# Vespa > The Vespa documentation ([provides all the information required to use all Vespa features and deploy them in any supported environment. --- # Source: https://docs.vespa.ai/en/learn/about-documentation.html.md # About this documentation The Vespa documentation ([https://docs.vespa.ai/](https://docs.vespa.ai/)) provides all the information required to use all Vespa features and deploy them in any supported environment. It is split into guides and tutorials, which explains features and how to use them to solve problems, and reference documentation which list complete information about all features and APIs. ## Applicability The Vespa platform is open source, and can be deployed in self-managed systems and on the Vespa Cloud service. Some add-ons (but no core functionality) are only available under a commercial license. Documents that describe functionality with such limited applicability are clearly marked by one or more of the following chips: | Vespa CloudThis content is applicable to Vespa Cloud deployments. | Only applicable to Vespa Cloud deployments. | | Self-managedThis content is applicable to self-managed Vespa systems. | Only applicable to self-managed deployments. | | EnterpriseNot open source: This functionality is only available commercially. | Not open source: Available commercially only (both self-managed and on cloud unless also marked by one of the other chips above). | For clarity, any document _not_ marked with any of these chips describes functionality that is open source and available both on Vespa Cloud and self-managed deployments. ## Contributing If you find errors or want to improve the documentation,[create an issue](https://github.com/vespa-engine/vespa/issues) or[contribute a fix](contributing). See the [README](https://docs.vespa.ai/README.md)before contributing. ## Notation _Italic_ is used for: - Pathnames, filenames, program names, hostnames, and URLs - New terms where they are defined `Constant Width` is used for: - Programming language elements, code examples, keywords, functions, classes, interfaces, methods, etc. - Commands and command-line output Commands meant to be run on the command line are shown like this, prepended by a $ for the prompt: ``` $ export PATH=$VESPA_HOME/bin:$PATH #how to highlight text in \
```
Notes and other Important pieces of information are shown like:
**Note:** Some info here
**Important:** Important info here
**Warning:** Warning here
**Deprecated:** Deprecation warning here
Copyright © 2026 - [Cookie Preferences](#)
---
# Source: https://docs.vespa.ai/en/operations/access-logging.html.md
# Access Logging
The Vespa access log format allows the logs to be processed by a number of available tools handling JSON based (log) files. With the ability to add custom key/value pairs to the log from any Searcher, you can easily track the decisions done by container components for given requests.
## Vespa Access Log Format
In the Vespa access log, each log event is logged as a JSON object on a single line. The log format defines a list of fields that can be logged with every request. In addition to these fields, [custom key/value pairs](#logging-key-value-pairs-to-the-json-access-log-from-searchers) can be logged via Searcher code. Pre-defined fields:
| Name | Type | Description | Always present |
| --- | --- | --- | --- |
| ip | string | The IP address request came from | yes |
| time | number | UNIX timestamp with millisecond decimal precision (e.g. 1477828938.123) when request is received | yes |
| duration | number | The duration of the request in seconds with millisecond decimal precision (e.g. 0.123) | yes |
| responsesize | number | The size of the response in bytes | yes |
| code | number | The HTTP status code returned | yes |
| method | string | The HTTP method used (e.g. 'GET') | yes |
| uri | string | The request URI from path and beyond (e.g. '/search?query=test') | yes |
| version | string | The HTTP version (e.g. 'HTTP/1.1') | yes |
| agent | string | The user agent specified in the request | yes |
| host | string | The host header provided in the request | yes |
| scheme | string | The scheme of the request | yes |
| port | number | The IP port number of the interface on which the request was received | yes |
| remoteaddr | string | The IP address of the [remote client](#logging-remote-address-port) if specified in HTTP header | no |
| remoteport | string | The port used from the [remote client](#logging-remote-address-port) if specified in HTTP header | no |
| peeraddr | string | Address of immediate client making request if different from _remoteaddr_ | no |
| peerport | string | Port used by immediate client making request if different from _remoteport_ | no |
| user-principal | string | The name of the authenticated user (java.security.Principal.getName()) if principal is set | no |
| ssl-principal | string | The name of the x500 principal if client is authenticated through SSL/TLS | no |
| search | object | Object holding search specific fields | no |
| search.totalhits | number | The total number of hits for the query | no |
| search.hits | number | The hits returned in this specific response | no |
| search.coverage | object | Object holding [query coverage information](../performance/graceful-degradation.html) similar to that returned in result set. | no |
| connection | string | Reference to the connection log entry. See [Connection log](#connection-log) | no |
| attributes | object | Object holding [custom key/value pairs](#logging-key-value-pairs-to-the-json-access-log-from-searchers) logged in searcher. | no |
**Note:** IP addresses can be both IPv4 addresses in standard dotted format (e.g. 127.0.0.1) or IPv6 addresses in standard form with leading zeros omitted (e.g. 2222:1111:123:1234:0:0:0:4321).
An example log line will look like this (here, pretty-printed):
```
{
"ip": "152.200.54.243",
"time": 920880005.023,
"duration": 0.122,
"responsesize": 9875,
"code": 200,
"method": "GET",
"uri": "/search?query=test¶m=value",
"version": "HTTP/1.1",
"agent": "Mozilla/4.05 [en] (Win95; I)",
"host": "localhost",
"search": {
"totalhits": 1234,
"hits": 0,
"coverage": {
"coverage": 98,
"documents": 100,
"degraded": {
"non-ideal-state": true
}
}
}
}
```
**Note:** The log format is extendable by design such that the order of the fields can be changed and new fields can be added between minor versions. Make sure any programmatic log handling is using a proper JSON processor.
Example: Decompress, pretty-print, with human-readable timestamps:
```
$[jq](https://stedolan.github.io/jq/)'. + {iso8601date:(.time | todateiso8601)}' \
<(unzstd -c /opt/vespa/logs/vespa/access/JsonAccessLog.default.20210601010000.zst)
```
### Logging Remote Address/Port
In some cases when a request passes through an intermediate service, this service may add HTTP headers indicating the IP address and port of the real origin client. These values are logged as _remoteaddr_ and _remoteport_ respectively. Vespa will log the contents in any of the following HTTP request headers as _remoteaddr_: _X-Forwarded-For_, _Y-RA_, _YahooRemoteIP_ or _Client-IP_. If more than one of these headers are present, the precedence is in the order listed here, i.e. _X-Forwarded-For_ takes precedence over _Y-RA_. The contents of the _Y-RP_ HTTP request header will be logged as _remoteport_.
If the remote address or -port differs from those initiating the HTTP request, the address and port for the immediate client making the request are logged as _peeraddress_ and _peerport_ respectively.
## Configuring Logging
For details on the access logging configuration see [accesslog in the container](../reference/applications/services/container.html#accesslog) element in _services.xml_.
Key configuration options include:
- **fileNamePattern**: Pattern for log file names with time variable support
- **rotationInterval**: Time-based rotation schedule (minutes since midnight)
- **rotationSize**: Size-based rotation threshold in bytes (0 = disabled)
- **rotationScheme**: Either 'sequence' or 'date'
- **compressionFormat**: GZIP or ZSTD compression for rotated files
### Logging Request Content
Vespa supports logging of request content for specific URI paths. This is useful for inspecting query content of search POST requests or document operations of Document v1 POST/PUT requests. The request content is logged as a base64-encoded string in the JSON access log.
To configure request content logging, use the [request-content](../reference/applications/services/container.html#request-content) element in the accesslog configuration in _services.xml_.
Here is an example of how the request content appears in the JSON access log:
```
{
...
"method": "POST",
"uri": "/search",
...,
"request-content": {
"type": "application/json; charset=utf-8",
"length": 12345,
"body": ""
}
}
```
### File name pattern
The file name pattern is expanded using the time when the file is created. The following parts in the file name are expanded:
| Field | Format | Meaning | Example |
| --- | --- | --- | --- |
| %Y | YYYY | Year | 2003 |
| %m | MM | Month, numeric | 08 |
| %x | MMM | Month, textual | Aug |
| %d | dd | Date | 25 |
| %H | HH | Hour | 14 |
| %M | mm | Minute | 30 |
| %S | ss | Seconds | 35 |
| %s | SSS | Milliseconds | 123 |
| %Z | Z | Time zone | -0400 |
| %T | Long | System.currentTimeMillis | 1349333576093 |
| %% | % | Escape percentage | % |
## Log rotation
Apache httpd style log _rotation_ can be configured by setting the _rotationScheme_. There's two alternatives for the rotationScheme, sequence and date. Rotation can be triggered by time intervals using _rotationInterval_ and/or by file size using _rotationSize_.
### Sequence rotation scheme
The _fileNamePattern_ is used for the active log file name (which in this case will often be a constant string). At rotation, this file is given the name fileNamePattern.N where N is 1 + the largest integer found by extracting the integers from all files ending by .\ in the same directory
```
```
### Date rotation scheme
The _fileNamePattern_ is used for the active log file name here too, but the log files are not renamed at rotation. Instead, you must specify a time-dependent fileNamePattern so that each time a new log file is created, the name is unique. In addition, a symlink is created pointing to the active log file. The name of the symlink is specified using _symlinkName_.
```
```
### Rotation interval
The time of rotation is controlled by setting _rotationInterval_:
```
```
The rotationInterval is a list of numbers specifying when to do rotation. Each element represents the number of minutes since midnight. Ending the list with '...' means continuing the [arithmetic progression](https://en.wikipedia.org/wiki/Arithmetic_progression) defined by the two last numbers for the rest of the day. E.g. "0 100 240 480 ..." is expanded to "0 100 240 480 720 960 1200"
### Log retention
Access logs are rotated, but not deleted by Vespa processes. It is up to the application owner to take care of archiving of access logs.
## Logging Key/Value pairs to the JSON Access Log from Searchers
To add a key/value pair to the access log from a searcher, use
```
query/result.getContext(true).logValue(key,value)
```
Such key/value pairs may be added from any thread participating in handling the query without incurring synchronization overhead.
If the same key is logged multiple times, the values written will be included in the log as an array of strings rather than a single string value.
The key/value pairs are added to the _attributes_ object in the log.
An example log line will then look something like this:
```
{"ip":"152.200.54.243","time":920880005.023,"duration":0.122,"responsesize":9875,"code":200,"method":"GET","uri":"/search?query=test¶m=value","version":"HTTP/1.1","agent":"Mozilla/4.05 [en] (Win95; I)","host":"localhost","search":{"totalhits":1234,"hits":0},"attributes":{"singlevalue":"value1","multivalue":["value2","value3"]}}
```
A pretty print version of the same example:
```
{
"ip": "152.200.54.243",
"time": 920880005.023,
"duration": 0.122,
"responsesize": 9875,
"code": 200,
"method": "GET",
"uri": "/search?query=test¶m=value",
"version": "HTTP/1.1",
"agent": "Mozilla/4.05 [en] (Win95; I)",
"host": "localhost",
"search": {
"totalhits": 1234,
"hits": 0
},
"attributes": {
"singlevalue": "value1",
"multivalue": [
"value2",
"value3"
]
}
}
```
## Connection log
In addition to the access log, one entry per connection is written to the connection log. This entry is written on connection close. Available fields:
| Name | Type | Description | Always present |
| --- | --- | --- | --- |
| id | string | Unique ID of the connection, referenced from access log. | yes |
| timestamp | number | Timestamp (ISO8601 format) when the connection was opened | yes |
| duration | number | The duration of the request in seconds with millisecond decimal precision (e.g. 0.123) | yes |
| peerAddress | string | IP address used by immediate client making request | yes |
| peerPort | number | Port used by immediate client making request | yes |
| localAddress | string | The local IP address the request was received on | yes |
| localPort | number | The local port the request was received on | yes |
| remoteAddress | string | Original client ip, if proxy protocol enabled | no |
| remotePort | number | Original client port, if proxy protocol enabled | no |
| httpBytesReceived | number | Number of HTTP bytes sent over the connection | no |
| httpBytesSent | number | Number of HTTP bytes received over the connection | no |
| requests | number | Number of requests sent by the client | no |
| responses | number | Number of responses sent to the client | no |
| ssl | object | Detailed information on ssl connection | no |
## SSL information
| Name | Type | Description | Always present |
| --- | --- | --- | --- |
| clientSubject | string | Client certificate subject | no |
| clientNotBefore | string | Client certificate valid from | no |
| clientNotAfter | string | Client certificate valid to | no |
| sessionId | string | SSL session id | no |
| protocol | string | SSL protocol | no |
| cipherSuite | string | Name of session cipher suite | no |
| sniServerName | string | SNI server name | no |
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [Vespa Access Log Format](#access-log-format)
- [Logging Remote Address/Port](#logging-remote-address-port)
- [Configuring Logging](#configuring-logging)
- [Logging Request Content](#logging-request-content)
- [File name pattern](#file-name-pattern)
- [Log rotation](#log-rotation)
- [Sequence rotation scheme](#sequence-rotation-scheme)
- [Date rotation scheme](#date-rotation-scheme)
- [Rotation interval](#rotation-interval)
- [Log retention](#log-retention)
- [Logging Key/Value pairs to the JSON Access Log from Searchers](#logging-key-value-pairs-to-the-json-access-log-from-searchers)
- [Connection log](#connection-log)
- [SSL information](#ssl-information)
---
# Source: https://docs.vespa.ai/en/operations/self-managed/admin-procedures.html.md
# Administrative Procedures
## Install
Refer to the [multinode-HA](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode-HA) sample application for a primer on how to set up a cluster - use this as a starting point. Try the [Multinode testing and observability](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode) sample app to get familiar with interfaces and behavior.
## Vespa start / stop / restart
Start and stop all services on a node:
```
$ $VESPA_HOME/bin/[vespa-start-services](../../reference/operations/self-managed/tools.html#vespa-start-services)$ $VESPA_HOME/bin/[vespa-stop-services](../../reference/operations/self-managed/tools.html#vespa-stop-services)
```
Likewise, for the config server:
```
$ $VESPA_HOME/bin/[vespa-start-configserver](../../reference/operations/self-managed/tools.html#vespa-start-configserver)$ $VESPA_HOME/bin/[vespa-stop-configserver](../../reference/operations/self-managed/tools.html#vespa-stop-configserver)
```
There is no _restart_ command, do a _stop_ then _start_ for a restart. Learn more about which processes / services are started at [Vespa startup](config-sentinel.html), read the [start sequence](configuration-server.html#start-sequence) and find training videos in the vespaengine [YouTube channel](https://www.youtube.com/@vespaai).
Use [vespa-sentinel-cmd](../../reference/operations/self-managed/tools.html#vespa-sentinel-cmd) to stop/start individual services.
**Important:** Running _vespa-stop-services_ on a content node will call[prepareRestart](../../reference/operations/self-managed/tools.html#vespa-proton-cmd) to optimize restart time, and is the recommended way to stop Vespa on a node.
See [multinode](multinode-systems.html#aws-ec2) for _systemd_ /_systemctl_ examples. [Docker containers](docker-containers.html) has relevant start/stop information, too.
### Content node maintenance mode
When stopping a content node _temporarily_ (e.g. for a software upgrade), consider manually setting the node into [maintenance mode](../../reference/api/cluster-v2.html#maintenance) _before_ stopping the node to prevent automatic redistribution of data while the node is down. Maintenance mode must be manually removed once the node has come back online. See also: [cluster state](#cluster-state).
Example of setting a node with [distribution key](../../reference/applications/services/content.html#node) 42 into `maintenance` mode using [vespa-set-node-state](../../reference/operations/self-managed/tools.html#vespa-set-node-state), additionally supplying a reason that will be recorded by the cluster controller:
```
$ vespa-set-node-state --type storage --index 42 maintenance "rebooting for software upgrade"
```
After the node has come back online, clear maintenance mode by marking the node as `up`:
```
$ vespa-set-node-state --type storage --index 42 up
```
Note that if the above commands are executed _locally_ on the host running the services for node 42, `--index 42` can be omitted; `vespa-set-node-state` will use the distribution key of the local node if no `--index` has been explicitly specified.
## System status
- Use [vespa-config-status](../../reference/operations/self-managed/tools.html#vespa-config-status) on a node in [hosts.xml](../../reference/applications/hosts.html) to verify all services run with updated config
- Make sure [VESPA\_CONFIGSERVERS](files-processes-and-ports.html#environment-variables) is set and identical on all nodes in hosts.xml
- Use the _cluster controller_ status page (below) to track the status of search/storage nodes.
- Check [logs](../../reference/operations/log-files.html)
- Use performance graphs, System Activity Report (_sar_) or [status pages](#status-pages) to track load
- Use [query tracing](../../reference/api/query.html#trace.level)
- Disk and/or memory might be exhausted and block feeding - recover from [feed block](/en/writing/feed-block.html)
## Status pages
All Vespa services have status pages, for showing health, Vespa version, config, and metrics. Status pages are subject to change at any time - take care when automating. Procedure:
1. **Find the port:** The status pages runs on ports assigned by Vespa. To find status page ports, use [vespa-model-inspect](../../reference/operations/self-managed/tools.html#vespa-model-inspect) to list the services run in the application.
```
$ vespa-model-inspect services
```
To find the status page port for a specific node for a specific service, pick the correct service and run:
```
$ vespa-model-inspect service [Options]
```
2. **Get the status and metrics:**_distributor_, _storagenode_, _searchnode_ and _container-clustercontroller_ are content services with status pages. These ports are tagged HTTP. The cluster controller have multiple ports tagged HTTP, where the port tagged STATE is the one with the status page. Try connecting to the root at the port, or /state/v1/metrics. The _distributor_ and _storagenode_ status pages are available at `/`:
```
$ vespa-model-inspect service searchnode
searchnode @ myhost.mydomain.com : search
search/search/cluster.search/0
tcp/myhost.mydomain.com:19110 (STATUS ADMIN RTC RPC)
tcp/myhost.mydomain.com:19111 (FS4)
tcp/myhost.mydomain.com:19112 (TEST HACK SRMP)
tcp/myhost.mydomain.com:19113 (ENGINES-PROVIDER RPC)tcp/myhost.mydomain.com:19114 (HEALTH JSON HTTP)$ curl http://myhost.mydomain.com:19114/state/v1/metrics
...
$ vespa-model-inspect service distributor
distributor @ myhost.mydomain.com : content
search/distributor/0
tcp/myhost.mydomain.com:19116 (MESSAGING)
tcp/myhost.mydomain.com:19117 (STATUS RPC)tcp/myhost.mydomain.com:19118 (STATE STATUS HTTP)$ curl http://myhost.mydomain.com:19118/state/v1/metrics
...
$ curl http://myhost.mydomain.com:19118/
...
```
3. **Use the cluster controller status page**: A status page for the cluster controller is available at the status port at `http://hostname:port/clustercontroller-status/v1/`. If _clustername_ is not specified, the available clusters will be listed. The cluster controller leader status page will show if any nodes are operating with differing cluster state versions. It will also show how many data buckets are pending merging (document set reconciliation) due to either missing or being out of sync.
```
$[vespa-model-inspect](../../reference/operations/self-managed/tools.html#vespa-model-inspect)service container-clustercontroller | grep HTTP
```
With multiple cluster controllers, look at the one with a "/0" suffix in its config ID; it is the preferred leader.
The cluster state version is listed under the _SSV_ table column. Divergence here usually points to host or networking issues.
## Cluster state
Cluster and node state information is available through the [/cluster/v2 API](../../reference/api/cluster-v2.html). This API can also be used to set a _user state_ for a node - alternatively use:
- [vespa-get-cluster-state](../../reference/operations/self-managed/tools.html#vespa-get-cluster-state)
- [vespa-get-node-state](../../reference/operations/self-managed/tools.html#vespa-get-node-state)
- [vespa-set-node-state](../../reference/operations/self-managed/tools.html#vespa-set-node-state)
Also see the cluster controller [status page](#status-pages).
State is persisted in a ZooKeeper cluster, restarting/changing a cluster controller preserves:
- Last cluster state version number, for new cluster controller handover at restarts
- User states, set by operators - i.e. nodes manually set to down / maintenance
In case of state data lost, the cluster state is reset - see [cluster controller](../../content/content-nodes.html#cluster-controller) for implications.
## Cluster controller configuration
It is recommended to run cluster controllers on the same hosts as [config servers](configuration-server.html), as they share a zookeeper cluster for state and deploying three nodes is best practise for both. See the [multinode-HA](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode-HA) sample app for a working example.
To configure the cluster controller, use [services.xml](../../reference/applications/services/content.html#cluster-controller) and/or add [configuration](https://github.com/vespa-engine/vespa/blob/master/configdefinitions/src/vespa/fleetcontroller.def) under the _services_ element - example:
```
5000
```
A broken content node may end up with processes constantly restarting. It may die during initialization due to accessing corrupt files, or it may die when it starts receiving requests of a given type triggering a node local bug. This is bad for distributor nodes, as these restarts create constant ownership transfer between distributors, causing windows where buckets are unavailable.
The cluster controller has functionality for detecting such nodes. If a node restarts in a way that is not detected as a controlled shutdown, more than [max\_premature\_crashes](https://github.com/vespa-engine/vespa/blob/master/configdefinitions/src/vespa/fleetcontroller.def), the cluster controller will set the wanted state of this node to be down.
Detecting a controlled restart is currently a bit tricky. A controlled restart is typically initiated by sending a TERM signal to the process. Not having any other sign, the content layer has to assume that all TERM signals are the cause of controlled shutdowns. Thus, if the process keep being killed by kernel due to using too much memory, this will look like controlled shutdowns to the content layer.
## Monitor distance to ideal state
Refer to the [distribution algorithm](../../content/idealstate.html). Use distributor [status pages](#status-pages) to inspect state metrics, see [metrics](../../content/content-nodes.html#metrics). `idealstate.merge_bucket.pending` is the best metric to track, it is 0 when the cluster is balanced - a non-zero value indicates buckets out of sync.
## Cluster configuration
- Running `vespa prepare` will not change served configuration until `vespa activate` is run. `vespa prepare` will warn about all config changes that require restart.
- Refer to [schemas](../../basics/schemas.html) for how to add/change/remove these.
- Refer to [elasticity](../../content/elasticity.html) for how to add/remove capacity from a Vespa cluster, procedure below.
- See [chained components](../../applications/chaining.html) for how to add or remove searchers and document processors.
- Refer to the [sizing examples](../../performance/sizing-examples.html) for changing from a _flat_ to _grouped_ content cluster.
## Add or remove a content node
1. **Node setup:** Prepare the node by installing software, set up the file systems/directories and set [VESPA\_CONFIGSERVERS](files-processes-and-ports.html#environment-variables). [Start](#vespa-start-stop-restart) the node.
2. **Modify configuration:** Add/remove a [node](../../reference/applications/services/content.html#node)-element in _services.xml_ and [hosts.xml](../../reference/applications/hosts.html). Refer to [multinode install](multinode-systems.html). Make sure the _distribution-key_ is unique.
3. **Deploy**: [Observe metrics](#monitor-distance-to-ideal-state) to track progress as the cluster redistributes documents. Use the [cluster controller](../../content/content-nodes.html#cluster-controller) to monitor the state of the cluster.
4. **Tune performance (optional):** Use [maxpendingidealstateoperations](https://github.com/vespa-engine/vespa/blob/master/storage/src/vespa/storage/config/stor-distributormanager.def) to tune concurrency of bucket merge operations from distributor nodes. Likewise, tune [merges](../../reference/applications/services/content.html#merges) - concurrent merge operations per content node. The tradeoff is speed of bucket replication vs use of resources, which impacts the applications' regular load.
5. **Finish:** The cluster is done redistributing when `idealstate.merge_bucket.pending` is zero on all distributors.
Do not remove more than _redundancy_-1 nodes at a time, to avoid data loss. Observe `idealstate.merge_bucket.pending` to know bucket replica status, when zero on all distributor nodes, it is safe to remove more nodes. If [grouped distribution](../../content/elasticity.html#grouped-distribution) is used to control bucket replicas, remove all nodes in a group if the redundancy settings ensure replicas in each group.
To increase bucket redundancy level before taking nodes out, [retire](../../content/content-nodes.html) nodes. Again, track `idealstate.merge_bucket.pending` to know when done. Use the [/cluster/v2 API](../../reference/api/cluster-v2.html) or [vespa-set-node-state](../../reference/operations/self-managed/tools.html#vespa-set-node-state) to set a node to the _retired_ state. The [cluster controller's](../../content/content-nodes.html#cluster-controller) status page lists node states.
An alternative to increasing cluster size is building a new cluster, then migrate documents to it. This is supported using [visiting](../../writing/visiting.html).
To _merge_ two content clusters, add nodes to the cluster like above, considering:
- [distribution-keys](../../reference/applications/services/content.html#node) must be unique. Modify paths like _$VESPA\_HOME/var/db/vespa/search/mycluster/n3_ before adding the node.
- Set [VESPA\_CONFIGSERVERS](files-processes-and-ports.html#environment-variables), then start the node.
## Topology change
Read [changing topology first](../../content/elasticity.html#changing-topology), and plan the sequence of steps.
Make sure to not change the `distribution-key` for nodes in _services.xml_.
It is not required to restart nodes as part of this process
## Add or remove services on a node
It is possible to run multiple Vespa services on the same host. If changing the services on a given host, stop Vespa on the given host before running `vespa activate`. This is because the services are dynamically allocated port numbers, depending on what is running on the host. Consider if some of the services changed are used by services on other hosts. In that case, restart services on those hosts too. Procedure:
1. Edit _services.xml_ and _hosts.xml_
2. Stop Vespa on the nodes that have changes
3. Run `vespa prepare` and `vespa activate`
4. Start Vespa on the nodes that have changes
## Troubleshooting
Also see the [FAQ](../../learn/faq).
| No endpoint |
Most problems with the quick start guides are due to Docker out of memory. Make sure at least 6G memory is allocated to Docker:
```
$ docker info | grep "Total Memory"
or
$ podman info | grep "memTotal"
```
OOM symptoms include
```
INFO: Problem with Handshake localhost:8080 ssl=false: localhost:8080 failed to respond
```
The container is named _vespa_ in the guides, for a shell do:
```
$ docker exec -it vespa bash
```
|
| Log viewing |
Use [vespa-logfmt](../../reference/operations/self-managed/tools.html#vespa-logfmt) to view the vespa log - example:
```
$ /opt/vespa/bin/vespa-logfmt -l warning,error
```
|
| Json |
For json pretty-print, append
```
| python -m json.tool
```
to commands that output json - or use [jq](https://stedolan.github.io/jq/). |
| Routing |
Vespa lets application set up custom document processing / indexing, with different feed endpoints. Refer to [indexing](../../writing/indexing.html) for how to configure this in _services.xml_.
[#13193](https://github.com/vespa-engine/vespa/issues/13193) has a summary of problems and solutions.
|
| Tracing |
Use [tracelevel](../../reference/api/document-v1.html#request-parameters) to dump the routes and hops for a write operation - example:
```
$ curl -H Content-Type:application/json --data-binary @docs.json \
$ENDPOINT/document/v1/mynamespace/doc/docid/1?tracelevel=4 | jq .
{
"pathId": "/document/v1/mynamespace/doc/docid/1",
"id": "id:mynamespace:doc::1",
"trace": [
{ "message": "[1623413878.905] Sending message (version 7.418.23) from client to ..." },
{ "message": "[1623413878.906] Message (type 100004) received at 'default/container.0' ..." },
{ "message": "[1623413878.907] Sending message (version 7.418.23) from 'default/container.0' ..." },
{ "message": "[1623413878.907] Message (type 100004) received at 'default/container.0' ..." },
{ "message": "[1623413878.909] Selecting route" },
{ "message": "[1623413878.909] No cluster state cached. Sending to random distributor." }
```
|
## Clean start mode
There has been rare occasions were Vespa stored data that was internally inconsistent. For those circumstances it is possible to start the node in a [validate\_and\_sanitize\_docstore](https://github.com/vespa-engine/vespa/blob/master/configdefinitions/src/vespa/proton.def) mode. This will do its best to clean up inconsistent data. However, detecting that this is required is not easy, consult the Vespa Team first. In order for this approach to work, all nodes must be stopped before enabling this feature - this to make sure the data is not redistributed.
## Content cluster configuration
| Availability vs resources |
Keeping index structures costs resources. Not all replicas of buckets are necessarily searchable, unless configured using [searchable-copies](../../reference/applications/services/content.html#searchable-copies). As Vespa indexes buckets on-demand, the most cost-efficient setting is 1, if one can tolerate temporary coverage loss during node failures.
|
| Data retention vs size |
When a document is removed, the document data is not immediately purged. Instead, _remove-entries_ (tombstones of removed documents) are kept for a configurable amount of time. The default is two weeks, refer to [removed-db prune age](../../reference/applications/services/content.html#removed-db-prune-age). This ensures that removed documents stay removed in a distributed system where nodes change state. Entries are removed periodically after expiry. Hence, if a node comes back up after being down for more than two weeks, removed documents are available again, unless the data on the node is wiped first. A larger _prune age_ will grow the storage size as this keeps document and tombstones longer.
**Note:** The backend does not store remove-entries for nonexistent documents. This to prevent clients sending wrong document identifiers from filling a cluster with invalid remove-entries. A side effect is that if a problem has caused all replicas of a bucket to be unavailable, documents in this bucket cannot be marked removed until at least one replica is available again. Documents are written in new bucket replicas while the others are down - if these are removed, then older versions of these will not re-emerge, as the most recent change wins.
|
| Transition time |
See [transition-time](../../reference/applications/services/content.html#transition-time) for tradeoffs for how quickly nodes are set down vs. system stability.
|
| Removing unstable nodes |
One can configure how many times a node is allowed to crash before it will automatically be removed. The crash count is reset if the node has been up or down continuously for more than the [stable state period](../../reference/applications/services/content.html#stable-state-period). If the crash count exceeds [max premature crashes](../../reference/applications/services/content.html#max-premature-crashes), the node will be disabled. Refer to [troubleshooting](#troubleshooting).
|
| Minimal amount of nodes required to be available |
A cluster is typically sized to handle a given load. A given percentage of the cluster resources are required for normal operations, and the remainder is the available resources that can be used if some of the nodes are no longer usable. If the cluster loses enough nodes, it will be overloaded:
- Remaining nodes may create disk full situation. This will likely fail a lot of write operations, and if disk is shared with OS, it may also stop the node from functioning.
- Partition queues will grow to maximum size. As queues are processed in FIFO order, operations are likely to get long latencies.
- Many operations may time out while being processed, causing the operation to be resent, adding more load to the cluster.
- When new nodes are added, they cannot serve requests before data is moved to the new nodes from the already overloaded nodes. Moving data puts even more load on the existing nodes, and as moving data is typically not high priority this may never actually happen.
To configure what the minimal cluster size is, use [min-distributor-up-ratio](../../reference/applications/services/content.html#min-distributor-up-ratio) and [min-storage-up-ratio](../../reference/applications/services/content.html#min-storage-up-ratio). |
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [Install](#install)
- [Vespa start / stop / restart](#vespa-start-stop-restart)
- [Content node maintenance mode](#content-node-maintenance-mode)
- [System status](#system-status)
- [Status pages](#status-pages)
- [Cluster state](#cluster-state)
- [Cluster controller configuration](#cluster-controller-configuration)
- [Monitor distance to ideal state](#monitor-distance-to-ideal-state)
- [Cluster configuration](#cluster-configuration)
- [Add or remove a content node](#add-or-remove-a-content-node)
- [Topology change](#topology-change)
- [Add or remove services on a node](#add-or-remove-services-on-a-node)
- [Troubleshooting](#troubleshooting)
- [Clean start mode](#clean-start-mode)
- [Content cluster configuration](#content-cluster-configuration)
---
# Source: https://docs.vespa.ai/en/reference/applications/services/admin.html.md
# services.xml - 'admin'
Reference documentation for `` in [services.xml](services.html). Find a working example of this configuration in the sample application _multinode-HA_[services.xml](https://github.com/vespa-engine/sample-apps/blob/master/examples/operations/multinode-HA/services.xml).
```
admin [version][adminserver [hostalias]](#adminserver)[cluster-controllers](#cluster-controllers)[cluster-controller [hostalias, baseport, jvm-options, jvm-gc-options]](#cluster-controller)[configservers](#configservers)[configserver [hostalias, baseport]](#configserver)[logserver [jvm-options, jvm-gc-options]](#logserver)[slobroks](#slobroks)[slobrok [hostalias, baseport]](#slobrok)[monitoring [systemname]](#monitoring)[metrics](#metrics)[consumer [id]](#consumer)[metric-set [id]](#metric-set)[metric [id]](#metric)[cloudwatch [region, namespace]](#cloudwatch)[shared-credentials [file, profile]](#shared-credentials)[logging](#logging)
```
| Attribute | Required | Value | Default | Description |
| --- | --- | --- | --- | --- |
| version | required | number | |
2.0
|
## adminserver
The configured node will be the default administration node in your Vespa system, which means that unless configured otherwise all administrative services - i.e. the log server, the configuration server, the slobrok, and so on - will run on this node. Use [configservers](#configservers), [logserver](#logserver),[slobroks](#slobroks) elements if you need to specify baseport or jvm options for any of these services.
| Attribute | Required | Value | Default | Description |
| --- | --- | --- | --- | --- |
| hostalias | required | string | |
|
| baseport | optional | number | |
|
## cluster-controllers
Container for one or more [cluster-controller](#cluster-controller) elements. When having one or more [content](content.html) clusters, configuring at least one cluster controller is required.
| Attribute | Required | Value | Default | Description |
| --- | --- | --- | --- | --- |
| standalone-zookeeper | optional | true/false | false |
Will by default share the ZooKeeper instance with configserver. If configured to true a separate ZooKeeper instance will be configured and started on the set of nodes where you run cluster controller on. The set of cluster controllers nodes cannot overlap with the set of nodes where config server is running. If this setting is changed from false to true in a running system, all previous cluster state information will be lost as the underlying ZooKeeper changes. Cluster controllers will re-discover the state, but nodes that have been manually set as down will again be considered to be up.
|
## cluster-controller
Specifies a host on which to run the [Cluster Controller](../../../content/content-nodes.html#cluster-controller) service. The Cluster Controller manages the state of the cluster in order to provide elasticity and failure detection.
| Attribute | Required | Value | Default | Description |
| --- | --- | --- | --- | --- |
| hostalias | required | string | |
|
| baseport | optional | number | |
|
| jvm-options | optional | string | |
|
## configservers
Container for one or more `configserver` elements.
## configserver
Specifies a host on which to run the [Configuration Server](/en/operations/self-managed/configuration-server.html) service. If contained directly below `` you may only have one, so if you need to configure multiple instances of this service, contain them within the [``](#configservers) element.
| Attribute | Required | Value | Default | Description |
| --- | --- | --- | --- | --- |
| hostalias | required | string | |
|
| baseport | optional | number | |
|
## logserver
Specifies a host on which to run the [Vespa Log Server](../../operations/log-files.html#log-server) service. If not specified, the logserver is placed on the [adminserver](#adminserver), like in the [example](https://github.com/vespa-engine/sample-apps/blob/master/examples/operations/multinode-HA/services.xml).
| Attribute | Required | Value | Default | Description |
| --- | --- | --- | --- | --- |
| hostalias | required | string | |
|
| baseport | optional | number | |
|
| jvm-options | optional | string | |
|
| jvm-gc-options | optional | string | |
|
Example:
```
```
```
```
## slobroks
This is a container for one or more `slobrok` elements.
## slobrok
Specifies a host on which to run the [Service Location Broker (slobrok)](/en/operations/self-managed/slobrok.html) service.
| Attribute | Required | Value | Default | Description |
| --- | --- | --- | --- | --- |
| hostalias | required | string | |
|
| baseport | optional | number | |
|
## monitoring
Settings for how to pass metrics to a monitoring service - see [monitoring](/en/operations/self-managed/monitoring.html).
```
```
```
```
| systemname | The name of the application in question in the monitoring system, default is "vespa" |
## logging
Used for tuning log levels of Java plug-ins. If you (temporarily) need to enable debug logging from some class or package, or if some third-party component is spamming your log with unnecessary INFO level messages, you can turn levels on or off. Example:
```
```
```
```
Note that tuning also affects sub-packages, so the above would also affect all packages with `org.anotherorg.` as prefix. And if there is a `org.myorg.tricky.package.foo.InternalClass` you will get even "spam" level logging from it!
The default for `levels` is `"all -debug -spam"` and as seen above you can add and remove specific levels.
## metrics
Used for configuring the forwarding of metrics to graphing applications - add `consumer` child elements. Also see [monitoring](/en/operations/self-managed/monitoring.html). Example:
```
```
```
```
## consumer
Configure a metrics consumer. The metrics contained in this element will be exported to the consumer with the given id. `consumer` is a request parameter in [/metrics/v1/values](../../api/metrics-v1.html), [/metrics/v2/values](../../api/metrics-v2.html) and [/prometheus/v1/values](../../api/prometheus-v1.html).
Add `metric` and/or `metric-set` children.
| Attribute | Required | Value | Default | Description |
| --- | --- | --- | --- | --- |
| id | required | string | |
The name of the consumer to export metrics to.
|
## metric-set
Include a pre-defined set of metrics to the consumer.
| Attribute | Required | Value | Default | Description |
| --- | --- | --- | --- | --- |
| id | required | string | |
The id of the metric set to include. Built-in metric sets are:
- `default`
- `Vespa`
|
## metric
Configure a metric.
| Attribute | Required | Value | Default | Description |
| --- | --- | --- | --- | --- |
| id | required | string | |
The name of the metric as defined in custom code or in [process metrics api](../../api/state-v1.html#state-v1-metrics)
|
Note that metric id needs to include the metric specific suffix, e.g. _.average_.
In this example, there is one metric added to a custom consumer in addition to the default metric set. Use _&consumer=my-custom-consumer_ parameter for the prometheus endpoint. Also notice the .count suffix, see [process metrics api](../../api/state-v1.html#state-v1-metrics).
The per process metrics api endpoint _/state/v1/metrics_ also includes a description of each emitted metric. The _/state/v1/metrics_ endpoint also includes the metric aggregates (.count, .average, .rate, .max).
```
```
```
```
## cloudwatch
Specifies that the metrics from this consumer should be forwarded to CloudWatch.
| Attribute | Required | Value | Default | Description |
| --- | --- | --- | --- | --- |
| region | required | string | |
Your AWS region
|
| namespace | required | string | |
The metrics namespace in CloudWatch
|
Example:
```
```
```
```
## shared-credentials
Specifies that a profile from a shared-credentials file should be used for authentication to CloudWatch.
| Attribute | Required | Value | Default | Description |
| --- | --- | --- | --- | --- |
| file | required | string | |
The path to the shared-credentials file
|
| profile | optional | string | default |
The profile in the shared-credentials file
|
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [adminserver](#adminserver)
- [cluster-controllers](#cluster-controllers)
- [cluster-controller](#cluster-controller)
- [configservers](#configservers)
- [configserver](#configserver)
- [logserver](#logserver)
- [slobroks](#slobroks)
- [slobrok](#slobrok)
- [monitoring](#monitoring)
- [logging](#logging)
- [metrics](#metrics)
- [consumer](#consumer)
- [metric-set](#metric-set)
- [metric](#metric)
- [cloudwatch](#cloudwatch)
- [shared-credentials](#shared-credentials)
---
# Source: https://docs.vespa.ai/en/reference/api/api.html.md
# Vespa API and interfaces
## Deployment and configuration
- [Deploy API](deploy-v2.html): Deploy [application packages](../../basics/applications.html) to configure a Vespa application
- [Config API](config-v2.html): Get and Set configuration
- [Tenant API](application-v2.html): Configure multiple tenants in the config servers
## Document API
- [Reads and writes](../../writing/reads-and-writes.html): APIs and binaries to read and update documents
- [/document/v1/](document-v1.html): REST API for operations based on document ID (get, put, remove, update)
- [Feeding API](../../clients/vespa-feed-client.html): High performance feeding API, the recommended API for feeding data
- [JSON feed format](../schemas/document-json-format.html): The Vespa Document format
- [Vespa Java Document API](../../writing/document-api-guide.html)
## Query and grouping
- [Query API](../../querying/query-api.html), [Query API reference](query.html)
- [Query Language](../../querying/query-language.html), [Query Language reference](../querying/yql.html), [Simple Query Language reference](../querying/simple-query-language.html), [Predicate fields](../../schemas/predicate-fields.html)
- [Vespa Query Profiles](../../querying/query-profiles.html)
- [Grouping API](../../querying/grouping.html), [Grouping API reference](../querying/grouping-language.html)
## Processing
- [Vespa Processing](../../applications/processing.html): Request-Response processing
- [Vespa Document Processing](../../applications/document-processors.html): Feed processing
## Request processing
- [Searcher API](../../applications/searchers.html)
- [Federation API](../../querying/federation.html)
- [Web service API](../../applications/web-services.html)
## Result processing
- [Custom renderer API](../../applications/result-renderers.html)
## Status and state
- [Health and Metric APIs](../../operations/metrics.html)
- [/cluster/v2 API](cluster-v2.html)
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [Deployment and configuration](#deployment-and-configuration)
- [Document API](#document-api)
- [Query and grouping](#query-and-grouping)
- [Processing](#processing)
- [Request processing](#request-processing)
- [Result processing](#result-processing)
- [Status and state](#status)
---
# Source: https://docs.vespa.ai/en/reference/applications/application-packages.html.md
# Application package reference
This is the [application package](../../basics/applications.html) reference. An application package is the deployment unit in Vespa. To deploy an application, create an application package and [vespa deploy](../../clients/vespa-cli.html#deployment) or use the [deploy API](../api/deploy-v2.html). The application package is a directory of files and subdirectories:
| Directory/file | Required | Description |
| --- | --- | --- |
| [services.xml](services/services.html) | Yes | Describes which services to run where, and their main configuration. |
| [hosts.xml](hosts.html) | No |
Vespa Cloud: Not used. See node counts in [services.xml](services/services.html).
Self-managed: The mapping from logical nodes to actual hosts.
|
| [deployment.xml](deployment.html) | Yes, for Vespa Cloud |
Specifies which environments and regions the application is deployed to during automated application deployment, as which application instances.
This file also specifies other deployment-related configurations like [cloud accounts](../../operations/enclave/enclave) and [private endpoints](../../operations/private-endpoints.html).
The file is required when deploying to the [prod environment](../../operations/environments.html#prod) - it is ignored (with some exceptions) when deploying to the _dev_ environment.
|
| [validation-overrides.xml](validation-overrides.html) | No | Override, allowing this package to deploy even if it fails validation. |
| [.vespaignore](../../applications/vespaignore.html) | No | Contains a list of path patterns that should be excluded from the `application.zip` deployed to Vespa. |
| [models](../ranking/model-files.html)/ | No | Machine-learned models in the application package. Refer to [stateless model evaluation](../../ranking/stateless-model-evaluation.html), [Tensorflow](../../ranking/tensorflow), [Onnx](../../ranking/onnx), [XGBoost](../../ranking/xgboost), and [LightGBM](../../ranking/lightgbm). |
| [schemas](../../basics/schemas.html)/ | No | Contains the \*.sd files describing the document types of the application and how they should be queried and processed. |
| [schemas/[schema]](../schemas/schemas.html#rank-profile)/ | No | Contains \*.profile files defining [rank profiles](../../basics/ranking.html#rank-profiles). This is an alternative to defining rank profiles inside the schema. |
| [security/clients.pem](../../security/guide) | Yes, for Vespa Cloud | PEM encoded X.509 certificates for data plane access. See the [security guide](../../security/guide) for how to generate and use. |
| [components](../../applications/components.html)/ | No | Contains \*.jar files containing searcher(s) for the JDisc Container. |
| [rules](../querying/semantic-rules.html)/ | No | Contains \*.sr files containing rule bases for semantic recognition and translation of the query |
| [search/query-profiles](../querying/query-profiles.html)/ | No | Contains \*.xml files containing a named set of search request parameters with values |
| [constants](../../ranking/tensor-user-guide.html#constant-tensors)/ | No | Constant tensors |
| [tests](testing.html)/ | No | Test files for automated tests |
| ext/ | No | Files that are guaranteed to be ignored by Vespa: They are excluded when processing the application package and cannot be referenced from any other element in it. |
Additional files and directories can be placed anywhere in the application package. These will be not be processed explicitly by Vespa when deploying the application package (i.e. they will only be considered if they are referred to from within the application package), but there is no guarantee to how these might be processed in a future release. To extend the application package in a way that is guaranteed to be ignored by Vespa in all future releases, use the _ext/_ directory.
## Deploy
| Command | Description |
| --- | --- |
| upload | Uploads an application package to the config server. Normally not used, as _prepare_ includes _upload_ |
| prepare |
1. Verifies that a configuration server is up and running
2. Uploads the application to the configuration server, which stores it in _$VESPA\_HOME/var/db/vespa/config\_server/serverdb/tenants/default/sessions/[sessionid]_. _[sessionid]_ increases for each _prepare_-call. The config server also stores the application in a [ZooKeeper](/en/operations/self-managed/configuration-server.html) instance at _/config/v2/tenants/default/sessions/[sessionid]_ - this distributes the application to all config servers
3. Creates metadata about the deployed the applications package (which user deployed it, which directory was it deployed from and at what time was it deployed) and stores it in _...sessions/[sessionid]/.applicationMetaData_
4. Verifies that the application package contains the required files and performs a consistency check
5. Validates the xml config files using the [schema](https://github.com/vespa-engine/vespa/tree/master/config-model/src/main/resources/schema), found in _$VESPA\_HOME/share/vespa/schema_
6. Checks if there are config changes between the active application and this prepared application that require actions like restart or re-feed (like changes to [schemas](../../basics/schemas.html)). These actions are returned as part of the prepare step in the [deployment API](../api/deploy-v2.html#prepare-session). This prevents breaking changes to production - also read about [validation overrides](validation-overrides.html)
7. Distributes constant tensors and bundles with [components](../../applications/components.html) to nodes using [file distribution](/en/applications/deployment.html#file-distribution). Files are downloaded to _$VESPA\_HOME/var/db/vespa/filedistribution_, URL download starts downloading to _$VESPA\_HOME/var/db/vespa/download_
|
| activate |
1. Waits for prepare to complete
2. Activates new configuration version
3. Signals to containers to load new bundles - read more in [container components](../../applications/components.html)
|
| fetch | Use _fetch_ to download the active application package |
An application package can be zipped for deployment:
```
$ zip -r ../app.zip .
```
Use any name for the zip file - then refer to the file instead of the path in [deploy](../../clients/vespa-cli.html#deployment) commands.
**Important:** Using `tar` / `gzip` is not supported.[Details](https://github.com/vespa-engine/vespa/issues/17837).
## Preprocess directives
Use preprocess directives to:
- _preprocess:properties_: define properties that one can refer to everywhere in _services.xml_
- _preprocess:include_: split _services.xml_ in smaller chunks
Below, _${container.port}_ is replaced by _4099_. The contents of _content.xml_ is placed at the _include_ point. This is applied recursively, one can use preprocess directives in included files, as long as namespaces are defined in the top level file:
```
\ \4099\ \
\
```
Sample _content.xml_:
```
1
```
## Versioning application packages
An application can be given a user-defined version, available at[/ApplicationStatus](../../applications/components.html#monitoring-the-active-application). Configure the version in [services.xml](../applications/services/services.html) (at top level):
```
42
...
```
Copyright © 2026 - [Cookie Preferences](#)
---
# Source: https://docs.vespa.ai/en/reference/api/application-v2.html.md
# /application/v2/tenant API reference
This is the /application/v2/tenant API reference with examples for the HTTP REST API to [list](#list-tenants), [create](#create-tenant) and [delete](#delete-tenant) a tenant, which can be used to [deploy](deploy-v2.html) an application.
The response format is JSON. The tenant value is "default".
The current API version is 2. The API port is 19071 - use [vespa-model-inspect](/en/reference/operations/self-managed/tools.html#vespa-model-inspect) service configserver to find config server hosts - example: `http://myconfigserver.mydomain.com:19071/application/v2/tenant/`
## HTTP requests
| HTTP request | application/v2/tenant operation | Description |
| --- | --- | --- |
| GET |
List tenant information.
|
| | List tenants |
```
/application/v2/tenant/
```
Example response:
```
```
[
"default"
]
```
```
|
| | Get tenant |
```
/application/v2/tenant/default
```
Example response:
```
```
{
"message": "Tenant 'default' exists."
}
```
```
|
| PUT |
Create a new tenant.
|
| | Create tenant |
```
/application/v2/tenant/default
```
Response: A message with the name of the tenant created - example:
```
```
{
"message" : "Tenant default created."
}
```
```
**Note:** This operation is asynchronous, it will eventually propagate to all config servers.
|
| DELETE |
Delete a tenant.
|
| | Delete tenant |
```
/application/v2/tenant/default
```
Response: A message with the deleted tenant:
```
```
{
"message" : "Tenant default deleted."
}
```
```
**Note:** This operation is asynchronous, it will eventually propagate to all config servers.
|
## Request parameters
None.
## HTTP status codes
Non-exhaustive list of status codes. Any additional info is included in the body of the return call, JSON-formatted.
| Code | Description |
| --- | --- |
| 400 | Bad request. Client error. The error message should indicate the cause. |
| 404 | Not found. For example using a session id that does not exist. |
| 405 | Method not implemented. E.g. using GET where only POST or PUT is allowed. |
| 500 | Internal server error. Generic error. The error message should indicate the cause. |
## Response format
Responses are in JSON format, with the following fields:
| Field | Description |
| --- | --- |
| message | An info/error message. |
Copyright © 2026 - [Cookie Preferences](#)
---
# Source: https://docs.vespa.ai/en/basics/applications.html.md
# Vespa applications
You use Vespa by deploying an _application_ to it. Why applications? Because Vespa handles both data and the computations you do over them - together an application.
An application is specified by an _application package_ - a directory with some files. The application package contains _everything_ that is needed to run your application: Config, schemas, components, ML models, and so on.
The _only_ way to change an application is to make the change in the application package and then deploy it again. Vespa will then safely change the running system to match the new application package revision, without impacting queries, writes, or data.
## A minimal application package
You can create a complete application package with just a single file: services.xml. This file specifies the clusters that your application should run. It could just be a single stateless cluster - what's called _container_ - like this:
```
```
```
```
Put this in a file called services.xml, and you have created the world's smallest application package. However, this won't do much, usually you want to have a `content` cluster which can store data, maintain indexes, and run the distributed part of queries. You'll also want your container cluster to load the necessary middleware for this. With that we get a services file like this:
```
```
2
```
```
This specifies a pretty normal simple Vespa application, but now we need another file: The schema of the document type we'll use. This goes into the directory `schemas/`, so our application package now looks like this:
```
services.xml
schemas/myschema.sd
```
The schema file describes a kind of data and the computations (such as ranking/scoring) you want to do over it. At minimum it just lists the fields of that data type and if and each field should be indexed:
```
schema myschema {
document myschema {
field text type string {
indexing: summary | index
}
field embedding type tensor(x[384]) {
indexing: attribute | index
}
field popularity type double {
indexing: summary | attribute
}
}
}
```
With these two files we have specified a fully functional application that can do text, vector and hybrid search with filtering.
Rather than creating applications from scratch like this, you can also clone one of our sample applications as a starting point like we did in [getting started](deploy-an-application.html).
To read more on schemas, see the [schemas](schemas.html) guide. To see everything an application package can contain, see the[application package reference](../reference/applications/application-packages.html).
## Deploying applications
To create running instances of an application, or make the changes to one take effect, you _deploy_ it. Deployments to the dev zone and to self-managed clusters sets up a single instance, while deployments to production can set up multiple instances in one or more regions.
To deploy an application package you use the [deploy command](../clients/vespa-cli.html#deployment) in Vespa CLI:
```
```
$ vespa deploy .
```
```
This will deploy the application package at the current dir to the current target and the default dev zone (use `vespa deploy -h` to see other options).
Deployment to production zones use a separate command:
```
```
$ vespa prod deploy .
```
```
Production deployments also require an additional file in the application package to specify where it should be deployed: deployment.xml. See [production deployment](../operations/production-deployment.html). The recommended way to deploy to production is by setting up a continuous deployment job, see[automated deployments](../operations/automated-deployments.html).
Deploying a change to an application package is generally safe to do at any time. It does not disrupt queries and writes, and invalid or destructive changes are rejected before taking effect. You can also add tests that verifies the application before deployment to production zones.
#### Next: [Schemas](schemas.html)
Copyright © 2026 - [Cookie Preferences](#)
---
# Source: https://docs.vespa.ai/en/querying/approximate-nn-hnsw.html.md
# Approximate nearest neighbor search using HNSW index
This document describes how to speed up searches for nearest neighbors in vector spaces by adding[HNSW index](../reference/schemas/schemas.html#index-hnsw) to tensor fields. For an introduction to nearest neighbor search, see [nearest neighbor search](nearest-neighbor-search) documentation, for practical usage of Vespa's nearest neighbor search, see [nearest neighbor search - a practical guide](nearest-neighbor-search-guide), and to have Vespa create vectors for you, see [embedding](../rag/embedding.html).
Vespa implements a modified version of the Hierarchical Navigable Small World (HNSW) graph algorithm [paper](https://arxiv.org/abs/1603.09320). The implementation in Vespa supports:
- **Filtering** - The search for nearest neighbors can be constrained by query filters. The [nearestNeighbor](../reference/querying/yql.html#nearestneighbor) query operator can be combined with other filters or query terms using the [Vespa query language](query-language.html). See the query examples in the [practical guide](nearest-neighbor-search-guide#combining-approximate-nearest-neighbor-search-with-query-filters).
- **Multi-field vector Indexing** - A schema can include multiple indexed tensor fields and search any combination of them in a query. This is useful to support multiple models, multiple text sources, and multi-modal search such as indexing both a textual description and image for the same entity.
- **Multi-vector Indexing** - A single document field can contain any number of vector values by defining it as a mixed tensor (a "map of vectors"). Documents will then be retrieved by the closest vector in each document compared to the query vector. See the [Multi-vector indexing sample application](https://github.com/vespa-engine/sample-apps/tree/master/multi-vector-indexing)for examples. This is commonly used to [index documents with multiple chunks](../rag/working-with-chunks.html). See also [this blog post](https://blog.vespa.ai/semantic-search-with-multi-vector-indexing/#implementation).
- **Real Time Indexing** - CRUD (Create, Add, Update, Remove) vectors in the index in true real time.
- **Mutable HNSW Graph** - No query or indexing overhead from searching multiple _HNSW_ graphs. In Vespa, there is one graph per tensor field per content node. No segmented or partitioned graph where a query against a content node need to scan multiple HNSW graphs.
- **Multithreaded Indexing** - The costly part when performing real time changes to the _HNSW_ graph is distance calculations while searching the graph layers to find which links to change. These distance calculations are performed by multiple indexing threads.
- **Multiple value types** - The cost driver of vector search is often storing the vectors in memory, which is required to produce accurate results at low latency. An effective way to reduce cost is to reduce the size of each vector value. Vespa supports double, float, bfloat16, int8 and [single-bit values](../rag/binarizing-vectors.html). Changing from float to bfloat16 can halve cost with negligible impact on accuracy, while single-bit values greatly reduce both memory and cpu costs, and can be effectively combined with larger vector values stored on disk as a paged attribute to be used for ranking.
- **Optimized HNSW lookups** - ANN searches in Vespa [support](https://blog.vespa.ai/tweaking-ann-parameters/) both pre-and post-filtering, beam exploration, and filtering before distance calculation ("Acorn 1"). Tuning parameters for these makes it possible to strike a good balance between performance and accuracy for any data set. Vespa's [ANN tuning tool](https://vespa-engine.github.io/pyvespa/examples/ann-parameter-tuning-vespa-cloud.html) can be used to automate the process.
## Using Vespa's approximate nearest neighbor search
The query examples in [nearest neighbor search](nearest-neighbor-search) uses exact search, which has perfect accuracy. However, this is computationally expensive for large document volumes as distances are calculated for every document which matches the query filters.
To enable fast approximate matching, the tensor field definition needs an `index` directive. A Vespa [document schema](../basics/schemas.html) can declare multiple tensor fields with `HNSW` enabled.
```
field image_embeddings type tensor(i{},x[512]) {
indexing: summary | attribute | index
attribute {
distance-metric: angular
}
index {
hnsw {
max-links-per-node: 16
neighbors-to-explore-at-insert: 100
}
}
}
field text_embedding type tensor(x[384]) {
indexing: summary | attribute | index
attribute {
distance-metric: prenormalized-angular
}
index {
hnsw {
max-links-per-node: 24
neighbors-to-explore-at-insert: 200
}
}
}
```
In the schema snippet above, fast approximate search is enabled by building an `HNSW` index for the`image_embeddings` and the `text_embedding` tensor fields.`image_embeddings` indexes multiple vectors per document, while `text_embedding` indexes one vector per document.
The two vector fields use different [distance-metric](../reference/schemas/schemas.html#distance-metric)and `HNSW` index settings:
- `max-links-per-node` - a higher value increases recall accuracy, but also memory usage, indexing and search cost.
- `neighbors-to-explore-at-insert` - a higher value increases recall accuracy, but also indexing cost.
Choosing the value of these parameters affects both accuracy, search performance, memory usage and indexing performance. See [Billion-scale vector search with Vespa - part two](https://blog.vespa.ai/billion-scale-knn-part-two/)for a detailed description of these tradeoffs. See [HNSW index reference](../reference/schemas/schemas.html#index-hnsw) for details on the index parameters.
### Indexing throughput

The `HNSW` settings impacts indexing throughput. Higher values of `max-links-per-node` and `neighbors-to-explore-at-insert`reduces indexing throughput. Example from [Billion-scale vector search with Vespa - part two](https://blog.vespa.ai/billion-scale-knn-part-two/).
### Memory usage
Higher value of `max-links-per-node` impacts memory usage, higher values means higher memory usage:

### Accuracy

Higher `max-links-per-node` and `neighbors-to-explore-at-insert` improves the quality of the graph and recall accuracy. As the search-time parameter [hnsw.exploreAdditionalHits](../reference/querying/yql.html#hnsw-exploreadditionalhits) is increased, the lower combination reaches about 70% recall@10, while the higher combination reaches about 92% recall@10. The improvement in accuracy needs to be weighted against the impact on indexing performance and memory usage.
## Using approximate nearest neighbor search
With an _HNSW_ index enabled on the tensor field one can choose between approximate or exact (brute-force) search by using the [approximate query annotation](../reference/querying/yql.html#approximate)
```
{
"yql": "select * from doc where {targetHits: 100, approximate:false}nearestNeighbor(image_embeddings,query_image_embedding)",
"hits": 10
"input.query(query_image_embedding)": [0.21,0.12,....],
"ranking.profile": "image_similarity"
}
```
By default, `approximate` is true when searching a tensor field with `HNSW` index enabled. The `approximate` parameter allows quantifying the accuracy loss of using approximate search. The loss can be calculated by performing an exact neighbor search using `approximate:false` and compare the retrieved documents with `approximate:true` and calculate the overlap@k metric.
Note that exact searches over a large vector volume require adjustment of the[query timeout](../reference/api/query.html#timeout). The default [query timeout](../reference/api/query.html#timeout) is 500ms, which will be too low for an exact search over many vectors.
In addition to [targetHits](../reference/querying/yql.html#targethits), there is a [hnsw.exploreAdditionalHits](../reference/querying/yql.html#hnsw-exploreadditionalhits) parameter which controls how many extra nodes in the graph (in addition to `targetHits`) that are explored during the graph search. This parameter is used to tune accuracy quality versus query performance.
## Combining approximate nearest neighbor search with filters
The [nearestNeighbor](../reference/querying/yql.html#nearestneighbor) query operator can be combined with other query filters using the [Vespa query language](../reference/querying/yql.html) and its query operators. There are two high-level strategies for combining query filters with approximate nearest neighbor search:
- [pre-filtering](https://blog.vespa.ai/constrained-approximate-nearest-neighbor-search/#pre-filtering-strategy) (the default)
- [post-filtering](https://blog.vespa.ai/constrained-approximate-nearest-neighbor-search/#post-filtering-strategy)
These strategies can be configured in a rank profile using[approximate-threshold](../reference/schemas/schemas.html#approximate-threshold) and[post-filter-threshold](../reference/schemas/schemas.html#post-filter-threshold). See[Controlling the filtering behavior with approximate nearest neighbor search](https://blog.vespa.ai/constrained-approximate-nearest-neighbor-search/#controlling-the-filtering-behavior-with-approximate-nearest-neighbor-search)for more details.
Note that when using `pre-filtering` the following query operators are not included when evaluating the filter part of the query:
- [geoLocation](../reference/querying/yql.html#geolocation)
- [predicate](../reference/querying/yql.html#predicate)
These are instead evaluated after the approximate nearest neighbors are retrieved, more like a `post-filter`. This might cause the search to expose fewer hits to ranking than the wanted `targetHits`.
Since Vespa 8.78 the `pre-filter` can be evaluated using[multiple threads per query](../performance/practical-search-performance-guide.html#multithreaded-search-and-ranking). This can be used to reduce query latency for larger vector datasets where the cost of evaluating the `pre-filter` is significant. Note that searching the `HNSW` index is always single-threaded per query. Multithreaded evaluation when using `post-filtering` has always been supported, but this is less relevant as the `HNSW` index search first reduces the document candidate set based on `targetHits`.
## Nearest Neighbor Search Considerations
- **targetHits**: The [targetHits](../reference/querying/yql.html#targethits)specifies how many hits one wants to expose to [ranking](../basics/ranking.html) _per content node_. Nearest neighbor search is typically used as an efficient retriever in a [phased ranking](../ranking/phased-ranking.html)pipeline. See [performance sizing](../performance/sizing-search.html).
- **Pagination**: Pagination uses the standard [hits](../reference/api/query.html#hits) and [offset](../reference/api/query.html#offset) query api parameters. There is no caching of results in between pagination requests, so a query for a higher `offset` will cause the search to be performed over again. This aspect is no different from [sparse search](../ranking/wand.html) not using nearest neighbor query operator.
- **Total hit count is not accurate**: Technically, all vectors in the searchable index are neighbors. There is no strict boundary between a match and no match. Both exact (`approximate:false`) and approximate (`approximate:true`) usages of the [nearestNeighbor](../reference/querying/yql.html#nearestneighbor) query operator does not produce an accurate `totalCount`. This is the same behavior as with sparse dynamic pruning search algorithms like[weakAnd](../reference/querying/yql.html#weakand) and [wand](../reference/querying/yql.html#wand).
- **Grouping** counts are not accurate: Grouping counts from [grouping](grouping.html) are not accurate when using [nearestNeighbor](../reference/querying/yql.html#nearestneighbor)search. This is the same behavior as with other dynamic pruning search algorithms like[weakAnd](../reference/querying/yql.html#weakand) and[wand](../reference/querying/yql.html#wand). See the [Result diversification](https://blog.vespa.ai/result-diversification-with-vespa/) blog post on how grouping can be combined with nearest neighbor search.
## Scaling Approximate Nearest Neighbor Search
### Memory
Vespa tensor fields are [in-memory](../content/attributes.html) data structures and so is the `HNSW` graph data structure. For large vector datasets the primary memory resource usage relates to the raw vector field memory usage.
Using lower tensor cell type precision can reduce memory footprint significantly, for example using `bfloat16` instead of `float` saves close to 50% memory usage without significant accuracy loss.
Vespa [tensor cell value types](../performance/feature-tuning.html#cell-value-types) include:
- `int8` - 1 byte per value. Also used to represent [packed binary values](../rag/binarizing-vectors.html).
- `bfloat16` - 2 bytes per value. See [bfloat16 floating-point format](https://en.wikipedia.org/wiki/Bfloat16_floating-point_format).
- `float` - 4 bytes per value. Standard float.
- `double` - 8 bytes per value. Standard double.
### Search latency and document volume
The `HNSW` greedy search algorithm is sublinear (close to log(N) where N is the number of vectors in the graph). This has interesting properties when attempting to add more nodes horizontally using [flat data distribution](../performance/sizing-search.html#data-distribution). Even if the document volume per node is reduced by a factor of 10, the search latency is only reduced by 50%. Still, flat scaling helps scale document volume, and increasing indexing throughput as vectors are partitioned randomly over a set of nodes.
Pure vector search applications (without filtering, or re-ranking) should attempt to scale up document volume by using larger instance type and maximize the number of vectors per node. To scale with query throughput, use [grouped data distribution](../performance/sizing-search.html#data-distribution) to replicate content.
Note that strongly sublinear search is not necessarily true if the application uses nearest neighbor search for candidate retrieval in a [multiphase ranking](../phased-ranking.html) pipeline, or combines nearest neighbor search with filters.
## HNSW Operations
Changing the [distance-metric](../reference/schemas/schemas.html#distance-metric)for a tensor field with `hnsw` index requires [restarting](../reference/schemas/schemas.html#changes-that-require-restart-but-not-re-feed), but not re-indexing (re-feed vectors). Similar, changing the `max-links-per-node` and`neighbors-to-explore-at-insert` construction parameters requires re-starting.
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [Using Vespa's approximate nearest neighbor search](#using-vespas-approximate-nearest-neighbor-search)
- [Indexing throughput](#indexing-throughput)
- [Memory usage](#memory-usage)
- [Accuracy](#accuracy)
- [Using approximate nearest neighbor search](#using-approximate-nearest-neighbor-search)
- [Combining approximate nearest neighbor search with filters](#combining-approximate-nearest-neighbor-search-with-filters)
- [Nearest Neighbor Search Considerations](#nearest-neighbor-search-considerations)
- [Scaling Approximate Nearest Neighbor Search](#scaling-approximate-nearest-neighbor-search)
- [Memory](#memory)
- [Search latency and document volume](#search-latency-and-document-volume)
- [HNSW Operations](#hnsw-operations)
---
# Source: https://docs.vespa.ai/en/operations/kubernetes/architecture.html.md
# Architecture

The Vespa Operator is an implementation of the [Operator Pattern](https://kubernetes.io/docs/concepts/extend-kubernetes/operator/) that extends Kubernetes with custom orchestration capabilities for Vespa. It relies on a [Custom Resource Definition](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/) called a `VespaSet`, which represents a quorum of [ConfigServers](https://docs.vespa.ai/en/operations/self-managed/configuration-server.html) in a Kubernetes namespace. The Vespa Operator is responsible for the deployment and lifecycle of the `VespaSet` resource and its ConfigServers, which collectively entails the infrastructure for Vespa on Kubernetes.
[Application Packages](https://docs.vespa.ai/en/basics/applications.html) are deployed to the [ConfigServers](https://docs.vespa.ai/en/operations/self-managed/configuration-server.html) to create Vespa applications. The ConfigServers will dynamically instantiate the services as individual Pods based on the settings provided in the Application Package. After an Application Package is deployed, the ConfigServers will remain responsible for the management and lifecycle of the Vespa application.
Copyright © 2026 - [Cookie Preferences](#)
---
# Source: https://docs.vespa.ai/en/operations/archive/archive-guide-aws.html.md
# AWS Archive guide
Vespa Cloud exports log data, heap dumps, and Java Flight Recorder sessions to buckets in AWS S3. This guide explains how to access this data. Access to the data must happen through an AWS account controlled by the tenant. Data traffic to access this data is charged to this AWS account.
These resources are needed to get started:
- An AWS account
- An IAM Role in that AWS account
- The [AWS command line client](https://aws.amazon.com/cli/)
Access is configured through the Vespa Cloud Console in the tenant account screen. Choose the "archive" tab to see the settings below.
## Register IAM Role

First, the IAM Role must be granted access to the S3 buckets in Vespa Cloud. This is done by entering the IAM Role in the setting seen above. Vespa Cloud will then grant access to that role to the S3 buckets.
## Grant access to Vespa Cloud resources

Second, the IAM Role must be granted access to resources inside Vespa Cloud. AWS requires both permissions to be registered in both Vespa Cloud's AWS account (step 1) and the tenant's AWS account (step 2). Copy the policy from the user interface and attach it to the IAM Role - or make your own equivalent policy should you have other requirements. For more information, see the [AWS documentation](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_manage-attach-detach.html).
## Access files using AWS CLI

Once permissions have been granted, the IAM Role can access the contents of the archive buckets. Any AWS S3 client will work, but the AWS command line client is an easy tool to use. The settings page will list all buckets where data is stored, typically one bucket per zone the tenant has applications.
The `--request-payer=requester` parameter is mandatory to make sure network traffic is charged to the correct AWS account.
Refer to [access-log-lambda](https://github.com/vespa-cloud/vespa-documentation-search/blob/main/access-log-lambda/README.md)for how to install and use `aws cli`, which can be used to download logs as in the illustration, or e.g. list objects:
```
$ aws s3 ls --profile=archive --request-payer=requester \
s3://vespa-cloud-data-prod.aws-us-east-1c-9eb633/vespa-team/
PRE album-rec-searcher/
PRE cord-19/
PRE vespacloud-docsearch/
```
In the example above, the S3 bucket name is _vespa-cloud-data-prod.aws-us-east-1c-9eb633_and the tenant name is _vespa-team_ (for that particular prod zone). Archiving is per tenant, and a log file is normally stored with a key like:
```
/vespa-team/vespacloud-docsearch/default/h2946a/logs/access/JsonAccessLog.default.20210629100001.zst
```
The URI to this object is hence:
```
s3://vespa-cloud-data-prod.aws-us-east-1c-9eb633/vespa-team/vespacloud-docsearch/default/h2946a/logs/access/JsonAccessLog.default.20210629100001.zst
```
Objects are exported once generated - access log files are compressed and exported at least once per hour.
If you are having problems accessing the files, please run
```
aws sts get-caller-identity
```
to verify that you are correctly assuming the role which has been granted access.
## Lambda processing
When processing logs using a lambda function, write a minimal function to list objects, to sort out access / keys / roles:
```
const aws = require("aws-sdk");
const s3 = new aws.S3({ apiVersion: "2006-03-01" });
const findRelevantKeys = ({ Bucket, Prefix }) => {
console.log(`Finding relevant keys in bucket ${Bucket}`);
return s3
.listObjectsV2({ Bucket: Bucket, Prefix: Prefix, RequestPayer: "requester" })
.promise()
.then((res) =>
res.Contents.map((content) => ({ Bucket, Key: content.Key }))
)
.catch((err) => Error(err));
};
exports.handler = async (event, context) => {
const options = { Bucket: "vespa-cloud-data-prod.aws-us-east-1c-9eb633", Prefix: "MY-TENANT-NAME/" };
return findRelevantKeys(options)
.then((res) => {
console.log("response: ", res);
return { statusCode: 200 };
})
.catch((err) => ({ statusCode: 500, message: err }));
};
```
Note: Always set `RequestPayer: "requester"` to access the objects - transfer cost is assigned to the requester.
Once the above lists the log files from S3, review [access-log-lambda](https://github.com/vespa-cloud/vespa-documentation-search/blob/main/access-log-lambda/README.md)for how to write a function to decompress and handle the log data.
Copyright © 2026 - [Cookie Preferences](#)
---
# Source: https://docs.vespa.ai/en/operations/archive/archive-guide-gcp.html.md
# GCP Archive guide
Vespa Cloud exports log data, heap dumps, and Java Flight Recorder sessions to buckets in Google Cloud Storage. This guide explains how to access this data. Access to the data is through a GCP project controlled by the tenant. Data traffic to access this data is charged to this GCP project.
These resources are needed to get started:
- A GCP project
- A Google user account
- The [gcloud command line interface](https://cloud.google.com/sdk/docs/install)
Access is configured through the Vespa Cloud Console in the tenant account screen. Choose the "archive" tab, then "GCP" tab to see the settings below.
## Register IAM principal

First, a principal must be granted access to the Cloud Storage bucket in Vespa Cloud. This is done by entering a [principal](https://cloud.google.com/iam/docs/overview) with a supported prefix. See the accepted format in the description below the input field.
## Access files using Gcloud CLI

Once permissions have been granted, the GPC member can access the contents of the archive buckets. Any Cloud Storage client will work, but the `gsutil` command line client is an easy tool to use. The settings page will list all buckets where data is stored, typically one bucket per zone the tenant has applications.
The `-u user-project` parameter is mandatory to make sure network traffic is charged to the correct GCP project.
```
$ gsutil -u my-project ls \
gs://vespa-cloud-data-prod.gcp-us-central1-f-73770f/vespa-team/
gs://vespa-cloud-data-prod.gcp-us-central1-f-73770f/vespa-team/album-rec-searcher/
gs://vespa-cloud-data-prod.gcp-us-central1-f-73770f/vespa-team/cord-19/
gs://vespa-cloud-data-prod.gcp-us-central1-f-73770f/vespa-team/vespacloud-docsearch/
```
In the example above, the bucket name is _vespa-cloud-data-prod.gcp-us-central1-f-73770f_and the tenant name is _vespa-team_ (for that particular prod zone). Archiving is per tenant, and a log file is normally stored with a key like:
```
/vespa-team/vespacloud-docsearch/default/h7644a/logs/access/JsonAccessLog.20221011080000.zst
```
The URI to this object is hence:
```
gs://vespa-cloud-data-prod.gcp-us-central1-f-73770f/vespa-team/vespacloud-docsearch/default/h2946a/logs/access/JsonAccessLog.default.20210629100001.zst
```
Objects are exported once generated - access log files are compressed and exported at least once per hour.
Note: Always set a user project to access the objects - transfer cost is assigned to the requester.
Copyright © 2026 - [Cookie Preferences](#)
---
# Source: https://docs.vespa.ai/en/operations/archive/archive-guide.html.md
# Archive guide
Vespa Cloud exports log data, heap dumps, and Java Flight Recorder sessions to storage buckets. The bucket system used will depend on which cloud provider is backing the zone your application is running in. AWS S3 will be used in the AWS zones, and Cloud Storage will be used in the GCP zones.
How to access and use the storage buckets is found in the documentation for the respective cloud providers:
- [AWS S3](archive-guide-aws)
- [Google Cloud Storage](archive-guide-gcp)
## Examples
These examples use GCP as source, replace with AWS commands as needed. Here, _resonant-triode-123456_ is the Google project ID that owns the target bucket _my\_access\_logs_ for data copy (and will get the data download cost, if any).
Use the CLUSTERS view in the Vespa Cloud Console to find hostname(s) for the nodes to export logs from - then list contents:
```
$ gsutil -u resonant-triode-123456 ls \
gs://vespa-cloud-data-prod-gcp-us-central1-f-73770f/mytenant/myapp/
$ gsutil -u resonant-triode-123456 ls \
gs://vespa-cloud-data-prod-gcp-us-central1-f-73770f/mytenant/myapp/myinstance
$ gsutil -u resonant-triode-123456 ls \
gs://vespa-cloud-data-prod-gcp-us-central1-f-73770f/mytenant/myapp/myinstance/h404a/logs/access
```
Copy files for a host to the _my\_access\_logs_ bucket:
```
$ gsutil -u resonant-triode-123456 \
-m -o "GSUtil:parallel_process_count=1" \
cp -r \
gs://vespa-cloud-data-prod-gcp-us-central1-f-73770f/vespa-team/vespacloud-docsearch/default/h404a \
gs://my_access_logs/vespa-files
```
`rsync` can be used to reduce number of files copied, using `-x` to exclude paths:
```
$ gsutil -u resonant-triode-123456 \
-m -o "GSUtil:parallel_process_count=1" \
rsync -r \
-x '.*/connection/.*|.*/vespa/.*|.*/zookeeper/.*' \
gs://vespa-cloud-data-prod-gcp-us-central1-f-73770f/vespa-team/vespacloud-docsearch/default/h404a \
gs://my_access_logs/vespa-files
```
Refer to [cloud-functions](https://github.com/vespa-engine/sample-apps/tree/master/examples/google-cloud/cloud-functions)and [lambda](https://github.com/vespa-engine/sample-apps/tree/master/examples/aws/lambda)for how to write and deploy simple functions to process files in Google Cloud and AWS.
For local processing, copy files for a host to local file system (or use `rsync`):
```
$ gsutil -u resonant-triode-123456 \
-m -o "GSUtil:parallel_process_count=1" \
cp -r \
gs://vespa-cloud-data-prod-gcp-us-central1-f-73770f/vespa-team/vespacloud-docsearch/default/h404a \
.
```
Use [zstd](https://facebook.github.io/zstd/) to decompress files:
```
$ zstd -d *
```
Example: Filter out healthchecks using [jq](https://stedolan.github.io/jq/):
```
$ cat JsonAccessLog.20230117* | jq '. |
select (.uri != "/status.html") |
select (.uri != "/state/v1/metrics") |
select (.uri != "/state/v1/health")'
```
Add a human-readable date field per access log entry:
```
$ cat JsonAccessLog.20230117* | jq '. |
select (.uri != "/status.html") |
select (.uri != "/state/v1/metrics") |
select (.uri != "/state/v1/health") |
. +{iso8601date:(.time|todateiso8601)}'
```
Copyright © 2026 - [Cookie Preferences](#)
---
# Source: https://docs.vespa.ai/en/operations/enclave/archive.html.md
# Log archive in Vespa Cloud Enclave
**Warning:** The structure of log archive buckets may change without notice
After Vespa Cloud Enclave is established in your cloud provider account using Terraform, the module will have created a storage bucket per Vespa Cloud zone you configured in your enclave. These storage buckets are used to archive logs from the machines that run Vespa inside your account.
There will be one storage bucket per Vespa Cloud Zone that is configured in the enclave. The name of the bucket will depend on the cloud provider you are setting up the enclave in.
Files are synchronized to the archive bucket when the file is rotated by the logging system, or when a virtual machine is deprovisioned from the application. The consequence of this is that frequency of uploads will depend on the activity of the Vespa application.
## Directory structure
The directory structure in the bucket is as follows:
```
////logs//
```
- `tenant` is the tenant ID.
- `application` is the application ID that generated the log.
- `instance` is the instance ID of the generated log, e.g. `default`.
- `host` is the name prefix of the host that generated the log, e.g. `e103a`.
- `logtype` is the type of log in the directory (see below).
- `logfile` is the specific file of the log.
## Log types
There are three log types that are synced to this bucket.
- `vespa`: [Vespa logs](../../reference/operations/log-files.html)
- `access`: [Access logs](../access-logging.html)
- `connection`: [Connection logs](../access-logging.html#connection-log)
Copyright © 2026 - [Cookie Preferences](#)
---
# Source: https://docs.vespa.ai/en/content/attributes.html.md
# Attributes
An _attribute_ is a [schema](../reference/schemas/schemas.html#attribute) keyword, specifying the indexing for a field:
```
field price type int {
indexing: attribute
}
```
Attribute properties and use cases:
- Flexible [match modes](../reference/schemas/schemas.html#match) including exact match, prefix match, and case-sensitive matching, but not text matching (tokenization and linguistic processing).
- High sustained update rates (avoiding read-apply-write patterns). Any mutating operation against an attribute field is written to Vespa's [transaction log](proton.html#transaction-log) and persisted, but appending to the log is sequential access, not random. Read more in [partial updates](../writing/partial-updates.html).
- Instant query updates - values are immediately searchable.
- [Document Summaries](../querying/document-summaries.html) are memory-only operations if all fields are attributes.
- [Numerical range queries](../reference/querying/yql.html#numeric).
```
where price > 100
```
- [Grouping](../querying/grouping.html) - aggregate results into groups - it is also great for generating diversity in results.
```
all(group(customer) each(max(3) each(output(summary()))))
```
- [Ranking](../basics/ranking.html) - use attribute values directly in rank functions.
```
rank-profile rank_fresh {
first-phase {
expression { freshness(timestamp) }
}
}
```
- [Sorting](../reference/querying/sorting-language.html) - order results by attribute value.
```
order by price asc, release_date desc
```
- [Parent/child](../schemas/parent-child.html) - import attribute values from global parent documents.
```
import field advertiser_ref.name as advertiser_name {}
```
The other field option is _index_ - use [index](proton.html#index) for fields used for [text search](../querying/text-matching.html), with [stemming](../linguistics/linguistics-opennlp.html#stemming) and [normalization](../linguistics/linguistics-opennlp.html#normalization).
An attribute is an in-memory data structure. Attributes speed up query execution and [document updates](../writing/partial-updates.html), trading off memory. As data structures are regularly optimized, consider both static and temporary resource usage - see [attribute memory usage](#attribute-memory-usage) below. Use attributes in document summaries to limit access to storage to generate result sets.

Configuration overview:
| fast-search |
Also see the [reference](../reference/schemas/schemas.html#attribute). Add an [index structure](#index-structures) to improve query performance:
```
field titles type array {
indexing : summary | attribute
attribute: fast-search
}
```
|
| fast-access |
For high-throughput updates, all nodes with a replica should have the attribute loaded in memory. Depending on replication factor and other configuration, this is not always the case. Use [fast-access](../reference/schemas/schemas.html#attribute) to increase feed rates by having replicas on all nodes in memory - see the [reference](../reference/schemas/schemas.html#attribute) and [sizing feeding](../performance/sizing-feeding.html).
```
field titles type array {
indexing : summary | attribute
attribute: fast-access
}
```
|
| distance-metric |
Features like [nearest neighbor search](../querying/nearest-neighbor-search) require a [distance-metric](../reference/schemas/schemas.html#distance-metric), and can also have an `hsnw index` to speed up queries. Read more in [approximate nearest neighbor](../querying/approximate-nn-hnsw). Pay attention to the field's `index` setting to enable the index:
```
field image_sift_encoding type tensor(x[128]) {
indexing: summary | attribute |indexattribute {
distance-metric: euclidean
}index{
hnsw {
max-links-per-node: 16
neighbors-to-explore-at-insert: 500
}
}
}
```
|
## Data structures
The attribute field's data type decides which data structures are used by the attribute to store values for that field across all documents on a content node. For some data types, a combination of data structures is used:
- _Attribute Multivalue Mapping_ stores arrays of values for array and weighed set types.
- _Attribute Enum Store_ stores unique strings for all string attributes and unique values for attributes with [fast-search](attributes.html#fast-search).
- _Attribute Tensor Store_ stores tensor values for all tensor attributes.
In the following illustration, a row represents a document, while a named column represents an attribute.

Attributes can be:
| Type | Size | Description |
| --- | --- | --- |
| Single-valued | Fixed | Like the "A" attribute, example `int`. The element size is the size of the type, like 4 bytes for an integer. A memory buffer (indexed by Local ID) holds all values directly. |
| Multi-valued | Fixed | Like the "B" attribute, example `array`. A memory buffer (indexed by Local ID) is holding references (32 bit) to where in the _Multivalue Mapping_ the arrays are stored. The _Multivalue Mapping_ consists of multiple memory buffers, where arrays of the same size are co-located in the same buffer. |
| Multi-valued | Variable | Like the "B" attribute, example `array`. A memory buffer (indexed by Local ID) is holding references (32 bit) to where in the _Multivalue Mapping_ the arrays are stored. The unique strings are stored in the _Enum Store_, and the arrays in the _Multivalue Mapping_ stores the references (32 bit) to the strings in the _Enum Store_. The _Enum Store_ consists of multiple memory buffers. |
| Single-valued | Variable | Like the "C" attribute, example `string`. A memory buffer (indexed by Local ID) is holding references (32 bit) to where in the _Enum Store_ the strings are stored. |
| Tensor | Fixed / Variable | Like the "D" attribute, example `tensor(x{},y[64])`. A memory buffer (indexed by Local ID) is holding references (32 bit) to where in the _Tensor Store_ the tensor values are stored. The memory layout in the _Tensor Store_ depends on the tensor type. |
The "A", "B", "C" and "D" attribute memory buffers have attribute values or references in Local ID (LID) order - see [document meta store](#document-meta-store).
When updating an attribute, the full value is written. This also applies to [multivalue](../basics/schemas.html#document-fields) fields - example adding an item to an array:
1. Space for the new array is reserved in a memory buffer
2. The current value is copied
3. The new element is written
This means that larger fields will copy more data at updates. It also implies that updates to [weighted sets](../reference/schemas/schemas.html#weightedset) are faster when using numeric keys (less memory and easier comparisons).
Data stored in the _Multivalue Mapping_, _Enum Store_ and _Tensor Store_ is referenced using 32 bit references. This address space can go full, and then feeding is blocked - [learn more](../writing/feed-block.html). For array or weighted set attributes, the max limit on the number of documents that can have the same number of values is approx 2 billion per node. For string attributes or attributes with [fast-search](attributes.html#fast-search), the max limit on the number of unique values is approx 2 billion per node.
## Index structures
Without `fast-search`, attribute access is a memory lookup, being one value or all values, depending on query execution. An attribute is a linear array-like data structure - matching documents potentially means scanning _all_ attribute values.
Setting [fast-search](../reference/schemas/schemas.html#attribute) creates an index structure for quicker lookup and search. This consists of a [dictionary](../reference/schemas/schemas.html#dictionary) pointing to posting lists. This uses more memory, and also more CPU when updating documents. It increases steady state memory usage for all attribute types and also add initialization overhead for numeric types.
The default dictionary is a b-tree of attribute _values_, pointing to an _occurrence_ b-tree (posting list) of local doc IDs for each value, exemplified in the A-attribute below. Using `dictionary: hash` on the attribute generates a hash table of attributes values pointing to the posting lists, as in the C-attribute (short posting lists are represented as arrays instead of b-trees):

Notes:
- If a value occurs in many documents, the _occurrence_ b-tree grows large. For such values, a boolean-occurrence list (i.e. bitvector) is generated in addition to the b-tree.
- Setting `fast-search` is not observable in the files on disk, other than size.
- `fast-search` causes a memory increase even for empty fields, due to the extra index structures created. E.g. single value fields will have the "undefined value" when empty, and there is a posting list for this value.
- The _value_ b-tree enables fast range-searches in numerical attributes. This is also available for `hash`-based dictionaries, but slower as a full scan is needed.
Using `fast-search` has many implications, read more in [when to use fast-search](../performance/feature-tuning.html#when-to-use-fast-search-for-attribute-fields).
## Attribute memory usage
Attribute structures are regularly optimized, and this causes temporary resource usage - read more in [maintenance jobs](proton.html#proton-maintenance-jobs). The memory footprint of an attribute depends on a few factors, data type being the most important:
- Numeric (int, long, byte, and double) and Boolean (bit) types - fixed length and fix cost per document
- String type - the footprint depends on the length of the strings and how many unique strings that needs to be stored.
Collection types like array and weighted sets increases the memory usage some, but the main factor is the average number of values per document. String attributes are typically the largest attributes, and requires most memory during initialization - use boolean/numeric types where possible. Example, refer to formulas below:
```
schema foo {
document bar {
field titles type array {
indexing: summary | attribute
}
}
}
```
- Assume average 10 values per document, average string length 15, 100k unique strings and 20M documents.
- Steady state memory usage is approx 1 GB (20M\*4\*(6/5) + 20M\*10\*4\*(6/5) + 100k\*(15+1+4+4)\*(6/5)).
- During initialization (loading attribute from disk) an additional 2.4 GB is allocated (20M\*10\*(4+4+4), for each value:
- local document id
- enum value
- weight
- Increasing the average number of values per document to 20 (double) will also double the memory footprint during initialization (4.8 GB).
When doing the capacity planning, keep in mind the maximum footprint, which occurs during initialization. For the steady state footprint, the number of unique values is important for string attributes.
Check the [Example attribute sizing spreadsheet](../../assets/attribute-memory-Vespa.xls), with various data types and collection types. It also contains estimates for how many documents a 48 GB RAM node can hold, taking initialization into account.
[Multivalue](../basics/schemas.html#document-fields) attributes use an adaptive approach in how data is stored in memory, and up to 2 billion documents per node is supported.
**Pro-tip:** The proton _/state/v1/_ interface can be explored for attribute memory usage. This is an undocumented debug-interface, subject to change at any moment - example: _http://localhost:19110/state/v1/custom/component/documentdb/music/subdb/ready/attribute/artist_
## Attribute file usage
Attribute data is stored in two locations on disk:
- The attribute store in memory, which is regularly flushed to disk. At startup, the flushed files are used to quickly populate the memory structures, resulting in a much quicker startup compared to generating the attribute store from the source in the document store.
- The document store on disk. Documents here are used to (re)generate index structures, as well as being the source for replica generation across nodes.
The different field types use various data types for storage, see below, a conservative rule of thumb for steady-state disk usage is hence twice the data size.
## Sizing
Attribute sizing is not an exact science but rather an approximation. The reason is that they vary in size. Both the number of documents, number of values, and uniqueness of the values are variable. The components of the attributes that occupy memory are:
| Abbreviation | Concept | Comment |
| --- | --- | --- |
| D | Number of documents | Number of documents on the node, or rather the maximum number of local document ids allocated |
| V | Average number of values per document | Only applicable for arrays and weighted sets |
| U | Number of unique values | Only applies for strings or if [fast-search](../reference/schemas/schemas.html#attribute) is set |
| FW | Fixed data width | sizeof(T) for numerics, 1 byte for strings, 1 bit for boolean |
| WW | Weight width | Width of the weight in a weighted set, 4 bytes. 0 bytes for arrays. |
| EIW | Enum index width | Width of the index into the enum store, 4 bytes. Used by all strings and other attributes if [fast-search](../reference/schemas/schemas.html#attribute) is set |
| VW | Variable data width | strlen(s) for strings, 0 bytes for the rest |
| PW | Posting entry width | Width of a posting list entry, 4 bytes for singlevalue, 8 bytes for array and weighted sets. Only applies if [fast-search](../reference/schemas/schemas.html#attribute) is set. |
| PIW | Posting index width | Width of the index into the store of posting lists; 4 bytes |
| MIW | Multivalue index width | Width of the index into the multivalue mapping; 4 bytes |
| ROF | Resize overhead factor | Default is 6/5. This is the average overhead in any dynamic vector due to resizing strategy. Resize strategy is 50% indicating that structure is 5/6 full on average. |
### Components
| Component | Formula | Approx Factor | Applies to |
| --- | --- | --- | --- |
| Document vector | D \* ((FW or EIW) or MIW) | ROF | FW for singlevalue numeric attributes and MIW for multivalue attributes. EIW for singlevalue string or if the attribute is singlevalue fast-search |
| Multivalue mapping | D \* V \* ((FW or EIW) + WW) | ROF | Applicable only for array or weighted sets. EIW if string or fast-search |
| Enum store | U \* ((FW + VW) + 4 + ((EIW + PIW) or EIW)) | ROF | Applicable for strings or if fast-search is set. (EIW + PIW) if fast-search is set, EIW otherwise. |
| Posting list | D \* V \* PW | ROF | Applicable if fast-search is set |
### Variants
| Type | Components | Formula |
| --- | --- | --- |
| Numeric singlevalue plain | Document vector | D \* FW \* ROF |
| Numeric multivalue value plain | Document vector, Multivalue mapping | D \* MIW \* ROF + D \* V \* (FW+WW) \* ROF |
| Numeric singlevalue fast-search | Document vector, Enum store, Posting List | D \* EIW \* ROF + U \* (FW+4+EIW+PIW) \* ROF + D \* PW \* ROF |
| Numeric multivalue value fast-search | Document vector, Multivalue mapping, Enum store, Posting List | D \* MIW \* ROF + D \* V \* (EIW+WW) \* ROF + U \* (FW+4+EIW+PIW) \* ROF + D \* V \* PW \* ROF |
| Singlevalue string plain | Document vector, Enum store | D \* EIW \* ROF + U \* (FW+VW+4+EIW) \* ROF |
| Singlevalue string fast-search | Document vector, Enum store, Posting List | D \* EIW \* ROF + U \* (FW+VW+4+EIW+PIW) \* ROF + D \* PW \* ROF |
| Multivalue string plain | Document vector, Multivalue mapping, Enum store | D \* MIW \* ROF + D \* V \* (EIW+WW) \* ROF + U \* (FW+VW+4+EIW) \* ROF |
| Multivalue string fast-search | Document vector, Multivalue mapping, Enum store, Posting list | D \* MIW \* ROF + D \* V \* (EIW+WW) \* ROF + U \* (FW+VW+4+EIW+PIW) \* ROF + D \* V \* PW \* ROF |
| Boolean singlevalue | Document vector | D \* FW \* ROF |
## Paged attributes
Regular attribute fields are guaranteed to be in-memory, while the [paged](../reference/schemas/schemas.html#attribute) attribute setting allows paging the attribute data out of memory to disk. The `paged` setting is _not_ supported for the following types:
- [tensor](../reference/schemas/schemas.html#tensor) with [fast-rank](../reference/schemas/schemas.html#attribute).
- [predicate](../reference/schemas/schemas.html#predicate).
For attribute fields using [fast-search](../reference/schemas/schemas.html#attribute), the memory needed for dictionary and index structures are never paged out to disk.
Using the `paged` setting for attributes is an alternative when there are memory resource constraints and the attribute data is only accessed by a limited number of hits per query during ranking. E.g. a dense tensor attribute which is only used during a [re-ranking phase](../ranking/phased-ranking.html), where the number of attribute accesses are limited by the re-ranking phase count.
For example using a second phase [rerank-count](../reference/schemas/schemas.html#secondphase-rerank-count) of 100 will limit the maximum number of page-ins/disk access per query to 100. Running at 100 QPS would need up to 10K disk accesses per second. This is the worst case if none of the accessed attribute data were paged into memory already. This depends on access locality and memory pressure (size of the attribute data versus available memory).
In this example, we have a dense tensor with 1024 [int8](../reference/ranking/tensor.html#tensor-type-spec) values. The tensor attribute is only accessed during re-ranking (second-phase ranking expression):
```
schema foo {
document foo {
field tensordata type tensor(x[1024]) {
indexing: attribute
attribute: paged
}
}
rank-profile foo {
first-phase {}
second-phase {
rerank-count: 100
expression: sum(attribute(tensordata))
}
}
}
```
For some use cases where serving latency SLA is not strict and query throughput is low, the `paged` attribute setting might be a tuning alternative, as it allows storing more data per node.
### Paged attributes disadvantages
The disadvantages of using _paged_ attributes are many:
- Unpredictable query latency as attribute access might touch disk. Limited queries per second throughput per node (depends on the locality of document re-ranking requests).
- Paged attributes are implemented by file-backed memory mappings. The performance depends on the [Linux virtual memory management](https://tldp.org/LDP/tlk/mm/memory.html) ability to page data in and out. Using many threads per search/high query throughput might cause high system (kernel) CPU and system unresponsiveness.
- The content node's total memory utilization will be close to 100% when using paged attributes. It's up to the Linux kernel to determine what part of the attribute data is paged into memory based on access patterns. A good understanding of how the Linux virtual memory management system works is recommended before enabling paged attributes.
- The[memory usage metrics](/en/performance/sizing-search.html#metrics-for-vespa-sizing)from content nodes are not reflecting the reality when using paged attributes. They can indicate a usage that is much higher than the available memory on the node. This is because attribute memory usage is reported as the amount of data contained in the attribute, and whether this data is paged out to disk is controlled by the Linux kernel.
- Using paged attributes doubles the disk usage of attribute data. For example if the original attribute size is 92 GB (100M documents of the above 1024 int8 per document schema), using the `paged` setting will double the attribute disk usage to close to 200 GB.
- Changing the `paged` setting (e.g. removing the option) on a running system might cause hard out-of-memory situations as without `paged`, the content nodes will attempt loading the attribute into memory without the option for page outs.
- Using a paged attribute in [first-phase](../ranking/phased-ranking.html) ranking can result in extremely high query latency if a large amount of the corpus is retrieved by the query. The number of disk accesses will, in the worst case, be equal to the number of hits the query produces. A similar problem can occur if running a query that searches a paged attribute.
- Using `paged` in combination with [HNSW indexing](../querying/approximate-nn-hnsw) is _strongly_ discouraged._HNSW_ indexing also searches and reads tensors during indexing, causing random access during feeding. Once the system memory usage reaches 100%, the Linux kernel will start paging pages in and out of memory. This can cause a high system (kernel) CPU and slows down HNSW indexing throughput significantly.
## Mutable attributes
[Mutable attributes](../reference/schemas/schemas.html#mutate) is document metadata for matching and ranking performance per document.
The attribute values are mutated as part of query execution, as defined in rank profiles - see [rank phase statistics](../ranking/phased-ranking.html#rank-phase-statistics) for details.
## Document meta store
The document meta store is an in-memory data structure for all documents on a node. It is an _implicit attribute_, and is [compacted](proton.html#lid-space-compaction) and [flushed](proton.html#attribute-flush). Memory usage for applications with small documents / no other attributes can be dominated by this attribute.
The document meta store scales linearly with number of documents - using approximately 30 bytes per document. The metric _content.proton.documentdb.ready.attribute.memory\_usage.allocated\_bytes_ for `"field": "[documentmetastore]"` is the size of the document meta store in memory - use the [metric API](../reference/api/state-v1.html#state-v1-metrics) to find the size - in this example, the node has 9M ready documents with 52 bytes in memory per document:
```
{
"name": "content.proton.documentdb.ready.attribute.memory_usage.allocated_bytes",
"description": "The number of allocated bytes",
"values": {
"average": 4.69736008E8,
"count": 12,
"rate": 0.2,
"min": 469736008,
"max": 469736008,"last": 469736008},
"dimensions": {
"documenttype": "doctype","field": "[documentmetastore]"}
},
```
The above is for the _ready_ documents, also check _removed_ and _notready_ - refer to [sub-databases](proton.html#sub-databases).
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [Data structures](#data-structures)
- [Index structures](#index-structures)
- [Attribute memory usage](#attribute-memory-usage)
- [Attribute file usage](#attribute-file-usage)
- [Sizing](#sizing)
- [Components](#components)
- [Variants](#variants)
- [Paged attributes](#paged-attributes)
- [Paged attributes disadvantages](#paged-attributes-disadvantages)
- [Mutable attributes](#mutable-attributes)
- [Document meta store](#document-meta-store)
---
# Source: https://docs.vespa.ai/en/operations/automated-deployments.html.md
# Automated Deployments

Vespa Cloud provides:
- A [CD test framework](#cd-tests) for safe deployments to production zones.
- [Multi-zone deployments](#deployment-orchestration) with orchestration and test steps.
This guide goes through details of an orchestrated deployment. Read / try [production deployment](production-deployment.html) first to have a baseline understanding. The [developer guide](../applications/developer-guide.html) is useful for writing tests. Use [example GitHub Actions](#automating-with-github-actions) for automation.
## CD tests
Before deployment in production zones, [system tests](#system-tests) and [staging tests](#staging-tests) are run. Tests are run in a dedicated and [downsized](environments.html) environment. These tests are optional, see details in the sections below. Status and logs of ongoing tests can be found in the _Deployment_ view in the [Vespa Cloud Console](https://console.vespa-cloud.com/):

These tests are also run during [Vespa Cloud upgrades](#vespa-cloud-upgrades).
Find deployable example applications in [CI-CD](https://github.com/vespa-cloud/examples/tree/main/CI-CD).
### System tests
When a system test is run, the application is deployed in the [test environment](environments.html#test). The system test suite is then run against the endpoints of the test deployment. The test deployment is empty when the test execution begins. The application package and Vespa platform version is the same as that to be deployed to production.
A test suite includes at least one [system test](../applications/testing.html#system-tests). An application can be deployed to a production zone without system tests - this step will then only test that the application starts successfully. See [production deployment](production-deployment.html) for an example without tests.
Read more about [system tests](../applications/testing.html#system-tests).
### Staging tests
A staging test verifies the transition of a deployment of a new application package - i.e., from application package `Appold` to `Appnew`. A test suite includes at least one [staging setup](../applications/testing.html#staging-tests), and [staging test](../applications/testing.html#staging-tests).
1. All production zone deployments are polled for the current versions. As there can be multiple versions already being deployed (i.e. multiple `Appold`), there can be a series of staging test runs.
2. The application at revision `Appold` is deployed in the [staging environment](environments.html#staging).
3. The staging setup test code is run, typically making the cluster reasonably similar to a production cluster.
4. The test deployment is then upgraded to application revision `Appnew`.
5. Finally, the staging test code is run, to verify the deployment works as expected after the upgrade.
An application can be deployed to a production zone without staging tests - this step will then only test that the application starts successfully before and after the change. See [production deployment](production-deployment.html) for an example without tests.
Read more about [staging tests](../applications/testing.html#staging-tests).
### Disabling tests
To deploy without testing, remove the test files from the application package. Tests are always run, regardless of _deployment.xml_.
To temporarily deploy without testing, run `deploy` and hit the "Abort" button (see illustration above, hover over the test step in the Console) - this skips the test step and makes the orchestration progress to the next step.
### Running tests only
To run a system test, without deploying to any nodes after, add a new test instance. In _deployment.xml_, add the instance without `dev` or`prod` elements, like:
```
```
...
```
```
Note that this will leave an empty instance in the console, as the deployment is for testing only, so no resources deployed to after test.
Make sure to run `vespa prod deploy` to invoke the pipeline for testing, and use a separate application for this test.
## Deployment orchestration
The _deployment orchestration_ is flexible. One can configure dependencies between deployments to production zones, production verification tests, and configured delays; by ordering these in parallel and serial blocks of steps:

On a higher level, instances can also depend on each other in the same way. This makes it easy to configure a deployment process which gradually rolls out changes to increasingly larger subsets of production nodes, as confidence grows with successful production verification tests. Refer to [deployment.xml](../reference/applications/deployment.html) for details.
Deployments run sequentially by default, but can be configured to [run in parallel](../reference/applications/deployment.html). Inside each zone, Vespa Cloud orchestrates the deployment, such that the change is applied without disruption to read or write traffic against the application. A production deployment in a zone is complete when the new configuration is active on all nodes.
Most changes are instant, making this a quick process. If node restarts are needed, e.g., during platform upgrades, these will happen automatically and safely as part of the deployment. When this is necessary, deployments will take longer to complete.
System and staging tests, if present, must always be successfully run before the application package is deployed to production zones.
### Source code repository integration
Each new _submission_ is assigned an increasing build number, which can be used to track the roll-out of the new package to the instances and their zones. With the submission, add a source code repository reference for easy integration - this makes it easy to track changes:

Add the source diff link to the pull request - see example [GitHub Action](https://github.com/vespa-cloud/vespa-documentation-search/blob/main/.github/workflows/deploy-vespa-documentation-search.yaml):
```
$ vespa prod deploy \
--source-url "$(git config --get remote.origin.url | sed 's+git@\(.*\):\(.*\)\.git+https://\1/\2+')/commit/$(git rev-parse HEAD)"
```
### Block-windows
Use block-windows to block deployments during certain windows throughout the week, e.g., avoid rolling out changes during peak hours / during vacations. Hover over the instance (here "default") to find block status - see [block-change](../reference/applications/deployment.html#block-change):

### Validation overrides
Some configuration changes are potentially destructive / change the application behavior - examples are removing fields and changing linguistic processing. These changes are disallowed by default, the deploy-command will fail. To override and force a deploy, use a [validation override](../reference/applications/validation-overrides.html):
```
```
tensor-type-change
```
```
### Production tests
Production tests are optional and configured in [deployment.xml](../reference/applications/deployment.html). Production tests do not have access to the Vespa endpoints, for security reasons. Dependent steps in the release pipeline will stop if the tests fail, but upgraded regions will remain on the version where the test failed. A production test is hence used to block deployments to subsequent zones and only makes sense in a multi-zone deployment.
### Deploying Components
Vespa is [backwards compatible](../learn/releases.html#versions) within major versions, and major versions rarely change. This means that [Components](../applications/components.html) compiled against an older version of Vespa APIs can always be run on the same major version. However, if the application package is compiled against a newer API version, and then deployed to an older runtime version in production, it might fail. See [vespa:compileVersion](production-deployment.html#production-deployment-with-components) for how to solve this.
## Automating with GitHub Actions
Auto-deploy production applications using GitHub Actions - examples:
- [deploy-vector-search.yaml](https://github.com/vespa-cloud/vector-search/blob/main/.github/workflows/deploy-vector-search.yaml) deploys an application to a production environment - a good example to start from!
- [deploy.yaml](https://github.com/vespa-cloud/examples/blob/main/.github/workflows/deploy.yaml) deploys an application with basic HTTP tests.
- [deploy-vespa-documentation-search.yaml](https://github.com/vespa-cloud/vespa-documentation-search/blob/main/.github/workflows/deploy-vespa-documentation-search.yaml) deploys an application with Java-tests.
The automation scripts use an API-KEY to deploy:
```
$ vespa auth api-key
```
This creates a key, or outputs:
```
Error: refusing to overwrite /Users/me/.vespa/mytenant.api-key.pem
Hint: Use -f to overwrite it
This is your public key:
-----BEGIN PUBLIC KEY-----
ABCDEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEB2UFsh8ZjoWNtkrDhyuMyaZQe1ze
qLB9qquTKUDQTuM2LOr2dawUs02nfSc3UTfC08Lgr/dvnTnHpc0/fY+3Aw==
-----END PUBLIC KEY-----
Its fingerprint is:
12:34:56:78:65:30:77:90:30:ab:83:ee:a9:67:68:2c
To use this key in Vespa Cloud click 'Add custom key' at
https://console.vespa-cloud.com/tenant/mytenant/account/keys
and paste the entire public key including the BEGIN and END lines.
```
This means, if there is a key, it is not overwritten, it is safe to run. Make sure to add the deploy-key to the tenant using the Vespa Cloud Console.
After the deploy-key is added, everything is ready for deployment.
You can upload or create new Application keys in the console, and store them as a secret in the repository like the GitHub actions example above.
Some services like [Travis CI](https://travis-ci.com) do not accept multi-line values for Environment Variables in Settings. A workaround is to use the output of
```
$ openssl base64 -A -a < mykey.pem && echo
```
in a variable, say `VESPA_MYAPP_API_KEY`, in Travis Settings. `VESPA_MYAPP_API_KEY` is exported in the Travis environment, example output:
```
Setting environment variables from repository settings
$ export VESPA_MYAPP_API_KEY=[secure]
```
Then, before deploying to Vespa Cloud, regenerate the key value:
```
$ MY_API_KEY=`echo $VESPA_MYAPP_API_KEY | openssl base64 -A -a -d`
```
and use `${MY_API_KEY}` in the deploy command.
## Vespa Cloud upgrades
Vespa upgrades follows the same pattern as for new application revisions in [CD tests](#cd-tests), and can be tracked via its version number in the Vespa Cloud Console.
System tests are run the same way as for deploying a new application package.
A staging test verifies the upgrade from application package `Appold` to `Appnew`, and from Vespa platform version `Vold` to `Vnew`. The staging test then consists of the following steps:
1. All production zone deployments are polled for the current `Vold` / `Appold` versions. As there can be multiple versions already being deployed (i.e. multiple `Vold` / `Appold`), there can be a series of staging test runs.
2. The application at revision `Appold` is deployed on platform version `Vold`, to a zone in the [staging environment](environments.html#staging).
3. The _staging setup_ test code is run, typically making the cluster reasonably similar to a production cluster.
4. The test deployment is then upgraded to application revision `Appnew` and platform version `Vnew`.
5. Finally, the _staging test_ test code is run, to verify the deployment works as expected after the upgrade.
Note that one or both of the application revision and platform may be upgraded during the staging test, depending on what upgrade scenario the test is run to verify. These changes are usually kept separate, but in some cases is necessary to allow them to roll out together.
## Next steps
- Read more about [feature switches and bucket tests](../applications/testing.html#feature-switches-and-bucket-tests).
- A challenge with continuous deployment can be integration testing across multiple services: Another service depends on this Vespa application for its own integration testing. Use a separate [application instance](../reference/applications/deployment.html#instance) for such integration testing.
- Set up a deployment badge - available from the console's deployment view - example: 
- Set up a [global query endpoint](../reference/applications/deployment.html#endpoints-global).
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [CD tests](#cd-tests)
- [System tests](#system-tests)
- [Staging tests](#staging-tests)
- [Disabling tests](#disabling-tests)
- [Running tests only](#running-tests-only)
- [Deployment orchestration](#deployment-orchestration)
- [Source code repository integration](#source-code-repository-integration)
- [Block-windows](#block-windows)
- [Validation overrides](#validation-overrides)
- [Production tests](#production-tests)
- [Deploying Components](#deploying-components)
- [Automating with GitHub Actions](#automating-with-github-actions)
- [Vespa Cloud upgrades](#vespa-cloud-upgrades)
- [Next steps](#next-steps)
---
# Source: https://docs.vespa.ai/en/operations/autoscaling.html.md
# Autoscaling
Autoscaling lets you adjust the hardware resources allocated to application clusters automatically depending on actual usage. It will attempt to keep utilization of all allocated resources close to ideal, and will automatically reconfigure to the cheapest option allowed by the ranges when necessary.
You can turn it on by specifying _ranges_ in square brackets for the [nodes](../reference/applications/services/services.html#nodes) and/or [node resource](../reference/applications/services/services.html#resources) values in _services.xml_. Vespa Cloud will monitor the resource utilization of your clusters and automatically choose the cheapest resource allocation within ranges that produces close to optimal utilization.
You can see the status and recent actions of the autoscaler in the _Resources_ view under a deployment in the console.
Autoscaling is not considering latency differences achieved by different configurations. If your application has certain configurations that produce good throughput but too high latency, you should not include these configurations in your autoscaling ranges.
Adjusting the allocation of a cluster may happen quickly for stateless container clusters, and much more slowly for content clusters with a lot of data. Autoscaling will adjust each cluster on the timescale it typically takes to rescale it (including any data redistribution).
The ideal utilization takes into account that a node may be down or failing, that another region may be down causing doubling of traffic, and that we need headroom for maintenance operations and handling requests with low latency. It acts on what it has observed on your system in the recent past. If you need much more capacity in the near future than you do currently, you may want to set the lower limit to take this into account. Upper limits should be set to the maximum size that makes business sense.
## When to use autoscaling
Autoscaling is useful in a number of scenarios. Some typical ones are:
- You have a new application which you can't benchmark with realistic data and usage, making you unsure what resources to allocate: Set wide ranges for all resource parameters and let the system choose a configuration. Once you gain experience you can consider tightening the configuration space.
- You have load that varies quickly during the day, or that may suddenly increase quickly due to some event, and want container cluster resources to quickly adjust to the load: Set a range for the number of nodes and/or vcpu on containers.
- You expect your data volume to grow over time, but you don't want to allocate resources prematurely, nor constantly worry about whether it is time to increase: Configure ranges for content nodes and/or node resources such that the size of the system grows with the data.
## Resource tradeoffs
Some other considerations when deciding resources:
- Making changes to resources/nodes is easy and safe, and one of Vespa Cloud's strengths. We advise you make controlled changes and observe effect on latencies, data migration and cost. Everything is automated, just deploy a new application package. This is useful learning when later needed during load peaks and capacity requirement changes.
- Node resources cannot be chosen freely in all zones, CPU/Memory often comes in increments of x 2. Try to make sure that the resource configuration is a good fit.
- CPU is the most expensive component, optimize for this for most applications.
- Having few nodes means more overcapacity as Vespa requires that the system will handle one node being down (or one group, in content clusters having multiple groups). 4-5 nodes minimum is a good rule of thumb. Whether 4-5 or 9-10 nodes of half the size is better depends on quicker upgrade cycles vs. smoother resource auto-scale curves. Latencies can be better or worse, depending on static vs dynamic query cost.
- Changing a node resource may mean allocating a new node, so it may be faster to scale container nodes by changing the number of nodes.
- As a consequence, during resource shortage (say almost full disk), add nodes and keep the rest unchanged.
- It is easiest to reason over capacity when changing one thing at a time.
It is often safe to follow the _suggested resources_ advice when shown in the console and feel free to contact us if you have questions.
## Mixed load
A Vespa application must handle a combination of reads and writes, from multiple sources. User load often resembles a sine-like curve. Machine-generated load, like a batch job, can be spiky and abrupt.
In the default Vespa configuration, all kinds of load uses _one_ default container cluster. Example: An application where daily batch jobs update the corpus at high rate:

Autoscaling scales _up_ much quicker than _down_, as the probability of a new spike is higher after one has been observed. In this example, see the rapid cluster growth for the daily load spike - followed by a slow decay.
The best solution for this case is to slow down the batch job, as it is of short duration. It is not always doable to slow down jobs - in these cases, setting up multiple[container clusters](../applications/containers.html)can be a smart thing - optimize each cluster for its load characteristics. This could be a combination of clusters using autoscale and clusters with a fixed size. Autoscaling often works best for the user-generated load, whereas the machine-generated load could either be tuned or routed to a different cluster in the same Vespa application.
## Examples
Below is an example of node resources with autoscaling that would work well for a container cluster:
```
```
```
```
The above would in general **not be recommended for a content cluster.** Changing cpu, memory or disk usually leads to allocating new nodes to fulfil the new node resources spec. When that happens there will be redistribution of documents between the old and new nodes and this might impact service quality to some degree. For a content cluster it would usually be better to try to stick to the same node resources and add or remove nodes, e.g something like:
```
```
```
```
If a content cluster is configured to autoscale based on node resources (not just number of nodes or groups) this will work fine, but note that using paged attributes or HNSW indexes will make it more expensive and time-consuming to redistribute documents when scaling up or down. When doing the initial feeding of a cluster it will be best to avoid auto-scaling, as changing the topology will require redistribution of documents, possibly several times.
When using groups in a content cluster it's possible to scale the number of groups instead of the number of nodes, e.g. with a fixed group size and a range for the number of groups:
```
```
```
```
Note that at the moment it is not possible to autoscale GPU resources.
## Related reading
- [Feed sizing](../performance/sizing-feeding.html)
- [Query sizing](../performance/sizing-search.html)
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [When to use autoscaling](#When-to-use-autoscaling)
- [Resource tradeoffs](#resource-tradeoffs)
- [Mixed load](#mixed-load)
- [Examples](#examples)
- [Related reading](#)
---
# Source: https://docs.vespa.ai/en/operations/enclave/aws-architecture.html.md
# Vespa Cloud Enclave AWS Architecture
Each Vespa Cloud Enclave in the tenant AWS account corresponds to a Vespa Cloud[zone](../zones.html). Inside the tenant AWS account one enclave is contained within one single[VPC](https://docs.aws.amazon.com/vpc/latest/userguide/what-is-amazon-vpc.html).

#### EC2 Instances, Load Balancers, and S3 buckets
Configuration Servers inside the Vespa Cloud zone makes the decision to create or destroy EC2 instances ("Vespa Hosts" in diagram) based on the Vespa applications that are deployed. The Configuration Servers also set up the Network Load Balancers needed to communicate with the deployed Vespa application.
Each Vespa Host will periodically sync its logs to a S3 bucket ("Log Archive"). This bucket is "local" to the enclave and provisioned by the Terraform module inside the tenant's AWS account.
#### Networking
The enclave VPC is very network restricted. Vespa Hosts do not have public IPv4 addresses and there is no[NAT gateway](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-nat-gateway.html)available in the VPC. Vespa Hosts have public IPv6 addresses and are able to make outbound connections. Inbound connections are not allowed. Outbound IPv6 connections are used to bootstrap communication with the Configuration Servers, and to report operational metrics back to Vespa Cloud.
When a Vespa Host is booted it will set up an encrypted tunnel back to the Configuration Servers. All communication between Configuration Servers and the Vespa Hosts will be run over this tunnel after it is set up.
### Security
The Vespa Cloud operations team does _not_ have any direct access to the resources that is part of the customer account. The only possible access is through the management APIs needed to run Vespa itself. In case it is needed for, e.g. incident debugging, direct access can only be granted to the Vespa team by the tenant itself. For further details, see the documentation for the[`ssh`-submodule](https://registry.terraform.io/modules/vespa-cloud/enclave/aws/latest/submodules/ssh).
All communication between the enclave and the Vespa Cloud configuration servers is encrypted, authenticated and authorized using[mTLS](https://en.wikipedia.org/wiki/Mutual_authentication#mTLS) with identities embedded in the certificate. mTLS communication is facilitated with the[Athenz](https://www.athenz.io/) service.
All data stored is encrypted at rest using[KMS](https://docs.aws.amazon.com/kms/latest/developerguide/overview.html). All keys are managed by the tenant in the tenant's AWS account.
The resources provisioned in the tenant AWS account are either provisioned by the Terraform module executed by the tenant, or by the orchestration services inside a Vespa Cloud Zone.
Resources are provisioned by the Vespa Cloud configuration servers, using the[`provision_policy`](https://github.com/vespa-cloud/terraform-aws-enclave/blob/main/modules/provision/main.tf)AWS IAM policy document defined in the Terraform module.
The tenant that registered the AWS account is the only tenant that can deploy applications targeting the enclave.
For more general information about security in Vespa Cloud, see the[whitepaper](../../security/whitepaper).
Copyright © 2026 - [Cookie Preferences](#)
---
# Source: https://docs.vespa.ai/en/operations/enclave/aws-getting-started.html.md
# Getting started with Vespa Cloud Enclave in AWS
Setting up Vespa Cloud Enclave requires:
1. Registration at [Vespa Cloud](https://console.vespa-cloud.com), or use a pre-existing tenant.
2. Registration of the AWS account ID in Vespa Cloud
3. Running a [Terraform](https://www.terraform.io/) configuration to provision AWS resources in the account. Go through the [AWS tutorial](https://developer.hashicorp.com/terraform/tutorials/aws-get-started) as needed.
4. Deployment of a Vespa application.
### 1. Vespa Cloud Tenant setup
Register at [Vespa Cloud](https://console.vespa-cloud.com) or use an existing tenant. Note that the tenant must be on a [paid plan](https://vespa.ai/pricing/).
### 2. Onboarding
Contact [support@vespa.ai](mailto:support@vespa.ai) stating which tenant should be on-boarded to use Vespa Cloud Enclave. Also include the [AWS account ID](https://docs.aws.amazon.com/accounts/latest/reference/manage-acct-identifiers.html#FindAccountId)to associate with the tenant.
**Note:** We recommend using a _dedicated_ account for your Vespa Cloud Enclave. Vespa Cloud will manage resources in the Enclave VPCs created in the AWS resource provisioning step. Primarily EC2 instances, load balancers and service endpoints.
One account can host all your Vespa applications, there is no need for multiple tenants or accounts.
### 3. Configure AWS Account
The same AWS account used in step two must be prepared for deploying Vespa applications using either _Terraform_ or _Cloudformation_.
#### Terraform
Use [Terraform](https://www.terraform.io/) to set up the necessary resources using the[modules](https://registry.terraform.io/modules/vespa-cloud/enclave/aws/latest) published by the Vespa team.
Modify the[multi-region Terraform files](https://github.com/vespa-cloud/terraform-aws-enclave/blob/main/examples/multi-region/main.tf)for your deployment.
If you are unfamiliar with Terraform: It is a tool to manage resources and their configuration in various cloud providers, like AWS and GCP. Terraform has published an[AWS](https://developer.hashicorp.com/terraform/tutorials/aws-get-started)tutorial, and we strongly encourage enclave users to read and follow the Terraform recommendations for[CI/CD](https://developer.hashicorp.com/terraform/tutorials/automation/automate-terraform).
The Terraform module we provide is regularly updated to add new required resources or extra permissions for Vespa Cloud to automate the operations of your applications. In order for your enclave applications to use the new features you must re-apply your terraform templates with the latest release. The [notification system](../notifications)will let you know when a new release is available.
#### Cloudformation
Vespa also supports Cloudformation if you prefer the AWS-native solution. Download the Cloudformation stacks in our [github repository](https://github.com/vespa-cloud/cloudformation-aws-enclave) and refer to the README for stack-specific instructions.
### 4. Deploy a Vespa application
By default, all applications are deployed on resources in Vespa Cloud accounts. To deploy in your enclave account, update [deployment.xml](../../reference/applications/deployment.html) to reference the account used in step two:
```
```
Useful resources are [getting started](../../basics/deploy-an-application-java.html)and [migrating to Vespa Cloud](../../learn/migrating-to-cloud) - put _deployment.xml_ next to _services.xml_.
## Next steps
After a successful deployment to the [dev](../environments.html#dev) environment, iterate on the configuration to implement your application on Vespa. The _dev_ environment is ideal for this, with rapid deployment cycles.
For production serving, deploy to the [prod](../environments.html#prod) environment - follow the steps in [production deployment](../production-deployment.html).
## Enclave teardown
To tear down a Vespa Cloud Enclave system, do the steps above in reverse order:
1. [Undeploy the application(s)](../deleting-applications.html)
2. Undeploy the Terraform changes
It is important to undeploy the Vespa application(s) first. After running the Terraform, Vespa Cloud cannot manage the resources allocated, so you must clean up these yourself.
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [1. Vespa Cloud Tenant setup](#1-vespa-cloud-tenant-setup)
- [2. Onboarding](#2-onboarding)
- [3. Configure AWS Account](#3-configure-aws-account)
- [4. Deploy a Vespa application](#4-deploy-a-vespa-application)
- [Next steps](#next-steps)
- [Enclave teardown](#enclave-teardown)
---
# Source: https://docs.vespa.ai/en/operations/enclave/azure-architecture.html.md
# Architecture for Vespa Cloud Enclave in Azure
### Architecture
With Vespa Cloud Enclave, all Azure resources associated with your Vespa Cloud applications are in your enclave Azure subscription, as opposed to a shared Vespa Cloud subscription.
Each Vespa Cloud [zone](../zones.html) has an associated zone resource group (RG) in the enclave subscription, that contains all the resources for that zone. For instance, it has one Virtual Network (VNet aka [VPC](https://cloud.google.com/vpc/)).

#### Virtual Machines, Load Balancers, and Blob Storage
Configuration Servers inside the Vespa Cloud subscription make the decision to create or destroy virtual machines ("Vespa Hosts" in diagram) based on the Vespa applications that are deployed. The Configuration Servers also set up the Container Load Balancers needed to communicate with the deployed Vespa application.
Each Vespa Host will periodically sync its logs to a Blob Storage container ("Log Archive") in a Storage Account in the zone RG. This storage account is "local" to the enclave and provisioned by the Terraform module inside your Azure subscription.
#### Networking
The Zone Virtual Network (VNet aka VPC) is very network restricted. The Vespa Hosts do not have a public IPv4 address. But your application can connect to external IPv4 services using a [NAT gateway](https://learn.microsoft.com/en-us/azure/nat-gateway/nat-overview). Vespa Hosts have public IPv6 addresses and are able to make outbound connections. Inbound connections are not allowed. Outbound IPv6 connections are used to bootstrap communication with the Configuration Servers, and to report operational metrics back to Vespa Cloud.
When a Vespa Host is booted, it will set up an encrypted tunnel back to the Configuration Servers. All communication between Configuration Servers and the Vespa Hosts will be run over this tunnel after it is set up.
### Security
The Vespa Cloud operations team does _not_ have any direct access to the resources in your subscription. The only possible access is through the management APIs needed to run Vespa itself. In case it is needed for, e.g. incident debugging, direct access can only be granted to the Vespa team by you. Enable direct access by setting the`enable_ssh` input to true in the enclave module. For further details, see the documentation for the[enclave module inputs](https://registry.terraform.io/modules/vespa-cloud/enclave/azure/latest/?tab=inputs).
All communication between the enclave and the Vespa Cloud Configuration servers is encrypted, authenticated, and authorized using[mTLS](https://en.wikipedia.org/wiki/Mutual_authentication#mTLS) with identities embedded in the certificate. mTLS communication is facilitated with the[Athenz](https://www.athenz.io/) service.
All data stored is encrypted at rest using[Encryption At Host](https://learn.microsoft.com/en-us/azure/virtual-machines/disk-encryption-overview). All keys are managed automatically by the Azure platform.
The resources provisioned in your Azure subscription are either provisioned by the Vespa Cloud Enclave Terraform module you apply, or by the orchestration services inside a Vespa Cloud zone.
Resources are provisioned by the Vespa Cloud Configuration servers, using the[`id-provisioner`](https://github.com/vespa-cloud/terraform-azure-enclave/blob/main/provisioner.tf)user-assigned managed identity defined in the Terraform module.
Only your Vespa tenant (that registered this Azure subscription) can deploy applications targeting your enclave.
For more general information about security in Vespa Cloud, see the[whitepaper](../../security/whitepaper).
Copyright © 2026 - [Cookie Preferences](#)
---
# Source: https://docs.vespa.ai/en/operations/enclave/azure-getting-started.html.md
# Getting started with Vespa Cloud Enclave in Azure
Setting up Vespa Cloud Enclave requires:
1. Registration at [Vespa Cloud](https://console.vespa-cloud.com), or use a pre-existing Vespa tenant.
2. Running a [Terraform](https://www.terraform.io/) configuration to provision necessary Azure resources in the subscription.
3. Registration of the Azure subscription in Vespa Cloud.
4. Deployment of a Vespa application.
### 1. Vespa Cloud Tenant setup
Register at [Vespa Cloud](https://console.vespa-cloud.com) or use an existing Vespa tenant. Note that the tenant must be on a [paid plan](https://vespa.ai/pricing/).
### 2. Configure Azure subscription
Choose an Azure subscription to use for Vespa Cloud Enclave.
**Note:** We recommend using a _dedicated_ subscription for your Vespa Cloud Enclave. Resources in this subscription will be fully managed by Vespa Cloud.
One subscription can host all your Vespa applications, there is no need for multiple Vespa tenants or Azure subscriptions.
The subscription must be prepared for deploying Vespa applications. Use [Terraform](https://www.terraform.io/) to set up the necessary resources using the[modules](https://registry.terraform.io/modules/vespa-cloud/enclave/azure/latest)published by the Vespa team.
Feel free to use the[example](https://github.com/vespa-cloud/terraform-azure-enclave/blob/main/examples/basic/main.tf)to get started.
If you are unfamiliar with Terraform: It is a tool to manage resources and their configuration in various cloud providers, like AWS, Azure, and GCP. Terraform has published a[Get Started - Azure](https://developer.hashicorp.com/terraform/tutorials/azure-get-started)tutorial, and we strongly encourage enclave users to read and follow the Terraform recommendations for[CI/CD](https://developer.hashicorp.com/terraform/tutorials/automation/automate-terraform).
The Terraform module we provide is regularly updated to add new required resources or extra permissions for Vespa Cloud to automate the operations of your applications. In order for your enclave applications to use the new features you must re-apply your terraform templates with the latest release. The [notification system](../notifications.html)will let you know when a new release is available.
### 3. Onboarding
Contact [support@vespa.ai](mailto:support@vespa.ai) and provide the `enclave_config` output after applying the Terraform, see[Outputs](https://github.com/vespa-cloud/terraform-azure-enclave?tab=readme-ov-file#outputs). The `enclave_config` includes which Vespa tenant should be on-boarded to use Vespa Cloud Enclave. And the Azure tenant ID, the subscription ID, and a client ID of an Athenz identity the Terraform created.
### 4. Deploy a Vespa application
By default, all applications are deployed on resources in Vespa Cloud accounts. To deploy in your Azure enclave subscription instead, update [deployment.xml](../../reference/applications/deployment.html) to reference the subscription ID from step 2:
```
```
Useful resources are [getting started](../../basics/deploy-an-application.html)and [migrating to Vespa Cloud](../../learn/migrating-to-cloud) - put _deployment.xml_ next to _services.xml_.
## Next steps
After a successful deployment to the [dev](../environments.html#dev) environment, iterate on the configuration to implement your application on Vespa. The _dev_ environment is ideal for this, with rapid deployment cycles.
For production serving, deploy to the [prod](../environments.html#prod) environment - follow the steps in [production deployment](../production-deployment.html).
## Enclave teardown
To tear down a Vespa Cloud Enclave system, do the steps above in reverse order:
1. [Undeploy the application(s)](../deleting-applications.html)
2. Undeploy the Terraform changes
It is important to undeploy the Vespa application(s) first. After running the Terraform, Vespa Cloud cannot manage the resources allocated, so you must clean up these yourself.
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [1. Vespa Cloud Tenant setup](#1-vespa-cloud-tenant-setup)
- [2. Configure Azure subscription](#2-configure-azure-subscription)
- [3. Onboarding](#3-onboarding)
- [4. Deploy a Vespa application](#4-deploy-a-vespa-application)
- [Next steps](#next-steps)
- [Enclave teardown](#enclave-teardown)
---
# Source: https://docs.vespa.ai/en/writing/batch-delete.html.md
# Batch delete
Options for batch deleting documents:
1. Use [vespa feed](../clients/vespa-cli.html#documents):
```
$ vespa feed -t my-endpoint deletes.json
```
2. Find documents using a query, delete, repeat. Pseudocode:
```
while True; do
query and read document ids, if empty exit
delete document ids using[/document/v1](../reference/api/document-v1.html#delete)wait a sec # optional, add wait to reduce load while deleting
```
3. Use a [document selection](../schemas/documents.html#document-expiry) to expire documents. This deletes all documents _not_ matching the expression. It is possible to use parent documents and imported fields for expiry of a document set. The content node will iterate over the corpus and delete documents (that are later compacted out):
```
```
```
```
4. Use [/document/v1](../reference/api/document-v1.html#delete) to delete documents identified by a [document selection](../reference/writing/document-selector-language.html) - example dropping all documents from the _my\_doctype_ schema. The _cluster_ value is the ID of the content cluster in _services.xml_, e.g., ``:
```
$ curl -X DELETE \
"$ENDPOINT/document/v1/my_namespace/my_doctype/docid?selection=true&cluster=my_cluster"
```
5. It is possible to drop a schema, with all its content, by removing the mapping to the content cluster. To understand what is happening, here is the status before the procedure:
## Example
This is an end-to-end example on how to track number of documents, and delete a subset using a [selection string](../reference/writing/document-selector-language.html).
### Feed sample documents
Feed a batch of documents, e.g. using the [vector-search](https://github.com/vespa-cloud/vector-search) sample application:
```
$ vespa feed <(python3 feed.py 100000 3)
```
See number of documents for a node using the [content.proton.documentdb.documents.total](../reference/operations/metrics/searchnode.html#content_proton_documentdb_documents_total) metric (here 100,000):
```
$ docker exec vespa curl -s http://localhost:19092/prometheus/v1/values | grep ^content.proton.documentdb.documents.total
content_proton_documentdb_documents_total_max{metrictype="standard",instance="searchnode",documenttype="vector",clustername="vectors",vespa_service="vespa_searchnode",} 100000.0 1695383025000
content_proton_documentdb_documents_total_last{metrictype="standard",instance="searchnode",documenttype="vector",clustername="vectors",vespa_service="vespa_searchnode",} 100000.0 1695383025000
```
Using the metric above is useful while feeding this example. Another alternative is [visiting](visiting.html) all documents to print the ID:
```
$ vespa visit --field-set "[id]" | wc -l
100000
```
At this point, there are 100,000 document in the index.
### Define selection
Define the subset of documents to delete - e.g. by age or other criteria. In this example, select random 1%. Do a test run:
```
$ vespa visit --field-set "[id]" --selection 'id.hash().abs() % 100 == 0' | wc -l
1016
```
Hence, the selection string `id.hash().abs() % 100 == 0` hits 1,016 documents.
### Delete documents
Delete documents, see the number of documents deleted in the response:
```
$ curl -X DELETE \
"http://localhost:8080/document/v1/mynamespace/vector/docid?selection=id.hash%28%29.abs%28%29+%25+100+%3D%3D+0&cluster=vectors"
{
"pathId":"/document/v1/mynamespace/vector/docid",
"documentCount":1016
}
```
In case of a large result set, a continuation token might be returned in the response, too:
```
"continuation": "AAAAEAAAA"
```
If so, add the token and redo the request:
```
$ curl -X DELETE \
"http://localhost:8080/document/v1/mynamespace/vector/docid?selection=id.hash%28%29.abs%28%29+%25+100+%3D%3D+0&cluster=vectors&continuation=AAAAEAAAA"
```
Repeat as long as there are tokens in the output. The token changes in every response.
### Validate
Check that all documents matching the selection criterion are deleted:
```
$ vespa visit --selection 'id.hash().abs() % 100 == 0' --field-set "[id]" | wc -l
0
```
List remaining documents:
```
$ vespa visit --field-set "[id]" | wc -l
98984
```
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [Example](#example)
- [Feed sample documents](#feed-sample-documents)
- [Define selection](#define-selection)
- [Delete documents](#delete-documents)
- [Validate](#validate)
---
# Source: https://docs.vespa.ai/en/performance/benchmarking-cloud.html.md
# Benchmarking
This is a step-by-step guide to get started with benchmarking on Vespa Cloud, based on the [Vespa benchmarking guide](benchmarking.html), using the [sample app](https://github.com/vespa-engine/sample-apps/tree/master/album-recommendation). Overview:

## Set up a performance test instance
Use an instance in a [dev zone](../operations/environments.html#dev) for benchmarks. To deploy an instance there, use the [getting started](../basics/deploy-an-application.html) guide, and make sure to specify the resources using a `deploy:environment="dev"` attribute:
```
```
```
```
```
$ vespa deploy --wait 600
```
Feed documents:
```
$ vespa feed ext/documents.jsonl
```
Query documents to validate the feed:
```
$ vespa query "select * from music where true"
```
Query documents using curl:
```
$ curl \
--cert ~/.vespa/mytenant.myapp.default/data-plane-public-cert.pem \
--key ~/.vespa/mytenant.myapp.default/data-plane-private-key.pem \
-H "Content-Type: application/json" \
--data '{"yql" : "select * from music where true"}' \
https://baaae1db.b68ddc0d.z.vespa-app.cloud/search/
```
At this point, the instance is ready, with data, and can be queried using data-plane credentials.
## Test using vespa-fbench
The rest of the guide assumes the data-plane credentials are in working directory:
```
$ ls -1 *.pem
data-plane-private-key.pem
data-plane-public-cert.pem
```
Prepare a query file:
```
$ echo "/search/?yql=select+*+from+music+where+true" > query001.txt
```
Test using [vespa-fbench](../reference/operations/tools.html#vespa-fbench) running in a docker container:
```
$ docker run -v $(pwd):/files -w /files \
--entrypoint /opt/vespa/bin/vespa-fbench \
vespaengine/vespa \
-C data-plane-public-cert.pem \
-K data-plane-private-key.pem \
-T /etc/ssl/certs/ca-bundle.crt \
-n 1 -q query001.txt -s 1 -c 0 \
-o output.txt \
baaae1db.b68ddc0d.z.vespa-app.cloud 443
```
`-o output.txt` is useful when validating the test - remove this option when load testing. Make sure there are no `SSL_do_handshake` errors in the output. Expect HTTP status code 200:
```
Starting clients...
Stopping clients
Clients stopped.
.
Clients Joined.
***HTTP keep-alive statistics***
connection reuse count -- 4
*****************Benchmark Summary*****************
clients: 1
ran for: 1 seconds
cycle time: 0 ms
lower response limit: 0 bytes
skipped requests: 0
failed requests: 0
successful requests: 5
cycles not held: 5
minimum response time: 128.17 ms
maximum response time: 515.35 ms
average response time: 206.38 ms
25 percentile: 128.70 ms
50 percentile: 129.60 ms
75 percentile: 130.20 ms
90 percentile: 361.32 ms
95 percentile: 438.36 ms
99 percentile: 499.99 ms
actual query rate: 4.80 Q/s
utilization: 99.03 %
zero hit queries: 5
http request status breakdown:
200 : 5
```
At this point, running queries using _vespa-fbench_ works well from local laptop.
## Run queries inside data center
Next step is to run this from the same location (data center) as the dev zone. In this example, an AWS [zone](../operations/zones.html). Deduce the AWS zone from Vespa Cloud zone name. Below is an example using a host with Amazon Linux 2023 AMI (HVM) image:
1. Create the host - here assume key pair is named _key.pem_. No need to do anything other than default.
2. Log in, update, install docker:
3. Copy credentials for endpoint access, log in and validate docker setup:
4. Make a dummy query:
5. Run vespa-fbench and verify 200 response:
At this point, you are able to benchmark using _vespa-fbench_ in the same zone as the Vespa Cloud dev instance.
## Run benchmark
Use the [Vespa Benchmarking Guide](../performance/benchmarking.html) to plan and run benchmarks. Also see [sizing](#sizing) below. Make sure the client running the benchmark tool has sufficient resources.
Export [metrics](../operations/metrics.html):
```
$ curl \
--cert data-plane-public-cert.pem \
--key data-plane-private-key.pem \
https://baaae1db.b68ddc0d.z.vespa-app.cloud/prometheus/v1/values
```
Notes:
- Periodically dump all metrics using `consumer=Vespa`.
- Make sure you will not exhaust your serving threads on your container nodes while in production. This can be verified by making sure this expression stays well below 100% (typically below 50%) for the traffic you expect: `100 * (jdisc.thread_pool.active_threads.sum / jdisc.thread_pool.active_threads.count) / jdisc.thread_pool.size.max` for each `threadpool` value. You can increase the number of threads in the pools by using larger container nodes, more container nodes or by tuning the number of threads as described in [services-search](../reference/applications/services/search.html#threadpool). In the case you do exhaust a threadpool and its queue you will experience HTTP 503 responses for requests that are rejected by the container.
## Making changes
Whenever deploying changes to configuration, track progress in the Deployment dashboard. Some changes, like changing [requestthreads](../reference/applications/services/content.html#requestthreads) will restart content nodes, and this is done in sequence and takes time. Wait for successful completion in _Wait for services and endpoints to come online_.
When changing node type/count, wait for auto data redistribution to complete, watching the `vds.idealstate.merge_bucket.pending.average` metric:
```
$ while true; do curl -s \
--cert data-plane-public-cert.pem \
--key data-plane-private-key.pem \
https://baaae1db.b68ddc0d.z.vespa-app.cloud/prometheus/v1/values?consumer=Vespa | \
grep idealstate.merge_bucket.pending.average; \
sleep 10; done
```
Notes:
- Dump all metrics using `consumer=Vespa`.
- After changing the number of content nodes, this metric will jump, then decrease (not necessarily linearly) - speed depending on data volume.
## Sizing
Using Vespa Cloud enables the Vespa Team to assist you to optimise the application to reduce resource spend. Based on 150 applications running on Vespa Cloud today, savings are typically 50%. Cost optimization is hard to do without domain knowledge - but few teams are experts in both their application and its serving platform. Sizing means finding both the right node size and the right cluster topology:

Applications use Vespa for their primary business use cases. Availability and performance vs. cost are business decisions. The best sized application can handle all expected load situations, and is configured to degrade quality gracefully for the unexpected.
Even though Vespa is cost-efficient out of the box, Vespa experts can usually spot over/under-allocations in CPU, memory and disk space/IO, and discuss trade-offs with the application team.
Using [automated deployments](../operations/automated-deployments.html) applications go live with little risk. After launch, right-size the application based on true load after using Vespa’s elasticity features with automated data migration.
Use the [Vespa sizing guide](../performance/sizing-search.html)to size the application and find metrics used there. Pro-tips:
- 60% is a good max memory allocation
- 50% is a good max CPU allocation, although application dependent.
- 70% is a good max disk allocation
Rules of thumb:
- Memory and disk scales approximately linearly for indexed fields' data - attributes have a fixed cost for empty fields.
- Data variance will impact memory usage.
- Undersized instances will [block writes](../writing/feed-block.html).
- If is often a good idea to use the `dev` zone to test memory impact of adding large fields, e.g. adding an embedding.
## Notes
- The user running benchmarks must have read access to the endpoint - if you already have, you can skip this section. Refer to the [Vespa security guide](../security/guide).
- [Monitoring](../operations/monitoring.html) is useful to track metrics when benchmarking.
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [Set up a performance test instance](#set-up-a-performance-test-instance)
- [Test using vespa-fbench](#test-using-vespa-fbench)
- [Run queries inside data center](#run-queries-inside-data-center)
- [Run benchmark](#run-benchmark)
- [Making changes](#making-changes)
- [Sizing](#sizing)
- [Notes](#notes)
---
# Source: https://docs.vespa.ai/en/performance/benchmarking.html.md
# Vespa Benchmarking
Benchmarking a Vespa application is essential to get an idea of how well the test configuration performs. Thus, benchmarking is an essential part of sizing a search cluster itself. Benchmarking a cluster can answer the following questions:
- What throughput and latency to expect from a search node?
- Which resource is the bottleneck in the system?
These in turn indirectly answers other questions such as how many nodes are needed, and if it will help to upgrade disk or CPU. Thus, benchmarking will help in finding the optimal Vespa configuration, using all resources optimally, which in turn lowers costs.
A good rule is to benchmark whenever the workload changes. Benchmarking should also be done when adding new features to queries.
Having an understanding of the query mix and SLA will help to set the test parameters. Before benchmarking, consider:
- What is the expected query mix? Having a representative query mix to test with is essential in order to get valid results. Splitting up in different types of queries is also a useful way to get an idea of which query classes are resource intensive.
- What is the expected SLA, both in terms of latency and query throughput?
- How important is real-time behavior? What is the rate of incoming documents, if any?
- Timeout, in a benchmarking scenario, is it ok for requests to time out? Default [timeout](/en/reference/querying/yql.html#timeout) is 500 ms, and [softtimeout](/en/reference/api/query.html#ranking.softtimeout.enable) is enabled. If the full cost of all queries are to be considered:
- Disable soft timeout with execution parameter
- by a [query profile](../querying/query-profiles.html)
- by appending: `&ranking.softtimeout.enable=false` to with the [vespa-fbench](#vespa-fbench) `-a` option
- Set timeout to e.g. 5 seconds
- Note that `timeout` in YQL takes precedence
- Replace timeout in YQL or use the execution parameter [timeout](/en/reference/api/query.html#timeout) as above.
If benchmarking using Vespa Cloud, see [Vespa Cloud Benchmarking](https://cloud.vespa.ai/en/benchmarking).
## vespa-fbench
Vespa provides a query load generator tool,[vespa-fbench](/en/reference/operations/tools.html#vespa-fbench), to run queries and generate statistics - much like a traditional web server load generator. It allows running any number of _clients_(i.e. the more clients, the higher load), for any length of time, and adjust the client response time before issuing the next query. It outputs the throughput, max, min, and average latency, as well as the 25, 50, 75, 90, 95, 99 and 99.9 latency percentiles. This provides quite accurate information of how well the system manages the workload.
**Disclaimer:** _vespa-fbench_ is a tool to drive load for benchmarking and tuning. It is not a tool for finding the maximum load or latencies in a production setting. This is due to the way it is implemented: It is run with `-n` number of clients per run. It is good for testing, as proton can be observed at different levels of concurrency. In the real world, the number of clients and query arrival will follow a different distribution, and impact 95p / 99p latency percentiles.
### Prepare queries
vespa-fbench uses _query files_ for GET and POST queries - see the [reference](/en/reference/operations/tools.html#vespa-fbench) - examples:_HTTP GET_ requests:
```
/search/?yql=select%20%2A%20from%20sources%20%2A%20where%20true
```
_HTTP POST_ requests format:
```
/search/
{"yql" : "select * from sources * where true"}
```
### Run queries
A typical vespa-fbench command looks like:
```
$ vespa-fbench -n 8 -q queries.txt -s 300 -c 0 myhost.mydomain.com 8080
```
This starts 8 clients, using requests read from `queries.txt`. The `-s` parameter indicates that the benchmark will run for 300 seconds. The `-c` parameter, states that each client thread should wait for 0 milliseconds between each query. The last two parameters are container hostname and port. Multiple hosts and ports can be provided, and the clients will be uniformly distributed to query the containers round-robin.
A more complex example, using docker, hitting a Vespa Cloud endpoint:
```
$ docker run -v /Users/myself/tmp:/testfiles \
-w /testfiles --entrypoint '' vespaengine/vespa \
/opt/vespa/bin/vespa-fbench \
-C data-plane-public-cert.pem -K data-plane-private-key.pem -T /etc/ssl/certs/ca-bundle.crt \
-n 10 -q queries.txt -o result.txt -s 300 -c 0 \
myapp.mytenant.aws-us-east-1c.z.vespa-app.cloud 443
```
When using a query file with HTTP POST requests (`-P` option) one also need to pass the _Content-Type_ header using the `-H` header option.
```
$ docker run -v /Users/myself/tmp:/testfiles \
-w /testfiles --entrypoint '' vespaengine/vespa \
/opt/vespa/bin/vespa-fbench \
-C data-plane-public-cert.pem -K data-plane-private-key.pem -T /etc/ssl/certs/ca-bundle.crt \
-n 10 -P -H "Content-Type: application/json" -q queries_post.txt -o output.txt -s 300 -c 0 \
myapp.mytenant.aws-us-east-1c.z.vespa-app.cloud 443
```
### Post Processing
After each run, a summary is written to stdout (and possibly an output file from each client) - example:
```
*****************Benchmark Summary*****************
clients: 30
ran for: 1800 seconds
cycle time: 0 ms
lower response limit: 0 bytes
skipped requests: 0
failed requests: 0
successful requests: 12169514
cycles not held: 12169514
minimum response time: 0.82 ms
maximum response time: 3010.53 ms
average response time: 4.44 ms
25 percentile: 3.00 ms
50 percentile: 4.00 ms
75 percentile: 6.00 ms
90 percentile: 7.00 ms
95 percentile: 8.00 ms
99 percentile: 11.00 ms
actual query rate: 6753.90 Q/s
utilization: 99.93 %
```
Take note of the number of _failed requests_, as a high number here can indicate that the system is overloaded, or that the queries are invalid.
- In some modes of operation, vespa-fbench waits before sending the next query. "utilization" represents the time that vespa-fbench is sending queries and waiting for responses. For example, a 'system utilization' of 50% means that vespa-fbench is stress testing the system 50% of the time, and is doing nothing the remaining 50% of the time
- vespa-fbench latency results include network latency between the client and the Vespa instance. Measure and subtract network latency to obtain the true vespa query latency.
## Benchmark
Strategy: find optimal _requestthreads_ number, then find capacity by increasing number of parallel test clients:
1. Test with single client (n=1), single thread to find a _latency baseline_. For each test run, increase [threads](../reference/applications/services/content.html#requestthreads):
2. use #threads sweet spot, then increase number of clients, observe latency and CPU.
### Metrics
The _container_ nodes expose the[/metrics/v2/values](../operations/metrics.html) interface - use this to dump metrics during benchmarks. Example - output all metrics from content node:
```
$ curl http://localhost:8080/metrics/v2/values | \
jq '.nodes[] | select(.role=="content/mysearchcluster/0/0") | .node.metrics[].values'
```
Output CPU util:
```
$ curl http://localhost:8080/metrics/v2/values | \
jq '.nodes[] | select(.role=="content/mysearchcluster/0/0") | .node.metrics[].values."cpu.util"'
```
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [vespa-fbench](#vespa-fbench)
- [Prepare queries](#prepare-queries)
- [Run queries](#run-queries)
- [Post Processing](#post-processing)
- [Benchmark](#benchmark)
- [Metrics](#metrics)
---
# Source: https://docs.vespa.ai/en/rag/binarizing-vectors.html.md
# Binarizing Vectors
Binarization in this context is mapping numbers in a vector (embedding) to bits (reducing the value range), and representing the vector of bits efficiently using the `int8` data type. Examples:
| input vector | binarized floats | pack\_bits (to INT8) |
| --- | --- | --- |
| [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0] | [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0] | -1 |
| [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0 |
| [-1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0 |
| [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -128 |
| [2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0] | -127 |
Binarization is key to reducing memory requirements and, therefore, cost. Binarization can also improve feeding performance, as the memory bandwidth requirements go down accordingly.
Refer to [embedding](../rag/embedding.html) for more details on how to create embeddings from text.
## Summary
This guide maps all the steps required to run a successful binarization project using Vespa only - there is no need to re-feed data. This makes a project feasible with limited incremental resource usage and man-hours required.
Approximate Nearest Neighbor vector operations are run using an HNSW index in Vespa, with online data structures. The cluster is operational during the procedure, gradually building the required data structures.
This guide is useful to map the steps and tradeoffs made for a successful vector binarization. Other relevant articles on how to reduce vector size in memory are:
- [Exploring the potential of OpenAI Matryoshka 🪆 embeddings with Vespa](https://blog.vespa.ai/matryoshka-embeddings-in-vespa/)
- [Matryoshka 🤝 Binary vectors: Slash vector search costs with Vespa](https://blog.vespa.ai/combining-matryoshka-with-binary-quantization-using-embedder/)
Adding to this, using algorithms like SPANN can solve problems with huge vector data sizes, read more in [Billion-scale vector search using hybrid HNSW-IF](https://blog.vespa.ai/vespa-hybrid-billion-scale-vector-search/).
A binarization project normally involves iteration over different configuration settings, measuring quality loss for each iteration - this procedure it built with that in mind.
## Converters
Vespa’s built-in indexing language [converters](../reference/writing/indexing-language.html#converters)`binarize` and `pack_bits` let you easily generate binarized vectors. Example schema definitions used to generate the vectors in the table above:
```
schema doc {
document doc {
field doc_embedding type tensor(x[8]) {
indexing: summary | attribute
}
}
field doc_embedding_binarized_floats type tensor(x[8]) {
indexing: input doc_embedding | binarize | attribute
}
field doc_embedding_binarized type tensor(x[1]) {
indexing: input doc_embedding | binarize | pack_bits | attribute
}
}
```
We see that the `binarize` function itself will not compress vectors to a smaller size, as the output cell type is the same as the input - it is only the values that are mapped to 0 or 1. Above, the vectors are binarized using a threshold value of 0, the Vespa default - any number \> 0 will map to 1 - this threshold is configurable.
`pack_bits` reads binarized vectors and represents them using int8. In the example above:
- `tensor(x[8])` is 8 x sizeof(float) = 8 x 32 bits = 256 bits = 32 bytes
- `tensor(x[1])` is 1 x sizeof(int8) = 1 x 8 bits = 8 bits = 1 byte
In other words, a compression factor of 32, which is expected, mapping a 32-bit float into 1 bit.
As memory usage often is the cost driver for applications, this has huge potential. However, there is a loss of precision, so the tradeoff must be evaluated. Read more in [billion-scale-knn](https://blog.vespa.ai/billion-scale-knn/) and[combining-matryoshka-with-binary-quantization-using-embedder](https://blog.vespa.ai/combining-matryoshka-with-binary-quantization-using-embedder/).
## Binarizing an existing embedding field
In the example above, we see that `doc_embedding` has the original embedding data, and the fields `doc_embedding_binarized_floats` and `doc_embedding_binarized` are generated from `doc_embedding`. This is configured through the `indexing: input …` statement, and defining the generated fields outside the `document { … }` block.
**Note:** The `doc_embedding_binarized_floats` field is just for illustration purposes, as input to the `doc_embedding_binarized` field, which is the target binarized and packed field with low memory requirements. From here, we will call this the binarized embedding.
This is a common case for many applications - how to safely binarize and evaluate the binarized data for subsequent use. The process can be broken down into:
- Pre-requisites.
- Define the new binarized embedding, normally as an addition to the original field.
- Deploy and re-index the data to populate the binarized embedding.
- Create new ranking profiles with the binarized embeddings.
- Evaluate the quality of the binarized embedding.
- Remove the original embedding field from memory to save cost.
## Pre-requisites
Adding a new field takes resources, on disk and in memory. A new binarized embedding field is smaller - above, it is 1/32 of the original field. Also note that embedding fields often have an index configured, like:
```
field doc_embeddings type tensor(x[8]) {
indexing: summary | attribute | index
attribute {
distance-metric: angular
}
index {
hnsw {
max-links-per-node: 16
neighbors-to-explore-at-insert: 100
}
}
}
```
The index is used for approximate nearest neighbor (ANN) searches, and also consumes memory.
Use the Vespa Cloud console to evaluate the size of original fields and size of indexes to make sure that there is room for the new embedding field, possibly with an index.
**Note:** The size of an index is a function of the number of documents, regardless of tensor type. In this context, this means that adding a new field with and index, the new index will have the same size as the index of the existing embedding field.
Use status pages to find the index size in memory - example:
https://api-ctl.vespa-cloud.com/application/v4/tenant/
TENANT\_NAME/application/APP\_NAME/instance/INSTANCE\_NAME/environment/prod/region/REGION/
service/searchnode/NODE\_HOSTNAME/
state/v1/custom/component/documentdb/SCHEMA/subdb/ready/attribute/ATTRIBUTE\_NAME
### Example
```
tensor: {
compact_generation: 33946879,
ref_vector: {
memory_usage: {
used: 1402202052,
dead: 0,
allocated: 1600126976,
onHold: 0
}
},
tensor_store: {
memory_usage: {
used: 205348904436,
dead: 10248636768,
allocated:206719921232,
onHold: 0
}
},
nearest_neighbor_index: {
memory_usage: {
all: {
used: 10452397992,
dead: 360247164,
allocated:13346516304,
onHold: 0
}
```
In this example, the index is 13G, the tensor data is 206G, so the index is 6.3% of the tensor data. The original tensor is of type `bfloat16`, a binarized version is 1/16 of this and hence 13G. As an extra index is 13G, the temporal incremental memory usage is approximately 26G during the procedure.
## Define the binarized embedding field
The new field is _added_ to the schema, example schema, before:
```
schema doc {
document doc {
field doc_embedding type tensor(x[8]) {
indexing: summary | attribute
}
}
}
```
After:
```
schema doc {
document doc {
field doc_embedding type tensor(x[8]) {
indexing: summary | attribute
}
}
field doc_embedding_binarized type tensor(x[1]) {
indexing: input doc_embedding | binarize | pack_bits | attribute
}
}
```
The above are simple examples, with no ANN settings on the fields. Following is a more complex example - schema before:
```
schema doc {
document doc {
field doc_embedding type tensor(x[8]) {
indexing: summary | attribute | index
attribute {
distance-metric: angular
}
index {
hnsw {
max-links-per-node: 16
neighbors-to-explore-at-insert: 200
}
}
}
}
}
```
Schema after:
```
schema doc {
document doc {
field doc_embedding type tensor(x[8]) {
indexing: summary | attribute | index
attribute {
distance-metric: angular
}
index {
hnsw {
max-links-per-node: 16
neighbors-to-explore-at-insert: 200
}
}
}
}
field doc_embedding_binarized type tensor(x[1]) {
indexing: input doc_embedding | binarize | pack_bits | attribute | index
attribute {
distance-metric: hamming
}
index {
hnsw {
max-links-per-node: 16
neighbors-to-explore-at-insert: 200
}
}
}
}
```
Note that we replicate the index settings to the new field.
## Deploy and reindex the binarized embedding field
Deploying the field will trigger a reindexing on Vespa Cloud to populate the binarized embedding, fully automated.
Self-hosted, the `deploy` operation will output the below - [trigger a reindex](../operations/reindexing.html).
```
$ vespa deploy
Uploading application package... done
Success: Deployed '.' with session ID 3
WARNING Change(s) between active and new application that may require re-index:
reindexing: Consider re-indexing document type 'doc' in cluster 'doc' because:
1) Document type 'doc': Non-document field 'doc_embedding_binarized' added; this may be populated by reindexing
```
Depending on the size of the corpus and resources configured, the reindexing process takes time.
## Create new ranking profiles and queries using the binarized embeddings
After reindexing, you can query using the new, binarized embedding field. Assuming a query using the doc\_embedding field:
```
$ vespa query \
'yql=select * from doc where {targetHits:5}nearestNeighbor(doc_embedding, q)' \
'input.query(q)=[1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0]' \
'ranking=app_ranking'
```
The same query, with a binarized query vector, to the binarized field:
```
$ vespa query \
'yql=select * from doc where {targetHits:5}nearestNeighbor(doc_embedding_binarized, q_bin)' \
'input.query(q_bin)=[-119]' \
'ranking=app_ranking_bin'
```
See [tensor-hex-dump](../reference/schemas/document-json-format.html#tensor-hex-dump)for more information about how to create the int8-typed tensor.
### Quick Hamming distance intro
Example embeddings:
| document embedding | binarized floats | pack\_bits (to INT8) |
| --- | --- | --- |
| [-1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0 |
| **query embedding** | **binarized floats** | **to INT8** |
| [1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0] | [1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0] | -119 |
Use [matchfeatures](../reference/schemas/schemas.html#match-features)to debug ranking (see ranking profile `app_ranking_bin` below):
```
"matchfeatures": {
"attribute(doc_embedding_binarized)": {
"type": "tensor(x[1])",
"values": [0]
},
"distance(field,doc_embedding_binarized)": 3.0,
"query(q_bin)": {
"type": "tensor(x[1])",
"values": [-119]
}
}
```
See distance calculated to 3.0, which is the number of bits different in the binarized vectors, which is the hamming distance.
## Rank profiles and queries
Assuming a rank profile like:
```
rank-profile app_ranking {
match-features {
distance(field, doc_embedding)
query(q)
attribute(doc_embedding)
}
inputs {
query(q) tensor(x[8])
}
first-phase {
expression: closeness(field, doc_embedding)
}
}
```
Query:
```
$ vespa query \
'yql=select * from doc where {targetHits:5}nearestNeighbor(doc_embedding, q)' \
'input.query(q)=[2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0]' \
'ranking=app_ranking'
```
A binarized version is like:
```
rank-profile app_ranking_bin {
match-features {
distance(field, doc_embedding_binarized)
query(q_bin)
attribute(doc_embedding_binarized)
}
inputs {
query(q_bin) tensor(x[1])
}
first-phase {
expression: closeness(field, doc_embedding_binarized)
}
}
```
Query:
```
$ vespa query \
'yql=select * from doc where {targetHits:5}nearestNeighbor(doc_embedding_binarized, q_bin)' \
'input.query(q_bin)=[-119]' \
'ranking=app_ranking_bin'
```
Query with full-precision query vector, against a binarized vector - rank profile:
```
rank-profile app_ranking_bin_full {
match-features {
distance(field, doc_embedding_binarized)
query(q)
query(q_bin)
attribute(doc_embedding_binarized)
}
function unpack_to_float() {
expression: 2*unpack_bits(attribute(doc_embedding_binarized), float)-1
}
function dot_product() {
expression: sum(query(q) * unpack_to_float)
}
inputs {
query(q) tensor(x[8])
query(q_bin) tensor(x[1])
}
first-phase {
expression: closeness(field, doc_embedding_binarized)
}
second-phase {
expression: dot_product
}
}
```
Notes:
- The first-phase ranking is as the binarized query above.
- The second-phase ranking is using the full-precision query vector query(q) with a bit-precision vector cast to float for type match.
- Both query vectors must be supplied in the query.
Note the differences when using full values in the query tensor, see the relevance score for the results:
```
$ vespa query \
'yql=select * from music where {targetHits:5}nearestNeighbor(doc_embedding_binarized, q_bin)' \
'input.query(q)=[1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0]' \
'input.query(q_bin)=[-119]' \
'ranking=app_ranking_bin_full'
...
"relevance": 3.0
```
```
$ vespa query \
'yql=select * from music where {targetHits:5}nearestNeighbor(doc_embedding_binarized, q_bin)' \
'input.query(q)=[2.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0]' \
'input.query(q_bin)=[-119]' \
'ranking=app_ranking_bin_full'
"relevance": 4.0
```
Read the [closeness](../reference/ranking/rank-features.html#closeness(dimension,name)) reference documentation.
### TargetHits for ANN
Given the lower precision with binarization, it might be a good idea to increase the `{targetHits:5}` annotation in the query, to generate more candidates for later ranking phases.
## Evaluate the quality of the binarized embeddings
This exercise is about evaluating a lower-precision retrieval phase, using the original full-sized (here we use floats) query-result pairs as reference. Experiments, query-document precision:
1. float-float
2. binarized-binarized
3. float-binarized
4. float-float, with binarized retrieval
To evaluate the precision, compute the differences for each query @10, like:
```
def compute_list_differences(list1, list2):
set1 = set(list1)
set2 = set(list2)
return len(set1 - set2)
list1 = [1, 3, 5, 7, 9, 11, 13, 15, 17, 20]
list2 = [2, 3, 5, 7, 9, 11, 14, 15, 18, 20]
num_hits = compute_list_differences(list1, list2)
print(f"Hits different: {num_hits}")
```
## Remove the original embedding field from memory
The purpose of the binarization is reducing memory footprint. Given the results of the evaluation above, store the full-precision embeddings on disk or remove them altogether. Example with paging the attribute to disk-only:
```
schema doc {
document doc {
field doc_embedding type tensor(x[8]) {
indexing: summary | attribute | index
attribute: paged
}
}
field doc_embedding_binarized type tensor(x[1]) {
indexing: input doc_embedding | binarize | pack_bits | attribute | index
attribute {
distance-metric: hamming
}
index {
hnsw {
max-links-per-node: 16
neighbors-to-explore-at-insert: 200
}
}
}
}
```
This example only indexes the binarized embedding, with data binarized before indexing:
```
schema doc {
document doc {
field doc_embedding_binarized type tensor(x[1]) {
indexing: input doc_embedding | binarize | pack_bits | attribute | index
attribute {
distance-metric: hamming
}
index {
hnsw {
max-links-per-node: 16
neighbors-to-explore-at-insert: 200
}
}
}
}
}
```
## Appendix: Binarizing from text input
To generate the embedding from other data types, like text, use the [converters](../reference/writing/indexing-language.html#converters) - example:
```
field doc_embedding type tensor(x[1]) {
indexing: (input title || "") . " " . (input content || "") | embed | attribute
attribute {
distance-metric: hamming
}
}
```
Find examples in [Matryoshka 🤝 Binary vectors: Slash vector search costs with Vespa](https://blog.vespa.ai/combining-matryoshka-with-binary-quantization-using-embedder/).
## Appendix: conversion to int8
Find examples of how to binarize values in code:
```
import numpy as np
def floats_to_bits(floats):
if len(floats) != 8:
raise ValueError("Input must be a list of 8 floats.")
bits = [1 if f > 0 else 0 for f in floats]
return bits
def bits_to_int8(bits):
bit_string = ''.join(str(bit) for bit in bits)
int_value = int(bit_string, 2)
int8_value = np.int8(int_value)
return int8_value
def floats_to_int8(floats):
bits = floats_to_bits(floats)
int8_value = bits_to_int8(bits)
return int8_value
floats = [0.5, -1.2, 3.4, 0.0, -0.5, 2.3, -4.5, 1.2]
int8_value = floats_to_int8(floats)
print(f"The int8 value is: {int8_value}")
```
```
import numpy as np
def binarize_tensor(tensor: torch.Tensor) -> str:
"""
Binarize a floating-point 1-d tensor by thresholding at zero
and packing the bits into bytes. Returns the hex str representation of the bytes.
"""
if not tensor.is_floating_point():
raise ValueError("Input tensor must be of floating-point type.")
return (
np.packbits(np.where(tensor > 0, 1, 0), axis=0).astype(np.int8).tobytes().hex()
)
```
Multivector example, from[ColPali: Efficient Document Retrieval with Vision Language Models](https://vespa-engine.github.io/pyvespa/examples/colpali-document-retrieval-vision-language-models-cloud.html):
```
import numpy as np
from typing import Dict, List
from binascii import hexlify
def binarize_token_vectors_hex(vectors: List[torch.Tensor]) -> Dict[str, str]:
vespa_tensor = list()
for page_id in range(0, len(vectors)):
page_vector = vectors[page_id]
binarized_token_vectors = np.packbits(
np.where(page_vector > 0, 1, 0), axis=1
).astype(np.int8)
for patch_index in range(0, len(page_vector)):
values = str(
hexlify(binarized_token_vectors[patch_index].tobytes()), "utf-8"
)
if (
values == "00000000000000000000000000000000"
): # skip empty vectors due to padding of batch
continue
vespa_tensor_cell = {
"address": {"page": page_id, "patch": patch_index},
"values": values,
}
vespa_tensor.append(vespa_tensor_cell)
return vespa_tensor
```
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [Summary](#summary)
- [Converters](#converters)
- [Binarizing an existing embedding field](#binarizing-an-existing-embedding-field)
- [Pre-requisites](#pre-requisites)
- [Example](#example)
- [Define the binarized embedding field](#define-the-binarized-embedding-field)
- [Deploy and reindex the binarized embedding field](#deploy-and-reindex-the-binarized-embedding-field)
- [Create new ranking profiles and queries using the binarized embeddings](#create-new-ranking-profiles-and-queries-using-the-binarized-embeddings)
- [Quick Hamming distance intro](#quick-hamming-distance-intro)
- [Rank profiles and queries](#rank-profiles-and-queries)
- [TargetHits for ANN](#targethits-for-ann)
- [Evaluate the quality of the binarized embeddings](#evaluate-the-quality-of-the-binarized-embeddings)
- [Remove the original embedding field from memory](#remove-the-original-embedding-field-from-memory)
- [Appendix: Binarizing from text input](#appendix-binarizing-from-text-input)
- [Appendix: conversion to int8](#appendix-conversion-to-int8)
---
# Source: https://docs.vespa.ai/en/ranking/bm25.html.md
# The BM25 rank feature
The[bm25 rank feature](../reference/ranking/rank-features.html#bm25)implements the[Okapi BM25](https://en.wikipedia.org/wiki/Okapi_BM25)ranking function used to estimate the relevance of a text document given a search query. It is a pure text ranking feature which operates over an[indexed string field](../reference/schemas/schemas.html#indexing-index). The feature is cheap to compute, about 3-4 times faster than[nativeRank](nativerank.html), while still providing a good rank score quality wise. It is a good candidate to use in a first phase ranking function when ranking text documents.
## Ranking function
The _bm25_ feature calculates a score for how good a query with termsq1,...,qnmatches an indexed string field _t_ in a document _D_. The score is calculated as follows:
∑inIDF(qi)⋅f(qi,D)⋅(k1+1)f(qi,D)+k1⋅(1-b+b⋅field\_lenavg\_field\_len)
Where the components in the function are:
- IDF(qi): The [inverse document frequency](https://en.wikipedia.org/wiki/Tf%E2%80%93idf#Inverse_document_frequency) (_IDF_) of query term _i_ in field _t_. This is calculated as:
- f(qi,D): The number of occurrences (term frequency) of query term _i_ in the field _t_ of document _D_. For multi-value fields we use the sum of occurrences over all elements.
- field\_len: The field length (in number of words) of field _t_ in document _D_. For multi-value fields we use the sum of field lengths over all elements.
- avg\_field\_len: The average field length of field _t_ among the documents on the content node. Can be configured using [rank-properties](../reference/ranking/rank-feature-configuration.html#bm25).
- k1: A parameter used to limit how much a single query term can affect the score for document _D_. With a higher value the score for a single term can continue to go up relatively more when more occurrences for that term exists. Default value is 1.2. Can be configured using [rank-properties](../reference/ranking/rank-feature-configuration.html#bm25).
- b: A parameter used to control the effect of the field length of field _t_ compared to the average field length. Default value is 0.75. Can be configured using [rank-properties](../reference/ranking/rank-feature-configuration.html#bm25).
## Example
In the following example we have an indexed string field _content_, and a rank profile using the _bm25_ rank feature. Note that the field must be enabled for usage with the bm25 feature by setting the _enable-bm25_ flag in the[index](../reference/schemas/schemas.html#index)section of the field definition.
```
schema example {
document example {
field content type string {
indexing: index | summary
index: enable-bm25
}
}
rank-profile default {
first-phase {
expression {
bm25(content)
}
}
}
}
```
If the _enable-bm25_ flag is turned on after documents are already fed then [proton](../content/proton.html) performs a [memory index flush](../content/proton.html#memory-index-flush)followed by a [disk index fusion](../content/proton.html#disk-index-fusion) to prepare the posting lists for use with _bm25_.
Use the [custom component state API](../content/proton.html#custom-component-state-api) on each content node and examine `pending_urgent_flush` to determine if the preparation is still ongoing:
```
/state/v1/custom/component/documentdb/mydoctype/subdb/ready/index
```
Copyright © 2026 - [Cookie Preferences](#)
---
# Source: https://docs.vespa.ai/en/content/buckets.html.md
# Buckets
The content layer splits the document space into chunks called _buckets_, and algorithmically maps documents to buckets by their id. The cluster automatically splits and joins buckets to maintain a uniform distribution across all nodes and to keep bucket sizes within configurable limits.
Documents have string identifiers that maps to a 58 bit numeric location. A bucket is defined as all the documents that shares a given amount of the least significant bits within the location. The amount of bits used controls how many buckets will exist. For instance, if a bucket contains all documents whose 8 LSB bits is 0x01, the bucket can be split in two by using the 9th bit in the location to split them. Similarly, buckets can be joined by requiring one less bit in common.
## Distribution
Distribution happens in several layers.
- Documents map to 58 bit numeric locations.
- Locations map to buckets
- Buckets map to distributors responsible for handling requests related to those buckets.
- Buckets map to content nodes responsible for storing replicas of buckets.
### Document to location distribution
Document identifiers use [document identifier schemes](../schemas/documents.html)to map documents to locations. This way it is possible to co-locate data within buckets by enforcing some documents to have common LSB bits. Specifying a group or numeric value with the n and g options overrides the 32 LSB bits of the location. Only use when required, e.g. when using streaming search for personal search.
### Location to bucket distribution
The cluster state contains a distribution bit count, which is the amount of location bits to use to generate buckets which can be mapped to distributors.
The cluster state may change the number of distribution bits to adjust the number of buckets distributed at this level. When adding more nodes to the cluster, the number of buckets increases in order for the distribution to remain uniform.
Altering the distribution bit count causes a redistribution of all buckets.
If locations have been overridden to co-localize documents into few units, the distribution of documents into these buckets may be skewed.
### Bucket to distributor distribution
Buckets are mapped to distributors using the ideal state algorithm.
### Bucket to content node distribution
Buckets are mapped to content nodes using the ideal state algorithm. As the content nodes persist data, changing bucket ownership takes more time/resources than on the distributors.
There is usually a replica of a bucket on the same content node as the distributor owning the bucket, as the same algorithm is used.
The distributors may split the buckets further than the distribution bit count indicates, allowing more units to be distributed among the content nodes to create a more even distribution, while not affecting routing from client to distributors.
## Maintenance operations
The content layer defines a set of maintenance operations to keep the cluster balanced. Distributors schedule maintenance operations and issue them to content nodes. Maintenance operations are typically not high priority requests. Scheduling a maintenance operation does not block any external operations.
| Split bucket | Split a bucket in two, by enforcing the documents within the new buckets to have more location bits in common. Buckets are split either because they have grown too big, or because the cluster wants to use more distribution bits. |
| Join bucket | Join two buckets into one. If a bucket has been previously split due to being large, but documents have now been deleted, the bucket can be joined again. |
| Merge bucket | If there are multiple replicas of a bucket, but they do not store the same set of versioned documents, _merge_ is used to synchronize the replicas. A special case of a merge is a one-way merge, which may be done if some of the replicas are to be deleted right after the merge. Merging is used not only to fix inconsistent bucket replicas, but also to move buckets between nodes. To move a bucket, an empty replica is created on the target node, a merge is executed, and the source bucket is deleted. |
| Create bucket | This operation exist merely for the distributor to notify a content node that it is now to store documents for this bucket too. This allows content nodes to refuse operations towards buckets it does not own. The ability to refuse traffic is a safeguard to avoid inconsistencies. If a client talks to a distributor that is no longer working correctly, we rather want its requests to fail than to alter the content cluster in strange ways. |
| Delete bucket | Drop stored state for a bucket and reject further requests for it |
| (De)activate bucket | Activate bucket for search results - refer to [bucket management](proton.html#bucket-management) |
| Garbage collections | If configured, documents are periodically garbage collected through background maintenance operations. |
### Bucket split size
The distributors may split existing buckets further to keep bucket sizes at manageable levels, or to ensure more units to split among the backends and their partitions.
Using small buckets, the distribution will be more uniform and bucket operations will be smaller. Using large buckets, less memory is needed for metadata operations and bucket splitting and joining is less frequent.
The size limits may be altered by configuring [bucket splitting](../reference/applications/services/content.html#bucket-splitting).
## Document to bucket distribution
Each document has a document identifier following a document identifier[uri scheme](../schemas/documents.html). From this scheme a 58 bit numeric _location_ is generated. Typically, all the bits are created from an MD5 checksum of the whole identifier.
Schemes specifying a _groupname_, will have the LSB bits of the location set to a hash of the _groupname_. Thus, all documents belonging to that group will have locations with similar least significant bits, which will put them in the same bucket. If buckets end up split far enough to use more bits than the hash bits overridden by the group, the data will be split into many buckets, but each will typically only contain data for that group.
MD5 checksums maps document identifiers to random locations. This creates a uniform bucket distribution, and is default. For some use cases, it is better to co-locate documents, optimizing grouped access - an example is personal documents. By enforcing some documents to map to similar locations, these documents are likely to end up in the same actual buckets. There are several use cases for where this may be useful:
- When migrating documents for some entity between clusters, this may be implemented more efficient if the entity is contained in just a few buckets rather than having documents scattered around all the existing buckets.
- If operations to the cluster is clustered somehow, clustering the documents equally in the backend may make better use of caches. For instance, if a service stores data for users, and traffic is typically created for users at short intervals while the users are actively using the service, clustering user data may allow a lot of the user traffic to be easily cached by generic bucket caches.
If the `n=` option is specified, the 32 LSB bits of the given number overrides the 32 LSB bits of the location. If the `g=` option is specified, a hash is created of the group name, the hash value is then used as if it were specified with `n=`. When the location is calculated, it is mapped to a bucket. Clients map locations to buckets using[distribution bits](#location-to-bucket-distribution).
Distributors map locations to buckets by searching their bucket database, which is sorted in inverse location order. The common case is that there is one. If there are several, there is currently inconsistent bucket splitting. If there are none, the distributor will create a new bucket for the request if it is a request that may create new data. Typically, new buckets are generated split according to the distribution bit count.
Content nodes should rarely need to map documents to buckets, as distributors specify bucket targets for all requests. However, as external operations are not queued during bucket splits and joins, the content nodes remap operations to avoid having to fail them due to a bucket having recently been split or joined.
### Limitations
One basic limitation to the document to location mapping is that it may never change. If it changes, then documents will suddenly be in the wrong buckets in the cluster. This would violate a core invariant in the system, and is not supported.
To allow new functionality, document identifier schemes may be extended or created that maps to location in new ways, but the already existing ones must map the same way as they have always done.
Current document identifier schemes typically allow the 32 least significant bits to be overridden for co-localization, while the remaining 26 bits are reserved for bits created from the MD5 checksum.
### Splitting
When there are enough documents co-localized to the same bucket, causing the bucket to be split, it will typically need to split past the 32 LSB. At this split-level and beyond, there is no longer a 1-1 relationship between the node owning the bucket and the nodes its replica data will be stored on.
The effect of this is that documents sharing a location will be spread across nodes in the entire cluster once they reach a certain size. This enables efficient parallel processing.
## Bucket space
Buckets exist in the _default_ or _global_ bucket space.
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [Distribution](#distribution)
- [Document to location distribution](#documents-to-location-distribution)
- [Location to bucket distribution](#location-to-bucket-distribution)
- [Bucket to distributor distribution](#bucket-to-distributor-distribution)
- [Bucket to content node distribution](#bucket-to-content-node-distribution)
- [Maintenance operations](#maintenance-operations)
- [Bucket split size](#bucket-split-size)
- [Document to bucket distribution](#document-to-bucket-distribution)
- [Limitations](#limitations)
- [Splitting](#splitting)
- [Bucket space](#bucket-space)
---
# Source: https://docs.vespa.ai/en/operations/self-managed/build-install.html.md
# Build / install Vespa
To develop with Vespa, follow the [guide](https://github.com/vespa-engine/vespa#building) to set up a development environment on AlmaLinux 8 using Docker.
Build Vespa Java artifacts with Java \>= 17 and Maven \>= 3.6.3. Once built, Vespa Java artifacts are ready to be used and one can build a Vespa application using the [bundle plugin](../../applications/bundles.html#maven-bundle-plugin).
```
$ export MAVEN_OPTS="-Xms128m -Xmx1024m"
$ ./bootstrap.sh java && mvn install
```
See [vespa.ai releases](../../learn/releases.html).
## Container images
| Image | Description |
| --- | --- |
| [docker.io/vespaengine/vespa](https://hub.docker.com/r/vespaengine/vespa)
[ghcr.io/vespa-engine/vespa](https://github.com/orgs/vespa-engine/packages/container/package/vespa) | Container image for running Vespa. |
| [docker.io/vespaengine/vespa-build-almalinux-8](https://hub.docker.com/r/vespaengine/vespa-build-almalinux-8) | Container image for building Vespa on AlmaLinux 8. |
| [docker.io/vespaengine/vespa-dev-almalinux-8](https://hub.docker.com/r/vespaengine/vespa-dev-almalinux-8) | Container image for development of Vespa on AlmaLinux 8. Used for incremental building and system testing. |
## RPMs
Dependency graph:

Installing Vespa on AlmaLinux 8:
```
$ dnf config-manager \
--add-repo https://raw.githubusercontent.com/vespa-engine/vespa/master/dist/vespa-engine.repo
$ dnf config-manager --enable powertools
$ dnf install -y epel-release
$ dnf install -y vespa
```
Package repository hosting is graciously provided by [Cloudsmith](https://cloudsmith.com) which is a fully hosted, cloud-native and universal package management solution:[](https://cloudsmith.com)
**Important:** Please note that the retention of released RPMs in the repository is limited to the latest 50 releases. Use the Docker images (above) for installations of specific versions older than this. Any problems with released rpm packages will be fixed in subsequent releases, please [report any issues](https://vespa.ai/support/) - troubleshoot using the [install example](/en/operations/self-managed/multinode-systems.html#aws-ec2-singlenode).
Refer to [vespa.spec](https://github.com/vespa-engine/vespa/blob/master/dist/vespa.spec). Build RPMs for a given Vespa version X.Y.Z:
```
$ git clone https://github.com/vespa-engine/vespa
$ cd vespa
$ git checkout vX.Y.Z
$ docker run --rm -ti -v $(pwd):/wd:Z -w /wd \
docker.io/vespaengine/vespa-build-almalinux-8:latest \
make -f .copr/Makefile rpms outdir=/wd
$ ls *.rpm | grep -v debug
vespa-8.634.24-1.el8.src.rpm
vespa-8.634.24-1.el8.x86_64.rpm
vespa-ann-benchmark-8.634.24-1.el8.x86_64.rpm
vespa-base-8.634.24-1.el8.x86_64.rpm
vespa-base-libs-8.634.24-1.el8.x86_64.rpm
vespa-clients-8.634.24-1.el8.x86_64.rpm
vespa-config-model-fat-8.634.24-1.el8.x86_64.rpm
vespa-jars-8.634.24-1.el8.x86_64.rpm
vespa-libs-8.634.24.el8.x86_64.rpm
vespa-malloc-8.634.24-1.el8.x86_64.rpm
vespa-node-admin-8.634.24-1.el8.x86_64.rpm
vespa-tools-8.634.24-1.el8.x86_64.rpm
```
Find most utilities in the vespa-x.y.z\*.rpm - other RPMs:
| RPM | Description |
| --- | --- |
| vespa-tools | Tools accessing Vespa endpoints for query or document operations:
- [vespa-destination](/en/reference/operations/self-managed/tools.html#vespa-destination)
- [vespa-fbench](/en/reference/operations/tools.html#vespa-fbench)
- [vespa-feeder](/en/reference/operations/self-managed/tools.html#vespa-feeder)
- [vespa-get](/en/reference/operations/self-managed/tools.html#vespa-get)
- [vespa-query-profile-dump-tool](/en/reference/operations/tools.html#vespa-query-profile-dump-tool)
- [vespa-stat](/en/reference/operations/self-managed/tools.html#vespa-stat)
- [vespa-summary-benchmark](/en/reference/operations/self-managed/tools.html#vespa-summary-benchmark)
- [vespa-visit](/en/reference/operations/self-managed/tools.html#vespa-visit)
- [vespa-visit-target](/en/reference/operations/self-managed/tools.html#vespa-visit-target)
|
| vespa-malloc | Vespa has its own memory allocator, _vespa-malloc_ - refer to _/opt/vespa/etc/vespamalloc.conf_ |
| vespa-clients | _vespa-feed-client.jar_ - see [vespa-feed-client](../../clients/vespa-feed-client.html) |
Copyright © 2026 - [Cookie Preferences](#)
---
# Source: https://docs.vespa.ai/en/applications/bundles.html.md
# Bundles
The Container uses [OSGi](https://osgi.org) to provide a modular platform for developing applications that can be composed of many reusable components. The user can deploy, upgrade and remove these components at runtime.
## OSGi
OSGi is a framework for modular development of Java applications, where a set of resources called _bundles_ can be installed. OSGi allows the developer to control which resources (Java packages) in a bundle that should be available to other bundles. Hence, you can explicitly declare a bundle's public API, and also ensure that internal implementation details remain hidden.
Unless you're already familiar with OSGi, we recommend reading Richard S. Hall's presentation [Learning to ignore OSGi](https://cwiki.apache.org/confluence/download/attachments/7956/Learning_to_ignore_OSGi.pdf), which explains the most important aspects that you must relate to as a bundle developer. There are other good OSGi tutorials available:
- [OSGi for Dummies](https://thiloshon.wordpress.com/2020/03/04/osgi-for-dummies/)
- [OSGi Modularity and Services - Tutorial](https://www.vogella.com/tutorials/OSGi/article.html) (You can ignore the part about OSGi services.)
JDisc uses OSGi's _module_ and _lifecycle_ layers, and does not provide any functionality from the _service_ layer.
## OSGi bundles
An OSGi bundle is a regular JAR file with a MANIFEST.MF file that describes its content, what the bundle requires (imports) from other bundles, and what it provides (exports) to other bundles. Below is an example of a typical bundle manifest with the most important headers:
```
Bundle-SymbolicName: com.yahoo.helloworld
Bundle-Description: A Hello World bundle
Bundle-Version: 1.0.0
Export-Package: com.yahoo.helloworld;version="1.0.0"
Import-Package: org.osgi.framework;version="1.3.0"
```
The meaning of the headers in this bundle manifest is as follows:
- `Bundle-SymbolicName` - The unique identifier of the bundle.
- `Bundle-Description` - A human-readable description of the bundle's functionality.
- `Bundle-Version` - Designates a version number to the bundle.
- `Export-Package` - Expresses which Java packages contained in a bundle will be made available to the outside world.
- `Import-Package` - Indicates which Java packages will be required from the outside world to fulfill the dependencies needed in a bundle.
Note that OSGi has a strict definition of version numbers that need to be followed for bundles to work correctly. See the [OSGi javadoc](https://docs.osgi.org/javadoc/r4v42/org/osgi/framework/Version.html#Version(java.lang.String)) for details. As a general advice, never use more than three numbers in the version (major, minor, micro).
## Building an OSGi bundle
As long as the project was created by following steps in the [Developer Guide](developer-guide.html), the code is already being packaged into an OSGi bundle by the [Maven bundle plugin](#maven-bundle-plugin). However, if migrating an existing Maven project, change the packaging statement to:
```
```
container-plugin
```
```
and add the plugin to the build instructions:
```
```
com.yahoo.vespa
bundle-plugin
8.634.24
true
true
```
```
Because OSGi introduces a different runtime environment from what Maven provides when running unit tests, one will not observe any loading and linking errors until trying to deploy the application onto a running Container. Errors triggered at this stage will be the likes of `ClassNotFoundException` and `NoClassDefFoundError`. To debug these types of errors, inspect the stack traces in the [error log](../reference/operations/log-files.html), and refer to [troubleshooting](#troubleshooting).
[vespa-logfmt](../reference/operations/self-managed/tools.html#vespa-logfmt) with its _--nldequote_ option is useful when reading logs.
The test suite needs to cover deployment of the application bundle to ensure that its dynamic loading and linking issues are covered.
## Depending on non-OSGi ready libraries
Unfortunately, many popular Java libraries have yet to be bundled with the appropriate manifest that makes them OSGi-compatible. The simplest solution to this is to set the scope of the problematic dependency to **compile** in your pom.xml file. This will cause the bundle plugin to package the whole library into your bundle's JAR file. Until the offending library becomes available as an OSGi bundle, it means that your bundle will be bigger (in number of bytes), and that classes of that library can not be shared across application bundles.
The practical implication of this feature is that the bundle plugin copies the compile-scoped dependency, and its transitive dependencies, into the final JAR file, and adds a `Bundle-ClassPath` instruction to its manifest that references those dependencies.
Although this approach works for most non-OSGi libraries, it only works for libraries where the jar file is _self-contained_. If, on the other hand, the library depends on other installed files, it must be treated as if it was a [JNI library](#depending-on-JNI-libraries).
## Depending on JNI Libraries
This section details alternatives for using native code in the container.
### OSGi bundles containing native code
OSGi jars may contain .so files, which can be loaded in the standard way from Java code in the bundle. Note that since only one instance of an .so can be loaded at any time, it is not possible to hot swap a jar containing .so files - when such jars are changed the [new configuration will not take effect until the container is restarted](components.html#JNI-requires-restart). Therefore, it is often a good idea to package a .so file and its Java API into a separate bundle from the rest of your code to avoid having to restart the container on all code changes.
### Add JNI code to the global classpath
When the JNI dependency cannot be packaged in a bundle, and you run on an environment where you can install files locally on the container nodes, you can add the dependency to the container's classpath and explicitly export the packages to make them visible to OSGi bundles.
Add the following configuration in the top level _services_ element in [services.xml](../reference/applications/services/container.html):
```
```
/lib/jars/foo.jar:/path/bar.jar
com.foo,com.bar
...
```
```
Adding the config at the top level ensures that it's applied to all jdisc clusters.
The packages are now available and visible, but they must still be imported by the application bundle that uses the library. Here is how to configure the bundle plugin to enforce an import of the packages to the bundle:
```
com.yahoo.vespa
bundle-plugin
true \\com.foo,com.bar\ \
```
When adding a library to the classpath it becomes globally visible, and exempt from the package visibility management of OSGi. If another bundle contains the same library, there will be class loading issues.
## Maven bundle plugin
The _bundle-plugin_ is used to build and package components for the [Vespa Container](components.html) with Maven. Refer to the [multiple-bundles sample app](https://github.com/vespa-engine/sample-apps/tree/master/examples/multiple-bundles) for a practical example.
The minimal Maven _pom.xml_ configuration is:
```
4.0.0
com.yahoo.example
basic-application
container-plugin \
8.634.24
\
```
```
**Note:** If the requested document-summary only contains fields that are[attributes](../content/attributes.html), the summary store (and cache) is not used.
## Protocol phases caches
_ranking.queryCache_ and _groupingSessionCache_described in the [Query API reference](../reference/api/query.html)are only caching data in between phases for a given a query, so other queries do not get any benefits, but these caches saves container - content node(s) round-trips for a _given_ query.
Copyright © 2026 - [Cookie Preferences](#)
---
# Source: https://docs.vespa.ai/en/applications/chaining.html.md
# Chained Components
[Processors](processing.html), [searcher plug-ins](searchers.html) and [document processors](document-processors.html) are chained components. They are executed serially, with each providing some service or transform, and other optionally depending on these. In other words, a chain is a set of components with dependencies. Javadoc: [com.yahoo.component.chain.Chain](https://javadoc.io/doc/com.yahoo.vespa/chain/latest/com/yahoo/component/chain/Chain.html)
It is useful to read the [federation guide](../querying/federation.html) before this document.
A chained component has three basic differences from a component in general:
- The named services it _provides_ to other components in the chain.
- The list of services or checkpoints which the component itself should be _before_ in a chain, in other words, its dependents.
- The list of services or checkpoints which the component itself should be _after_ in a chain, in other words, its dependencies.
What a component should be placed before, what it should be placed after and what itself provides, may be either defined using Java annotations directly on the component class, or it may be added specifically to the component declarations in [services.xml](../reference/applications/services/container.html). In general, the implementation should have as many of the necessary annotations as practical, leaving the application specific configuration clean and simple to work with.
## Ordering Components
The execution order of the components in a chain is not defined by the order of the components in the configuration. Instead, the order is defined by adding the _ordering constraints_ to the components:
- Any component may declare that it `@Provides` some named functionality (the names are just labels that have no meaning to the container).
- Any component may declare that it must be placed `@Before` some named functionality,
- or that it must be placed `@After` some functionality.
The container will pick any ordering of a chain consistent with the constraints of the components in the chain.
Dependencies can be added in two ways. Dependencies which are due to the code should be added as annotations in the code:
```
import com.yahoo.processing.*;
import com.yahoo.component.chain.dependencies.*;@Provides("SourceSelection") @Before("Federation") @After("IntentModel")public class SimpleProcessor extends Processor {
@Override
public Response process(Request request, Execution execution) {
//TODO: Implement this
}
}
```
Multiple functionality names may be specified by using the syntax `@Provides/Before/After({"A",
"B"})`.
Annotations which do not belong in the code may be added in the[configuration](../reference/applications/services/container.html):
```
\ai.vespa.examples.Processor1\
```
For convenience, components always `Provides` their own fully qualified class name (the package and simple class name concatenated, e.g.`ai.vespa.examples.SimpleProcessor`) and their simple name (that is, only the class name, like`SimpleProcessor` in our searcher case), so it is always possible to declare that one must execute before or after some particular component. This goes for both general processors, searchers and document processors.
Finally, note that ordering constraints are just that; in particular they are not used to determine if a given search chain, or set of search chains, is “complete”.
## Chain Inheritance
As implied by examples above, chains may inherit other chains in _services.xml_.
```
```
```
```
A chain will include all components from the chains named in the optional `inherits` attribute, exclude from that set all components named in the also optional`excludes` attribute and add all the components listed inside the defining tag. Both `inherits` and`excludes` are space delimited lists of reference names.
For search chains, there are two built-in search chains which are especially useful to inherit from, `native` and `vespa`.`native` is a basic search chain, containing the basic functionality most systems will need anyway,`vespa` inherits from `native` and adds a few extra searchers which most installations containing Vespa backends will need.
```
```
```
```
## Unit Tests
A component should be unit tested in a chain containing the components it depends on. It is not necessary to run the dependency handling framework to achieve that, as the `com.yahoo.component.chain.Chain` class has several constructors which are easy to use while testing.
```
Chain c = new Chain(new UselessSearcher("first"),
new UselessSearcher("second"),
new UselessSearcher("third"));
Execution e = new Execution(c, Execution.Context.createContextStub(null));
Result r = e.search(new Query());
```
The above is a rather useless test, but it illustrates how the basic workflow can be simulated. The constructor will create a chain with supplied searchers in the given order (it will not analyze any annotations).
## Passing Information Between Components
When different searchers or document processors depend on shared classes or field names, it is good practice defining the name only in a single place. An [example](searchers.html#passing-information-between-searchers) in the searcher development introduction illustrates an easy way to do that.
## Invoking a Specific Search Chain
The search chain to use can be selected in the request, by adding the request parameter:`searchChain=myChain`
If no chain is selected in the query, the chain called`default` will be used. If no chain called`default` has been configured, the chain called`native` will be used. The _native_ chain is always present and contains a basic set of searchers needed in most applications. Custom chains will usually inherit the native chain to include those searchers.
The search chain can also be set in a [query profile](../querying/query-profiles.html).
## Example: Configuration
Annotations which do not belong in the code may be added in the configuration, here a simple example with[search chains](../reference/applications/services/search.html#chain):
```
\Cache\ \Statistics\ \Logging\ \SimpleTest\
```
And for [document processor chains](../reference/applications/services/docproc.html), it becomes:
```
\TextMetrics\
```
For searcher plugins the class[com.yahoo.search.searchchain.PhaseNames](https://javadoc.io/doc/com.yahoo.vespa/container-search/latest/com/yahoo/search/searchchain/PhaseNames.html)defines a set of checkpoints third party searchers may use to help order themselves when extending the Vespa search chains.
Note that ordering constraints are just that; in particular they are not used to determine if a given search chain, or set of search chains, is “complete”.
## Example: Cache with async write
Use case: In a search chain, do early return and do further search asynchronously using [ExecutorService](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/concurrent/ExecutorService.html).
Pseudocode: If cache hit (e.g. using Redis), just return cached data. If cache miss, return null data and let the following searcher finish further query and write back to cache:
```
```
public Result search(Query query, Execution execution) {
// cache lookup
if (cache_hit) {
return result;
}
else {
execution.search(query); // invoke async cache update searcher next in chain
return result;
}
}
```
```
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [Ordering Components](#ordering-components)
- [Chain Inheritance](#chain-inheritance)
- [Unit Tests](#unit-tests)
- [Passing Information Between Components](#passing-information-between-components)
- [Invoking a Specific Search Chain](#invoking-a-specific-search-chain)
- [Example: Configuration](#example-configuration)
- [Example: Cache with async write](#example-cache-with-async-write)
---
# Source: https://docs.vespa.ai/en/reference/rag/chunking.html.md
# Chunking Reference
Reference configuration for _chunkers_: Components that splits text into pieces in[chunk indexing expressions](../writing/indexing-language.html#chunk), as in
```
indexing: input myTextField | chunk fixed-length 500 | index
```
See also the [guide to working with chunks](../../rag/working-with-chunks.html).
## Built-in chunkers
Vespa provides these built-in chunkers:
| Chunker id | Arguments | Description |
| --- | --- | --- |
| sentence | - | Splits the text into chunks at sentence boundaries. |
| fixed-length | target chunk length in characters | Splits the text into chunks with roughly equal length. This will prefer to make chunks of similar length, and to split at reasonable locations over matching the target length exactly. |
## Chunker components
Chunkers are [components](../../applications/components.html), so you can also add your own:
```
```
foo
```
```
You create a chunker component by implementing the[com.yahoo.language.process.Chunker](https://github.com/vespa-engine/vespa/blob/master/linguistics/src/main/java/com/yahoo/language/process/Chunker.java)interface, see [these examples](https://github.com/vespa-engine/vespa/tree/master/linguistics/src/main/java/ai/vespa/language/chunker).
Copyright © 2026 - [Cookie Preferences](#)
---
# Source: https://docs.vespa.ai/en/operations/cloning.html.md
# Cloning applications and data
This is a guide on how to replicate a Vespa application in different environments, with or without data. Use cases for cloning include:
- Get a copy of the application and (some) data on a laptop to work offline, or attach a debugger.
- Deploy local experiments to the `dev` environment to easily cooperate and share.
- Set up a copy of the application and (some) data to test a new major version of Vespa.
- Replicate a bug report in a non-production environment.
- Set up a copy of the application and (some) data in a `prod` environment to experiment with a CI/CD pipeline, without touching the current production serving.
- Onboard a new team member by setting up a copy of the application and test data in a `dev` environment.
- Clone to a `dev` environment for load testing.
This guide uses _applications_. One can also use _instances_, but that will not work across Vespa major versions on Vespa Cloud - refer to [tenant, applications, instances](../learn/tenant-apps-instances) for details.
Vespa Cloud has different environments `dev` and `prod`, with different characteristics -[details](environments.html). Clone to `dev` for short-lived experiments/development/benchmarking, use `prod` for serving applications with a [CI/CD pipeline](automated-deployments.html).
As some steps are similar, it is a good idea to read through all, as details are added only the first time for brevity. Examples are based on the[album-recommendation](https://github.com/vespa-engine/sample-apps/tree/master/album-recommendation) sample application.
**Note:** When done, it is easy to tear down resources in Vespa Cloud. E.g., _https://console.vespa-cloud.com/tenant/mytenant/application/myapp/prod/deploy_ or_https://console.vespa-cloud.com/tenant/mytenant/application/myapp/dev/instance/default_ to find a delete-link. Instances in `dev` environments are auto-expired ([details](environments.html)), so application cloning is a safe way to work with Vespa. Find more information in [deleting applications](deleting-applications).
## Cloning - self-hosted to Vespa Cloud
**Source setup:**
```
$ docker run --detach --name vespa1 --hostname vespa-container1 \
--publish 8080:8080 --publish 19071:19071 \
vespaengine/vespa
$ vespa deploy -t http://localhost:19071
```
**Target setup:**
[Create a tenant](../basics/deploy-an-application.html) in the Vespa Cloud console, in this guide using "mytenant".
**Export source application package:**
This gets the application package and copies it out of the container to local file system:
```
$ vespa fetch -t http://localhost:19071 && \
unzip application.zip -x application.zip
```
**Deploy target application package**
The procedure differs a little whether deploying to dev or prod [environment](environments.html). The `mvn -U clean package` step is only needed for applications with custom code. Configure application name and create data plane credentials:
```
$ vespa config set target cloud && \
vespa config set application mytenant.myapp
$ vespa auth login
$ vespa auth cert -f
$ mvn -U clean package
```
**Note:** When deploying to a new app, one will often want to generate a new data plane cert/key pair. To do this, use `vespa auth cert -f`. If reusing a cert/key pair, drop `-f` and make sure to put the pair in _.vespa_, to avoid errors like`Error: open /Users/me/.vespa/mytenant.myapp.default/data-plane-public-cert.pem: no such file or directory`in the subsequent deploy step.
Then deploy the application. Depending on the use case, deploy to `dev` or `prod`:
- `dev`:
```
$ vespa deploy
```
Expect something like:
```
Uploading application package ... done
Success: Triggered deployment of . with run ID 1
Use vespa status for deployment status, or follow this deployment at
https://console.vespa-cloud.com/tenant/mytenant/application/myapp/dev/instance/default/job/dev-aws-us-east-1c/run/1
```
- Deployments to the `prod` environment requires [deployment.xml](/en/reference/applications/deployment.html) - select which [zone](https://cloud.vespa.ai/en/reference/zones) to deploy to:
```
$ cat < deployment.xml
aws-us-east-1c
EOF
```
`prod` deployments also require `resources` specifications in [services.xml](https://cloud.vespa.ai/en/reference/services) - use [vespa-documentation-search](https://github.com/vespa-cloud/vespa-documentation-search/blob/main/src/main/application/services.xml) as an example and add/replace `nodes` elements for `container` and `content` clusters. If in doubt, just add a small config to start with, and change later:
```
```
Deploy the application package:
```
$ vespa prod deploy
```
Expect something like:
```
Hint: See[production deployment](production-deployment.html)Success: Deployed .
See https://console.vespa-cloud.com/tenant/mytenant/application/myapp/prod/deployment for deployment progress
```
A proper deployment to a `prod` zone should have automated tests, read more in [automated deployments](automated-deployments.html)
**Data copy**
Export documents from the local instance and feed to the Vespa Cloud instance:
```
$ vespa visit -t http://localhost:8080 | vespa feed -
```
Add more parameters as needed to `vespa feed` for other endpoints.
**Get access log from source:**
```
$ docker exec vespa1 cat /opt/vespa/logs/vespa/access/JsonAccessLog.default
```
## Cloning - Vespa Cloud to self-hosted
**Download application from Vespa Cloud**
Validate the endpoint, and fetch the application package:
```
$ vespa config get application
application = mytenant.myapp.default
$ vespa fetch
Downloading application package... done
Success: Application package written to application.zip
```
The application package can also be downloaded from the Vespa Cloud Console:
- dev: Navigate to _https://console.vespa-cloud.com/tenant/mytenant/application/myapp/dev/instance/default_, click _Application_ to download:
- prod: Navigate to _https://console.vespa-cloud.com/tenant/mytenant1/application/myapp/prod/deployment?tab=builds_ and select the version of the application to download:
**Target setup:**
Note the name of the application package .zip-file just downloaded. If changes are needed, unzip it and use `vespa deploy -t http://localhost:19071 `to deploy from current directory:
```
$ docker run --detach --name vespa1 --hostname vespa-container1 \
--publish 8080:8080 --publish 19071:19071 \
vespaengine/vespa
$ vespa config set target local
$ vespa deploy -t http://localhost:19071 mytenant.myapp.default.dev.aws-us-east-1c.zip
```
**Data copy**
Set config target cloud for `vespa visit` and pipe the jsonl output into `vespa feed` to the local instance:
```
$ vespa config set target cloud
$ vespa visit | vespa feed - -t http://localhost:8080
```
**data copy - minimal**
For use cases requiring a few documents, visit just a few documents:
```
$ vespa visit --chunk-count 10
```
**Get access log from source:**
Use the Vespa Cloud Console to get access logs
## Cloning - Vespa Cloud to Vespa Cloud
This is a combination of the procedures above. Download the application package from dev or prod, make note of the source name, like mytenant.myapp.default. Then use `vespa deploy` or `vespa prod deploy` as above to deploy to dev or prod.
If cloning from `dev` to `prod`, pay attention to changes in _deployment.xml_ and _services.xml_as in [cloning to Vespa Cloud](#cloning---self-hosted-to-vespa-cloud).
**Data copy**
Set the feed endpoint name / paths, e.g. mytenant.myapp-new.default:
```
$ vespa config set target cloud
$ vespa visit | vespa feed - -t https://default.myapp-new.mytenant.aws-us-east-1c.dev.z.vespa-app.cloud
```
**Data copy 5%**Set the –selection argument to `vespa visit` to select a subset of the documents.
## Cloning - self-hosted to self-hosted
Creating a copy from one self-hosted application to another. Self-hosted means running [Vespa](https://vespa.ai/) on a laptop or a [multinode system](self-managed/multinode-systems.html).
This example sets up a source app and deploys the [application package](../basics/applications.html) - use [album-recommendation](https://github.com/vespa-engine/sample-apps/tree/master/album-recommendation)as an example. The application package is then exported from the source and deployed to a new target app. Steps:
**Source setup:**
```
$ vespa config set target local
$ docker run --detach --name vespa1 --hostname vespa-container1 \
--publish 8080:8080 --publish 19071:19071 \
vespaengine/vespa
$ vespa deploy -t http://localhost:19071
```
**Target setup:**
```
$ docker run --detach --name vespa2 --hostname vespa-container2 \
--publish 8081:8080 --publish 19072:19071 \
vespaengine/vespa
```
**Export source application package**
Export files:
```
$ vespa fetch -t http://localhost:19071
```
**Deploy application package to target**
Before deploying, one can make changes to the application package files as needed. Deploy to target:
```
$ vespa deploy -t http://localhost:19072 application.zip
```
**Data copy from source to target**
This pipes the source data directly into `vespa feed` - another option is to save the data to files temporarily and feed these individually:
```
$ vespa visit -t http://localhost:8080 | vespa feed - -t http://localhost:8081
```
**Data copy 5%**
This is an example on how to use a [selection](../reference/writing/document-selector-language.html)to specify a subset of the documents - here a "random" 5% selection:
```
$ vespa visit -t http://localhost:8080 --selection 'id.hash().abs() % 20 = 0' | \
vespa feed - -t http://localhost:8081
```
**Get access log from source**
Get the current query access log from the source application (there might be more files there):
```
$ docker exec vespa1 cat /opt/vespa/logs/vespa/access/JsonAccessLog.default
```
Copyright © 2026 - [Cookie Preferences](#)
---
# Source: https://docs.vespa.ai/en/security/cloudflare-workers.html.md
# Using Cloudflare Workers with Vespa Cloud
This guide describes how you can access mutual TLS protected Vespa Cloud endpoints using[Cloudflare Workers](https://workers.cloudflare.com/).
## Writing and reading from Vespa Cloud Endpoints
Vespa Cloud's endpoints are protected using mutual TLS. This means the client must present a TLS certificate that the Vespa application trusts. The application knows which certificate to trust because the certificate is included in the Vespa application package.
### mTLS Configuration
Mutual TLS certificates can be created using the[Vespa CLI](../clients/vespa-cli.html):
For example, for tenant `samples` with application `vsearch` and instance `default`:
```
$ vespa auth cert --application samples.vsearch.default
Success: Certificate written to security/clients.pem
Success: Certificate written to $HOME/.vespa/samples.vsearch.default/data-plane-public-cert.pem
Success: Private key written to $HOME/.vespa/samples.vsearch.default/data-plane-private-key.pem
```
Refer to the [security guide](guide) for details.
### Creating a Cloudflare Worker to interact with mTLS Vespa Cloud endpoints
In March 2023, Cloudflare announced [Mutual TLS available for Workers](https://blog.cloudflare.com/mtls-workers/), see also [Workers Runtime API mTLS](https://developers.cloudflare.com/workers/runtime-apis/mtls/).
Install wrangler and create a worker project. Wrangler is the Cloudflare command line interface (CLI), refer to[Workers:Get started guide](https://developers.cloudflare.com/workers/get-started/guide/). Once configured and authenticated, one can upload the Vespa Cloud data plane certificates to Cloudflare.
Upload Vespa Cloud mTLS certificates to Cloudflare:
```
$ npx wrangler mtls-certificate upload \
--cert $HOME/.vespa/samples.vsearch.default/data-plane-public-cert.pem \
--key $HOME/.vespa/samples.vsearch.default/data-plane-private-key.pem \
--name vector-search-dev
```
The output will look something like this:
```
Uploading mTLS Certificate vector-search-dev...
Success! Uploaded mTLS Certificate vector-search-dev
ID: 63316464-1404-4462-baf7-9e9f81114d81
Issuer: CN=cloud.vespa.example
Expires on 3/11/2033
```
Notice the `ID` in the output; This is the `certificate_id` of the uploaded mTLS certificate. To use the certificate in the worker code, add an `mtls_certificates` variable to the `wrangler.toml` file in the project to bind a name to the certificate id. In this case, bind to `VESPA_CERT`:
```
mtls_certificates = [
{ binding = "VESPA_CERT", certificate_id = "63316464-1404-4462-baf7-9e9f81114d81" }
]
```
With the above binding in place, you can access the `VESPA_CERT` in Worker code like this:
```
export default {
async fetch(request, env) {
return await env.VESPA_CERT.fetch("https://vespa-cloud-endpoint");
}
}
```
Notice that `env` is a variable passed by the Cloudflare worker infrastructure.
### Worker example
The following worker example forwards POST and GET HTTP requests to the `/search/` path of the Vespa cloud endpoint. It rejects other paths or other HTTP methods.
```
/**
* Simple Vespa proxy that forwards read (POST and GET) requests to the
* /search/ endpoint
* Learn more at https://developers.cloudflare.com/workers/
*/
export default {
async fetch(request, env, ctx) {
//Change to your endpoint url, obtained from the Vespa Cloud Console.
//Use global endpoint if you have global routing with multiple Vespa regions
const vespaEndpoint = "https://vsearch.samples.aws-us-east-1c.dev.z.vespa-app.cloud";
async function MethodNotAllowed(request) {
return new Response(`Method ${request.method} not allowed.`, {
status: 405,
headers: {
Allow: 'GET,POST',
}
});
}
async function NotAcceptable(request) {
return new Response(`Path not Acceptable.`, {
status: 406,
});
}
if (request.method !== 'GET' && request.method !== 'POST') {
return MethodNotAllowed(request);
}
let url = new URL(request.url)
const { pathname, search } = url;
if (!pathname.startsWith("/search/")) {
return NotAcceptable(request);
}
const destinationURL = `${vespaEndpoint}${pathname}${search}`;
let new_request = new Request(destinationURL, request);
return await env.VESPA_CERT.fetch(new_request)
},
};
```
To deploy the above to the worldwide global edge network of Cloudflare, use:
```
$ npx wrangler publish
```
To start a local instance, use:
```
$ npx wrangler dev
```
Test using `curl`:
```
$ curl --json '{"yql": "select * from sources * where true"}' http://127.0.0.1:8787/search/
```
After publishing to Cloudflare production:
```
$ curl --json '{"yql": "select * from sources * where true"}' https://your-worker-name.workers.dev/search/
```
## Data plane access control permissions
Vespa Cloud supports having multiple certificates to separate `read` and `write` access.
This way, one can upload the read-only certificate to a Cloudflare worker to limit write access.
See [Data plane access control permissions](guide#permissions).
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [Writing and reading from Vespa Cloud Endpoints](#writing-and-reading-from-vespa-cloud-endpoints)
- [mTLS Configuration](#mtls-configuration)
- [Creating a Cloudflare Worker to interact with mTLS Vespa Cloud endpoints](#creating-a-cloudflare-worker-to-interact-with-mtls-vespa-cloud-endpoints)
- [Worker example](#worker-example)
- [Data plane access control permissions](#data-plane-access-control-permissions)
---
# Source: https://docs.vespa.ai/en/reference/api/cluster-v2.html.md
# /cluster/v2 API reference
The cluster controller has a /cluster/v2 API for viewing and modifying a content cluster state. To find the URL to access this API, identify the [cluster controller services](../../content/content-nodes.html#cluster-controller) in the application. Only the master cluster controller will be able to respond. The master cluster controller is the cluster controller alive that has the lowest index. Thus, one will typically use cluster controller 0, but if contacting it fails, try number 1 and so on. Using [vespa-model-inspect](/en/reference/operations/self-managed/tools.html#vespa-model-inspect):
```
$ vespa-model-inspect service -u container-clustercontroller
container-clustercontroller @ hostname.domain.com : admin
admin/cluster-controllers/0
http://hostname.domain.com:19050/ (STATE EXTERNAL QUERY HTTP)
http://hostname.domain.com:19117/ (EXTERNAL HTTP)
tcp/hostname.domain.com:19118 (MESSAGING RPC)
tcp/hostname.domain.com:19119 (ADMIN RPC)
```
In this example, there is only one clustercontroller, and the State Rest API is available on the port marked STATE and HTTP, 19050 in this example. This information can also be retrieved through the model config in the config server.
Find examples of API usage in [content nodes](../../content/content-nodes.html#cluster-v2-API-examples).
## HTTP requests
| HTTP request | cluster/v2 operation | Description |
| --- | --- | --- |
| GET |
List cluster and nodes. Get cluster, node or disk states.
|
| | List content clusters |
```
/cluster/v2/
```
|
| | Get cluster state and list service types within cluster |
```
/cluster/v2/
```
|
| | List nodes per service type for cluster |
```
/cluster/v2//
```
|
| | Get node state |
```
/cluster/v2///
```
|
| PUT |
Set node state
|
| | Set node user state |
```
/cluster/v2///
```
|
## Node state
Content and distributor nodes have state:
| State | Description |
| --- | --- |
|
`Up`
| The node is up and available to keep buckets and serve requests. |
|
`Down`
| The node is not available, and can not be used. |
|
`Stopping`
| This node is stopping and is expected to be down soon. This state is typically only exposed to the cluster controller to tell why the node stopped. The cluster controller will expose the node as down or in maintenance mode for the rest of the cluster. This state is thus not seen by the distribution algorithm. |
|
`Maintenance`
| This node is temporarily unavailable. The node is available for bucket placement, so redundancy is lower. Using this mode, new replicas of the documents stored on this node will not be created, allowing the node to be down with less of a performance impact on the rest of the cluster. This mode is typically used to mask a down state during controlled node restarts, or by an administrator that need to do some short maintenance work, like upgrading software or restart the node. |
|
`Retired`
| A retired node is available and serves requests. This state is used to remove nodes while keeping redundancy. Buckets are moved to other nodes (with low priority), until empty. Special considerations apply when using [grouped distribution](../../content/elasticity.html#grouped-distribution) as buckets are not necessarily removed. |
Distributor nodes start / transfer buckets quickly and are hence not in `maintenance` or `retired`.
Refer to [examples](../../content/content-nodes.html#cluster-v2-API-examples) of manipulating states.
## Types
| Type | Spec | Description |
| --- | --- | --- |
|
cluster
| _\_ | The name given to a content cluster in a Vespa application. |
|
description
| _.\*_ | Description can contain anything that is valid JSON. However, as the information is presented in various interfaces, some which may present reasons for all the states in a cluster or similar, keeping it short and to the point makes it easier to fit the information neatly into a table and get a better cluster overview. |
|
group-spec
| _\_(\._\_)\* | The hierarchical group assignment of a given content node. This is a dot separated list of identifiers given in the application services.xml configuration. |
|
node
| [0-9]+ | The index or distribution key identifying a given node within the context of a content cluster and a service type. |
|
service-type
| (distributor|storage) | The type of the service to look at state for, within the context of a given content cluster. |
|
state-disk
| (up|down) | One of the valid disk states. |
|
state-unit
| [up](#up) | [stopping](#stopping) | [down](#down) |
The cluster controller fetches states from all nodes, called _unit states_. States reported from the nodes are either `up` or `stopping`. If the node can not be reached, a `down` state is assumed.
This means, the cluster controller detects failed nodes. The subsequent _generated states_ will have nodes in `down`, and the [ideal state algorithm](../../content/idealstate.html) will redistribute [buckets](../../content/buckets.html) of documents.
|
|
state-user
| [up](#up) | [down](#down) | [maintenance](#maintenance) | [retired](#retired) |
Use tools for [user state management](/en/operations/self-managed/admin-procedures.html#cluster-state).
- Retire a node from a cluster - use `retired` to move buckets to other nodes
- Short-lived maintenance work - use `maintenance` to avoid merging buckets to other nodes
- Fail a bad node. The cluster controller or an operator can set a node `down`
|
|
state-generated
| [up](#up) | [down](#down) | [maintenance](#maintenance) | [retired](#retired) |
The cluster controller generates the cluster state from the `unit` and `user` states, over time. The generated state is called the _cluster state_.
|
## Request parameters
| Parameter | Type | Description |
| --- | --- | --- |
| recursive | number |
Number of levels, or `true` for all levels. Examples:
- Use `recursive=1` for a node request to also see all data
- use `recursive=2` to see all the node data within each service type
In recursive mode, you will see the same output as found in the spec below. However, where there is a `{ "link" : "" }` element, this element will be replaced by the content of that request, given a recursive value of one less than the request above.
|
## HTTP status codes
Non-exhaustive list of status codes:
| Code | Description |
| --- | --- |
| 200 | OK. |
| 303 |
Cluster controller not master - master known.
This error means communicating with the wrong cluster controller. This returns a standard HTTP redirect, so the HTTP client can automatically redo the request on the correct cluster controller.
As the cluster controller available with the lowest index will be the master, the cluster controllers are normally queried in index order. Hence, it is unlikely to ever get this error, but rather fail to connect to the cluster controller if it is not the current master.
```
HTTP/1.1 303 See Other
Location: http://\/\Content-Type: application/json
{
"message" : "Cluster controllerindexnot master. Use master at indexindex.
}
```
|
| 503 |
Cluster controller not master - unknown or no master.
This error is used if the cluster controller asked is not master, and it doesn't know who the master is. This can happen, e.g. in a network split, where cluster controller 0 no longer can reach cluster controller 1 and 2, in which case cluster controller 0 knows it is not master, as it can't see the majority, and cluster controller 1 and 2 will vote 1 to master.
```
HTTP/1.1 503 Service Unavailable
Content-Type: application/json
{
"message" : "No known master cluster controller currently exist."
}
```
|
## Response format
Responses are in JSON format, with the following fields:
| Field | Description |
| --- | --- |
| message | An error message — included for failed requests. |
| ToDo | Add more fields here. |
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [HTTP requests](#http-requests)
- [Node state](#node-state)
- [Types](#types)
- [Request parameters](#request-parameters)
- [HTTP status codes](#http-status-codes)
- [Response format](#response-format)
---
# Source: https://docs.vespa.ai/en/reference/operations/metrics/clustercontroller.html.md
# ClusterController Metrics
| Name | Unit | Description |
| --- | --- | --- |
|
cluster-controller.down.count
| node | Number of content nodes down |
|
cluster-controller.initializing.count
| node | Number of content nodes initializing |
|
cluster-controller.maintenance.count
| node | Number of content nodes in maintenance |
|
cluster-controller.retired.count
| node | Number of content nodes that are retired |
|
cluster-controller.stopping.count
| node | Number of content nodes currently stopping |
|
cluster-controller.up.count
| node | Number of content nodes up |
|
cluster-controller.cluster-state-change.count
| node | Number of nodes changing state |
|
cluster-controller.nodes-not-converged
| node | Number of nodes not converging to the latest cluster state version |
|
cluster-controller.stored-document-count
| document | Total number of unique documents stored in the cluster |
|
cluster-controller.stored-document-bytes
| byte | Combined byte size of all unique documents stored in the cluster (not including replication) |
|
cluster-controller.cluster-buckets-out-of-sync-ratio
| fraction | Ratio of buckets in the cluster currently in need of syncing |
|
cluster-controller.busy-tick-time-ms
| millisecond | Time busy |
|
cluster-controller.idle-tick-time-ms
| millisecond | Time idle |
|
cluster-controller.work-ms
| millisecond | Time used for actual work |
|
cluster-controller.is-master
| binary | 1 if this cluster controller is currently the master, or 0 if not |
|
cluster-controller.remote-task-queue.size
| operation | Number of remote tasks queued |
|
cluster-controller.node-event.count
| operation | Number of node events |
|
cluster-controller.resource\_usage.nodes\_above\_limit
| node | The number of content nodes above resource limit, blocking feed |
|
cluster-controller.resource\_usage.max\_memory\_utilization
| fraction | Current memory utilisation, for content node with the highest value |
|
cluster-controller.resource\_usage.max\_disk\_utilization
| fraction | Current disk space utilisation, for content node with the highest value |
|
cluster-controller.resource\_usage.memory\_limit
| fraction | Memory space limit as a fraction of available memory |
|
cluster-controller.resource\_usage.disk\_limit
| fraction | Disk space limit as a fraction of available disk space |
|
reindexing.progress
| fraction | Re-indexing progress |
Copyright © 2026 - [Cookie Preferences](#)
---
# Source: https://docs.vespa.ai/en/applications/components.html.md
# Source: https://docs.vespa.ai/en/reference/applications/components.html.md
# Component reference
A component is any Java class whose lifetime is controlled by the container, see the [Developer Guide](../../applications/developer-guide.html) for an introduction. Components are specified and configured in services.xml and can have other components, and config (represented by generated "Config" classes) [injected](../../applications/dependency-injection.html) at construction time, and in turn be injected into other components.
Whenever a component or a resource your component depends on is changed by a redeployment, your component is reconstructed. Once all changed components are reconstructed, new requests are atomically switched to use the new set and the old ones are destructed.
If you have multiple constructors in your component, annotate the one to use for injection by `@com.yahoo.component.annotation.Inject`.
Identifiable components must implement `com.yahoo.component.Component`, and components that need to destruct resources at removal must subclass `com.yahoo.component.AbstractComponent` and implement `deconstruct()`.
See the [example](../../operations/metrics.html#example-qa) for common questions about component uniqueness / lifetime.
## Component Types
Vespa defined various component types (superclasses) for common tasks:
| Component type | Description |
| --- | --- |
| Request handler |
[Request handlers](../../applications/request-handlers.html) allow applications to implement arbitrary HTTP APIs. A request handler accepts a request and returns a response. Custom request handlers are subclasses of [ThreadedHttpRequestHandler](https://javadoc.io/doc/com.yahoo.vespa/container-disc/latest/com/yahoo/container/jdisc/ThreadedHttpRequestHandler.html).
|
| Processor |
The [processing framework](../../applications/processing.html) can be used to create general composable synchronous request-response systems. Searchers and search chains are an instantiation (through subclasses) of this general framework for a specific domain. Processors are invoked synchronously and the response is a tree of arbitrary data elements. Custom output formats can be defined by adding [renderers](#renderers).
|
| Renderer |
Renderers convert a Response (or query Result) into a serialized form sent over the network. Renderers are subclasses of [com.yahoo.processing.rendering.Renderer](https://github.com/vespa-engine/vespa/blob/master/container-disc/src/main/java/com/yahoo/processing/rendering/Renderer.java).
|
| Searcher |
Searchers processes Queries and their Results. Since they are synchronous, they can issue multiple queries serially or in parallel to e.g. implement federation or decorate queries with information fetched from a content cluster. Searchers are composed into _search chains_ defined in services.xml. A query request selects a particular search chain which implements the logic of that query. [Read more](../../applications/searchers.html).
|
| Document processor |
Document processors processes incoming document operations. Similar to Searchers and Processors they can be composed in chains, but document processors are asynchronous. [Read more](../../applications/document-processors.html).
|
| Binding |
A binding matches a request URI to the correct [filter chain](#filter) or [request handler](#request-handlers), and route outgoing requests to the correct [client](#client). For instance, the binding _http://\*/\*_ would match any HTTP request, while _http://\*/processing_ would only match that specific path. If several bindings match, the most specific one is chosen.
| Server binding |
A server binding is a rule for matching incoming requests to the correct request handler, basically the JDisc building block for implementing RESTful APIs.
|
| Client binding |
A client binding is a pattern which is used to match requests originating inside the container, e.g. when doing federation, to a client provider. That is, it is a rule which determines what code should handle a given outgoing request.
|
|
| Filter |
A filter is a lightweight request checker. It may set some specific request property, or it may do security checking and simply block requests missing some mandatory property or header.
|
| Client |
Clients, or client providers, are implementations of clients for different protocols, or special rules for given protocols. When a JDisc application acts as a client, e.g. fetches a web page from another host, it is a client provider that handles the transaction. Bindings are used, as with request handlers and filters, to choose the correct client, matching protocol, server, etc., and then hands off the request to the client provider. There is no problem in using arbitrary other types of clients for external services in processors and request handlers.
|
## Component configurations
This illustrates a typical component configuration set up by the Vespa container: 
The network layer associates a Request with a _response handler_ and routes it to the correct type of [request handler](#request-handlers) (typically based on URI binding patterns).
If an application needs lightweight request-response processing using decomposition by a series of chained logical units, the [processing framework](../../applications/processing.html) is the correct family of components to use. The request will be routed from ProcessingHandler through one or more chains of [Processor](#processors) instances. The exact format of the output is customizable using a [Renderer](#renderers).
If doing queries, SearchHandler will create a Query object, route that to the pertinent chain of [Searcher](#searchers) instances, and associate the returned Result with the correct [Renderer](#renderers) instance for optional customization of the output format.
The DocumentProcessingHandler is usually invoked from messagebus, and used for feeding documents into an index or storage. The incoming data is used to build a Document object, and this is then feed through a chain of [DocumentProcessor](#document-processors) instances.
If building an application with custom HTTP APIs, for instance arbitrary REST APIs, the easiest way is building a custom [RequestHandler](#request-handlers). This gets the Request, which is basically a set of key-value pairs, and returns a stream of arbitrary data back to the network.
## Injectable Components
These components are available from Vespa for [injection](../../applications/dependency-injection.html) into applications in various contexts:
| Component | Description |
| --- | --- |
| Always available |
| --- |
| [AthenzIdentityProvider](https://github.com/vespa-engine/vespa/blob/master/container-disc/src/main/java/com/yahoo/container/jdisc/athenz/AthenzIdentityProvider.java) | Provides the application's Athenz-identity and gives access to identity/role certificate and tokens. |
| [BertBaseEmbedder](https://github.com/vespa-engine/vespa/blob/master/model-integration/src/main/java/ai/vespa/embedding/BertBaseEmbedder.java) | A BERT-Base compatible embedder, see [BertBase embedder](../../rag/embedding.html#bert-embedder). |
| [ConfigInstance](https://github.com/vespa-engine/vespa/blob/master/config-lib/src/main/java/com/yahoo/config/ConfigInstance.java) | Configuration is injected into components as `ConfigInstance` components - see [configuring components](../../applications/configuring-components.html). |
| [Executor](https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Executor.html) | Default threadpool for processing requests in threaded request handler |
| [Linguistics](https://github.com/vespa-engine/vespa/blob/master/linguistics/src/main/java/com/yahoo/language/Linguistics.java) | Inject a Linguistics component like [SimpleLinguistics](https://github.com/vespa-engine/vespa/blob/master/linguistics/src/main/java/com/yahoo/language/simple/SimpleLinguistics.java) or provide a custom implementation - see [linguistics](../../linguistics/linguistics.html). |
| [Metric](https://github.com/vespa-engine/vespa/blob/master/jdisc_core/src/main/java/com/yahoo/jdisc/Metric.java) | Jdisc core interface for metrics. Required by all subclasses of ThreadedRequestHandler. |
| [MetricReceiver](https://github.com/vespa-engine/vespa/blob/master/container-disc/src/main/java/com/yahoo/metrics/simple/MetricReceiver.java) | Use to emit metrics from a component. Find an example in the [metrics](../../operations/metrics.html#metrics-from-custom-components) guide. |
| [ModelsEvaluator](https://github.com/vespa-engine/vespa/blob/master/model-evaluation/src/main/java/ai/vespa/models/evaluation/ModelsEvaluator.java) | Evaluates machine-learned models added to Vespa applications and available as config form. |
| [SentencePieceEmbedder](https://github.com/vespa-engine/vespa/blob/master/linguistics-components/src/main/java/com/yahoo/language/sentencepiece/SentencePieceEmbedder.java) | A native Java implementation of SentencePiece, see [SentencePiece embedder](../rag/embedding.html#sentencepiece-embedder). |
| [VespaCurator](https://github.com/vespa-engine/vespa/blob/master/zkfacade/src/main/java/com/yahoo/vespa/curator/api/VespaCurator.java) |
A client for ZooKeeper. For use in container clusters that have ZooKeeper enabled. See [using ZooKeeper](../../applications/using-zookeeper).
|
| [VipStatus](https://github.com/vespa-engine/vespa/blob/master/container-disc/src/main/java/com/yahoo/container/handler/VipStatus.java) | Use this to gain control over the service status (up/down) to be emitted from this container. |
| [WordPieceEmbedder](https://github.com/vespa-engine/vespa/blob/master/linguistics-components/src/main/java/com/yahoo/language/wordpiece/WordPieceEmbedder.java) | An implementation of the WordPiece embedder, usually used with BERT models. Refer to [WordPiece embedder](../rag/embedding.html#wordpiece-embedder). |
| [SystemInfo](https://github.com/vespa-engine/vespa/blob/master/hosted-zone-api/src/main/java/ai/vespa/cloud/SystemInfo.java) | Vespa Cloud: Provides information about the environment the component is running in. [Read more](/en/applications/components.html#the-systeminfo-injectable-component). |
| Available in containers having `search` |
| --- |
| [DocumentAccess](https://github.com/vespa-engine/vespa/blob/master/documentapi/src/main/java/com/yahoo/documentapi/DocumentAccess.java) | To use the [Document API](../../writing/document-api-guide.html). |
| [ExecutionFactory](https://github.com/vespa-engine/vespa/blob/master/container-search/src/main/java/com/yahoo/search/searchchain/ExecutionFactory.java) | To execute new queries from code. [Read more](../../applications/web-services.html#queries). |
| [Map\](https://github.com/vespa-engine/vespa/blob/master/model-evaluation/src/main/java/ai/vespa/models/evaluation/Model.java) | Use to inject a set of Models, see [Stateless Model Evaluation](../../ranking/stateless-model-evaluation.html). |
| Available in containers having `document-api` or `document-processing` |
| --- |
| [DocumentAccess](https://github.com/vespa-engine/vespa/blob/master/documentapi/src/main/java/com/yahoo/documentapi/DocumentAccess.java) | To use the [Document API](../../writing/document-api-guide.html). |
## Component Versioning
Components as well as many other artifacts in the container can be versioned. This document explains the format and semantics of these versions and how they are referred.
### Format
Versions are on the form:
```
version ::= major ["." minor [ "." micro [ "." qualifier]]]
```
Where `major`, `minor`, and `micro` are integers and `qualifier` is any string.
A version is appended to an id separated by a colon. In cases where a file is created for each component version, the colon is replaced by a dash in the file name.
### Ordering
Versions are ordered first by major, then minor, then micro and then by doing a lexical ordering on the qualifier. This means that `a:1 < a:1.0 < a:1.0.0 < a:1.1 < a:1.1.0 < a:2`
### Referencing a versioned Component
Whenever component is referenced by id (in code or configuration), a fully or partially specified version may be included in the reference by using the form `id:versionSpecification`. Such references are resolved using the following rules:
- An id without any version specification resolves to the highest version not having a qualifier.
- A partially or full version specification resolves to the highest version not having a qualifier which matches the specification.
- Versions with qualifiers are matched only by exact match.
Example: Given a component with id `a` having these versions: `[1.1, 1.2, 1.2, 1.3.test, 2.0]`
- The reference `a` will resolve to `a:2.0`
- The reference `a:1` will resolve to `a:1.2`
- The only way to resolve to the "test" qualified version is by using the exact reference `a:1.3.test`
- These references will not resolve: `a:1.3`, `a:3`, `1.2.3`
### Merging specifications for chained Components
In some cases, there is a need for merging multiple references into one. An example is inheritance of chains of version references, where multiple inherited chains may reference the same component.
Two version references are said to be _compatible_ if one is a prefix of the other. In this case the most specific version is used. If they are not compatible they are _conflicting_. Example:
```
bundle="the name in in your pom.xml"
bundle="the name in in your pom.xml"
bundle="the name in in your pom.xml"
```
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [Component Types](#component-types)
- [Component configurations](#component-configurations)
- [Injectable Components](#injectable-components)
- [Component Versioning](#component-versioning)
- [Format](#format)
- [Ordering](#ordering)
- [Referencing a versioned Component](#referencing-a-versioned-component)
- [Merging specifications for chained Components](#merging-specifications-for-chained-components)
---
# Source: https://docs.vespa.ai/en/schemas/concrete-documents.html.md
# Concrete documents
In [document processing](../applications/document-processors.html),`setFieldValue()` and `getFieldValue()`is used to access fields in a `Document`. The data for each of the fields in the document instance is wrapped in field values. If the documents use structs, they are handled the same way. Example:
```
book.setFieldValue("title", new StringFieldValue("Moby Dick"));
```
Alternatively, use code generation to get a _concrete document type_, a `Document` subclass that represents the exact document type (defined for example in the file `book.sd`). To generate, include it in the build, plugins section in _pom.xml_:
```
com.yahoo.vespa
vespa-documentgen-plugin
8.634.24
\etc/schemas\
document-gen
document-gen
```
`schemasDirectory` contains the[schemas](../reference/schemas/schemas.html). Generated classes will be in _target/generated-sources_. The document type `book` will be represented as the Java class `Book`, and it will have native methods for data access, so the code example above becomes:
```
book.setTitle("Moby Dick");
```
| Configuration | Description |
| --- | --- |
| Java package |
Specify the Java package of the generated types by using the following configuration:
```
com.yahoo.mypackage
```
|
| User provided annotation types |
To provide the Java implementation of a given annotation type, yielding _behaviour of annotations_ (implementing additional interfaces may be one scenario):
```
etc/schemas
NodeImpl
com.yahoo.vespa.document.NodeImpl
DocumentImpl
com.yahoo.vespa.document.DocumentImpl
```
Here, the plugin will not generate a type for `NodeImpl` and `DocumentImpl`, but the `ConcreteDocumentFactory` will support them, so that code depending on this will work. |
| Abstract annotation types |
Make a generated annotation type abstract:
```
myabstractannotationtype
```
|
## Inheritance
If input document types use single inheritance, the generated Java types will inherit accordingly. However, if a document type inherits from more than one type (example: `document myDoc inherits base1, base2`), the Java type for `myDoc` will just inherit from `Document`, since Java has single inheritance. Refer to [schema inheritance](inheritance-in-schemas.html) for examples.
## Feeding
Concrete types are often used in a docproc, used for feeding data into stateful clusters. To make Vespa use the correct type during feeding and serialization, include in `` in [services.xml](../reference/applications/services/services.html ):
```
in your pom.xml"class="com.yahoo.mypackage.Book"/>
```
Vespa will make the type `Book` and all other concrete document, annotation and struct types from the bundle available to the docproc(s) in the container. The specified bundle must be the `Bundle-SymbolicName`. It will also use the given Java type when feeding through a docproc chain. If the class is not in the specified bundle, the container will emit an error message about not being able to load`ConcreteDocumentFactory` as a component, and not start. There is no need to `Export-Package` the concrete document types from the bundle, a `package-info.java` is generated that does that.
## Factory and copy constructor
Along with the actual types, the Maven plugin will also generate a class `ConcreteDocumentFactory`, which holds information about the actual concrete types present. It can be used to initialize an object given the document type:
```
Book b = (Book) ConcreteDocumentFactory.getDocument("book", new DocumentId("id:book:book::0"));
```
This can be done for example during deserialization, when a document is created. The concrete types also have copy constructors that can take a generic`Document` object of the same type. The contents will be deep-copied:
```
Document bookGeneric;
// …
Book book = new Book(bookGeneric, bookGeneric.getId());
```
All the accessor and mutator methods on `Document` will work as expected on concrete types. Note that `getFieldValue()` will _generate_ an ad-hoc `FieldValue` _every time_, since concrete types don't use them to store data.`setFieldValue()` will pack the data into the native Java field of the type.
## Document processing
In a document processor, cast the incoming document base into the concrete document type before accessing it. Example:
```
public class ConcreteDocDocProc extends DocumentProcessor {
public Progress process(Processing processing) {
DocumentPut put = (DocumentPut) processing.getDocumentOperations().get(0);
Book b = (Book) (put.getDocument());
b.setTitle("The Title");
return Progress.DONE;
}
}
```
Concrete document types are not supported for document updates or removes.
Copyright © 2026 - [Cookie Preferences](#)
---
# Source: https://docs.vespa.ai/en/reference/applications/config-files.html.md
# Custom Configuration File Reference
This is the reference for config file definitions. It is useful for developing applications that has[configurable components](../../applications/configuring-components.html)for the [Vespa Container](../../applications/containers.html), where configuration for individual components may be provided by defining[``](#generic-configuration-in-services-xml)elements within the component's scope in services.xml.
## Config definition files
Config definition files are part of the source code of your application and have a _.def_ suffix. Each file defines and documents the content and semantics of one configuration type. Vespa's builtin _.def_ files are found in`$VESPA_HOME/share/vespa/configdefinitions/`.
### Package
Package is a mandatory statement that is used to define the package for the java class generated to represent the file. For [container component](../../applications/components.html) developers, it is recommended to use a separate package for each bundle that needs to export config classes, to avoid conflicts between bundles that contain configurable components. Package must be the first non-comment line, and can only contain lower-case characters and dots:
```
package=com.mydomain.mypackage
```
### Parameter names
Config definition files contain lines on the form:
```
parameterName type [default=value] [range=[min,max]]
```
camelCase in parameter names is recommended for readability.
### Parameter types
Supported types for variables in the _.def_ file:
| int |
32 bit signed integer value
|
| long |
64 bit signed integer value
|
| double |
64 bit IEEE float value
|
| enum |
Enumerated types. A set of strings representing the valid values for the parameter, e.g:
```
foo enum {BAR, BAZ, QUUX} default=BAR
```
|
| bool |
A boolean (true/false) value
|
| string |
A String value. Default values must be enclosed in quotation marks (" "), and any internal quotation marks must be escaped by backslash. Likewise, newlines must be escaped to `\n`
|
| path |
A path to a physical file or directory in the application package. This makes it possible to access files from the application package in container components. The path is relative to the root of the [application package](../../basics/applications.html). A path parameter cannot have a default value, but may be optional (using the _optional_ keyword after the type). An optional path does not have to be set, in which case it will be an empty value. The content will be available as a `java.nio.file.Path` instance when the component accessing this config is constructed, or an `Optional` if the _optional_ keyword is used.
|
| url |
Similar to `path`, an arbitrary URL of a file that should be downloaded and made available to container components. The file content will be available as a java.io.File instance when the component accessing this config is constructed. Note that if the file takes a long time to download, it will also take a long time for the container to come up with the configuration referencing it. See also the [note about changing contents for such a url](../../applications/configuring-components.html#adding-files-to-the-component-configuration).
|
| model |
A pointer to a machine-learned model. This can be a model-id, url or path, and multiple of these can be specified as a single config value, where one is used depending on the deployment environment:
- If a model-id is specified and the application is deployed on Vespa Cloud, the model-id is used.
- Otherwise, if a URL is specified, it is used.
- Otherwise, path is used.
You may also use remote URLs protected by bearer-token authentication by supplying the optional `secret-ref` attribute. See [using private Huggingface models](../rag/embedding.html#private-model-hub).
On the receiving side, this config value is simply represented as a file path regardless of how it is resolved. This makes it easy to refer to models in multiple ways such that the appropriate one is used depending on the context. The special syntax for setting these config values is documented in [adding files to the configuration](../../applications/configuring-components.html#adding-files-to-the-component-configuration). |
| reference |
A config id to another configuration (only for internal vespa usage)
|
### Structs
Structs are used to group a number of parameters that naturally belong together. A struct is declared by adding a '.' between the struct name and each member's name:
```
basicStruct.foo string
basicStruct.bar int
```
### Arrays
Arrays are declared by appending square brackets to the parameter name. Arrays can either contain simple values, or have children. Children can be simple parameters and/or structs and/or other arrays. Arbitrarily complex structures can be built to any depth. Examples:
```
intArr[] int # Integer value array
row[].column[] int # Array of integer value arrays
complexArr[].foo string # Complex array that contains
complexArr[].bar double # … two simple parameters
complexArr[].coord.x int # … and a struct called 'coord'
complexArr[].coord.y int
complexArr[].coord.depths[] double # … that contains a double array
```
Note that arrays cannot have default values, even for simple value arrays. An array that has children cannot contain simple values, and vice versa. In the example above, `intArr` and `row.column` could not have children, while `row` and `complexArr` are not allowed to contain values.
### Maps
Maps are declared by appending curly brackets to the parameter name. Arbitrarily complex structures are supported also here. Examples:
```
myMap{} int
complexMap{}.nestedMap{}.id int
complexMap{}.nestedMap{}.name string
```
## Generic configuration in services.xml
`services.xml`has four types of elements:
| individual service elements | (e.g. _searcher_, _handler_, _searchnode_) - creates a service, but has no child elements that create services |
| service group elements | (e.g. _content_, _container_, _document-processing_ - creates a group of services and can have all types of child elements |
| dedicated config elements | (e.g. _accesslog_) - configures a service or a group of services and can only have other dedicated config elements as children |
| generic config elements | always named _config_ |
Generic config elements can be added to most elements that lead to one or more services being created - i.e. service group elements and individual service elements. The config is then applied to all services created by that element and all descendant elements.
For example, by adding _config_ for _container_, the config will be applied to all container components in that cluster. Config at a deeper level has priority, so this config can be overridden for individual components by setting the same config values in e.g. _handler_ or _server_ elements.
Given the following config definition, let's say its name is `type-examples.def`:
```
package=com.mydomain
stringVal string
myArray[].name string
myArray[].type enum {T1, T2, T3} default=T1
myArray[].intArr[] int
myMap{} string
basicStruct.foo string
basicStruct.bar int default=0 range=[-100,100]
boolVal bool
myFile path
myUrl url
myOptionalPath path optional
```
To set all the values for this config in `services.xml`, add the following xml at the desired element (the name should be _\.\_):
```
val
-
elem_0
T2
- 0
- 1
-
elem_1
T3
- 0
- 1
- val1
- val2
str
3
true
components/file1.txt
https://docs.vespa.ai/en/reference/query-api-reference.html
```
Note that each '.' in the parameter's definition corresponds to a child element in the xml. It is not necessary to set values that already have a default in the _.def_ file, if you want to keep the default value. Hence, in the example above, `basicStruct.bar` and `myArray[].type`could have been omitted in the xml without generating any errors when deploying the application.
### Configuring arrays
Assigning values to _arrays_ is done by using the `- ` element. This ensures that the given config values do not overwrite any existing array elements from higher-level xml elements in services, or from Vespa itself.
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [Config definition files](#config-definition-files)
- [Package](#package)
- [Parameter names](#parameter-names)
- [Parameter types](#parameter-types)
- [Structs](#structs)
- [Arrays](#arrays)
- [Maps](#maps)
- [Generic configuration in services.xml](#generic-configuration-in-services-xml)
- [Configuring arrays](#configuring-arrays)
---
# Source: https://docs.vespa.ai/en/operations/self-managed/config-proxy.html.md
# Configuration proxy
Read [application packages](../../basics/applications.html) for an overview of the cloud config system. The _config proxy_ runs on every Vespa node. It has a set of config sources, defined in [VESPA\_CONFIGSERVERS](files-processes-and-ports.html#environment-variables).
The config proxy will act as a proxy for config clients on the same machine, so that all clients can ask for config on _localhost:19090_. The _config source_ that the config proxy uses is set in [VESPA\_CONFIGSERVERS](files-processes-and-ports.html#environment-variables) and consists of one or more config sources (the addresses of [config servers](configuration-server.html)).
The proxy has a memory cache that is used to serve configs if it is possible. In default mode, the proxy will have an outstanding request to the config server that will return when the config has changed (a new generation of config). This means that every time config changes on the config server, the proxy will get a response, update its cache and respond to all its clients with the changed config.
The config proxy has two modes:
| Mode | Description |
| --- | --- |
| default | Gets config from server and stores in memory cache. The config proxy will always be started in _default_ mode. Serves from cache if possible. Always uses a config source. If restarted, it will lose all configs that were cached in memory. |
| memorycache | Serves config from memory cache only. Never uses a config source. A restart will lose all cached configs. Setting the mode to _memorycache_ will make all applications on the node work as before (given that they have previously been running and requested config), since the config proxy will serve config from cache and work without connection to any config server. Applications on this node will not work if the config proxy stops, is restarted or crashes. |
Use [vespa-configproxy-cmd](../../reference/operations/self-managed/tools.html#vespa-configproxy-cmd)to inspect cached configs, mode, config sources etc., there are also some commands to change some of the settings. Run the command as:
```
$ vespa-configproxy-cmd -m
```
to see all possible commands.
## Detaching from config servers
```
$ vespa-configproxy-cmd -m setmode memorycache
```
## Inspecting config
To inspect the configuration for a service, in this example a searchnode (proton) instance, do:
1. Find the active config generation used by the service, using [/state/v1/config](../../reference/api/state-v1.html#state-v1-config) - example for _http://localhost:19110/state/v1/config_, here the generation is 2:
```
```
{
"config": {
"generation": 2,
"proton": {
"generation": 2
},
"proton.documentdb.music": {
"generation": 2
}
}
}
```
```
2. Find the relevant _config definition name_, _config id_ and _config generation_ using [vespa-configproxy-cmd](../../reference/operations/self-managed/tools.html#vespa-configproxy-cmd) - e.g.:
```
$ vespa-configproxy-cmd | grep protonvespa.config.search.core.proton,music/search/cluster.music/0,2,MD5:40087d6195cedb1840721b55eb333735,XXHASH64:43829e79cea8e714
```
`vespa.config.search.core.proton` is the _config definition name_ for this particular config, `music/search/cluster.music/0` is the _config id_ used by the proton service instance on this node and `2` is the active config generation. This means, the service is using the correct config generation as it is matching the /state/v1/config response (a restart can be required for some config changes).
3. Get the generated config using [vespa-get-config](../../reference/operations/self-managed/tools.html#vespa-get-config) - e.g.:
```
$ vespa-get-config -n vespa.config.search.core.proton -i music/search/cluster.music/0
basedir "/opt/vespa/var/db/vespa/search/cluster.music/n0"
rpcport 19106
httpport 19110
...
```
**Important:** Omitting `-i` will return the default configuration, meaning not generated for the active service instance.
Copyright © 2026 - [Cookie Preferences](#)
---
# Source: https://docs.vespa.ai/en/operations/self-managed/config-sentinel.html.md
# Config sentinel
The config sentinel starts and stops services - and restart failed services unless they are manually stopped. All nodes in a Vespa system have at least these running processes:
| Process | Description |
| --- | --- |
| [config-proxy](config-proxy.html) | Proxies config requests between Vespa applications and the configserver node. All configuration is cached locally so that this node can maintain its current configuration, even if the configserver shuts down. |
| config-sentinel | Registers itself with the _config-proxy_ and subscribes to and enforces node configuration, meaning the configuration of what services should be run locally, and with what parameters. |
| [vespa-logd](../../reference/operations/log-files.html#logd) | Monitors _$VESPA\_HOME/logs/vespa/vespa.log_, which is used by all other services, and relays everything to the [log-server](../../reference/operations/log-files.html#log-server). |
| [metrics-proxy](monitoring.html#metrics-proxy) | Provides APIs for metrics access to all nodes and services. |

Start sequence:
1. _config server(s)_ are started and application config is deployed to them - see [config server operations](configuration-server.html).
2. _config-proxy_ is started. The environment variables [VESPA\_CONFIGSERVERS](files-processes-and-ports.html#environment-variables) and [VESPA\_CONFIGSERVER\_RPC\_PORT](files-processes-and-ports.html#environment-variables) are used to connect to the [config-server(s)](configuration-server.html). It will retry all config servers in case some are down.
3. _config-sentinel_ is started, and subscribes to node configuration (i.e. a service list) from _config-proxy_ using its hostname as the [config id](../../applications/configapi-dev.html#config-id). See [Node and network setup](node-setup.html) for details about how the hostname is detected and how to override it. The config for the config-sentinel (the service list) lists the processes to be started, along with the _config id_ to assign to each, typically the logical name of that service instance.
4. _config-proxy_ subscribes to node configuration from _config-server_, caches it, and returns the result to _config-sentinel_
5. _config-sentinel_ starts the services given in the node configuration, with the config id as argument. See example output below, like _id="search/qrservers/qrserver.0"_. _logd_ and _metrics-proxy_ are always started, regardless of configuration. Each service:
1. Subscribes to configuration from _config-proxy_.
2. _config-proxy_ subscribes to configuration from _config-server_, caches it and returns result to the service.
3. The service runs according to its configuration, logging to _$VESPA\_HOME/logs/vespa/vespa.log_. The processes instantiate internal components, each assigned the same or another config id, and instantiating further components.
Also see [cluster startup](#cluster-startup) for a minimum nodes-up start setting.
When new config is deployed to _config-servers_ they propagate the changed configuration to nodes subscribing to it. In turn, these nodes reconfigure themselves accordingly.
## User interface
The config sentinel runs an RPC service which can be used to list, start and stop the services supposed to run on that node. This can be useful for testing and debugging. Use [vespa-sentinel-cmd](../../reference/operations/self-managed/tools.html#vespa-sentinel-cmd) to trigger these actions. Example output from `vespa-sentinel-cmd list`:
```
vespa-sentinel-cmd 'sentinel.ls' OK.
container state=RUNNING mode=AUTO pid=27993 exitstatus=0 id="default/container.0"
container-clustercontroller state=RUNNING mode=AUTO pid=27997 exitstatus=0 id="admin/cluster-controllers/0"
distributor state=RUNNING mode=AUTO pid=27996 exitstatus=0 id="search/distributor/0"
logd state=RUNNING mode=AUTO pid=5751 exitstatus=0 id="hosts/r6-3/logd"
logserver state=RUNNING mode=AUTO pid=27994 exitstatus=0 id="admin/logserver"
searchnode state=RUNNING mode=AUTO pid=27995 exitstatus=0 id="search/search/cluster.search/0"
slobrok state=RUNNING mode=AUTO pid=28000 exitstatus=0 id="admin/slobrok.0"
```
To learn more about the processes and services, see [files and processes](files-processes-and-ports.html). Use [vespa-model-inspect host _hostname_](../../reference/operations/self-managed/tools.html#vespa-model-inspect) to list services running on a node.
## Cluster startup
The config sentinel will not start services on a node unless it has connectivity to a minimum of other nodes, default 50%. Find an example of this feature in the [multinode-HA](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode-HA#start-the-admin-server) example application. Example configuration:
```
```
20
1
```
```
Example: `minOkPercent 10` means that services will be started only if more than or equal to 10% of nodes are up. If there are 11 nodes in the application, the first node started will not start its services - when the second node is started, services will be started on both.
`maxBadCount` is for connectivity checks where the other node is up, but we still do not have proper two-way connectivity. Normally, one-way connectivity means network configuration is broken and needs looking into, so this may be set low (1 or even 0 are the recommended values). If there are some temporary problems (in the example below non-responding DNS which leads to various issues at startup) the config sentinel will loop and retry, so the service startup will just be slightly delayed.
Example log:
```
[2021-06-15 14:33:25] EVENT : starting/1 name="sbin/vespa-config-sentinel -c hosts/le40808.ostk (pid 867)"
[2021-06-15 14:33:25] EVENT : started/1 name="config-sentinel"
[2021-06-15 14:33:25] CONFIG : Sentinel got 4 service elements [tenant(footest), application(bartest), instance(default)] for config generation 1001
[2021-06-15 14:33:25] CONFIG : Booting sentinel 'hosts/le40808.ostk' with [stateserver port 19098] and [rpc port 19097]
[2021-06-15 14:33:25] CONFIG : listening on port 19097
[2021-06-15 14:33:25] CONFIG : Sentinel got model info [version 7.420.21] for 35 hosts [config generation 1001]
[2021-06-15 14:33:25] CONFIG : connectivity.maxBadCount = 3
[2021-06-15 14:33:25] CONFIG : connectivity.minOkPercent = 40
[2021-06-15 14:33:28] INFO : Connectivity check details: 2086533.ostk -> OK: both ways connectivity verified
[2021-06-15 14:33:28] INFO : Connectivity check details: le01287.ostk -> OK: both ways connectivity verified
[2021-06-15 14:33:28] INFO : Connectivity check details: le23256.ostk -> OK: both ways connectivity verified
[2021-06-15 14:33:28] INFO : Connectivity check details: le23267.ostk -> OK: both ways connectivity verified
[2021-06-15 14:33:28] INFO : Connectivity check details: le23297.ostk -> OK: both ways connectivity verified
[2021-06-15 14:33:28] INFO : Connectivity check details: le23312.ostk -> connect OK, but reverse check FAILED
[2021-06-15 14:33:28] INFO : Connectivity check details: le23317.ostk -> OK: both ways connectivity verified
[2021-06-15 14:33:28] INFO : Connectivity check details: le23319.ostk -> connect OK, but reverse check FAILED
[2021-06-15 14:33:28] INFO : Connectivity check details: le30550.ostk -> OK: both ways connectivity verified
[2021-06-15 14:33:28] INFO : Connectivity check details: le30553.ostk -> connect OK, but reverse check FAILED
[2021-06-15 14:33:28] INFO : Connectivity check details: le30556.ostk -> unreachable from me, but up
[2021-06-15 14:33:28] INFO : Connectivity check details: le30560.ostk -> OK: both ways connectivity verified
[2021-06-15 14:33:28] INFO : Connectivity check details: le30567.ostk -> OK: both ways connectivity verified
[2021-06-15 14:33:28] INFO : Connectivity check details: le40387.ostk -> OK: both ways connectivity verified
[2021-06-15 14:33:28] INFO : Connectivity check details: le40389.ostk -> OK: both ways connectivity verified
[2021-06-15 14:33:28] INFO : Connectivity check details: le40808.ostk -> OK: both ways connectivity verified
[2021-06-15 14:33:28] INFO : Connectivity check details: le40817.ostk -> OK: both ways connectivity verified
[2021-06-15 14:33:28] INFO : Connectivity check details: le40833.ostk -> OK: both ways connectivity verified
[2021-06-15 14:33:28] INFO : Connectivity check details: le40834.ostk -> unreachable from me, but up
[2021-06-15 14:33:28] INFO : Connectivity check details: le40841.ostk -> connect OK, but reverse check FAILED
[2021-06-15 14:33:28] INFO : Connectivity check details: le40858.ostk -> OK: both ways connectivity verified
[2021-06-15 14:33:28] INFO : Connectivity check details: le40860.ostk -> unreachable from me, but up
[2021-06-15 14:33:28] INFO : Connectivity check details: le40863.ostk -> connect OK, but reverse check FAILED
[2021-06-15 14:33:28] INFO : Connectivity check details: le40873.ostk -> OK: both ways connectivity verified
[2021-06-15 14:33:28] INFO : Connectivity check details: le40892.ostk -> OK: both ways connectivity verified
[2021-06-15 14:33:28] INFO : Connectivity check details: le40900.ostk -> OK: both ways connectivity verified
[2021-06-15 14:33:28] INFO : Connectivity check details: le40905.ostk -> OK: both ways connectivity verified
[2021-06-15 14:33:28] INFO : Connectivity check details: le40914.ostk -> OK: both ways connectivity verified
[2021-06-15 14:33:28] INFO : Connectivity check details: sm02318.ostk -> OK: both ways connectivity verified
[2021-06-15 14:33:28] INFO : Connectivity check details: sm02324.ostk -> OK: both ways connectivity verified
[2021-06-15 14:33:28] INFO : Connectivity check details: sm02340.ostk -> OK: both ways connectivity verified
[2021-06-15 14:33:28] INFO : Connectivity check details: zt40672.ostk -> OK: both ways connectivity verified
[2021-06-15 14:33:28] INFO : Connectivity check details: zt40712.ostk -> OK: both ways connectivity verified
[2021-06-15 14:33:28] INFO : Connectivity check details: zt40728.ostk -> OK: both ways connectivity verified
[2021-06-15 14:33:28] INFO : Connectivity check details: zt41329.ostk -> OK: both ways connectivity verified
[2021-06-15 14:33:28] WARNING : 8 of 35 nodes up but with network connectivity problems (max is 3)
[2021-06-15 14:33:28] WARNING : Bad network connectivity (try 1)
[2021-06-15 14:33:30] WARNING : slow resolve time: 'le30556.ostk' -> '1234:5678:90:123::abcd' (5.00528 s)
[2021-06-15 14:33:30] WARNING : slow resolve time: 'le40834.ostk' -> '1234:5678:90:456::efab' (5.00527 s)
[2021-06-15 14:33:30] WARNING : slow resolve time: 'le40860.ostk' -> '1234:5678:90:789::cdef' (5.00459 s)
[2021-06-15 14:33:31] INFO : Connectivity check details: le23312.ostk -> OK: both ways connectivity verified
[2021-06-15 14:33:31] INFO : Connectivity check details: le23319.ostk -> OK: both ways connectivity verified
[2021-06-15 14:33:31] INFO : Connectivity check details: le30553.ostk -> OK: both ways connectivity verified
[2021-06-15 14:33:31] INFO : Connectivity check details: le30556.ostk -> OK: both ways connectivity verified
[2021-06-15 14:33:31] INFO : Connectivity check details: le40834.ostk -> OK: both ways connectivity verified
[2021-06-15 14:33:31] INFO : Connectivity check details: le40841.ostk -> OK: both ways connectivity verified
[2021-06-15 14:33:31] INFO : Connectivity check details: le40860.ostk -> connect OK, but reverse check FAILED
[2021-06-15 14:33:31] INFO : Connectivity check details: le40863.ostk -> OK: both ways connectivity verified
[2021-06-15 14:33:31] INFO : Enough connectivity checks OK, proceeding with service startup
[2021-06-15 14:33:31] EVENT : starting/1 name="searchnode"
...
```
Copyright © 2026 - [Cookie Preferences](#)
---
# Source: https://docs.vespa.ai/en/applications/config-system.html.md
# The Config System
The config system in Vespa is responsible for turning the application package into live configuration of all the nodes, processes and components that realizes the running system. Here we deep dive into various aspects of how this works.
## Node configuration
The problem of configuring nodes can be divided into three parts, each addressed by different solutions:
- **Node system level configuration:** Configure OS level settings such as time zone as well as user privileges on the node.
- **Package management**: Ensure that the correct set of software packages is installed on the nodes. This functionality is provided by three tools working together.
- **Vespa configuration:** Starts the configured set of processes on each node with their configured startup parameters and provides dynamic configuration to the modules run by these services. _Configuration_ here is any data which:
- can not be fixed at compile time
- is static most of the time
Note that by these definitions, this allows all the nodes to have the same software packages (disregarding version differences, discussed later), as variations in what services are run on each node and in their behavior is achieved entirely by using Vespa Configuration. This allows managing the complexity of node variations completely within the configuration system, rather than across multiple systems.
Configuring a system can be divided into:
- **Configuration assembly:** Assembly of a complete set of configurations for delivery from the inputs provided by the parties involved in configuring the system
- **Configuration delivery:** Definition of individual configurations, APIs for requesting and accessing configuration, and the mechanism for delivering configurations from their source to the receiving components
This division allows the problem of reliable configuration delivery in large distributed systems to be addressed in configuration delivery, while the complexities of assembling complete configurations can be treated as a vm-local design problem.
An important feature of Vespa Configuration is the nature of the interface between the delivery and assembly subsystems. The assembly subsystem creates as output a (Java) object model of the distributed system. The delivery subsystem queries this model to obtain concrete configurations of all the components of the system. This allows the assembly subsystem to accept higher level, and simpler to use, abstractions as input and automatically derive detailed configurations with the correct interdependencies. This division insulates the external interface and the components being configured from changes in each other. In addition, the system model provides the home for logic implementing node/component instance variations of configuration.
## Configuration assembly
Config assembly is the process of turning the configuration input sources into an object model of the desired system, which can respond to queries for configs given a name and config id. Config assembly for Vespa systems can become complex, because it involves merging information owned by multiple parties:
- **Vespa operations** own the nodes and controls assignment of nodes to services/applications
- **Vespa service providers** own services which hosts multiple applications running on Vespa
- **Vespa applications** define the final applications running on nodes and shared services
The current config model assembly procedure uses a single source - the _application package_. The application package is a directory structure containing defined files and subdirectories which together completely defines the system - including which nodes belong in the system, which services they should run and the configuration of these services and their components. When the application deployer wants to change the application,[vespa prepare](../reference/clients/vespa-cli/vespa_prepare) is issued to a config server, with the application package as argument.
At this point the system model is assembled and validated and any feedback is issued to the deployer. If the deployer decides to make the new configuration active, a [vespa activate](../reference/clients/vespa-cli/vespa_activate) is then issued, causing the config server cluster to switch to the new system model and respond with new configs on any active subscriptions where the new system model caused the config to change. This ensures that subscribers gets new configs timely on changes, and that the changes propagated are the minimal set such that small changes to an application package causes correspondingly small changes to the system.

The config model itself is pluggable, so that service providers may write plugins for assembling a particular service. The plugins are written in Java, and is installed together with the Vespa Configuration. Service plugins define their own syntax for specifying services that may be configured by Vespa applications. This allows the applications to be specified in an abstract manner, decoupled from the configuration that is delivered to the components.
## Configuration delivery
Configuration delivery encompasses the following aspects:
- Definition of configurations
- The component view (API) of configuration
- Configuration delivery mechanism
These aspects work together to realize the following goals:
- Eliminate inconsistency between code and configuration.
- Eliminate inconsistency between the desired configuration and the state on each node.
- Limit temporary inconsistencies after reconfiguration.
The next three subsections discusses the three aspects above, followed by subsections on two special concerns - bootstrapping and system upgrades.
### Configuration definitions
A _configuration_ is a set of simple or array key-values with a name and a type, which can possibly be nested - example:
```
myProperty "myvalue"
myArray[1]
myArray[0].key1 "someValue"
myArray[0].key2 1337
```
The _type definition_ (or class) of a configuration object defines and documents the set of fields a configuration may contain with their types and default values. It has a name as well as a namespace. For example, the above config instance may have this definition:
```
namespace=foo.bar
# Documentation of this key
myProperty string default="foo"
# etc.
myArray[].key1 string
myArray[].key2 int default=0
```
An individual config typically contains a coherent set of settings regarding some topic, such as _logging_ or _indexing_. A complete system consists of many instances of many config types.
### Component view
Individual components of a system consumes one or more such configs and use their values to influence their behavior. APIs are needed for _requesting_ configs and for _accessing_ the values of those configs as they are provided.
_Access_ to configs happens through a (Java or C++) class generated from the config definition file. This ensures that any inconsistency between the fields declared in a config type and the expectations of the code accessing it are caught at compile time. The config definition is best viewed as another class with an alternative form of source syntax belonging to the components consuming it. A Maven target is provided for generating such classes from config definition types.
Components may use two different methods for _requesting_ configurations subscription and dependency injection:
**Subscription:** The component sets up_ConfigSubscriber_, then subscribes to one or more configs. This is the simple approach, there are [other ways of](configapi-dev.html)getting configs too:
```
```
ConfigSubscriber subscriber = new ConfigSubscriber();
ConfigHandle handle = subscriber.subscribe(MyConfig.class, "myId");
if (!subscriber.nextConfig()) throw new RuntimeException("Config timed out.");
if (handle.isChanged()) {
String message = handle.getConfig().myKey();
// ... consume the rest of this config
}
```
```
**Dependency injection:** The component declares its config dependencies in the constructor and subscriptions are set up on its behalf. When changed configs are available a new instance of the component is created. The advantage of this method is that configs are immutable throughout the lifetime of the component such that no thread coordination is required. This method is currently only available in Java using the [Container](containers.html).
```
```
public MyComponent(MyConfig config) {
String myKey = config.myKey();
// ... consume the rest of this config
}
```
```
For unit testing,[configs can be created with Builders](configapi-dev.html#unit-testing), submitted directly to components.
### Delivery mechanism
The config delivery mechanism is responsible for ensuring that a new config instance is delivered to subscribing components, each time there is a change to the system model causing that config instance to change. A config subscription is identified by two parameters, the _config definition name and namespace_and the [config id](configapi-dev.html#config-id)used to identify the particular component instance making the subscription.
The in-process config library will forward these subscription requests to a node local[config proxy](../operations/self-managed/config-proxy.html), which provides caching and fan-in from processes to node. The proxy in turn issues these subscriptions to a node in the configuration server cluster, each of which hosts a copy of the system model and resolves config requests by querying the system model.
To provide config server failover, the config subscriptions are implemented as long-timeout gets, which are immediately resent when they time out, but conceptually this is best understood as push subscriptions:

As configs are not stored as files locally on the nodes, there is no possibility of inconsistencies due to local edits, or of nodes coming out of maintenance with a stale configuration. As configuration changes are pushed as soon as the config server cluster allows, time inconsistencies during reconfigurations are minimized, although not avoided as there is no global transaction.
Application code and config is generally pulled from the config server - it is however possible to use the [url](../reference/applications/config-files.html#url)config type to refer to any resource to download to nodes.
### Bootstrapping
Each Vespa node runs a [config-sentinel](../operations/self-managed/config-sentinel.html) process which start and maintains services run on a node.
### System upgrades
The configuration server will up/downgrade between config versions on the fly on minor upgrades which causes discrepancies between the config definitions requested from those produced by the configuration model. Major upgrades, which involve incompatible changes to the configuration protocol or the system model, require a [procedure](../operations/self-managed/config-proxy.html).
## Notes
Find more information for using the Vespa config API in the[reference doc](configapi-dev.html).
Vespa Configuration makes the following assumptions about the nodes using it:
- All nodes have the software packages needed to run the configuration system and any services which will be configured to run on the node. This usually means that all nodes have the same software, although this is not a requirement
- All nodes have [VESPA\_CONFIGSERVERS](../operations/self-managed/files-processes-and-ports.html#environment-variables) set
- All nodes know their fully qualified domain name
Reading this document is not necessary in order to use Vespa or to develop Java components for the Vespa container - for this purpose, refer to[Configuring components](configuring-components.html).
## Further reads
- [Configuration server operations](../operations/self-managed/configuration-server.html) is a good resource for troubleshooting.
- Refer to the [bundle plugin](bundles.html#maven-bundle-plugin) for how to build an application package with Java components.
- During development on a local instance it can be handy to just wipe the state completely and start over:
1. [Delete all config server state](../operations/self-managed/configuration-server.html#zookeeper-recovery) on all config servers
2. Run [vespa-remove-index](../reference/operations/self-managed/tools.html#vespa-remove-index) to wipe content nodes
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [Node configuration](#node-configuration)
- [Configuration assembly](#configuration-assembly)
- [Configuration delivery](#configuration-delivery)
- [Configuration definitions](#configuration-definitions)
- [Component view](#component-view)
- [Delivery mechanism](#delivery-mechanism)
- [Bootstrapping](#bootstrapping)
- [System upgrades](#upgrades)
- [Notes](#notes)
- [Further reads](#further-reads)
---
# Source: https://docs.vespa.ai/en/reference/api/config-v2.html.md
# Config API
Vespa provides a REST API for listing and retrieving config - alternatives are the [programmatic Java API](../../applications/configapi-dev.html#the-java-config-api). The Config API provides a way to inspect and retrieve all the config that can be generated by the config model for a given [tenant's active application](deploy-v2.html). Some, but not necessarily all, of those configs are used by services by [subscribing](../../applications/configapi-dev.html) to them.
The response format is JSON. The current API version is 2. All config servers provide the REST API. The API port is 19071 - use [vespa-model-inspect](../operations/self-managed/tools.html#vespa-model-inspect) service configserver to find config server hosts. Example: `http://myconfigserver.mydomain.com:19071/config/v2/tenant/msbe/application/articlesearch/`
The API is available after an application has been [deployed and activated](../../basics/applications.html#deploying-applications).
## The application id
The API provides two ways to identify your application, given a tenant: one using only an application name, and one using application name, environment, region and instance. For the former, "short" form, a default environment, region and instance is used.
More formally, an _application id_ is a tuple of the form (_application_, _environment_, _region_, _instance_). The system currently provides shorthand to the id (_application_, "default", "default", "default").
Note: Multiple environments, regions and instances are not currently supported for application deployments, _default_ is always used.
Example URL using only application name: `http://myconfigserver.mydomain.com:19071/config/v2/tenant/media/application/articlesearch/media.config.server-list/clusters/0`
| Part | Description |
| --- | --- |
| media | Tenant |
| articlesearch | Application |
| media.config | Namespace of the requested config |
| server-list | Name of the requested config |
| clusters/0 | Config id of the requested config |
Example URL using full application id:`http://myconfigserver.mydomain.com:19071/config/v2/tenant/media/application/articlesearch/environment/test/region/us/instance/staging/media.config.server-list/clusters/0`
| Part | Description |
| --- | --- |
| media | Tenant |
| articlesearch | Name of the application |
| test | Environment |
| us | Region |
| staging | Instance |
| media.config | Namespace of the requested config |
| server-list | Name of the requested config |
| clusters/0 | Config id of the requested config |
In this API specification, the short form of the application id, i.e. only the application name, is used. The tenant `mytenant` and the application name `myapplication` is used throughout in examples.
## GET /config/v2/tenant/mytenant/application/myapplication/
List the configs in the model, as [config id](../../applications/configapi-dev.html#config-id) specific URLs.
| Parameters |
| Parameter | Default | Description |
| --- | --- | --- |
| recursive | false | If true, include each config id in the model which produces the config, and list only the links to the config payload. If false, include the first level of the config ids in the listing of new list URLs, as explained above. |
|
| Request body | None |
| Response |
A list response includes two arrays:
- List-links to descend one level down in the config id hierarchy, named `children`.
- [Config payload](#payload) links for the current (top) level, named `configs`.
|
| Error Response |
N/A
|
Examples:
`GET /config/v2/tenant/mytenant/application/myapplication/`
```
```
{
"children": [
"http://myconfigserver.mydomain.com:19071/config/v2/tenant/mytenant/application/myapplication/config.sentinel/myconfigserver.mydomain.com/",
"http://myconfigserver.mydomain.com:19071/config/v2/tenant/mytenant/application/myapplication/config.sentinel/hosts/",
"http://myconfigserver.mydomain.com:19071/config/v2/tenant/mytenant/application/myapplication/config.model/admin/",
"http://myconfigserver.mydomain.com:19071/config/v2/tenant/mytenant/application/myapplication/container.components/search/"
],
"configs": [
"http://myconfigserver.mydomain.com:19071/config/v2/tenant/mytenant/application/myapplication/config.sentinel",
"http://myconfigserver.mydomain.com:19071/config/v2/tenant/mytenant/application/myapplication/config.model",
"http://myconfigserver.mydomain.com:19071/config/v2/tenant/mytenant/application/myapplication/container.components"
]
```
```
`GET /config/v2/tenant/mytenant/application/myapplication/?recursive=true`
```
```
{
"configs": [
"http://myconfigserver.mydomain.com:19071/config/v2/tenant/mytenant/application/myapplication/config.sentinel/myconfigserver.mydomain.com",
"http://myconfigserver.mydomain.com:19071/config/v2/tenant/mytenant/application/myapplication/config.sentinel/hosts/myconfigserver.mydomain.com"
```
```
## GET /config/v2/tenant/mytenant/application/myapplication/[namespace.name]/
| Parameters |
Same as above.
|
| Request body | None |
| Response |
List the configs in the model with the given namespace and name. List semantics as above.
|
| Error Response |
404 if the given namespace.name is not known to the config model.
|
Examples:
`GET /config/v2/tenant/mytenant/application/myapplication/vespaclient.config.feeder/`
```
```
{
"children": [
"http://myconfigserver.mydomain.com:19071/config/v2/tenant/mytenant/application/myapplication/vespaclient.config.feeder/search/",
"http://myconfigserver.mydomain.com:19071/config/v2/tenant/mytenant/application/myapplication/vespaclient.config.feeder/clients/",
"http://myconfigserver.mydomain.com:19071/config/v1/vespaclient.config.feeder/docproc/"
]
"configs": [
"http://myconfigserver.mydomain.com:19071/config/v2/tenant/mytenant/application/myapplication/vespaclient.config.feeder",
]
}
```
```
`GET /config/v2/tenant/mytenant/application/myapplication/vespaclient.config.feeder/?recursive=true`
```
```
{
"configs": [
"http://myconfigserver.mydomain.com:19071/config/v2/tenant/mytenant/application/myapplication/vespaclient.config.feeder/search/qrsclusters/default",
"http://myconfigserver.mydomain.com:19071/config/v2/tenant/mytenant/application/myapplication/vespaclient.config.feeder/clients/gateways",
"http://myconfigserver.mydomain.com:19071/config/v2/tenant/mytenant/application/myapplication/vespaclient.config.feeder/clients/gateways/gateway",
```
```
## GET /config/v2/tenant/mytenant/application/myapplication/[namespace.name]/[config/subid]/
| Parameters |
Same as above.
|
| Request body | None |
| Response |
List the configs in the model with the given namespace and name, and for which the given config id segment is a prefix.
|
| Error Response |
- 404 if the given namespace.name is not known to the config model.
- 404 if the given config id is not in the model.
|
Examples:
`GET /config/v2/tenant/mytenant/application/myapplication/vespaclient.config.feeder/search/`
```
```
{
"children": [
"http://myconfigserver.mydomain.com:19071/config/v2/tenant/mytenant/application/myapplication/vespaclient.config.feeder/search/qrsclusters/"
]
"configs": [
"http://myconfigserver.mydomain.com:19071/config/v2/tenant/mytenant/application/myapplication/vespaclient.config.feeder/search"
]
}
```
```
`GET /config/v2/tenant/mytenant/application/myapplication/vespaclient.config.feeder/search/?recursive=true`
```
```
{
"configs": [
"http://myhost.mydomain.com:19071/config/v2/tenant/mytenant/application/myapplication/vespaclient.config.feeder/search/qrsclusters/default"
]
}
```
```
## GET /config/v2/tenant/mytenant/application/myapplication/[namespace.name]/[config/id]
| Parameters |
None
|
| Request body | None |
| Response |
Returns the config payload of the given `namespace.name/config/id`, formatted as JSON.
|
| Error Response |
Same as above.
|
Example:
`GET /config/v2/tenant/mytenant/application/myapplication/container.core.container-http/search/qrsclusters/default/qrserver.0`
```
```
{
"enabled": "true",
"requestbuffersize": "65536",
"port": {
"search": "8080",
"host": ""
},
"fileserver": {
"throughsearch": "true"
}
}
```
```
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [The application id](#application-id)
- [GET /config/v2/tenant/mytenant/application/myapplication/](#list-configs)
- [GET /config/v2/tenant/mytenant/application/myapplication/[namespace.name]/](#list-namespace)
- [GET /config/v2/tenant/mytenant/application/myapplication/[namespace.name]/[config/subid]/](#list-prefix)
- [GET /config/v2/tenant/mytenant/application/myapplication/[namespace.name]/[config/id]](#payload)
---
# Source: https://docs.vespa.ai/en/applications/configapi-dev.html.md
# Cloud Config API
This document describes how to use the C++ and Java versions of the Cloud config API (the 'config API'). This API is used internally in Vespa, and reading this document is not necessary in order to use Vespa or to develop Java components for the Vespa container. For this purpose, please refer to[Configuring components](configuring-components.html) instead.
Throughout this document, we will use as example an application serving up a configurable message.
## Creating a Config Definition
The first thing to do when deciding to use the config API is to define the config you want to use in your application. This is described in the[configuration file reference](../reference/applications/config-files.html). Here we will use the definition `motd.def`from the complete example at the end of the document:
```
namespace=myproject
message string default="NO MESSAGE"
port int default=1337
```
## Generating Source Code and Accessing Config in Code
Before you can access config in your program you will need to generate source code for the config definition. Simple steps for how you can generate API code and use the API are provided for[Java](#the-java-config-api). See also [javadoc](https://javadoc.io/doc/com.yahoo.vespa/config-lib))
We also recommend that you read the [general guidelines](#guidelines)for examples of advanced usage and recommendations for how to use the API.
## Config ID
The config id specified when requesting config is essentially an identifier of the component requesting config. The config server contains a config object model, which maps a request for a given config name and config id to the correct configproducer instance, which will merge default values from the config definition with config from the object model and config set in`services.xml` to produce the final config instance.
The config id is given to a service via the VESPA\_CONFIG\_ID environment variable. The [config sentinel](/en/operations/self-managed/config-sentinel.html)sets the environment variable to the id given by the config model. This id should then be used by the service to subscribe for config. If you are running multiple services, each of them will be assigned a **unique config id** for that service, and a service should not subscribe using any config id other than its own.
If you need to get config for a services that is not part of the model (i.e. it is not specified in the services.xml), but that you want to specify values for in services.xml, use the config id `client`.
## Schema Compatibility Rules
A schema incompatibility occurs if the config class (for example `MotdConfig` in the C++ and Java sections above) was built from a different def-file than the one the server is seeing and using to serve config. Some such incompatibilities are automatically handled by the config system, others lead to error. This is useful to know during development/testing of a config schema.
Let _S_ denote a config definition called _motd_ which the server is using, and _C_ denote a config definition also called _motd_ which the client is using, i.e. the one that created `MotdConfig` used when subscribing. The following is the system's behavior:
| Compatible Changes |
These schema mismatches are handled automatically by the configserver:
- C is missing a config value that S has: The server will omit that value from the response.
- C has an additional config value with a default value: The server will include that value in the response.
- C and S both have a config value, but the default values differ: The server will use C's default value.
|
| Incompatible Changes |
These schema mismatches are not handled by the config server, and will typically lead to error in the subscription API because of missing values (though in principle some consumers of config may tolerate them):
- C has an additional config value without a default value: The server will not include anything for that value.
- C has the type of a config value changed, for example from string to int: The server will print an error message, and not include anything for that value. The user must use an entirely new name for the config if such a change must be made.
|
As with any data schema, it is wise to be conservative about changing it if the system will have new versions in the future. For a `def` schema, removing a config value constitutes a semantic change that may lead to problems when an older version of some config subscriber asks for config. In large deployments, the risk associated with this increases, because of the higher cost of a full restart of everything.
Consequently, one should prefer creating a new config name, to removing a config value from a schema.
## Creating a Deployable Application Package
The application package consists of the following files:
```
app/services.xml
app/hosts.xml
```
The services file contains the services that is handled by the config model plugin. The hosts file contains:
```
```
node0
```
```
## Setting Up a Running System
To get a running system, first install the cloudconfig package, start the config server, then deploy the application: Prepare the application:
```
$ vespa prepare /path/to/app/folder
```
Activate the application:
```
$ vespa activate /path/to/app/folder
```
Then, start vespa. This will start the application and pass it its config id via the VESPA\_CONFIG\_ID environment variable.
## Advanced Usage of the Config API
For a simple application, having only 1 config may suffice. In a typical server application, however, the number of config settings can become large. Therefore, we **encourage** that you split the config settings into multiple logical classes. This section covers how you can use a ConfigSubscriber to subscribe to multiple configs and how you should group configs based on their dependencies. Configs can either be:
- Independent static configs
- Dependent static configs
- Dependent dynamic configs
We will give a few examples of how you can cope with these different scenarios. The code examples are given in a pseudo format common to C++ and Java, but they should be easy to convert to their language specific equivalents.
### Independent Static Configs
Independent configs means that it does not matter if one of them is updated independently of the other. In this case, you might as well use one ConfigSubscriber for each of the configs, but it might become tedious to check all of them. Therefore, the recommended way is to manage all of these configs using one ConfigSubscriber. In this setup, it is also typical to split the subscription phase from the config check/retrieval part. The subscribing part:
| C++ |
```
```
ConfigSubscriber subscriber;
ConfigHandle::UP fooHandle = subscriber.subscribe(...);
ConfigHandle::UP barHandle = subscriber.subscribe(...);
ConfigHandle::UP bazHandle = subscriber.subscribe(...);
```
```
|
| Java |
```
```
ConfigSubscriber subscriber;
ConfigHandle fooHandle = subscriber.subscribe(FooConfig.class, ...);
ConfigHandle barHandle = subscriber.subscribe(BarConfig.class, ...);
ConfigHandle bazHandle = subscriber.subscribe(BazConfig.class, ...);
```
```
|
And the retrieval part:
```
if (subscriber.nextConfig()) {
if (fooHandle->isChanged()) {
// Reconfigure foo
}
if (barHandle->isChanged()) {
// Reconfigure bar
}
if (bazHandle->isChanged()) {
// Reconfigure baz
}
}
```
This allows you to perform the config fetch part either in its own thread or as part of some other event thread in your application.
### Dependent Static Configs
Dependent configs means that one of your configs depends on the value in another config. The most common is that you have one config which contains the config id to use when subscribing to the second config. In addition, your system may need that the configs are updated to the same **generation**.
**Note:** A generation is a monotonically increasing number which is increased each time an application is deployed with `vespa deploy`. Certain applications may require that all configs are of the same generation to ensure consistency, especially container-like applications. All configs subscribed to by a ConfigSubscriber are guaranteed to be of the same generation.
The configs are static in the sense that the config id used does not change. The recommended way to approach this is to use a two phase setup, where you fetch the initial configs in the first phase, and then subscribe to both the initial and derived configs in order to ensure that they are of the same generation. Assume that the InitialConfig config contains two fields named _derived1_ and _derived2_:
| C++ |
```
```
ConfigSubscriber initialSubscriber;
ConfigHandle::UP initialHandle = subscriber.subscribe(...);
while (!subscriber.nextConfig()); // Ensure that we actually get initial config.
std::auto_ptr initialConfig = initialHandle->getConfig();
ConfigSubscriber subscriber;
... = subscriber.subscribe(...);
... = subscriber.subscribe(initialConfig->derived1);
... = subscriber.subscribe(initialConfig->derived1);
```
```
|
| Java |
```
```
ConfigSubscriber initialSubscriber;
ConfigHandle initialHandle = subscriber.subscribe(InitialConfig.class, ...);
while (!subscriber.nextConfig()); // Ensure that we actually get initial config.
InitialConfig initialConfig = initialHandle.getConfig();
ConfigSubscriber subscriber;
... = subscriber.subscribe(InitialConfig.class, ...);
... = subscriber.subscribe(DerivedConfig.class, initialConfig.derived1);
... = subscriber.subscribe(DerivedConfig.class, initialConfig.derived1);
```
```
|
You can then check the configs in the same way as for independent static configs, and be sure that all your configs are of the same generation. The reason why you need to create a new ConfigSubscriber is that **once you have called nextConfig(), you cannot add or remove new subscribers**.
### Dependent Dynamic Configs
Dynamic configs mean that the set of configs that you subscribe for may change between each deployment. This is the hardest case to solve, and how hard it is depends on how many levels of configs you have. The most common one is to have a set of bootstrap configs, and another set of configs that may change depending on the bootstrap configs (typically in an application that has plugins). To cover this case, you can use a class named `ConfigRetriever`. Currently, it is **only available in the C++ API**.
The ConfigRetriever uses the same mechanisms as the ConfigSubscriber to ensure that you get a consistent set of configs. In addition, two more classes called`ConfigKeySet` and `ConfigSnapshot` are added. The ConfigRetriever takes in a set of configs used to bootstrap the system in its constructor. This set does not change. It then provides one method, `getConfigs(ConfigKeySet)`. The method returns a ConfigSnapshot of the next generation of bootstrap configs or derived configs.
To create the ConfigRetriever, you must first populate a set of bootstrap configs:
```
```
ConfigKeySet bootstrapKeys;
bootstrapKeys.add(configId);
bootstrapKeys.add(configId);
```
```
The bootstrap configs are typically configs that will always be needed by your application. Once you have defined your set, you can create the retriever and fetch a ConfigSnapshot of the bootstrap configs:
```
ConfigRetriever retriever(bootstrapKeys);
ConfigSnapshot bootstrapConfigs = retriever.getConfigs();
```
The ConfigSnapshot contains the bootstrap config, and you may use that to fetch the individual configs. You need to provide the config id and the type in order for the snapshot to know which config to look for:
```
```
if (!bootstrapConfigs.empty()) {
std::auto_ptr bootstrapFoo = bootstrapConfigs.getConfig(configId);
std::auto_ptr bootstrapBar = bootstrapConfigs.getConfig(configId);
```
```
The snapshot returned is empty if the retriever was unable to get the configs. In that case, you can try calling the same method again.
Once you have the bootstrap configs, you know the config ids for the other components that you should subscribe for, and you can define a new key set. Let's assume that bootstrapFoo contains an array of config ids we should subscribe for.
```
```
ConfigKeySet pluginKeySet;
for (size_t i = 0; i < (*bootstrapFoo).pluginConfigId.size; i++) {
pluginKeySet.add((*bootstrapFoo).pluginConfigId[i]);
}
```
```
In this example we know the type of config requested, but this could be done in another way letting the plugin add keys to the set.
Now that the derived configs have been added to the pluginKeySet, we can request a snapshot of them:
```
ConfigSnapshot pluginConfigs = retriever.getConfigs(pluginKeySet);
if (!pluginConfigs.empty()) {
// Configure each plugin with a config picked from the snapshot.
}
```
And that's it. When calling the method without any key parameters, the snapshot returned by this method may be empty if **the config could not be fetched within the timeout**, or **the generation of configs has changed**. To check if you should call getBootstrapConfigs() again, you can use the `bootstrapRequired()` method. If it returns true, you will have to call getBootstrapConfigs() again, because the plugin configs have been updated, and you need a new bootstrap generation to match it. If it returns false, you may call getConfigs() again to try and get a new generation of plugin configs.
We recommend that you use the retriever API if you have a use case like this. The alternative is to create your own mechanism using two ConfigSubscriber classes, but this is **not** recommended.
### Advice on Config Modelling
Regardless of which of these types of configs you have, it is recommended that you always fetch all the configs you need**before** you start configuring your system. This is because the user may deploy multiple different version of the config that may cause your components to get conflicting config values. A common pitfall is to treat dependent configs as independent, thereby causing inconsistency in your application when a config update for config A arrives before config B. The ConfigSubscriber was created to minimize the possibility of making this mistake, by ensuring that all of the configs comes from the same config reload.
**Tip:** Set up your entire _tree_ of configs in one thread to ensure consistency, and configure your system once all of the configs have arrived. This also maps best to the ConfigSubscriber, since it is not thread safe.
## The Java config API
Assumption: a [def file](configapi-dev.html), which is the schema for one of your configs, is created and put in `src/main/resources/configdefinitions/`.
To generate source code for the def-file, invoke the`config-class-plugin` from _pom.xml_, in the ``, `` section:
```
```
com.yahoo.vespa
config-class-plugin
${vespa.version}
config-gen
config-gen
```
```
The generated classes will be saved to`target/generated-sources/vespa-configgen-plugin`, when the`generate-sources` phase of the build is executed. The def-file [`motd.def`](configapi-dev.html)is used in this tutorial, and a class called `MotdConfig`was generated (in the package `myproject`). It is a subtype of `ConfigInstance`.
When using only the config system (and not other parts of Vespa or the JDisc container), pull in that by using this in pom.xml:
```
```
com.yahoo.vespa
config
${vespa.version}
provided
```
```
## Subscribing and getting config
To retrieve the config in the application, create a `ConfigSubscriber`. A `ConfigSubscriber` is capable of subscribing to one or more configs. The example shown here uses simplified error handling:
```
```
ConfigSubscriber subscriber = new ConfigSubscriber();
ConfigHandle handle = subscriber.subscribe(MotdConfig.class, "motdserver2/0");
if (!subscriber.nextConfig()) throw new RuntimeException("Config timed out.");
if (handle.isChanged()) {
String message = handle.getConfig().message();
int port = handle.getConfig().port();
}
```
```
Note that `isChanged()` always will be true after the first call to `nextConfig()`, it is included here to illustrate the API.
In many cases one will do this from a thread which loops the`nextConfig()` call, and reconfigures your application if`isChanged()` is true.
The second parameter to `subscribe()`, _"motdserver2/0"_, is the [config id](configapi-dev.html#config-id).
If one `ConfigSubscriber` subscribes to multiple configs,`nextConfig()` will only return true if the configs are of the same generation, i.e. they are "in sync".
See the[com.yahoo.config](https://javadoc.io/doc/com.yahoo.vespa/config-lib) javadoc for details. Example:
```
```
ConfigSubscriber subscriber = new ConfigSubscriber();
ConfigHandle motdHandle = subscriber.subscribe(MotdConfig.class, "motdserver2/0");
ConfigHandle anotherHandle = subscriber.subscribe(AnotherConfig.class, "motdserver2/0");
if (!subscriber.nextConfig()) throw new RuntimeException("Config timed out.");
// We now have a synchronized new generation for these two configs.
if (motdHandle.isChanged()) {
String message = motdHandle.getConfig().message();
int port = motdHandle.getConfig().port();
}
if (anotherHandle.isChanged()) {
String myfield = anotherHandle.getConfig().getMyField();
}
```
```
## Simplified subscription
In cases like the first example above, where you only subscribe to one config, you may also subscribe using the`ConfigSubscriber.SingleSubscriber` interface. In this case, you define a `configure()`method from the interface, and call a special `subscribe()`. The method will start a dedicated config fetcher thread for you. The method will throw an exception in the user thread if initial configuration fails, and print a warning in the config thread if it fails afterwards. Example:
```
```
public class MyConfigSubscriber implements ConfigSubscriber.SingleSubscriber {
public MyConfigSubscriber(String configId) {
new ConfigSubscriber().subscribe(this, MotdConfig.class, configId);
}
@Override
public void configure(MotdConfig config) {
// configuration logic here
}
}
```
```
The disadvantage to using this is that one cannot implement custom error handling or otherwise track config changes. If needed, use the generic method above.
## Unit testing config
When instantiating a [ConfigSubscriber](https://javadoc.io/doc/com.yahoo.vespa/config/latest/com/yahoo/config/subscription/ConfigSubscriber.html), one can give it a [ConfigSource](https://javadoc.io/doc/com.yahoo.vespa/config/latest/com/yahoo/config/subscription/ConfigSource.html). One such source is a `ConfigSet`. It consists of a set of `Builder`s. This is an example of instantiating a subscriber using this - it uses 2 types of config, that were generated from files`app.def` and `string.def`:
```
ConfigSet myConfigs = new ConfigSet();
AppConfig.Builder a0builder = new AppConfig.Builder().message("A message, 0").times(88);
AppConfig.Builder a1builder = new AppConfig.Builder().message("A message, 1").times(89);
myConfigs.add("app/0", a0builder);
myConfigs.add("app/1", a1builder);
myConfigs.add("bar", new StringConfig.Builder().stringVal("StringVal"));
ConfigSubscriber subscriber = new ConfigSubscriber(myConfigs);
```
To help with unit testing, each config type has a corresponding builder type. The `Builder` is mutable whereas the `ConfigInstance` is not. Use this to set up config fixtures for unit tests. The `ConfigSubscriber` has a `reload()` method which is used in tests to force the subscriptions into a new generation. It emulates a `vespa activate` operation after you have updated the `ConfigSet`.
A full example can be found in[ConfigSetSubscriptionTest.java](https://github.com/vespa-engine/vespa/blob/master/config/src/test/java/com/yahoo/config/subscription/ConfigSetSubscriptionTest.java).
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [Creating a Config Definition](#creating-config-definition)
- [Generating Source Code and Accessing Config in Code](#generate-source)
- [Config ID](#config-id)
- [Schema Compatibility Rules](#def-compatibility)
- [Creating a Deployable Application Package](#deploy)
- [Setting Up a Running System](#setting-up)
- [Advanced Usage of the Config API](#guidelines)
- [Independent Static Configs](#independent-static-configs)
- [Dependent Static Configs](#guidelines-dependent-static)
- [Dependent Dynamic Configs](#guidelines-dependent-dynamic)
- [Advice on Config Modelling](#guidelines-tips)
- [The Java config API](#the-java-config-api)
- [Subscribing and getting config](#subscribing-and-getting-config)
- [Simplified subscription](#simplified-subscription)
- [Unit testing config](#unit-testing)
---
# Source: https://docs.vespa.ai/en/reference/operations/metrics/configserver.html.md
# ConfigServer Metrics
| Name | Unit | Description |
| --- | --- | --- |
|
configserver.requests
| request | Number of requests processed |
|
configserver.failedRequests
| request | Number of requests that failed |
|
configserver.latency
| millisecond | Time to complete requests |
|
configserver.cacheConfigElems
| item | Time to complete requests |
|
configserver.cacheChecksumElems
| item | Number of checksum elements in the cache |
|
configserver.hosts
| node | The number of nodes being served configuration from the config server cluster |
|
configserver.tenants
| instance | The number of tenants being served configuration from the config server cluster |
|
configserver.applications
| instance | The number of applications being served configuration from the config server cluster |
|
configserver.delayedResponses
| response | Number of delayed responses |
|
configserver.sessionChangeErrors
| session | Number of session change errors |
|
configserver.unknownHostRequests
| request | Config requests from unknown hosts |
|
configserver.newSessions
| session | New config sessions |
|
configserver.preparedSessions
| session | Prepared config sessions |
|
configserver.activeSessions
| session | Active config sessions |
|
configserver.inactiveSessions
| session | Inactive config sessions |
|
configserver.addedSessions
| session | Added config sessions |
|
configserver.removedSessions
| session | Removed config sessions |
|
configserver.rpcServerWorkQueueSize
| item | Number of elements in the RPC server work queue |
|
maintenanceDeployment.transientFailure
| operation | Number of maintenance deployments that failed with a transient failure |
|
maintenanceDeployment.failure
| operation | Number of maintenance deployments that failed with a permanent failure |
|
maintenance.successFactorDeviation
| fraction | Configserver: Maintenance Success Factor Deviation |
|
maintenance.duration
| millisecond | Configserver: Maintenance Duration |
|
maintenance.congestion
| failure | Configserver: Maintenance Congestion |
|
configserver.zkConnectionLost
| connection | Number of ZooKeeper connections lost |
|
configserver.zkReconnected
| connection | Number of ZooKeeper reconnections |
|
configserver.zkConnected
| node | Number of ZooKeeper nodes connected |
|
configserver.zkSuspended
| node | Number of ZooKeeper nodes suspended |
|
configserver.zkZNodes
| node | Number of ZooKeeper nodes present |
|
configserver.zkAvgLatency
| millisecond | Average latency for ZooKeeper requests |
|
configserver.zkMaxLatency
| millisecond | Max latency for ZooKeeper requests |
|
configserver.zkConnections
| connection | Number of ZooKeeper connections |
|
configserver.zkOutstandingRequests
| request | Number of ZooKeeper requests in flight |
|
orchestrator.lock.acquire-latency
| second | Time to acquire zookeeper lock |
|
orchestrator.lock.acquire-success
| operation | Number of times zookeeper lock has been acquired successfully |
|
orchestrator.lock.acquire-timedout
| operation | Number of times zookeeper lock couldn't be acquired within timeout |
|
orchestrator.lock.acquire
| operation | Number of attempts to acquire zookeeper lock |
|
orchestrator.lock.acquired
| operation | Number of times zookeeper lock was acquired |
|
orchestrator.lock.hold-latency
| second | Time zookeeper lock was held before it was released |
|
nodes.active
| node | The number of active nodes in a cluster |
|
nodes.nonActive
| node | The number of non-active nodes in a cluster |
|
nodes.nonActiveFraction
| node | The fraction of non-active nodes vs total nodes in a cluster |
|
nodes.exclusiveSwitchFraction
| fraction | The fraction of nodes in a cluster on exclusive network switches |
|
nodes.emptyExclusive
| node | The number of exclusive hosts that do not have any nodes allocated to them |
|
nodes.expired.deprovisioned
| node | The number of deprovisioned nodes that have expired |
|
nodes.expired.dirty
| node | The number of dirty nodes that have expired |
|
nodes.expired.inactive
| node | The number of inactive nodes that have expired |
|
nodes.expired.provisioned
| node | The number of provisioned nodes that have expired |
|
nodes.expired.reserved
| node | The number of reserved nodes that have expired |
|
cluster.cost
| dollar\_per\_hour | The cost of the nodes allocated to a certain cluster, in $/hr |
|
cluster.load.ideal.cpu
| fraction | The ideal cpu load of a certain cluster |
|
cluster.load.ideal.memory
| fraction | The ideal memory load of a certain cluster |
|
cluster.load.ideal.disk
| fraction | The ideal disk load of a certain cluster |
|
cluster.load.peak.cpu
| fraction | The peak cpu load in the period considered of a certain cluster |
|
cluster.load.peak.memory
| fraction | The peak memory load in the period considered of a certain cluster |
|
cluster.load.peak.disk
| fraction | The peak disk load in the period considered of a certain cluster |
|
zone.working
| binary | The value 1 if zone is considered healthy, 0 if not. This is decided by considering the number of non-active nodes vs the number of active nodes in a zone |
|
cache.nodeObject.hitRate
| fraction | The fraction of cache hits vs cache lookups for the node object cache |
|
cache.nodeObject.evictionCount
| item | The number of cache elements evicted from the node object cache |
|
cache.nodeObject.size
| item | The number of cache elements in the node object cache |
|
cache.curator.hitRate
| fraction | The fraction of cache hits vs cache lookups for the curator cache |
|
cache.curator.evictionCount
| item | The number of cache elements evicted from the curator cache |
|
cache.curator.size
| item | The number of cache elements in the curator cache |
|
wantedRestartGeneration
| generation | Wanted restart generation for tenant node |
|
currentRestartGeneration
| generation | Current restart generation for tenant node |
|
wantToRestart
| binary | One if node wants to restart, zero if not |
|
wantedRebootGeneration
| generation | Wanted reboot generation for tenant node |
|
currentRebootGeneration
| generation | Current reboot generation for tenant node |
|
wantToReboot
| binary | One if node wants to reboot, zero if not |
|
retired
| binary | One if node is retired, zero if not |
|
wantedVespaVersion
| version | Wanted vespa version for the node, in the form MINOR.PATCH. Major version is not included here |
|
currentVespaVersion
| version | Current vespa version for the node, in the form MINOR.PATCH. Major version is not included here |
|
wantToChangeVespaVersion
| binary | One if node want to change Vespa version, zero if not |
|
hasWireguardKey
| binary | One if node has a WireGuard key, zero if not |
|
wantToRetire
| binary | One if node wants to retire, zero if not |
|
wantToDeprovision
| binary | One if node wants to be deprovisioned, zero if not |
|
failReport
| binary | One if there is a fail report for the node, zero if not |
|
suspended
| binary | One if the node is suspended, zero if not |
|
suspendedSeconds
| second | The number of seconds the node has been suspended |
|
activeSeconds
| second | The number of seconds the node has been active |
|
numberOfServicesUp
| instance | The number of services confirmed to be running on a node |
|
numberOfServicesNotChecked
| instance | The number of services supposed to run on a node, that has not checked |
|
numberOfServicesDown
| instance | The number of services confirmed to not be running on a node |
|
someServicesDown
| binary | One if one or more services has been confirmed to not run on a node, zero if not |
|
numberOfServicesUnknown
| instance | The number of services the config server does not know is running on a node |
|
nodeFailerBadNode
| binary | One if the node is failed due to being bad, zero if not |
|
downInNodeRepo
| binary | One if the node is registered as being down in the node repository, zero if not |
|
numberOfServices
| instance | Number of services supposed to run on a node |
|
lockAttempt.acquireMaxActiveLatency
| second | Maximum duration for keeping a lock, ending during the metrics snapshot, or still being kept at the end or this snapshot period |
|
lockAttempt.acquireHz
| operation\_per\_second | Average number of locks acquired per second the snapshot period |
|
lockAttempt.acquireLoad
| operation | Average number of locks held concurrently during the snapshot period |
|
lockAttempt.lockedLatency
| second | Longest lock duration in the snapshot period |
|
lockAttempt.lockedLoad
| operation | Average number of locks held concurrently during the snapshot period |
|
lockAttempt.acquireTimedOut
| operation | Number of locking attempts that timed out during the snapshot period |
|
lockAttempt.deadlock
| operation | Number of lock grab deadlocks detected during the snapshot period |
|
lockAttempt.errors
| operation | Number of other lock related errors detected during the snapshot period |
|
hostedVespa.docker.totalCapacityCpu
| vcpu | Total number of VCPUs on tenant hosts managed by hosted Vespa in a zone |
|
hostedVespa.docker.totalCapacityMem
| gigabyte | Total amount of memory on tenant hosts managed by hosted Vespa in a zone |
|
hostedVespa.docker.totalCapacityDisk
| gigabyte | Total amount of disk space on tenant hosts managed by hosted Vespa in a zone |
|
hostedVespa.docker.freeCapacityCpu
| vcpu | Total number of free VCPUs on tenant hosts managed by hosted Vespa in a zone |
|
hostedVespa.docker.freeCapacityMem
| gigabyte | Total amount of free memory on tenant hosts managed by hosted Vespa in a zone |
|
hostedVespa.docker.freeCapacityDisk
| gigabyte | Total amount of free disk space on tenant hosts managed by hosted Vespa in a zone |
|
hostedVespa.docker.allocatedCapacityCpu
| vcpu | Total number of allocated VCPUs on tenant hosts managed by hosted Vespa in a zone |
|
hostedVespa.docker.allocatedCapacityMem
| gigabyte | Total amount of allocated memory on tenant hosts managed by hosted Vespa in a zone |
|
hostedVespa.docker.allocatedCapacityDisk
| gigabyte | Total amount of allocated disk space on tenant hosts managed by hosted Vespa in a zone |
|
hostedVespa.pendingRedeployments
| task | The number of hosted Vespa re-deployments pending |
|
hostedVespa.docker.skew
| fraction | A number in the range 0..1 indicating how well allocated resources are balanced with availability on hosts |
|
hostedVespa.activeHosts
| host | The number of managed hosts that are in state "active" |
|
hostedVespa.breakfixedHosts
| host | The number of managed hosts that are in state "breakfixed" |
|
hostedVespa.deprovisionedHosts
| host | The number of managed hosts that are in state "deprovisioned" |
|
hostedVespa.dirtyHosts
| host | The number of managed hosts that are in state "dirty" |
|
hostedVespa.failedHosts
| host | The number of managed hosts that are in state "failed" |
|
hostedVespa.inactiveHosts
| host | The number of managed hosts that are in state "inactive" |
|
hostedVespa.parkedHosts
| host | The number of managed hosts that are in state "parked" |
|
hostedVespa.provisionedHosts
| host | The number of managed hosts that are in state "provisioned" |
|
hostedVespa.readyHosts
| host | The number of managed hosts that are in state "ready" |
|
hostedVespa.reservedHosts
| host | The number of managed hosts that are in state "reserved" |
|
hostedVespa.activeNodes
| host | The number of managed nodes that are in state "active" |
|
hostedVespa.breakfixedNodes
| host | The number of managed nodes that are in state "breakfixed" |
|
hostedVespa.deprovisionedNodes
| host | The number of managed nodes that are in state "deprovisioned" |
|
hostedVespa.dirtyNodes
| host | The number of managed nodes that are in state "dirty" |
|
hostedVespa.failedNodes
| host | The number of managed nodes that are in state "failed" |
|
hostedVespa.inactiveNodes
| host | The number of managed nodes that are in state "inactive" |
|
hostedVespa.parkedNodes
| host | The number of managed nodes that are in state "parked" |
|
hostedVespa.provisionedNodes
| host | The number of managed nodes that are in state "provisioned" |
|
hostedVespa.readyNodes
| host | The number of managed nodes that are in state "ready" |
|
hostedVespa.reservedNodes
| host | The number of managed nodes that are in state "reserved" |
|
overcommittedHosts
| host | The number of hosts with over-committed resources |
|
spareHostCapacity
| host | The number of spare hosts |
|
throttledHostFailures
| host | Number of host failures stopped due to throttling |
|
throttledNodeFailures
| host | Number of node failures stopped due to throttling |
|
nodeFailThrottling
| binary | Metric indicating when node failure throttling is active. The value 1 means active, 0 means inactive |
|
clusterAutoscaled
| operation | Number of times a cluster has been rescaled by the autoscaler |
|
clusterAutoscaleDuration
| second | The currently predicted duration of a rescaling of this cluster |
|
deployment.prepareMillis
| millisecond | Duration of deployment preparations |
|
deployment.activateMillis
| millisecond | Duration of deployment activations |
|
throttledHostProvisioning
| binary | Value 1 if host provisioning is throttled, 0 if not |
Copyright © 2026 - [Cookie Preferences](#)
---
# Source: https://docs.vespa.ai/en/operations/self-managed/configuration-server.html.md
# Configuration Servers
Vespa Configuration Servers host the endpoint where application packages are deployed - and serves generated configuration to all services - see the [overview](../../learn/overview.html) and [application packages](../../basics/applications.html) for details. I.e., one cannot configure Vespa without config servers, and services cannot run without it.
It is useful to understand the [Vespa start sequence](config-sentinel.html). Refer to the sample applications [multinode](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode) and [multinode-HA](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode-HA) for practical examples of multi-configserver configuration.
Vespa configuration is set up using one or more configuration servers (config servers). A config server uses [Apache ZooKeeper](https://zookeeper.apache.org/) as a distributed data storage for the configuration system. In addition, each node runs a config proxy to cache configuration data - find an overview at [services start](config-sentinel.html).
## Status and config generation
Check the health of a running config server using (replace localhost with hostname):
```
$ curl http://localhost:19071/state/v1/health
```
Note that the config server is a service is itself, and runs with file-based configuration. The application packages deployed will not change the config server - the config server serves this configuration to all other Vespa nodes. This will hence always be config generation 0:
```
$ curl http://localhost:19071/state/v1/config
```
Details in [start-configserver](https://github.com/vespa-engine/vespa/blob/master/configserver/src/main/sh/start-configserver).
## Redundancy
The config servers are defined in [VESPA\_CONFIGSERVERS](files-processes-and-ports.html#environment-variables), [services.xml](../../reference/applications/services/services.html) and [hosts.xml](/en/reference/applications/hosts.html):
```
$ VESPA_CONFIGSERVERS=myserver0.mydomain.com,myserver1.mydomain.com,myserver2.mydomain.com
```
```
```
```
```
```
```
admin0
admin1
admin2
```
```
[VESPA\_CONFIGSERVERS](files-processes-and-ports.html#environment-variables) must be set on all nodes. This is a comma- or whitespace-separated list with the hostname of all config servers, like _myhost1.mydomain.com,myhost2.mydomain.com,myhost3.mydomain.com_.
When there are multiple config servers, the [config proxy](config-proxy.html) will pick a config server randomly (to achieve load balancing between config servers). The config proxy is fault-tolerant and will switch to another config server (if there is more than one) if the one it is using becomes unavailable or there is an error in the configuration it receives.
For the system to tolerate _n_ failures, [ZooKeeper](#zookeeper) by design requires using _(2\*n)+1_ nodes. Consequently, only an odd numbers of nodes is useful, so you need minimum 3 nodes to have a fault-tolerant config system.
Even when using just one config server, the application will work if the server goes down (but deploying application changes will not work). Since the _config proxy_ runs on every node and caches configs, it will continue to serve config to the services on that node. However, restarting a node when config servers are unavailable means that services on the node will be unable to start since the cache will be destroyed when restarting the config proxy.
Refer to the [admin model reference](../../reference/applications/services/admin.html#configservers) for more details on _services.xml_.
## Start sequence
To bootstrap a Vespa application instance, the high-level steps are:
- Start config servers
- Deploy config
- Start Vespa nodes
[multinode-HA](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode-HA) is a great guide on how to start a multinode Vespa application instance - try this first. Detailed steps for config server startup:
1. Set [VESPA\_CONFIGSERVERS](files-processes-and-ports.html#environment-variables) on all nodes, using fully qualified hostnames and the same value on all nodes, including the config servers.
2. Start the config server on the nodes configured in _services/hosts.xml_. Make sure the startup is successful by inspecting [/state/v1/health](../../reference/api/state-v1.html#state-v1-health), default on port 19071:
```
$ curl http://localhost:19071/state/v1/health
```
```
```
{
"time" : 1651147368066,
"status" : {
"code" : "up"
},
"metrics" : {
"snapshot" : {
"from" : 1.651147308063E9,
"to" : 1.651147367996E9
}
}
}
```
```
If there is no response on the health API, two things can have happened:
- The config server process did not start - inspect logs using `vespa-logfmt`, or check _$VESPA\_HOME/logs/vespa/vespa.log_, normally _/opt/vespa/logs/vespa/vespa.log_.
- The config server process started, and is waiting for [Zookeeper quorum](#zookeeper):
```
$ vespa-logfmt -S configserver
```
```
configserver Container.com.yahoo.vespa.zookeeper.ZooKeeperRunner Starting ZooKeeper server with /opt/vespa/var/zookeeper/conf/zookeeper.cfg. Trying to establish ZooKeeper quorum (members: [node0.vespanet, node1.vespanet, node2.vespanet], attempt 1)configserver Container.com.yahoo.container.handler.threadpool.ContainerThreadpoolImpl Threadpool 'default-pool': min=12, max=600, queue=0
configserver Container.com.yahoo.vespa.config.server.tenant.TenantRepository Adding tenant 'default', created 2022-04-28T13:02:24.182Z. Bootstrapping in PT0.175576S
configserver Container.com.yahoo.vespa.config.server.rpc.RpcServer Rpc server will listen on port 19070
configserver Container.com.yahoo.container.jdisc.state.StateMonitor Changing health status code from 'initializing' to 'up'
configserver Container.com.yahoo.jdisc.http.server.jetty.Janitor Creating janitor executor with 2 threads
configserver Container.com.yahoo.jdisc.http.server.jetty.JettyHttpServer Threadpool size: min=22, max=22
configserver Container.org.eclipse.jetty.server.Server jetty-9.4.46.v20220331; built: 2022-03-31T16:38:08.030Z; git: bc17a0369a11ecf40bb92c839b9ef0a8ac50ea18; jvm 11.0.14.1+1-
configserver Container.org.eclipse.jetty.server.handler.ContextHandler Started o.e.j.s.ServletContextHandler@341c0dfc{19071,/,null,AVAILABLE}
configserver Container.org.eclipse.jetty.server.AbstractConnector Started configserver@3cd6d147{HTTP/1.1, (http/1.1, h2c)}{0.0.0.0:19071}
configserver Container.org.eclipse.jetty.server.Server Started @21955ms
configserver Container.com.yahoo.container.jdisc.ConfiguredApplication Switching to the latest deployed set of configurations and components.Application config generation: 0
```
It will hang until quorum is reached, and the second highlighted log line is emitted. Root causes for missing quorum can be:
- No connectivity between the config servers. Zookeeper logs the members like `(members: [node0.vespanet, node1.vespanet, node2.vespanet], attempt 1)`. Verify that the nodes running config server can reach each other on port 2181.
- No connectivity can be wrong network config. [multinode-HA](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode-HA) uses a docker network, make sure there are no underscores in the hostnames.
3. Once all config servers return `up` on _state/v1/health_, an application package can be deployed. This means, if deploy fails, it is always a good idea to verify the config server health first - if config servers are up, and deploy fails, it is most likely an issue with the application package - if so, refer to [application packages](../../basics/applications.html).
4. A successful deployment logs the following, for the _prepare_ and _activate_ steps:
```
Container.com.yahoo.vespa.config.server.ApplicationRepository Session 2 prepared successfully.
Container.com.yahoo.vespa.config.server.deploy.Deployment Session 2 activated successfully using no host provisioner. Config generation 2. File references: [file '9cfc8dc57f415c72']
Container.com.yahoo.vespa.config.server.session.SessionRepository Session activated: 2
```
5. Start the Vespa nodes. Technically, they can be started at any time. When troubleshooting, it is easier to make sure the config servers are started successfully, and deployment was successful - before starting any other nodes. Refer to the [Vespa start sequence](config-sentinel.html) and [Vespa start / stop / restart](admin-procedures.html#vespa-start-stop-restart).
Make sure to look for logs on all config servers when debugging.
## Scaling up
Add a config server node for increased fault tolerance or when replacing a node. Read up on [ZooKeeper configuration](#zookeeper-configuration) before continuing. Although it is _possible_ to add more than one config server at a time, doing it one by one is recommended, to keep the ZooKeeper quorum intact.
Due to the ZooKeeper majority vote, use one or three config servers.
1. Install _vespa_ on new config server node.
2. Append the config server node's hostname to VESPA\_CONFIGSERVERS on all nodes, then (re)start all config servers in sequence to update the ZooKeeper config. By appending, the current config server nodes keep their current ZooKeeper index. Restart the existing config server(s) first. Config server will log which servers are configured when starting up to vespa log.
3. Update _services.xml_ and _hosts.xml_ with the new set of config servers, then _vespa prepare_ and _vespa activate_.
4. Restart other nodes one by one to start using the new config servers. This will let the vespa nodes use the updated set of config servers.
The config servers will automatically redistribute the application data to new nodes.
## Scaling down
This is the inverse of scaling up, and the procedure is the same. Remove config servers from the end of _VESPA\_CONFIGSERVERS_, and here one can remove two nodes in one go, if going from three to one.
## Replacing nodes
- Make sure to replace only one node at a time.
- If you have only one config server you need to first scale up with a new node, then scale down by removing the old node.
- If you have 3 or more you can replace one of the old nodes in VESPA\_CONFIGSERVERS with the new one instead of adding one, otherwise same procedure as in [Scaling up](#scaling-up). Repeat for each node you want to replace.
## Tools
Tools to access config:
- [vespa-get-config](../../reference/operations/self-managed/tools.html#vespa-get-config)
- [vespa-configproxy-cmd](../../reference/operations/self-managed/tools.html#vespa-configproxy-cmd)
- [Config API](../../reference/api/config-v2.html)
## ZooKeeper
[ZooKeeper](https://zookeeper.apache.org/) handles data consistency across multiple config servers. The config server Java application runs a ZooKeeper server, embedded with an RPC frontend that the other nodes use. ZooKeeper stores data internally in _nodes_ that can have _sub-nodes_, similar to a file system.
At [vespa prepare](../../reference/clients/vespa-cli/vespa_prepare), the application's files, along with global configurations, are stored in ZooKeeper. The application data is stored under _/config/v2/tenants/default/sessions/[sessionid]/userapp_. At [vespa activate](../../reference/clients/vespa-cli/vespa_activate), the newest application is activated _live_ by writing the session id into _/config/v2/tenants/default/applications/default:default:default_. It is at that point the other nodes get configured.
Use _vespa-zkcli_ to inspect state, replace with actual session id:
```
$ vespa-zkcli ls /config/v2/tenants/default/sessions/sessionid/userapp
$ vespa-zkcli get /config/v2/tenants/default/sessions/sessionid/userapp/services.xml
```
The ZooKeeper server logs to _$VESPA\_HOME/logs/vespa/zookeeper.configserver.0.log (files are rotated with sequence number)_
### ZooKeeper configuration
The members of the ZooKeeper cluster is generated based on the contents of [VESPA\_CONFIGSERVERS](files-processes-and-ports.html#environment-variables). _$VESPA\_HOME/var/zookeeper/conf/zookeeper.cfg_ is written when (re)starting the config server. Hence, config server(s) must all be restarted when `VESPA_CONFIGSERVERS` changes.
The order of the nodes is used to create indexes in _zookeeper.cfg_, do not change node order.
### ZooKeeper recovery
If the config server(s) should experience data corruption, for instance a hardware failure, use the following recovery procedure. One example of such a scenario is if _$VESPA\_HOME/logs/vespa/zookeeper.configserver.0.log_ says _java.io.IOException: Negative seek offset at java.io.RandomAccessFile.seek(Native Method)_, which indicates ZooKeeper has not been able to recover after a full disk. There is no need to restart Vespa on other nodes during the procedure:
1. [vespa-stop-configserver](../../reference/operations/self-managed/tools.html#vespa-stop-configserver)
2. [vespa-configserver-remove-state](../../reference/operations/self-managed/tools.html#vespa-configserver-remove-state)
3. [vespa-start-configserver](../../reference/operations/self-managed/tools.html#vespa-start-configserver)
4. [vespa](../../clients/vespa-cli.html#deployment) prepare \
5. [vespa](../../clients/vespa-cli.html#deployment) activate
This procedure completely cleans out ZooKeeper's internal data snapshots and deploys from scratch.
Note that by default the [cluster controller](../../content/content-nodes.html#cluster-controller) that maintains the state of the content cluster will use the shared same ZooKeeper instance, so the content cluster state is also reset when removing state. Manually set state will be lost (e.g. a node with user state _down_). It is possible to run cluster-controllers in standalone zookeeper mode - see [standalone-zookeeper](../../reference/applications/services/admin.html#cluster-controllers).
### ZooKeeper barrier timeout
If the config servers are heavily loaded, or the applications being deployed are big, the internals of the server may time out when synchronizing with the other servers during deploy. To work around, increase the timeout by setting: [VESPA\_CONFIGSERVER\_ZOOKEEPER\_BARRIER\_TIMEOUT](files-processes-and-ports.html#environment-variables) to 600 (seconds) or higher, and restart the config servers.
## Configuration
To access config from a node not running the config system (e.g. doing feeding via the Document API), use the environment variable [VESPA\_CONFIG\_SOURCES](files-processes-and-ports.html#environment-variables):
```
$ export VESPA_CONFIG_SOURCES="myadmin0.mydomain.com:19071,myadmin1.mydomain.com:19071"
```
Alternatively, for Java programs, use the system property _configsources_ and set it programmatically or on the command line with the _-D_ option to Java. The syntax for the value is the same as for _VESPA\_CONFIG\_SOURCES_.
### System requirements
The minimum heap size for the JVM it runs under is 128 Mb and max heap size is 2 GB (which can be changed with a [setting](../../performance/container-tuning.html#config-server-and-config-proxy)). It writes a transaction log that is regularly purged of old items, so little disk space is required. Note that running on a server that has a lot of disk I/O will adversely affect performance and is not recommended.
### Ports
The config server RPC port can be changed by setting [VESPA\_CONFIGSERVER\_RPC\_PORT](files-processes-and-ports.html#environment-variables) on all nodes in the system.
Changing HTTP port requires changing the port in _$VESPA\_HOME/conf/configserver-app/services.xml_:
```
```
```
```
When deploying, use the _-p_ option, if port is changed from the default.
## Troubleshooting
| Problem | Description |
| --- | --- |
| Health checks |
Verify that a config server is up and running using [/state/v1/health](../../reference/api/state-v1.html#state-v1-health), see [start sequence](#start-sequence). Status code is `up` if the server is up and has finished bootstrapping.
Alternatively, use [http://localhost:19071/status.html](http://localhost:19071/status.html) which will return response code 200 if server is up and has finished bootstrapping.
Metrics are found at [/state/v1/metrics](../../reference/api/state-v1.html#state-v1-metrics). Use [vespa-model-inspect](../../reference/operations/self-managed/tools.html#vespa-model-inspect) to find host and port number, port is 19071 by default.
|
| Consistency |
When having more than one config server, consistency between the servers is crucial. [http://localhost:19071/status](http://localhost:19071/status) can be used to check that settings for config servers are the same for all servers.
[vespa-config-status](../../reference/operations/self-managed/tools.html#vespa-config-status) can be used to check config on nodes.
[http://localhost:19071/application/v2/tenant/default/application/default](http://localhost:19071/application/v2/tenant/default/application/default) displays active config generation and should be the same on all servers, and the same as in response from running [vespa deploy](../../clients/vespa-cli.html#deployment)
|
| Bad Node |
If running with more than one config server and one of these goes down or has hardware failure, the cluster will still work and serve config as usual (clients will switch to use one of the good servers). It is not necessary to remove a bad server from the configuration.
Deploying applications will take longer, as [vespa deploy](../../clients/vespa-cli.html#deployment) will not be able to complete a deployment on all servers when one of them is down. If this is troublesome, lower the [barrier timeout](#zookeeper-barrier-timeout) - (default value is 120 seconds).
Note also that if you have not configured [cluster controllers](../../reference/applications/services/admin.html#cluster-controller) explicitly, these will run on the config server nodes and the operation of these might be affected. This is another reason for not trying to manually remove a bad node from the config server setup.
|
| Stuck filedistribution |
The config system distributes binary files (such as jar bundle files) using [file-distribution](../../applications/deployment.html#file-distribution) - use [vespa-status-filedistribution](../../reference/operations/self-managed/tools.html#vespa-status-filedistribution) to see detailed status if it gets stuck.
|
| Memory |
Insufficient memory on the host / in the container running the config server will cause startup or deploy / configuration problems - see [Docker containers](docker-containers.html).
|
| ZooKeeper |
The following can be caused by a full disk on the config server, or clocks out of sync:
```
at com.yahoo.vespa.zookeeper.ZooKeeperRunner.startServer(ZooKeeperRunner.java:92)
Caused by: java.io.IOException: The accepted epoch, 10 is less than the current epoch, 48
```
Users have reported that "Copying the currentEpoch to acceptedEpoch fixed the problem".
|
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [Status and config generation](#status-and-config-generation)
- [Redundancy](#redundancy)
- [Start sequence](#start-sequence)
- [Scaling up](#scaling-up)
- [Scaling down](#scaling-down)
- [Replacing nodes](#replacing-nodes)
- [Tools](#tools)
- [ZooKeeper](#zookeeper)
- [ZooKeeper configuration](#zookeeper-configuration)
- [ZooKeeper recovery](#zookeeper-recovery)
- [ZooKeeper barrier timeout](#zookeeper-barrier-timeout)
- [Configuration](#configuration)
- [System requirements](#system-requirements)
- [Ports](#ports)
- [Troubleshooting](#troubleshooting)
---
# Source: https://docs.vespa.ai/en/applications/configuring-components.html.md
# Configuring Java components
Any Java component might require some sort of configuration, be it simple strings or integers, or more complex structures. Because of all the boilerplate code that commonly goes into classes to hold such configuration, this often degenerates into a collection of key-value string pairs (e.g. [javax.servlet.FilterConfig](https://docs.oracle.com/javaee/6/api/javax/servlet/FilterConfig.html)). To avoid this, Vespa has custom, type-safe configuration to all [Container](containers.html) components. Get started with the [Developer Guide](developer-guide.html), try the [album-recommendation-java](https://github.com/vespa-engine/sample-apps/tree/master/album-recommendation-java) sample application.
Configurable components in short:
- Create a [config definition](../reference/applications/config-files.html#config-definition-files) file
- Use the Vespa [bundle plugin](bundles.html#maven-bundle-plugin) to generate a config class from the definition
- Inject config objects in the application code
The application code is interfacing with config through the generated code — code and config is always in sync. This configuration should be used for all state which is assumed to stay constant for the _lifetime of the component instance_. Use [deploy](../basics/applications.html) to push and activate code and config changes.
## Config definition
Write a [config definition](../reference/applications/config-files.html#config-definition-files) file and place it in the application's `src/main/resources/configdefinitions/` directory, e.g. `src/main/resources/configdefinitions/my-component.def`:
```
package=com.mydomain.mypackage
myCode int default=42
myMessage string default=""
```
## Generating config classes
Generating config classes is done by the _bundle plugin_:
```
$ mvn generate-resources
```
The generated the config classes are written to `target/generated-sources/vespa-configgen-plugin/`. In the above example, the config definition file was named _my-component.def_ and its package declaration is _com.mydomain.mypackage_. The full name of the generated java class will be _com.mydomain.mypackage.MyComponentConfig_
It is a good idea to generate the config classes first, _then_ resolve dependencies and compile in the IDE.
## Using config in code
The generated config class is now available for the component through [constructor injection](dependency-injection.html), which means that the component can declare the generated class as one of its constructor arguments:
```
package com.mydomain.mypackage;
public class MyComponent {
private final int code;
private final String message;
@Inject
public MyComponent(MyComponentConfig config) {
code = config.myCode();
message = config.myMessage();
}
}
```
The Container will create and inject the config instance. To override the default values of the config, [specify](../reference/applications/config-files.html#generic-configuration-in-services-xml) values in `src/main/application/services.xml`, like:
```
132
Hello, World!
```
and the deployed instance of `MyComponent` is constructed using a corresponding instance of `MyComponentConfig`.
## Unit testing configurable components
The generated config class provides a builder API that makes it easy to create config objects for unit testing. Example that sets up a unit test for the `MyComponent` class from the example above:
```
import static com.mydomain.mypackage.MyComponentConfig.*;
public class MyComponentTest {
@Test
public void requireThatMyComponentGetsConfig() {
MyComponentConfig config = new MyComponentConfig.Builder()
.myCode(668)
.myMessage("Neighbour of the beast")
.build();
MyComponent component = new MyComponent(config);
…
}
}
```
The config class used here is simple — see a separate example of [building a complex configuration object](unit-testing.html#unit-testing-configurable-components).
## Adding files to the component configuration
This section describes what to do if the component needs larger configuration objects that are stored in files, e.g. machine-learned models, [automata](../reference/operations/tools.html#vespa-makefsa) or large tables. Before proceeding, take a look at how to create [provider components](dependency-injection.html#special-components) — instead of integrating large objects into e.g. a searcher or processor, it might be better to split the resource-demanding part of the component's configuration into a separate provider component. The procedure described below can be applied to any component type.
Files can be transferred using either [file distribution](deployment.html#file-distribution) or URL download. File distribution is used when the files are added to the application package. If for some reason this is not convenient, e.g. due to size, origin of file or update frequency, Vespa can download the file and make it available for the component. Both types are set up in the config definition file. File distribution uses the `path` config type, and URL downloading the `url` type. You can also use the `model` type for machine-learned models that can be referenced by both model-id, used on Vespa Cloud, and url/path, used on self-hosted deployments. See [the config file reference](../reference/applications/config-files.html) for details.
In the following example we will show the usage of all three types. Assume this config definition, named `my-component.def`:
```
package=com.mydomain.mypackage
myFile path
myUrl url
myModel model
```
The file must reside in the application package, and the path (relative to the application package root) must be given in the component's configuration in `services.xml`:
```
my-files/my-file.txt
https://docs.vespa.ai/en/reference/query-api-reference.html
```
An example component that uses these files:
```
package com.mydomain.mypackage;
import java.io.File;
public class MyComponent {
private final File fileFromFileDistribution;
private final File fileFromUrlDownload;
public MyComponent(MyComponentConfig config) {
pathFromFileDistribution = config.myFile();
fileFromUrlDownload = config.myUrl();
modelFilePath = config.myModel();
}
}
```
The `myFile()` and `myModel()` getter returns a `java.nio.Path` object, while the `myUrl()` getter returns a `java.io.File` object. The container framework guarantees that these files are fully present at the given location before the component constructor is invoked, so they can always be accessed right away.
When the client asks for config that uses the `url` or `model` config type with a URL, the content will be downloaded and cached on the nodes that need it. If you want to change the content, the application package needs to be updated with a new URL for the changed content and the application [deployed](../basics/applications.html), otherwise the cached content will still be used. This avoids unintended changes to the application if the content of a URL changes.
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [Config definition](#config-definition)
- [Generating config classes](#generate-config-class)
- [Using config in code](#use-config-in-code)
- [Unit testing configurable components](#unit-testing-configurable-components)
- [Adding files to the component configuration](#adding-files-to-the-component-configuration)
---
# Source: https://docs.vespa.ai/en/content/consistency.html.md
# Vespa Consistency Model
Vespa offers configurable data redundancy with eventual consistency across replicas. It's designed for high efficiency under workloads where eventual consistency is an acceptable tradeoff. This document aims to go into some detail on what these tradeoffs are, and what you, as a user, can expect.
### Vespa and CAP
Vespa may be considered a limited subset of AP under the [CAP theorem](https://en.wikipedia.org/wiki/CAP_theorem).
Under CAP, there is a fundamental limitation of whether any distributed system can offer guarantees on consistency (C) or availability (A) in scenarios where nodes are partitioned (P) from each other. Since there is no escaping that partitions can and will happen, we talk either of systems that are _either_ CP or AP.
Consistency (C) in CAP implies that reads and writes are strongly consistent, i.e. the system offers_linearizability_. Weaker forms such as causal consistency or "read your writes" consistency is _not_ sufficient. As mentioned initially, Vespa is an eventually consistent data store and therefore does not offer this property. In practice, Consistency requires the use of a majority consensus algorithm, which Vespa does not currently use.
Availability (A) in CAP implies that _all requests_ receive a non-error response regardless of how the network may be partitioned. Vespa is dependent on a centralized (but fault tolerant) node health checker and coordinator. A network partition may take place between the coordinator and a subset of nodes. Operations to nodes in this subset aren't guaranteed to succeed until the partition heals. As a consequence, Vespa is not _guaranteed_ to be strongly available, so we treat this as a "limited subset" of AP (though this is not technically part of the CAP definition).
In _practice_, the best-effort semantics of Vespa have proven to be both robust and highly available in common datacenter networks.
### Write durability and consistency
When a client receives a successful [write](../writing/reads-and-writes.html) response, the operation has been written and synced to disk. The replication level is configurable. Operations are by default written on _all_ available replica nodes before sending a response. "Available" here means being Up in the [cluster state](content-nodes.html#cluster-state), which is determined by the fault-tolerant, centralized Cluster Controller service. If a cluster has a total of 3 nodes, 2 of these are available and the replication factor is 3, writes will be ACKed to the client if both the available nodes ACK the operation.
On each replica node, operations are persisted to a write-ahead log before being applied. The system will automatically recover after a crash by replaying logged operations. Writes are guaranteed to be synced to durable storage prior to sending a successful response to the client, so acknowledged writes are retained even in the face of sudden power loss.
If a client receives a failure response for a write operation, the operation may or may not have taken place on a subset of the replicas. If not all replicas could be written to, they are considered divergent (out of sync). The system detects and reconciles divergent replicas. This happens without any required user intervention.
Each document write assigns a new wall-clock timestamp to the resulting document version. As a consequence, configure servers with NTP to keep clock drift as small as possible. Large clock drifts may result in timestamp collisions or unexpected operation orderings.
Vespa has support for conditional writes for individual documents through test-and-set operations. Multi-document transactions are not supported.
After a successful response, changes to the search indexes are immediately visible by default.
### Read consistency
Reads are consistent on a best-effort basis and are not guaranteed to be linearizable.
When using a [Get](../reference/api/document-v1.html#get) or [Visit](../writing/visiting.html) operation, the client will never observe a partially updated document. For these read operations, writes behave as if they are atomic.
Searches may observe partial updates, as updates are not atomic across index structures. This can only happen _after_ a write has started, but _before_ it's complete. Once a write is complete, all index updates are visible.
Searches may observe transient loss of coverage when nodes go down. Vespa will restore coverage automatically when this happens. How fast this happens depends on the configured [searchable-copies](../reference/applications/services/content.html#searchable-copies) value.
If replicas diverge during a Get, Vespa performs a read-repair. This fetches the requested document from all divergent replicas. The client then receives the version with the newest timestamp.
If replicas diverge during a Visit, the behavior is slightly different between the Document V1 API and [vespa-visit](/en/reference/operations/self-managed/tools.html#vespa-visit):
- Document V1 will prefer immediately visiting the replica that contains the most documents. This means it's possible for a subset of documents in a bucket to not be returned.
- `vespa-visit` will by default retry visiting the bucket until it is in sync. This may take a long time if large parts of the system are out of sync.
The rationale for this difference in behavior is that Document V1 is usually called in a real-time request context, whereas `vespa-visit` is usually called in a background/batch processing context.
Visitor operations iterate over the document corpus in an implementation-specific order. Any given document is returned in the state it was in at the time the visitor iterated over the data bucket containing the document. This means there is _no snapshot isolation_—a document mutation happening concurrently with a visitor may or may not be reflected in the returned document set, depending on whether the mutation happened before or after iteration of the bucket containing the document.
### Replica reconciliation
Reconciliation is the act of bringing divergent replicas back into sync. This usually happens after a node restarts or fails. It will also happen after network partitions.
Unlike several other eventually consistent databases, Vespa doesn't use distributed replica operation logs. Instead, reconciling replicas involves exchanging sets of timestamped documents. Reconciliation is complete once the union set of documents is present on all replicas. Metadata is checksummed to determine whether replicas are in sync with each other.
When reconciling replicas, the newest available version of a document will "win" and become visible. This version may be a remove (tombstone). Tombstones are replicated in the same way as regular documents.
Reconciliation happens the document level, not at the field level. I.e. there is no merging of individual fields across different versions.
If a test-and-set operation updates at least one replica, it will eventually become visible on the other replicas.
The reconciliation operation is referred to as a "merge" in the rest of the Vespa documentation.
Tombstone entries have a configurable time-to-live before they are compacted away. Nodes that have been partitioned away from the network for a longer period of time than this TTL should ideally have their indexes removed before being allowed back into the cluster. If not, there is a risk of resurrecting previously removed documents. Vespa does not currently detect or handle this scenario automatically.
See the documentation on [data-retention-vs-size](/en/operations/self-managed/admin-procedures.html#data-retention-vs-size).
### Q/A
#### How does Vespa perform read-repair for Get-operations, and how many replicas are consulted?
When the distributor process that is responsible for a particular data bucket receives a Get operation, it checks its locally cached replica metadata state for inconsistencies.
If all replicas have consistent metadata, the operation is routed to a single replica—preferably located on the same host as the distributor, if present. This is the normal case when the bucket replicas are in sync.
If there is at least one replica metadata mismatch, the distributor automatically initiates a read-repair process:
1. The distributor splits the bucket replicas into subsets based on their metadata, where all replicas in each subset have the same metadata. It then sends a lightweight metadata-only Get to one replica in each subset. The core assumption is that all these replicas have the same set of document versions, and that it suffices to consult one replica in the set. If a metadata read fails, the distributor will automatically fail over to another replica in the subset.
2. It then sends one full Get to a node in the replica set that returned the _highest_timestamp.
This means that if you have 100 replicas and 1 has different metadata from the remaining 99, only 2 nodes in total will be initially queried, and only 1 will receive the actual (full) Get read.
Similar algorithms are used by other operations that may trigger read/write-repair.
#### Since Vespa performs read-repair when inconsistencies are detected, does this mean replies are strongly consistent?
Unfortunately not. Vespa does not offer any cross-document transactions, so in this case strong consistency implies single-object _linearizability_ (as opposed to_strict serializability_ across multiple objects). Linearizability requires the ability to reach a majority consensus amongst a particular known and stable configuration of replicas (side note: replica sets can be reconfigured in strongly consistent algorithms like Raft and Paxos, but such a reconfiguration must also be threaded through the consensus machinery).
The active replica set for a given data bucket (and thus the documents it logically contains) is ephemeral and dynamic based on the nodes that are currently available in the cluster (as seen from the cluster controller). This precludes having a stable set of replicas that can be used for reaching majority decisions.
See also [Vespa and CAP](#vespa-and-cap).
#### In what kind of scenario might Vespa return a stale version of a document?
Stale document versions may be returned when all replicas containing the most recent document version have become unavailable.
Example scenario (for simplicity—but without loss of generality—assuming redundancy 1) in a cluster with two nodes {A, B}:
1. Document X is stored in a replica on node A with timestamp 100.
2. Node A goes down; node B takes over ownership.
3. A write request is received for document X; it is stored on node B with timestamp 200 and ACKed to the client.
4. Node B goes down.
5. Node A comes back up.
6. A read request arrives for document X. The only visible replica is on node A, which ends up serving the request.
7. The document version at timestamp 100 is returned to the client.
Since the write at `t=200` _happens-after_ the write at `t=100`, returning the version at`t=100` violates linearizability.
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [Vespa and CAP](#vespa-and-cap)
- [Write durability and consistency](#write-durability-and-consistency)
- [Read consistency](#read-consistency)
- [Replica reconciliation](#replica-reconciliation)
- [Q/A](#qa)
---
# Source: https://docs.vespa.ai/en/reference/ranking/constant-tensor-json-format.html.md
# Constant Tensor JSON Format
This document describes with examples the JSON formats accepted when reading tensor constants from a file. For convenience, compactness, and readability there are various formats that can be used depending on the detailed tensor type:
- [Dense tensors](#dense-tensors): indexed dimensions only
- [Sparse tensors](#sparse-tensors): mapped dimensions only
- [Mixed tensors](#mixed-tensors): both indexed and mapped dimensions
## Canonical type
A tensor type can be declared with its dimension in any order, but internally they will always be sorted in alphabetical order. So the type "`tensor(category{}, brand{}, a[3], x[768], d0[1])`" has the canonical string representation "`tensor(a[3],brand{},category{},d0[1],x[768])`" and the "x" dimension with size 768 is the innermost. For constants, all indexed dimensions must have a known size.
## Dense tensors
Tensors using only indexed dimensions are used for storing a vector, a matrix, and so on and are collectively known as "dense" tensors. These are particularly easy to handle, as they always have a known number of cells in a well-defined order. They can be input as nested arrays of numerical values. Example with vector of size 5:
```
{
"type": "tensor(x[5])",
"values": [13.25, -22, 0.4242, 0, -17.0]
}
```
The "type" field is optional, but must match the [canonical form of the tensor type](#canonical-type) if present. This format is similar to "Indexed tensors short form" in the [document JSON format](../schemas/document-json-format.html#tensor-short-form-indexed).
Example of a 3x4 matrix; note that the dimension names will always be processed in [alphabetical order](#canonical-type) from outermost to innermost.
```
{
"type": "tensor(bar[3],foo[4])",
"values": [
[2.5, 1.0, 2.0, 3.0],
[1.0, 2.0, 3.0, 2.0],
[2.0, 3.0, 2.0, 1.5]
]
}
```
Note that the arrays must have exactly the declared number of elements for each dimension, and be correctly nested.
Example of an ONNX model input where we have an extra "batch" dimension which is unused (size 1) for this particular input, but still requires extra brackets:
```
{
"type": "tensor(d0[1],d1[5],d2[2])",
"values": [ [
[1.1, 1.2],
[2.1, 2.2],
[3.1, 3.2],
[4.1, 4.2],
[5.1, 5.2]
] ]
}
```
## Sparse tensors
Tensors using only mapped dimensions are collectively known as "sparse" tensors. JSON input for these will list the cells directly. Tensors with only one mapped dimension can use as simple JSON object as input:
```
{
"type": "tensor(category{})",
"cells": {
"tag": 2.5,
"another": 2.75
}
}
```
The "type" field is optional. This format is similar to "Short form for tensors with a single mapped dimension" in the [document JSON format](../schemas/document-json-format.html#tensor-short-form-mapped).
Tensors with multiple mapped dimensions must use an array of objects, where each object has an "address" containing the labels for all dimensions, and a "value" with the cell value:
```
{
"type": "tensor(category{},product{})",
"cells": [
{
"address": { "category": "foo", "product": "bar" },
"value": 1.5
},
{
"address": { "category": "qux", "product": "zap" },
"value": 3.5
},
{
"address": { "category": "pop", "product": "rip" },
"value": 6.5
}
]
}
```
Again, the "type" field is optional, but must match the [canonical form of the tensor type](#canonical-type) if present.
This format is also known as the [general verbose form](../schemas/document-json-format.html#tensor), and it's possible to use it for any tensor type.
## Mixed tensors
Tensors with both mapped and indexed dimensions can use a "blocks" format; this is similar to the "cells" formats for sparse tensors, but instead of a single cell value you get a block of values for each address. With one mapped dimension and two indexed dimensions:
```
{
"type": "tensor(a{},x[3],y[4])",
"blocks": {
"bar": [
[1.0, 2.0, 0.0, 3.0],
[2.0, 2.5, 2.0, 0.5],
[3.0, 6.0, 9.0, 9.0]
],
"foo": [
[1.0, 0.0, 2.0, 3.0],
[2.0, 2.5, 2.0, 0.5],
[3.0, 3.0, 6.0, 9.0]
]
}
}
```
The "type" field is optional, but must match the [canonical form of the tensor type](#canonical-type) if present. This format is similar to the first variant of "Mixed tensors short form" in the [document JSON format](../schemas/document-json-format.html#tensor-short-form-mixed).
With two mapped dimensions and one indexed dimensions:
```
{
"type": "tensor(a{},b{},x[3])",
"blocks": [
{
"address": { "a": "qux", "b": "zap" },
"values": [2.5, 3.5, 4.5]
},
{
"address": { "a": "foo", "b": "bar" },
"values": [1.5, 2.5, 3.5]
},
{
"address": { "a": "pop", "b": "rip" },
"values": [3.5, 4.5, 5.5]
}
]
}
```
Again, the "type" field is optional. This format is similar to the second variant of "Mixed tensors short form" in the [document JSON format](../schemas/document-json-format.html#tensor-short-form-mixed).
Copyright © 2026 - [Cookie Preferences](#)
---
# Source: https://docs.vespa.ai/en/performance/container-http.html.md
# HTTP Performance Testing of the Container using Gatling
For container testing, more flexibility and more detailed checking than straightforward saturating an interface with HTTP requests is often required. The stress test tool [Gatling](https://gatling.io/) provides such capabilities in a flexible manner with the possibility of writing arbitrary plug-ins and a DSL for the most common cases. This document shows how to get started using Gatling with Vespa. Experienced Gatling users should find there is nothing special with testing Vespa versus other HTTP services.
## Install Gatling
Refer to Gatling's [documentation for getting started](https://gatling.io/docs/gatling/reference/current/), or simply get the newest version from the[Gatling front page](https://gatling.io/), unpack the tar ball and jump straight into it. The tool runs happily from the directory created when unpacking it. This tutorial is written with Gatling 2 in mind.
## Configure the First Test with a Query Log
Refer to the Gatling documentation on how to set up the recorder. This tool acts as a browser proxy, recording what you do in the browser, allowing you to replay that as a test scenario.
After running _bin/recorder.sh_ and setting package to _com.vespa.example_and class name to _VespaTutorial_, running a simple query against your node _mynode_ (running e.g.[album-recommendation-java](https://github.com/vespa-engine/sample-apps/tree/master/album-recommendation-java)), should create a basic simulation looking something like the following in_user-files/simulations/com/vespa/example/VespaTutorial.scala_:
```
package com.vespa.example
import io.gatling.core.Predef._
import io.gatling.core.session.Expression
import io.gatling.http.Predef._
import io.gatling.jdbc.Predef._
import io.gatling.http.Headers.Names._
import io.gatling.http.Headers.Values._
import scala.concurrent.duration._
import bootstrap._
import assertions._
class VespaTutorial extends Simulation {
val httpProtocol = http
.baseURL("http://mynode:8080")
.acceptHeader("text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8")
.acceptEncodingHeader("gzip, deflate")
.connection("keep-alive")
.userAgentHeader("Mozilla/5.0 (X11; Linux x86_64; rv:27.0) Gecko/20100101 Firefox/27.0")
val headers_1 = Map("""Cache-Control""" -> """max-age=0""")
val scn = scenario("Scenario Name")
.exec(http("request_1")
.get("""/search/?query=bad""")
.headers(headers_1))
setUp(scn.inject(atOnce(1 user))).protocols(httpProtocol)
}
```
Running a single query over and over again is not useful, so we have a tiny query log in a CSV file we want to run in our test,_user-files/data/userinput.csv_:
```
userinput
bad religion
bad
lucky oops
radiohead
bad jackson
```
As usual for CSV files, the first line names the parameters. A literal comma may be escaped with backslash as "\,". Gatling takes hand of URL quoting, there is no need to e.g. encode space as "%20".
Add a feeder:
```
package com.vespa.example
import io.gatling.core.Predef._
import io.gatling.core.session.Expression
import io.gatling.http.Predef._
import io.gatling.jdbc.Predef._
import io.gatling.http.Headers.Names._
import io.gatling.http.Headers.Values._
import scala.concurrent.duration._
import bootstrap._
import assertions._
class VespaTutorial extends Simulation {
val httpProtocol = http
.baseURL("http://mynode:8080")
.acceptHeader("text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8")
.acceptEncodingHeader("gzip, deflate")
.connection("keep-alive")
.userAgentHeader("Mozilla/5.0 (X11; Linux x86_64; rv:27.0) Gecko/20100101 Firefox/27.0")
val headers_1 = Map("""Cache-Control""" -> """max-age=0""")
val scn = scenario("Scenario Name")
.feed(csv("userinput.csv").random)
.exec(http("request_1")
.get("/search/")
.queryParam("query", "${userinput}")
.headers(headers_1))
setUp(scn.inject(constantRate(100 usersPerSec) during (10 seconds)))
.protocols(httpProtocol)
}
```
Now, we have done a couple of changes to the original scenario. First, we have added the feeder. Since we do not have enough queries available for running long enough to get a scenario for some traffic, we chose the "random" strategy. This means a random user input string will be chosen for each invocation, and it might be reused. Also, we have changed how the test is run, from just a single query, into a constant rate of 100 users for 10 seconds. We should expect something as close as possible to 100 QPS in our test report.
## Running a Benchmark
We now have something we can run both on a headless node and on a personal laptop, sample run output:
```
$ ./bin/gatling.sh
GATLING_HOME is set to ~/tmp/gatling-charts-highcharts-2.0.0-M3a
Choose a simulation number:
[0] advanced.AdvancedExampleSimulation
[1] basic.BasicExampleSimulation
[2] com.vespa.example.VespaTutorial
2
Select simulation id (default is 'vespatutorial'). Accepted characters are a-z, A-Z, 0-9, - and _
Select run description (optional)
Simulation com.vespa.example.VespaTutorial started...
================================================================================
2014-04-09 11:54:33 0s elapsed
---- Scenario Name -------------------------------------------------------------
[-] 0%
waiting: 998 / running: 2 / done:0
---- Requests ------------------------------------------------------------------
> Global (OK=0 KO=0 )
================================================================================
================================================================================
2014-04-09 11:54:38 5s elapsed
---- Scenario Name -------------------------------------------------------------
[####################################] 49%
waiting: 505 / running: 0 / done:495
---- Requests ------------------------------------------------------------------
> Global (OK=495 KO=0 )
> request_1 (OK=495 KO=0 )
================================================================================
================================================================================
2014-04-09 11:54:43 10s elapsed
---- Scenario Name -------------------------------------------------------------
[#########################################################################] 99%
waiting: 8 / running: 0 / done:992
---- Requests ------------------------------------------------------------------
> Global (OK=992 KO=0 )
> request_1 (OK=992 KO=0 )
================================================================================
================================================================================
2014-04-09 11:54:43 10s elapsed
---- Scenario Name -------------------------------------------------------------
[##########################################################################]100%
waiting: 0 / running: 0 / done:1000
---- Requests ------------------------------------------------------------------
> Global (OK=1000 KO=0 )
> request_1 (OK=1000 KO=0 )
================================================================================
Simulation finished.
Generating reports...
Parsing log file(s)...
Parsing log file(s) done
================================================================================
---- Global Information --------------------------------------------------------
> numberOfRequests 1000 (OK=1000 KO=0 )
> minResponseTime 10 (OK=10 KO=- )
> maxResponseTime 30 (OK=30 KO=- )
> meanResponseTime 10 (OK=10 KO=- )
> stdDeviation 2 (OK=2 KO=- )
> percentiles1 10 (OK=10 KO=- )
> percentiles2 10 (OK=10 KO=- )
> meanNumberOfRequestsPerSecond 99 (OK=99 KO=- )
---- Response Time Distribution ------------------------------------------------
> t < 800 ms 1000 (100%)
> 800 ms < t < 1200 ms 0 ( 0%)
> t > 1200 ms 0 ( 0%)
> failed 0 ( 0%)
================================================================================
Reports generated in 0s.
Please open the following file : ~/tmp/gatling-charts-highcharts-2.0.0-M3a/results/vespatutorial-20140409115432/index.html
```
The report gives graphs showing how the test progressed and summaries for failures and time spent.
Copyright © 2026 - [Cookie Preferences](#)
---
# Source: https://docs.vespa.ai/en/performance/container-tuning.html.md
# Container Tuning
A collection of configuration parameters to tune the Container as used in Vespa. Some configuration parameters have native [services.xml](../application-packages.html) support while others are configured through [generic config overrides](../reference/applications/config-files.html#generic-configuration-in-services-xml).
## Container worker threads
The container uses multiple thread pools for its operations. Most components including request handlers use the container's [default thread pool](../reference/applications/services/container.html#threadpool), which is controlled by a shared executor instance. Any component can utilize the default pool by injecting an `java.util.concurrent.Executor` instance. Some built-in components have dedicated thread pools - such as the Jetty server, the [search handler](../reference/applications/services/search.html#threadpool) and [document-processing](../reference/applications/services/docproc.html#threadpool) chains. These thread pools are injected through special wiring in the config model and are not easily accessible from other components.
The thread pools are by default scaled on the system resources as reported by the JVM (`Runtime.getRuntime().availableProcessors()`). It's paramount that the `-XX:ActiveProcessorCount`/`jvm_availableProcessors` configuration is correct for the container to work optimally. The [default thread pool](../reference/applications/services/container.html#threadpool) configuration can be overridden through services.xml. We recommend you keep the default configuration as it's tuned to work across a variety of workloads. Note that the default configuration and pool usage may change between minor versions.
The container will pre-start the minimum number of worker threads, so even an idle container may report running several hundred threads. The [search handler](../reference/applications/services/search.html#threadpool) and [document processing handler](../reference/applications/services/docproc.html#threadpool) thread pools each pre-start the number of workers set in their configurations. Note that tuning the capacity upwards increases the risk of high GC pressure as concurrency becomes higher with more in-flight requests. The GC pressure is a function of number of in-flight requests, the time it takes to complete the request and the amount of garbage produced per request. Increasing the queue size will allow the application to handle shorter traffic bursts without rejecting requests, although increasing the average latency for those requests that are queued up. Large queues will also increase heap consumption in overload situations. For some thread pools, extra threads will be created once the queue is full (when [`max`](../reference/applications/services/search.html#threads.max) is specified), and are destroyed after an idle timeout. If all threads are occupied, requests are rejected with a 503 response.
The effective thread pool configuration and utilization statistics can be observed through the [Container Metrics](/en/operations/metrics.html#container-metrics). See [Thread Pool Metrics](/en/operations/metrics.html#thread-pool-metrics) for a list of metrics exported.
**Note:** If the queue size is set to 0 the metric measuring the queue size -`jdisc.thread_pool.work_queue.size` - will instead switch to measure how many threads are active.
### Recommendation
A fixed size pool is preferable for stable latency during peak load, at a cost of a higher static memory load and increased context-switching overhead if excessive number of threads are configured. Variable size pool is mostly beneficial to minimize memory consumption during low-traffic periods, and in general if the size of peak load is somewhat unknown. The downside is that once all core threads are active, latency will increase as additional tasks are queued and launching extra threads is relatively expensive as it involves system calls to the OS.
### Example
Consider a container host with 8 vCPU. Setting `4 ` on the [search handler threadpool](../reference/applications/services/search.html#threadpool) yields `4 * 8 = 32` worker threads, and adding `25 ` gives the pool a total queue capacity of `32 * 25 = 800` requests. The same thread calculation applies to the [document processing handler threadpool](../reference/applications/services/docproc.html#threadpool), which does not support queue configuration. The example below shows a consistent configuration where the default thread pool, the search handler threadpool, and the document processing handler threadpool are all kept fixed.
```
```
5
25
4
25
2
```
```
## Container memory usage
> Help, my container nodes are using more than 70% memory!
It's common to observe the container process utilizing its maximum configured heap size. This, by itself, is not necessarily an indication of a problem. The Java Virtual Machine (JVM) manages memory within the allocated heap, and it's designed to use as much of it as possible to reduce the frequency of garbage collection.
To understand whether enough memory is allocated, look at the garbage collection activity. If GC is running frequently and using significant CPU or causing long pauses, it might indicate that the heap size is too small for the workload. In such cases, consider increasing the maximum heap size. However, if the garbage collector is running infrequently and efficiently, it's perfectly normal for the container to utilize most or all of its allocated heap, and even more (as some memory will also be allocated outside the heap; e.g. direct buffers for efficient data transfer).
Vespa exports several metrics to allow you to monitor JVM GC performance, such as [jvm.gc.overhead](../reference/operations/metrics/container.html#jvm_gc_overhead) - if this exceeds 8-10% you should consider increasing heap memory and/or tuning GC settings.
## JVM heap size
Change the default JVM heap size settings used by Vespa to better suit the specific hardware settings or application requirements.
By setting the relative size of the total JVM heap in [percentage of available memory](../reference/applications/services/container.html#nodes), one does not know exactly what the heap size will be, but the configuration will be adaptable and ensure that the container can start even in environments with less available memory. The example below allocates 50% of available memory on the machine to the JVM heap:
```
```
```
```
## JVM Tuning
Use _gc-options_ for controlling GC related parameters and _options_ for tuning other parameters. See [reference documentation](../reference/applications/services/container.html#nodes). Example: Running with 4 GB heap using G1 garbage collector and using NewRatio = 1 (equal size of old and new generation) and enabling verbose GC logging (logged to stdout to vespa.log file).
```
```
```
```
The default heap size with docker image is 1.5g which can for high throughput applications be on the low side, causing frequent garbage collection. By default, the G1GC collector is used.
### Config Server and Config Proxy
The config server and proxy are not executed based on the model in _services.xml_. On the contrary, they are used to bootstrap the services in that model. Consequently, one must use configuration variables to set the JVM parameters for the config server and config proxy. They also need to be restarted (_services_ in the config proxy's case) after a change, but one does _not_ need to _vespa prepare_ or _vespa activate_ first. Example:
```
VESPA_CONFIGSERVER_JVMARGS -Xlog:gc
VESPA_CONFIGPROXY_JVMARGS -Xlog:gc -Xmx256m
```
Refer to [Setting Vespa variables](/en/operations/self-managed/files-processes-and-ports.html#environment-variables).
## Container warmup
Some applications observe that the first queries made to a freshly started container take a long time to complete. This is typically due to some components performing lazy setup of data structures or connections. Lazy initialization should be avoided in favor of eager initialization in component constructor, but this is not always possible.
A way to avoid problems with the first queries in such cases is to perform warmup queries at startup. This is done by issuing queries from the constructor of the Handler of regular queries. If using the default handler, [com.yahoo.search.handler.SearchHandler](https://github.com/vespa-engine/vespa/blob/master/container-search/src/main/java/com/yahoo/search/handler/SearchHandler.java), subclass this and configure your subclass as the handler of query requests in _services.xml_.
Add a call to a warmupQueries() method as the last line of your handler constructor. The method can look something like this:
```
```
private void warmupQueries() {
String[] requestUris = new String[] {"warmupRequestUri1", "warmupRequestUri2"};
int warmupIterations = 50;
for (int i = 0; i < warmupIterations; i++) {
for (String requestUri : requestUris) {
handle(HttpRequest.createTestRequest(requestUri, com.yahoo.jdisc.http.HttpRequest.Method.GET));
}
}
}
```
```
Since these queries will be executed before the container starts accepting external queries, they will cause the first external queries to observe a warmed up container instance.
Use [metrics.ignore](../reference/api/query.html#metrics.ignore) in the warmup queries to eliminate them from being reported in metrics.
### Disabling warmups
Warmups can be disabled by adding the following container http config to the container section in services.xml:
```
```
false
```
```
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [Container worker threads](#container-worker-threads)
- [Recommendation](#recommendation)
- [Example](#container-worker-threads-example)
- [Container memory usage](#container-memory-usage)
- [JVM heap size](#jvm-heap-size)
- [JVM Tuning](#jvm-tuning)
- [Config Server and Config Proxy](#config-server-and-config-proxy)
- [Container warmup](#container-warmup)
- [Disabling warmups](#disabling-warmups)
---
# Source: https://docs.vespa.ai/en/reference/operations/metrics/container.html.md
# Source: https://docs.vespa.ai/en/reference/applications/services/container.html.md
# Source: https://docs.vespa.ai/en/operations/self-managed/container.html.md
# Container
This is the Container service operational guide.

Note that "container" is an overloaded concept in Vespa - in this guide it refers to service instance nodes in blue.
Refer to [container metrics](../metrics.html#container-metrics).
## Endpoints
Container service(s) hosts the query and feed endpoints - examples:
- [album-recommendation](https://github.com/vespa-engine/sample-apps/blob/master/album-recommendation/app/services.xml) configures \_both\_ query and feed in the same container cluster (i.e. service):
```
```
```
```
- [multinode-HA](https://github.com/vespa-engine/sample-apps/blob/master/examples/operations/multinode-HA/services.xml) configures query and feed in separate container clusters (i.e. services):
```
```
```
```
Observe that `` and `` are located in separate clusters in the second example, and endpoints are therefore different.
**Important:** The first thing to validate when troubleshooting query errors is to make sure that the endpoint is correct, i.e. that query requests hit the correct nodes. A query will be written to the [access log](../access-logging.html)on one of the nodes in the container cluster
## Inspecting Vespa Java Services using JConsole
Determine the state of each running Java Vespa service using JConsole. JConsole is distributed along with the Java developer kit. Start JConsole:
```
$ jconsole :
```
where the host and port determine which service to attach to. For security purposes the JConsole tool can not directly attach to Vespa services from external machines.
### Connecting to a Vespa instance
To attach a JConsole to a Vespa service running on another host, create a tunnel from the JConsole host to the Vespa service host. This can for example be done by setting up two SSH tunnels as follows:
```
$ ssh -N -L:localhost: &
$ ssh -N -L:localhost: &
```
where port1 and port2 are determined by the type of service (see below). A JConsole can then be attached to the service as follows:
```
$ jconsole localhost:
```
Port numbers:
| Service | Port 1 | Port 2 |
| --- | --- | --- |
| QRS | 19015 | 19016 |
| Docproc | 19123 | 19124 |
Updated port information can be found by running:
```
$[vespa-model-inspect](../../reference/operations/self-managed/tools.html#vespa-model-inspect)service
```
where the resulting RMIREGISTRY and JMX lines determine port1 and port2, respectively.
### Examining thread states
The state of each container is available in JConsole by pressing the Threads tab and selecting the thread of interest in the threads list. Threads of interest includes _search_, _connector_, _closer_, _transport_ and _acceptor_ (the latter four are used for backend communications).
Copyright © 2026 - [Cookie Preferences](#)
---
# Source: https://docs.vespa.ai/en/applications/containers.html.md
# Container clusters
Vespa's Java container - JDisc, hosts all application components as well as the stateless logic of Vespa itself. Which particular components are hosted by a container cluster is configured in services.xml. The main features of JDIsc are:
- HTTP serving out of the box from an embedded Jetty server, and support for plugging in other transport mechanisms.
- Integration with the config system of Vespa which allows components to [receive up-to-date config](configuring-components.html) (by constructor injection) resulting from application deployment.
- [Dependency injection based on Guice](dependency-injection.html) (Felix), but extended for configs and component collections.
- A component model based on [OSGi](bundles.html) which allows component to be (re)deployed to running servers, and to control which APIs they expose to others.
- The features above combine to allow application package changes (changes to components, configuration or data) to be applied by Vespa without disrupting request serving nor requiring restarts.
- Standard component types exists for
- [general request handling](request-handlers.html)
- [chained request-response processing](processing.html)
- [processing document writes](document-processors.html)
- [intercepting queries and results](searchers.html)
- [rendering responses](result-renderers.html)
Application components can be of any other type as well and do not need to reference any Vespa API to be loaded and managed by the container.
- A general [chain composition](chaining.html) mechanism for components.
## Developing Components
- The JDisc container provides a framework for processing requests and responses, named _Processing_ - its building blocks are:
- [Chains](chaining.html) of other components that are to be executed serially, with each providing some service or transform
- [Processors](processing.html) that change the request and / or the response. They may also make multiple forward requests, in series or parallel, or manufacture the response content themselves
- [Renderers](processing.html#response-rendering) that are used to serialize a Processor's response before returning it to a client
- Application Lifecycle and unit testing:
- [Configuring components](configuring-components.html) with custom configuration
- [Component injection](dependency-injection.html) allows components to access other application components
- Learn how to [build OSGi bundles](bundles.html) and how to [troubleshoot](bundles.html#troubleshooting) classloading issues
- Using [Libraries for Pluggable Frameworks](pluggable-frameworks.html) from a component may result in class loading issues that require extra setup in the application
- [Unit testing configurable components](unit-testing.html#unit-testing-configurable-components)
- Handlers and filters:
- [Http servers and security filters](http-servers-and-filters.html) for incoming connections on HTTP and HTTPS
- [Request handlers](request-handlers.html) to process incoming requests and generate responses
- Searchers and Document Processors:
- [Searcher](searchers.html) and [search result renderer](result-renderers.html) development
- [Document processing](document-processors.html)
## Reference documentation
- [services.xml](../reference/applications/services/container.html)
## Other related documents
- [Designing RESTful web services](web-services.html) as Vespa Components
- [healthchecks](../reference/operations/health-checks.html) - using the Container with a VIP
- [Vespa Component Reference](../reference/applications/components.html): The Container's request processing lifecycle
Copyright © 2026 - [Cookie Preferences](#)
---
# Source: https://docs.vespa.ai/en/operations/self-managed/content-node-recovery.html.md
# Content node recovery
In exceptional cases, one or more content nodes may end up with corrupted data causing it to fail to restart. Possible reasons are
- the application configuring a higher memory or disk limit such that the node is allowed to accept more data than it can manage,
- hardware failure, or
- a bug in Vespa.
Normally a corrupted node can just be wiped of all data or removed from the cluster, but when this happens simultaneously to multiple nodes, or redundancy 1 is used, it may be necessary to recover the node(s) to avoid data loss. This documents explains the procedure.
## Recovery steps
On each of the nodes needing recovery:
1. [Stop services](admin-procedures.html#vespa-start-stop-restart) on the node if running.
2. Repair the node:
- If the node cannot start due to needing more memory than available: Increase the memory available to the node, or if not possible stop all non-essential processes on the node using `vespa-sentinel-cmd list` and `vespa-sentinel-cmd stop [name]`, and (if necessary) start only the content node process using `vespa-sentinel-cmd start searchnode`. When the node is successfully started, issue delete operations or increase the cluster size to reduce the amount of data on the node if necessary.
- If the node cannot start due to needing more disk than available: Increase the disk available to the node, or if not possible delete non-essential data such as logs and cached packages. When the node is successfully started, issue delete operations or increase the cluster size to reduce the amount of data on the node if necessary.
- If the node cannot start for any other reason, repair the data manually as needed. This procedure will depend on the specific nature of the data corruption.
3. [Start services](admin-procedures.html#vespa-start-stop-restart) on the node.
4. Verify that the node is fully up before doing the next node - metrics/interfaces to be used to evaluate if the next node can be stopped:
- Check if a node is up using [/state/v1/health](../../reference/api/state-v1.html#state-v1-health).
- Check the `vds.idealstate.merge_bucket.pending.average` metric on content nodes. When 0, all buckets are in sync - see [example](../metrics.html).
Copyright © 2026 - [Cookie Preferences](#)
---
# Source: https://docs.vespa.ai/en/content/content-nodes.html.md
# Content nodes, states and metrics

Content cluster processes are _distributor_, _proton_ and _cluster controller_.
The distributor calculates the correct content node using the distribution algorithm and the [cluster state](#cluster-state). With no known cluster state, the client library will send requests to a random node, which replies with the updated cluster state if the node was incorrect. Cluster states are versioned, such that clients hitting outdated distributors do not override updated states with old states.
The [distributor](#distributor) keeps track of which content nodes that stores replicas of each bucket (maximum one replica each), based on [redundancy](../reference/applications/services/content.html#redundancy) and information from the _cluster controller_. A bucket maps to one distributor only. A distributor keeps a bucket database with bucket metadata. The metadata holds which content nodes store replicas of the buckets, the checksum of the bucket content and the number of documents and meta entries within the bucket. Each document is algorithmically mapped to a bucket and forwarded to the correct content nodes. The distributors detect whether there are enough bucket replicas on the content nodes and add/remove as needed. Write operations wait for replies from every replica and fail if less than redundancy are persisted within timeout.
The [cluster controller](#cluster-controller) manages the state of the distributor and content nodes. This _cluster state_ is used by the document processing chains to know which distributor to send documents to, as well as by the distributor to know which content nodes should have which bucket.
## Cluster state
There are three kinds of state: [unit state](../reference/api/cluster-v2.html#state-unit), [user state](../reference/api/cluster-v2.html#state-user) and [generated state](../reference/api/cluster-v2.html#state-generated) (a.k.a. _cluster state_).
For new cluster states, the cluster state version is incremented, and the new cluster state is broadcast to all nodes. There is a minimum time between each cluster state change.
It is possible to set a minimum capacity for the cluster state to be `up`. If a cluster has so many nodes unavailable that it is considered down, the state of each node is irrelevant, and thus new cluster states will not be created and broadcast before enough nodes are back for the cluster to come back up. A cluster state indicating the entire cluster is down, may thus have outdated data on the node level.
## Cluster controller
The main task of the cluster controller is to maintain the [cluster state](#cluster-state). This is done by _polling_ nodes for state, _generating_ a cluster state, which is then _broadcast_ to all the content nodes in the cluster. Note that clients do not interface with the cluster controller - they get the cluster state from the distributors - [details](#distributor).
| Task | Description |
| --- | --- |
| Node state polling |
The cluster controller polls nodes, sending the current cluster state. If the cluster state is no longer correct, the node returns correct information immediately. If the state is correct, the request lingers on the node, such that the node can reply to it immediately if its state changes. After a while, the cluster controller will send a new state request to the node, even with one pending. This triggers a reply to the lingering request and makes the new one linger instead. Hence, nodes have a pending state request.
During a controlled node shutdown, it starts the shutdown process by responding to the pending state request that it is now stopping.
**Note:** As controlled restarts or shutdowns are implemented as TERM signals from the [config-sentinel](/en/operations/self-managed/config-sentinel.html), the cluster controller is not able to differ between controlled and other shutdowns.
|
| Cluster state generation |
The cluster controller translates unit and user states into the generated _cluster state_
|
| Cluster state broadcast |
When node unit states are received, a cluster controller internal cluster state is updated. New cluster states are distributed with a minimum interval between. A grace period per unit state too - e.g., distributors and content nodes that are on the same node often stop at the same time.
The version number is incremented, and the new cluster state is broadcast.
If cluster state version is [reset](../operations/self-managed/admin-procedures.html#cluster-state), distributors and content node processes may have to be restarted in order for the system to converge to the new state. Nodes will reject lower cluster state versions to prevent race conditions caused by overlapping cluster controller leadership periods.
|
See [cluster controller configuration](../operations/self-managed/admin-procedures.html#cluster-controller-configuration).
### Master election
Vespa can be configured with one cluster controller. Reads and writes will work well in case of cluster controller down, but other changes to the cluster (like a content node going down) will not be handled. It is hence recommended to configure a set of cluster controllers.
The cluster controller nodes elect a master, which does the node polling and cluster state broadcast. The other cluster controller nodes only exist to do master election and potentially take over if the master dies.
All cluster controllers will vote for the cluster controller with the lowest index that says it is ready. If a cluster controller has more than half of the votes, it will be elected master. As a majority vote is required, the number of cluster controllers should be an odd number of 3 or greater. A fresh master will not broadcast states before a transition time is passed, allowing an old master to have some time to realize it is no longer the master.
## Distributor
Buckets are mapped to distributors using the [ideal state algorithm](idealstate.html). As the cluster state changes, buckets are re-mapped immediately. The mapping does not overlap - a bucket is owned by one distributor.
Distributors do not persist the bucket database, the bucket-to-content-node mapping is kept in memory in the distributor. Document count, persisted size and a metadata checksum per bucket is stored as well. At distributor (re)start, content nodes are polled for bucket information, and return which buckets are owned by this distributor (using the ideal state algorithm). There is no centralized bucket directory node. Likewise, at any distributor cluster state change, content nodes are polled for bucket handover - a distributor will then handle a new set of buckets.
Document operations are mapped to content nodes based on bucket locations - each put/update/get/remove is mapped to a [bucket](buckets.html)and sent to the right content nodes. To manage the document set as it grows and nodes change, buckets move between content nodes.
Document API clients (i.e. container nodes with[\](../reference/applications/services/container.html#document-api)) do not communicate directly with the cluster controller, and do not know the cluster state at startup. Clients therefore start out by sending requests to a random distributor. If the document operation hits the wrong distributor,`WRONG_DISTRIBUTION` is returned, with the current cluster state in the response.`WRONG_DISTRIBUTION` is hence expected and normal at cold start / state change events.
### Timestamps
[Write operations](../writing/reads-and-writes.html)have a _last modified time_ timestamp assigned when passing through the distributor. The timestamp is guaranteed to be unique within the[bucket](buckets.html) where it is stored. The timestamp is used by the content layer to decide which operation is newest. These timestamps can be used when [visiting](../writing/visiting.html), to process/retrieve documents within a given time range. To guarantee unique timestamps, they are in microseconds - the microsecond part is generated to avoid conflicts with other documents.
If documents are migrated _between_ clusters, the target cluster will have new timestamps for their entries. Also, when [reprocessing documents](../applications/document-processors.html) _within_ a cluster, documents will have new timestamps, even if not modified.
### Ordering
The Document API uses the [document ID](../schemas/documents.html#document-ids) to order operations. A Document API client ensures that only one operation is pending at the same time. This ensures that if a client sends multiple operations for the same document, they will be processed in a defined order. This is done by queueing pending operations _locally_ at the client.
**Note:** If sending two write operations to the same document, and the first operation fails, the enqueued operation is sent. In other words, the client does not assume there exists any kind of dependency between separate operations to the same document. If you need to enforce this, use[test-and-set conditions](../writing/document-v1-api-guide.html#conditional-writes)for writes.
If _different_ clients have pending operations on the same document, the order is unspecified.
### Maintenance operations
Distributors track which content nodes have which buckets in their bucket database. Distributors then use the [ideal state algorithm](idealstate.html)to generate bucket _maintenance operations_. A stable system has all buckets located per the ideal state:
- If buckets have too few replicas, new are generated on other content nodes.
- If the replicas differ, a bucket merge is issued to get replicas consistent.
- If a buckets has too many replicas, superfluous are deleted. Buckets are merged, if inconsistent, before deletion.
- If two buckets exist, such that both may contain the same document, the buckets are split or joined to remove such overlapping buckets. Read more on [inconsistent buckets](buckets.html).
- If buckets are too small/large, they will be joined or split.
The maintenance operations have different priorities. If no maintenance operations are needed, the cluster is said to be in the _ideal state_. The distributors synchronize maintenance load with user load, e.g. to remap requests to other buckets after bucket splitting and joining.
### Restart
When a distributor stops, it will try to respond to any pending cluster state request first. New incoming requests after shutdown is commenced will fail immediately, as the socket is no longer accepting requests. Cluster controllers will thus detect processes stopping almost immediately.
The cluster state will be updated with the new state internally in the cluster controller. Then the cluster controller will wait for maximum [min\_time\_between\_new\_systemstates](https://github.com/vespa-engine/vespa/blob/master/configdefinitions/src/vespa/fleetcontroller.def) before publishing the new cluster state - this to reduce short-term state fluctuations.
The cluster controller has the option of setting states to make other distributors take over ownership of buckets, or mask the change, making the buckets owned by the distributor restarting unavailable for the time being.
If the distributor transitions from `up` to `down`, other distributors will request metadata from the content nodes to take over ownership of buckets previously owned by the restarting distributor. Until the distributors have gathered this new metadata from all the content nodes, requests for these buckets can not be served, and will fail back to client. When the restarting node comes back up and is marked `up` in the cluster state again, the additional nodes will discard knowledge of the extra buckets they previously acquired.
For requests with timeouts of several seconds, the transition should be invisible due to automatic client resending. Requests with a lower timeout might fail, and it is up to the application whether to resend or handle failed requests.
Requests to buckets not owned by the restarting distributor will not be affected.
## Content node
The content node runs _proton_, which is the query backend.
### Restart
When a content node does a controlled restart, it marks itself in the `stopping` state and rejects new requests. It will process its pending request queue before shutting down. Consequently, client requests are typically unaffected by content node restarts. The currently pending requests will typically be completed. New copies of buckets will be created on other nodes, to store new requests in appropriate redundancy. This happens whether node transitions through `down` or `maintenance` state. The difference being that if transitioning through `maintenance`, the distributor will not start any effort of synchronizing new copies with existing copies. They will just store the new requests until the maintenance node comes back up.
When starting, content nodes will start with gathering information on what buckets it has data stored for. While this is happening, the service layer will expose that it is `down`.
## Metrics
| Metric | Description |
| --- | --- |
| .idealstate.idealstate\_diff | This metric tries to create a single value indicating distance to the ideal state. A value of zero indicates that the cluster is in the ideal state. Graphed values of this metric gives a good indication for how fast the cluster gets back to the ideal state after changes. Note that some issues may hide other issues, so sometimes the graph may appear to stand still or even go a bit up again, as resolving one issue may have detected one or several others. |
| .idealstate.buckets\_toofewcopies | Specifically lists how many buckets have too few copies. Compare to the _buckets_ metric to see how big a portion of the cluster this is. |
| .idealstate.buckets\_toomanycopies | Specifically lists how many buckets have too many copies. Compare to the _buckets_ metric to see how big a portion of the cluster this is. |
| .idealstate.buckets | The total number of buckets managed. Used by other metrics reporting bucket counts to know how big a part of the cluster they relate to. |
| .idealstate.buckets\_notrusted | Lists how many buckets have no trusted copies. Without trusted buckets operations against the bucket may have poor performance, having to send requests to many copies to try and create consistent replies. |
| .idealstate.delete\_bucket.pending | Lists how many buckets that needs to be deleted. |
| .idealstate.merge\_bucket.pending | Lists how many buckets there are, where we suspect not all copies store identical document sets. |
| .idealstate.split\_bucket.pending | Lists how many buckets are currently being split. |
| .idealstate.join\_bucket.pending | Lists how many buckets are currently being joined. |
| .idealstate.set\_bucket\_state.pending | Lists how many buckets are currently altered for active state. These are high priority requests which should finish fast, so these requests should seldom be seen as pending. |
Example, using the [quickstart](../basics/deploy-an-application-local.html) - find the distributor port (look for HTTP):
```
$ docker exec vespa vespa-model-inspect service distributor
distributor @ vespa-container : content
music/distributor/0
tcp/vespa-container:19112 (MESSAGING)
tcp/vespa-container:19113 (STATUS RPC)
tcp/vespa-container:19114 (STATE STATUS HTTP)
```
Get the metric value:
```
$ docker exec vespa curl -s http://localhost:19114/state/v1/metrics | jq . | \
grep -A 10 idealstate.merge_bucket.pending
"name": "vds.idealstate.merge_bucket.pending",
"description": "The number of operations pending",
"values": {
"average": 0,
"sum": 0,
"count": 1,
"rate": 0.016666,
"min": 0,
"max": 0,
"last": 0
},
```
## /cluster/v2 API examples
Examples of state manipulation using the [/cluster/v2 API](../reference/api/cluster-v2.html).
List content clusters:
```
$ curl http://localhost:19050/cluster/v2/
```
```
```
{
"cluster": {
"music": {
"link": "/cluster/v2/music"
},
"books": {
"link": "/cluster/v2/books"
}
}
}
```
```
Get cluster state and list service types within cluster:
```
$ curl http://localhost:19050/cluster/v2/music
```
```
```
{
"state": {
"generated": {
"state": "state-generated",
"reason": "description"
}
}
"service": {
"distributor": {
"link": "/cluster/v2/music/distributor"
},
"storage": {
"link": "/cluster/v2/music/storage"
}
}
}
```
```
List nodes per service type for cluster:
```
$ curl http://localhost:19050/cluster/v2/music/storage
```
```
```
{
"node": {
"0": {
"link": "/cluster/v2/music/storage/0"
},
"1": {
"link": "/cluster/v2/music/storage/1"
}
}
}
```
```
Get node state:
```
$ curl http://localhost:19050/cluster/v2/music/storage/0
```
```
```
{
"attributes": {
"hierarchical-group": "group0"
},
"state": {
"generated": {
"state": "up",
"reason": ""
},
"unit": {
"state": "up",
"reason": ""
},
"user": {
"state": "up",
"reason": ""
}
},
"metrics": {
"bucket-count": 0,
"unique-document-count": 0,
"unique-document-total-size": 0
}
}
```
```
Get all nodes, including topology information (see `hierarchical-group`):
```
$ curl http://localhost:19050/cluster/v2/music/?recursive=true
```
```
```
{
"state": {
"generated": {
"state": "up",
"reason": ""
}
},
"service": {
"storage": {
"node": {
"0": {
"attributes": {
"hierarchical-group": "group0"
},
"state": {
"generated": {
"state": "up",
"reason": ""
},
"unit": {
"state": "up",
"reason": ""
},
"user": {
"state": "up",
"reason": ""
}
},
"metrics": {
"bucket-count": 0,
"unique-document-count": 0,
"unique-document-total-size": 0
}
```
```
Set node user state:
```
curl -X PUT -H "Content-Type: application/json" --data '
{
"state": {
"user": {
"state": "retired",
"reason": "This node will be removed soon"
}
}
}' \
http://localhost:19050/cluster/v2/music/storage/0
```
```
```
{
"wasModified": true,
"reason": "ok"
}
```
```
## Further reading
- Refer to [administrative procedures](../operations/self-managed/admin-procedures.html) for configuration and state monitoring / management.
- Try the [Multinode testing and observability](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode) sample app to get familiar with interfaces and behavior.
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [Cluster state](#cluster-state)
- [Cluster controller](#cluster-controller)
- [Master election](#master-election)
- [Distributor](#distributor)
- [Timestamps](#timestamps)
- [Ordering](#ordering)
- [Maintenance operations](#maintenance-operations)
- [Restart](#distributor-restart)
- [Content node](#content-node)
- [Restart](#content-node-restart)
- [Metrics](#metrics)
- [/cluster/v2 API examples](#cluster-v2-API-examples)
- [Further reading](#further-reading)
---
# Source: https://docs.vespa.ai/en/reference/applications/services/content.html.md
# services.xml - 'content'
```
[content](#content)[documents](#documents)[document](#document)[document-processing](#document-processing)[min-redundancy](#min-redundancy)[redundancy](#redundancy)[coverage-policy](#coverage-policy)[nodes](services.html#nodes)[node](#node)[group](#group)[distribution](#distribution)[node](#node)[group](#group)[engine](#engine)[proton](#proton)[searchable-copies](#searchable-copies)[tuning](#tuning-proton)[searchnode](#searchnode)[lidspace](#lidspace)[max-bloat-factor](#lidspace-max-bloat-factor)[requestthreads](#requestthreads)[search](#requestthreads-search)[persearch](#requestthreads-persearch)[summary](#requestthreads-summary)[flushstrategy](#flushstrategy)[native](#flushstrategy-native)[total](#flushstrategy-native-total)[maxmemorygain](#flushstrategy-native-total-maxmemorygain)[diskbloatfactor](#flushstrategy-native-total-diskbloatfactor)[component](#flushstrategy-native-component)[maxmemorygain](#flushstrategy-native-component-maxmemorygain)[diskbloatfactor](#flushstrategy-native-component-diskbloatfactor)[maxage](#flushstrategy-native-component-maxage)[transactionlog](#flushstrategy-native-transactionlog)[maxsize](#flushstrategy-native-transactionlog-maxsize)[conservative](#flushstrategy-native-conservative)[memory-limit-factor](#flushstrategy-native-conservative-memory-limit-factor)[disk-limit-factor](#flushstrategy-native-conservative-disk-limit-factor)[initialize](#initialize)[threads](#initialize-threads)[feeding](#feeding)[concurrency](#feeding-concurrency)[niceness](#feeding-niceness)[index](#index)[io](#index-io)[search](#index-io-search)[warmup](#index-warmup)[time](#index-warmup-time)[unpack](#index-warmup-unpack)[removed-db](#removed-db)[prune](#removed-db-prune)[age](#removed-db-prune-age)[interval](#removed-db-prune-interval)[summary](#summary)[io](#summary-io)[read](#summary-io-read)[store](#summary-store)[cache](#summary-store-cache)[maxsize](#summary-store-cache-maxsize)[maxsize-percent](#summary-store-cache-maxsize-percent)[compression](#summary-store-cache-compression)[type](#summary-store-cache-compression-type)[level](#summary-store-cache-compression-level)[logstore](#summary-store-logstore)[maxfilesize](#summary-store-logstore-maxfilesize)[chunk](#summary-store-logstore-chunk)[maxsize](#summary-store-logstore-chunk-maxsize)[compression](#summary-store-logstore-chunk-compression)[type](#summary-store-logstore-chunk-compression-type)[level](#summary-store-logstore-chunk-compression-level)[sync-transactionlog](#sync-transactionlog)[flush-on-shutdown](#flush-on-shutdown)[resource-limits](#resource-limits-proton)[disk](#disk)[memory](#memory)[search](#search)[query-timeout](#query-timeout)[visibility-delay](#visibility-delay)[coverage](#coverage)[minimum](#minimum)[min-wait-after-coverage-factor](#min-wait-after-coverage-factor)[max-wait-after-coverage-factor](#max-wait-after-coverage-factor)[tuning](#tuning)[bucket-splitting](#bucket-splitting)[min-node-ratio-per-group](#min-node-ratio-per-group)[distribution](#distribution_type)[maintenance](#maintenance)[max-document-size](#max-document-size)[merges](#merges)[persistence-threads](#persistence-threads)[resource-limits](#resource-limits)[visitors](#visitors)[max-concurrent](#max-concurrent)[dispatch](#dispatch-tuning)[max-hits-per-partition](#max-hits-per-partition)[dispatch-policy](#dispatch-policy)[prioritize-availability](#prioritize-availability)[min-active-docs-coverage](#min-active-docs-coverage)[top-k-probability](#top-k-probability)[cluster-controller](#cluster-controller)[init-progress-time](#init-progress-time)[transition-time](#transition-time)[max-premature-crashes](#max-premature-crashes)[stable-state-period](#stable-state-period)[min-distributor-up-ratio](#min-distributor-up-ratio)[min-storage-up-ratio](#min-storage-up-ratio)[groups-allowed-down-ratio](#groups-allowed-down-ratio)
```
## content
The root element of a Content cluster definition. Creates a content cluster. A content cluster stores and/or indexes documents. The xml file may have zero or more such tags.
Contained in [services](services.html).
| Attribute | Required | Value | Default | Description |
| --- | --- | --- | --- | --- |
| version | required | number | | 1.0 in this version of Vespa |
| id | required for multiple clusters | string | |
Name of the content cluster. If none is supplied, the cluster name will be `content`. Cluster names must be unique within the application, if multiple clusters are configured, the name must be set for all but one at minimum.
**Note:** Renaming a cluster is the same as dropping the current cluster and adding a new one. This makes data unavailable or lost, depending on hosting model. Deploying with a changed cluster id will therefore fail with a validation override requirement: `Content cluster 'music' is removed. This will cause loss of all data in this cluster.
To allow this add content-cluster-removal to validation-overrides.xml,
see https://docs.vespa.ai/en/reference/validation-overrides.html`.
|
Subelements:
- [documents](#documents) (required)
- [min-redundancy](#min-redundancy)
- [redundancy](#redundancy)
- [coverage-policy](#coverage-policy)
- [nodes](services.html#nodes)
- [group](#group)
- [engine](#engine)
- [search](#search)
- [tuning](#tuning)
## documents
Contained in [content](#content). Defines which document types should be routed to this content cluster using the default route, and what documents should be kept in the cluster if the garbage collector runs. Read more on [expiring documents](../../../schemas/documents.html#document-expiry). Also have some backend specific configuration for whether documents should be searchable or not.
| Attribute | Required | Value | Default | Description |
| --- | --- | --- | --- | --- |
| selection | optional | string | |
A [document selection](../../writing/document-selector-language.html), restricting documents that are routed to this cluster. Defaults to a selection expression matching everything.
This selection can be specified to match document identifier specifics that are _independent_ of document types. For restrictions that apply only to a _specific_ document type, this must be done within that particular document type's [document](#document) element. Trying to use document type references in this selection makes an error during deployment. The selection given here will be merged with per-document type selections specified within document tags, if any, meaning that any document in the cluster must match _both_ selections to be accepted and kept.
This feature is primarily used to [expire documents](../../../schemas/documents.html#document-expiry).
|
| garbage-collection | optional | true / false | false |
If true, regularly verify the documents stored in the cluster to see if they belong in the cluster, and delete them if not. If false, garbage collection is not run.
|
| garbage-collection-interval | optional | integer | 3600 |
Time (in seconds) between garbage collection cycles. Note that the deletion of documents is spread over this interval, so more resources will be used for deleting a set of documents with a small interval than with a larger interval.
|
Subelements:
- [document](#document) (required)
- [document-processing](#document-processing) (optional)
## document
Contained in [documents](#documents). The document type to be routed to this content cluster.
| Attribute | Required | Value | Default | Description |
| --- | --- | --- | --- | --- |
| type | required | string | |
[Document type name](../../schemas/schemas.html#document)
|
| mode | required | index /
store-only /
streaming | |
The mode of storing and indexing. Refer to [streaming search](../../../performance/streaming-search.html) for _store-only_, as documents are stored the same way for both cases.
Changing mode requires an _indexing-mode-change_[validation override](../validation-overrides.html), and documents must be re-fed.
|
| selection | optional | string | |
A [document selection](../../writing/document-selector-language.html), restricting documents that are routed to this cluster. Defaults to a selection expression matching everything.
This selection must apply to fields in _this document type only_. Selection will be merged together with selection for other types and global selection from [documents](#documents) to form a full expression for what documents belong to this cluster.
|
| global | optional | true / false | false |
Set to _true_ to distribute all documents of this type to all nodes in the content cluster it is defined.
Fields in global documents can be imported into documents to implement joins - read more in [parent/child](../../../schemas/parent-child.html). Vespa will detect when a new (or outdated) node is added to the cluster and prevent it from taking part in searches until it has received all global documents.
Changing from _false_ to _true_ or vice versa requires a _global-document-change_[validation override](../validation-overrides.html). First, [stop services](/en/operations/self-managed/admin-procedures.html#vespa-start-stop-restart) on all content nodes. Then, deploy with the validation override. Finally, [start services](/en/operations/self-managed/admin-procedures.html#vespa-start-stop-restart) on all content nodes.
Note: _global_ is only supported for _mode="index"_.
|
## document-processing
Contained in [documents](#documents). Vespa Search specific configuration for which document processing cluster and chain to run index preprocessing.
| Attribute | Required | Value | Default | Description |
| --- | --- | --- | --- | --- |
| cluster | optional | string | Container cluster on content node |
Name of a [document-processing](docproc.html) container cluster that does index preprocessing. Use cluster to specify an alternative cluster, other than the default cluster on content nodes.
|
| chain | optional | string | `indexing` chain |
A document processing chain in the container cluster specified by _cluster_ to use for index preprocessing. The chain must inherit the `indexing` chain.
|
Example - the container cluster enables [document-processing](docproc.html), referred to by the content cluster:
```
```
```
```
To add document processors either before or after the indexer, declare a chain (inherit _indexing_) in a _document-processing_ container cluster and add document processors. Annotate document processors with `before=indexingStart` or `after=indexingEnd`. Configure this cluster and chain as the indexing chain in the content cluster - example:
```
```
indexingStart
indexingEnd
```
```
**Important:** Note the [document-api](container.html#document-api) configuration. Set up this API on the same nodes as `document-processing` - find details in [indexing](../../../writing/indexing.html).
## min-redundancy
Contained in [content](#content). The minimum total data copies the cluster will maintain. This can be set instead of (or in addition to) redundancy to ensure that a minimum number of copies are always maintained regardless of other configuration.
`min-redundancy` can be changed without node restart - replicas will be added or removed automatically.
### min-redundancy and groups
A group will always have minimum one copy of each document in the cluster. This is also the most commonly used configuration; Increase replica level with more groups to improve query capacity.
- Example 1: If _min-redundancy_ is 2 and there is 1 content group, there will be 2 data copies in the group (2 copies for the cluster). If the number of groups is changed to 2 there will be 1 data copy in each group (still 2 copies for the cluster).
- Example 2: A cluster is configured to [autoscale](../../../operations/autoscaling.html) using `groups="[2,3]"`. Here, configure min-redundancy to 2, as each group will have 1 replica irrespective of number of groups, here 2 or 3 - see [replicas](../../../content/elasticity.html#replicas). Setting the lower bound ensures correct replica level for 2 groups.
For self-managed Vespa: Read more about the actual number of replicas when using [groups](#group) in [topology change](/en/content/elasticity.html#changing-topology).
## redundancy
Contained in [content](#content).
**Note:** Use [min-redundancy](#min-redundancy) instead of `redundancy`.
Vespa Cloud: The number of data copies _per group_.
Self-managed: The total data copies the cluster will maintain to avoid data loss.
Example: with a redundancy of 2, the system tolerates 1 node failure before data becomes unavailable (until the system has managed to create new replicas on other online nodes).
Redundancy can be changed without node restart - replicas will be added or removed automatically.
## coverage-policy
Contained in [content](#content).
Specifies the coverage policy for the content cluster. Valid values are `group` or `node`. The default value is `group`.
If the policy is `group` coverage is maintained per group, meaning that when doing maintenance, upgrades etc. one group is allowed to be down at a time. If there is only one group in the cluster, coverage will be the same as policy `node`.
If the policy is `node` coverage is maintained on a node level, meaning that when doing maintenance, upgrades etc. coverage will be maintained on a node level, so in practice 1 node in the whole cluster is allowed to be down at a time.
When having several groups the common reason for changing policy away from the default `group` policy is when the load added to the remaining groups will increase too much when a whole group is allowed to go down. In that case it will be better to use the `node` policy, as taking one node at a time will give just a minor increase in load.
## node
Contained in [nodes](services.html#nodes) or [group](#group). Configures a content node to the cluster, see [node](services.html#node)in the general services.xml documentation.
Additional node attributes for content nodes:
| Attribute | Required | Value | Default | Description |
| --- | --- | --- | --- | --- |
| distribution-key | required | integer | |
The unique data distribution id of this node. This **must** remain unchanged for the host's lifetime. Distribution keys of a fresh system should be contiguous and start from zero.
Distribution keys are used to identify nodes and groups for the [distribution algorithm](../../../content/idealstate.html). If a node changes distribution key, the distribution algorithm regards it as a new node, so buckets are redistributed.
|
| capacity | optional | double | 1 |
**Deprecated:** Capacity of this node, relative to other nodes. A node with capacity 2 will get double the data and feed requests of a node with capacity 1. This feature is deprecated and expert mode only. Don't use in production, Vespa assumes homogenous cluster capacity.
|
| baseport | optional | integer | |
baseport The first port in the port range allocated by this node.
|
## group
Contained in [content](#content) or[group](#group) - groups can be nested. Defines the [hierarchical structure](../../../content/elasticity.html#grouped-distribution) of the cluster. Can not be used in conjunction with the [nodes](services.html#nodes) element. Groups can contain other groups or nodes, but not both. There can only be a single level of leaf groups under the top group.
| Attribute | Required | Value | Default | Description |
| --- | --- | --- | --- | --- |
| distribution-key | required | integer | |
Sets the distribution key of a group. It is not allowed to change this for a given group. Group distribution keys only need to be unique among groups that share the same parent group.
|
| name | required | string | |
The name of the group, used for access from status pages and the like.
|
**Important:** There is no deployment-time verification that the distribution key remains unchanged for any given node or group. Consequently, take great care when modifying the set of nodes in a content cluster. Assigning a new distribution key to an existing node is undefined behavior; Best case, the existing data will be temporarily unavailable until the error has been corrected. Worst case, risk crashes or data loss.
See [Vespa Serving Scaling Guide](../../../performance/sizing-search.html)for when to consider using grouped distribution and [Examples](../../../performance/sizing-examples.html) for example deployments using flat and grouped distribution.
## distribution (in group)
Contained in [group](#group). Defines the data distribution to subgroups of this group._distribution_ should not be in the lowest level group containing storage nodes, as here the ideal state algorithm is used directly. In higher level groups, _distribution_ is mandatory.
| Attribute | Required | Value | Default | Description |
| --- | --- | --- | --- | --- |
| partitions | required if there are subgroups in the group | string | | String conforming to the partition specification:
| Partition specification | Description |
| --- | --- |
| \* | Distribute all copies over 1 of N groups |
| 1|\* | Distribute all copies over 2 of N groups |
| 1|1|\* | Distribute all copies over 3 of N groups |
|
The partition specification is used to evenly distribute content copies across groups. Set a number or `*` per group separated by pipes (e.g. `1|*` for two groups). See [sample deployment configurations](../../../performance/sizing-examples.html).
## engine
Contained in [content](#content). Specify the content engine to use, and/or adjust tuning parameters for the engine. Allowed engines are `proton` and `dummy`, the latter being used for debugging purposes. If no engine is given, proton is used. Sub-element: [proton](#proton).
## proton
Contained in [engine](#engine). If specified, the content cluster will use the Proton content engine. This engine supports storage, indexed search and secondary indices. Optional sub-elements are [searchable-copies](#searchable-copies),[tuning](#tuning-proton),[sync-transactionlog](#sync-transactionlog),[flush-on-shutdown](#flush-on-shutdown), and[resource-limits (in proton)](#resource-limits-proton).
## searchable-copies
Contained in [proton](#proton). Default value is 2, or [redundancy](#redundancy), if lower. If set to less than redundancy, only some of the stored copies are ready for searching at any time. This means that node failures causes temporary data unavailability while the alternate copies are being indexed for search. The benefit is using less memory, trading off availability during transitions. Refer to [bucket move](../../../content/proton.html#bucket-move).
If updating documents or using [document selection](#documents) for garbage collection, consider setting [fast-access](../../schemas/schemas.html#attribute)on the subset of attribute fields used for this to make sure that these attributes are always kept in memory for fast access. Note that this is only useful if `searchable-copies` is less than `redundancy`. Read more in [proton](../../../content/proton.html).
`searchable-copies` can be changed without node restart. Note that when reducing `searchable-copies` resource usage will not be reduced until content nodes are restarted.
## tuning
Contained in [proton](#proton), optional. Tune settings for the search nodes in a content cluster - sub-element:
| Element | Required | Quantity |
| --- | --- | --- |
| [searchnode](#searchnode) | No | Zero or one |
## searchnode
Contained in [tuning](#tuning-proton), optional. Tune settings for search nodes in a content cluster - sub-elements:
| Element | Required | Quantity |
| --- | --- | --- |
| [lidspace](#lidspace) | No | Zero or one |
|
| |
| [requestthreads](#requestthreads) | No | Zero or one |
| [flushstrategy](#flushstrategy) | No | Zero or one |
| [initialize](#initialize) | No | Zero or one |
| [feeding](#feeding) | No | Zero or one |
| [index](#index) | No | Zero or one |
| [summary](#summary) | No | Zero or one |
```
```
```
```
## requestthreads
Contained in [searchnode](#searchnode), optional. Tune the number of request threads used on a content node, see [thread-configuration](../../../performance/sizing-search.html#thread-configuration) for details. Sub-elements:
| Element | Required | Default | Description |
| --- | --- | --- | --- |
| search | Optional | 64 |
Number of search threads.
|
| persearch | Optional | 1 |
Number of search threads. Number of search threads used per search, see the [Vespa serving scaling guide](../../../performance/sizing-search.html) for an introduction of using multiple threads per search per node to reduce query latency. Number of threads per search can be adjusted down per _rank-profile_ using [num-threads-per-search](../../schemas/schemas.html#num-threads-per-search).
|
| summary | Optional | 16 |
Number of summary threads.
|
```
```
64
1
16
```
```
## flushstrategy
Contained in [searchnode](#searchnode), optional. Tune the _native_-strategy for flushing components to disk - a smaller number means more frequent flush:
- _Memory gain_ is how much memory can be freed by flushing a component
- _Disk gain_ is how much disk space can be freed by flushing a component (typically by using compaction)
Refer to [Proton maintenance jobs](../../../content/proton.html#proton-maintenance-jobs). Optional sub-elements:
- `native`:
- `total`
- `maxmemorygain`: The total maximum memory gain (in bytes) for _all_ components before running flush, default 4294967296 (4 GB)
- `diskbloatfactor`: Trigger flush if the total disk gain (in bytes) for _all_ components is larger than the factor times current total disk usage, default 0.25
- `component`
- `maxmemorygain`: The maximum memory gain (in bytes) by _a single_ component before running flush, default 1073741824 (1 GB)
- `diskbloatfactor`: Trigger flush if the disk gain (in bytes) by _a single_ component is larger than the given factor times the current disk usage by that component, default 0.25
- `maxage`: The maximum age (in seconds) of unflushed content for a single component before running flush, default 111600 (31h)
- `transactionlog`
- `maxsize`: The total maximum size (in bytes) of [transaction logs](../../../content/proton.html#transaction-log) for all document types before running flush, default 21474836480 (20 GB)
- `conservative`
- `memory-limit-factor`: When [resource-limits (in proton)](#resource-limits-proton) for memory is reached, flush more often by downscaling `total.maxmemorygain` and `component.maxmemorygain`, default 0.5
- `disk-limit-factor`: When [resource-limits (in proton)](#resource-limits-proton) for disk is reached, flush more often by downscaling `transactionlog.maxsize`, default 0.5
```
```
4294967296
0.2
1073741824
0.2
111600
21474836480
0.5
0.5
```
```
## initialize
Contained in [searchnode](#searchnode), optional. Tune settings related to how the search node (proton) is initialized. Optional sub-elements:
- `threads`: The number of initializer threads used for loading structures from disk at proton startup. The threads are shared between document databases when the value is larger than 0. Default value is the number of document databases + 1.
- When set to larger than 1, document databases are initialized in parallel
- When set to 1, document databases are initialized in sequence
- When set to 0, 1 separate thread is used per document database, and they are initialized in parallel.
```
```
2
```
```
## lidspace
Contained in [searchnode](#searchnode), optional. Tune settings related to how lidspace is managed. Optional sub-elements:
- `max-bloat-factor`: Maximum bloat allowed before lidspace compaction is started. Compaction is moving a document from a high lid to a lower lid. Cost is similar to feeding a document and removing it. Also see description in [lidspace compaction maintenance job](../../../content/proton.html#lid-space-compaction). Default value is 0.01 or 1% of total lidspace. Will be increased to target of 0.50 or 50%.
```
```
0.5
```
```
## feeding
Contained in [searchnode](#searchnode), optional. Tune [proton](../../../content/proton.html) settings for feed operations. Optional sub-elements:
- `concurrency`: A number between 0.0 and 1.0 that specifies the concurrency when handling feed operations, default 0.5. When set to 1.0, all cores on the cpu can be used for feeding. Changing this value requires restart of node to take effect.
- `niceness`: A number between 0.0 and 1.0 that specifies the niceness of the feeding threads, default 0.0 =\> not any nicer than anyone else. Increasing this number will reduce priority of feeding compared to search. The real world effect is hard to predict as the magic exists in the OS level scheduler. Changing this value requires restart of node to take effect.
```
```
0.8
0.5
```
```
## index
Contained in [searchnode](#searchnode), optional. Tune various aspect with the handling of disk and memory indexes. Optional sub-elements:
- `io`
- `search`: Controls io read options used during search, values={mmap,populate}, default `mmap`. Using `populate` will eagerly touch all pages when index is loaded (after re-start or after index fusion is complete).
- `warmup`
- `time`: Specifies in seconds how long the index shall be warmed up before being switched in for serving. During warmup, it will receive queries and posting lists will be iterated, but results ignored as they are duplicates of the live index. This will pull in the most important ones in the cache. However, as warming up an index will occupy more memory, do not turn it on unless you suspect you need it. And always benchmark to see if it is worth it.
- `unpack`: Controls whether all posting features are pulled in to the cache, or only the most important. values={true, false}, default false.
```
```
mmap
true
```
```
## removed-db
Contained in [searchnode](#searchnode), optional. Tune various aspect of the db of removed documents. Optional sub-elements:
- `prune`
- `age`: Specifies how long (in seconds) we must remember removed documents before we can prune them away. Default is 2 weeks. This sets the upper limit on how long a node can be down and still be accepted back in the system, without having the index wiped. There is no point in having this any higher than the age of the documents. If corpus is re-fed every day, there is no point in having this longer than 24 hours.
- `interval`: Specifies how often (in seconds) to prune old documents. Default is 3.36 hours (prune age / 100). No need to change default. Exposed here for reference and for testing.
```
```
86400
```
```
## summary
Contained in [searchnode](#searchnode), optional. Tune various aspect with the handling of document summary. Optional sub-elements:
- `io`
- `read`: Controls io read options used during reading of stored documents. Values are `directio` `mmap` `populate`. Default is `mmap`. `populate` will do an eager mmap and touch all pages.
- `store`
- `cache`: Used to tune the cache used by the document store. Enabled by default, using up to 5% of available memory.
- `maxsize`: The maximum size of the cache in bytes. If set, it takes precedence over [maxsize-percent](#summary-store-cache-maxsize-percent). Default is unset.
- `maxsize-percent`: The maximum size of the cache in percent of available memory. Default is 5%.
- `compression`
- `type`: The compression type of the documents while in the cache. Possible values are , `none` `lz4` `zstd`. Default is `lz4`
- `level`: The compression level of the documents while in cache. Default is 6
- `logstore`: Used to tune the actual document store implementation (log-based).
- `maxfilesize`: The maximum size (in bytes) per summary file on disk. Default value is 1GB. [document-store-compaction](../../../content/proton.html#document-store-compaction)
- `chunk`
- `maxsize`: Maximum size (in bytes) of a chunk. Default value is 64KB.
- `compression`
- `type`: Compression type for the documents, `none` `lz4` `zstd`. Default is `zstd`.
- `level`: Compression level for the documents. Default is 3.
```
```
directio
5
none
16384
zstd
3
```
```
## flush-on-shutdown
Contained in [proton](#proton). Default value is true. If set to true, search nodes will flush a set of components (e.g. memory index, attributes) to disk before shutting down such that the time it takes to flush these components plus the time it takes to replay the [transaction log](../../../content/proton.html#transaction-log)after restart is as low as possible. The time it takes to replay the transaction log depends on the amount of data to replay, so by flushing, some components before restart the transaction log will be pruned, and we reduce the replay time significantly. Refer to [Proton maintenance jobs](../../../content/proton.html#proton-maintenance-jobs).
## sync-transactionlog
Contained in [proton](#proton). Default value is true. If true, the transactionlog is synced to disk after every write. This enables the transactionlog to survive power failures and kernel panic. The sync cost is amortized over multiple feed operations. The faster you feed the more operations it is amortized over. So with a local disk this is not known to be a performance issue. However, if using NAS (Network Attached Storage) like EBS on AWS one can see significant feed performance impact. For one particular case, turning off sync-transactionlog for EBS gave a 60x improvement.
With sync-transactionlog turned off, the risk of losing data depends on the kernel's [sysctl settings.](https://www.kernel.org/doc/html/latest/admin-guide/sysctl/vm.html#dirty-background-bytes) For example, this is a common default:
```
# sysctl -a
...
vm.dirty_expire_centisecs = 3000
vm.dirty_ratio = 20
vm.dirty_writeback_centisecs = 500
...
```
With this configuration, the worse case scenario is to lose 35 seconds worth of transactionlog, but no more than 1/20 of the free memory. Because kernel flusher threads wake up every 5s (dirty\_writeback\_centisecs) and write data older than 30s (dirty\_expire\_centisecs) from memory to disk. But if un-synced data exceeds 1/20 of the free memory, the Vespa process will sync it (dirty\_ratio).
The above also assumes that all copies of the data are lost at the same time **and** that kernels on all these nodes flush at the same time: realistic scenario only with one copy.
Adjust these [sysctl settings](https://www.kernel.org/doc/html/latest/admin-guide/sysctl/vm.html#dirty-background-bytes) to manage the trade-off between data loss and performance. You'll see more in those kernel docs: for example, thresholds can be expressed in bytes.
## resource-limits (in proton)
Contained in [proton](#proton). Specifies resource limits used by proton to reject both external and internal write operations (on this content node) when a limit is reached.
**Warning:** These proton limits should almost never be changed directly. Instead, change [resource-limits](#resource-limits)that controls when external write operations are blocked in the entire content cluster. Be aware of the risks of tuning resource limits as seen in the link.
The local proton limits are derived from the cluster limits if not specified, using this formula:
$${L\_{proton}} = {L\_{cluster}} + \frac{1-L\_{cluster}}{2}$$
| Element | Required | Value | Default | Description |
| --- | --- | --- | --- | --- |
| disk | optional | float
[0, 1] | 0.875 |
Fraction of total space on the disk partition used before put and update operations are rejected
|
| memory | optional | float
[0, 1] | 0.9 |
Fraction of physical memory that can be resident memory in anonymous mapping by proton before put and update operations are rejected
|
Example:
```
```
0.83
0.82
```
```
## search
Contained in [content](#content), optional. Declares search configuration for this content cluster. Optional sub-elements are[query-timeout](#query-timeout),[visibility-delay](#visibility-delay) and[coverage](#coverage).
## query-timeout
Contained in [search](#search). Specifies the query timeout in seconds for queries against the search interface on the content nodes. The default is 0.5 (500ms), the max is 600.0. For query timeout also see the request parameter [timeout](../../api/query.html#timeout).
**Note:** One can not override this value using the[timeout](../../api/query.html#timeout) request parameter.
## visibility-delay
Contained in [search](#search). Default 0, max 1, seconds.
This setting controls the TTL caching for [parent-child](../../../schemas/parent-child.html) imported fields. See [feature tuning](../../../performance/feature-tuning.html#parent-child-and-search-performance).
## coverage
Contained in [search](#search). Declares search coverage configuration for this content cluster. Optional sub-elements are[minimum](#minimum),[min-wait-after-coverage-factor](#min-wait-after-coverage-factor) and[max-wait-after-coverage-factor](#max-wait-after-coverage-factor). Search coverage configuration controls how many nodes the query dispatcher process should wait for, trading search coverage versus search performance.
## minimum
Contained in [coverage](#coverage). Declares the minimum search coverage required before returning the results of a query. This number is in the range `[0, 1]`, with 0 being no coverage and 1 being full coverage.
The default is 1; unless configured otherwise a query will not return until all search nodes have responded within the specified timeout.
## min-wait-after-coverage-factor
Contained in [coverage](#coverage). Declares the minimum time for a query to wait for full coverage once the declared[minimum](#minimum) has been reached. This number is a factor that is multiplied with the time remaining at the time of reaching minimum coverage.
The default is 0; unless configured otherwise a query will return as soon as the minimum coverage has been reached, and the remaining search nodes appear to be lagging.
## max-wait-after-coverage-factor
Contained in [coverage](#coverage). Declares the maximum time for a query to wait for full coverage once the declared[minimum](#minimum) has been reached. This number is a factor that is multiplied with the time remaining at the time of reaching minimum coverage.
The default is 1; unless configured otherwise a query is allowed to wait its full timeout for full coverage even after reaching the minimum.
## tuning
Contained in [content](#content), optional. Optional tuning parameters are:[bucket-splitting](#bucket-splitting),[min-node-ratio-per-group](#min-node-ratio-per-group),[cluster-controller](#cluster-controller),[dispatch](#dispatch-tuning),[distribution](#distribution_type),[maintenance](#maintenance),[max-document-size](#max-document-size),[merges](#merges),[persistence-threads](#persistence-threads) and[visitors](#visitors).
## bucket-splitting
Contained in [tuning](#tuning). The [bucket](../../../content/buckets.html) is the fundamental unit of distribution and management in a content cluster. Buckets are auto-split, no need to configure for most applications.
| Attribute | Required | Value | Default | Description |
| --- | --- | --- | --- | --- |
| max-documents | optional | integer | 1024 |
Maximum number of documents per content bucket. Buckets are split in two if they have more documents than this. Keep this value below 16K.
|
| max-size | optional | integer | 32MiB |
Maximum size (in bytes) of a bucket. This is the sum of the serialized size of all documents kept in the bucket. Buckets are split in two if they have a larger size than this. Keep this value below 100 MiB.
|
| minimum-bits | optional | integer | |
Override the ideal distribution bit count configured for this cluster. Prefer to use the [distribution type](#distribution_type) setting instead if the default distribution bit count does not fit the cluster. This variable is intended for testing and to work around possible distribution bit issues. Most users should not need this option.
|
## min-node-ratio-per-group
**Important:** This is configuration for the cluster controller. Most users are normally looking for[min-active-docs-coverage](#min-active-docs-coverage)which controls how many nodes can be down before query load is routed to other groups.
Contained in [tuning](#tuning). States a lower bound requirement on the ratio of nodes within _individual_ [groups](#group)that must be online and able to accept traffic before the entire group is automatically taken out of service. Groups are automatically brought back into service when the availability of its nodes has been restored to a level equal to or above this limit.
Elastic content clusters are often configured to use multiple groups for the sake of horizontal traffic scaling and/or data availability. The content distribution system will try to ensure a configured number of replicas is always present within a group in order to maintain data redundancy. If the number of available nodes in a group drops too far, it is possible for the remaining nodes in the group to not have sufficient capacity to take over storage and serving for the replicas they now must assume responsibility for. Such situations are likely to result in increased latencies and/or feed rejections caused by resource exhaustion. Setting this tuning parameter allows the system to instead automatically take down the remaining nodes in the group, allowing feed and query traffic to fail completely over to the remaining groups.
Valid parameter is a decimal value in the range [0, 1]. Default is 0, which means that the automatic group out-of-service functionality will _not_ automatically take effect.
Example: assume a cluster has been configured with _n_ groups of 4 nodes each and the following tuning config:
```
```
0.75
```
```
This tuning allows for 1 node in a group to be down. If 2 or more nodes go down, all nodes in the group will be marked as down, letting the _n-1_ remaining groups handle all the traffic.
This configuration can be changed live as the system is running and altered limits will take effect immediately.
## distribution (in tuning)
Contained in [tuning](#tuning). Tune the distribution algorithm used in the cluster.
| Attribute | Required | Value | Default | Description |
| --- | --- | --- | --- | --- |
| type | optional | loose | strict | legacy | loose |
When the number of a nodes configured in a system changes over certain limits, the system will automatically trigger major redistributions of documents. This is to ensure that the number of buckets is appropriate for the number of nodes in the cluster. This enum value specifies how aggressive the system should be in triggering such distribution changes.
The default of `loose` strikes a balance between rarely altering the distribution of the cluster and keeping the skew in document distribution low. It is recommended that you use the default mode unless you have empirically observed that it causes too much skew in load or document distribution.
Note that specifying `minimum-bits` under [bucket-splitting](#bucket-splitting) overrides this setting and effectively "locks" the distribution in place.
|
## maintenance
Contained in [tuning](#tuning). Controls the running time of the bucket maintenance process. Bucket maintenance verifies bucket content for corruption. Most users should not need to tweak this.
| Attribute | Required | Value | Default | Description |
| --- | --- | --- | --- | --- |
| start | required | HH.MM | | Start of daily maintenance window, e.g. 02:00 |
| stop | required | HH.MM | | End of daily maintenance window, e.g. 05:00 |
| high | required | HH.MM | | Day of week for starting full file verification cycle, e.g. monday. The full cycle is more costly than partial file verification |
## max-document-size
Contained in [tuning](#tuning). Specifies max document size in the content cluster, measured as the uncompressed size of a document operation arriving over the wire by the distributor service. The limit will be used for all document types. A document larger than this limit will be rejected by the distributor. Note that some document operations that don't contain the entire document, like[document updates](../../../writing/document-api-guide.html#document-updates)might increase the size of a document above this limit.
Valid values are numbers including a unit (e.g. _10MiB_) and the value must be between 1Mib and 2048 Mib (inclusive). Values will be rounded to nearest MiB, so using MiB as a unit is preferrable. It is strongly recommended to make sure this is not set too high, 10 MiB is a reasonable setting for most use cases, setting it above 100 MiB is not recommended, as allowing large documents might impact operations, e.g. when restarting nodes, moving documents between nodes etc. Default value is 128 MiB.
Example:
```
```
10MiB
```
```
## merges
Contained in [tuning](#tuning). Defines throttling parameters for bucket merge operations.
| Attribute | Required | Value | Default | Description |
| --- | --- | --- | --- | --- |
| max-per-node | optional | number | | Maximum number of parallel active bucket merge operations. |
| max-queue-size | optional | number | | Maximum size of the merge bucket queue, before reporting BUSY back to the distributors. |
## persistence-threads
Contained in [tuning](#tuning). Defines the number of persistence threads per partition on each content node. A content node executes bucket operations against the persistence engine synchronously in each of these threads. 8 threads are used by default. Override with the **count** attribute.
## visitors
Contained in [tuning](#tuning). Tuning parameters for visitor operations. Might contain [max-concurrent](#max-concurrent).
| Attribute | Required | Value | Default | Description |
| --- | --- | --- | --- | --- |
| thread-count | optional | number | | The maximum number of threads in which to execute visitor operations. A higher number of threads may increase performance, but may use more memory. |
| max-queue-size | optional | number | | Maximum size of the pending visitor queue, before reporting BUSY back to the distributors. |
## max-concurrent
Contained in [visitors](#visitors). Defines how many visitors can be active concurrently on each storage node. The number allowed depends on priority - lower priority visitors should not block higher priority visitors completely. To implement this, specify a fixed and a variable number. The maximum active is calculated by adjusting the variable component using the priority, and adding the fixed component.
| Attribute | Required | Value | Default | Description |
| --- | --- | --- | --- | --- |
| fixed | optional | number | [16](https://github.com/vespa-engine/vespa/blob/master/storage/src/vespa/storage/visiting/stor-visitor.def) | The fixed component of the maximum active count |
| variable | optional | number | [64](https://github.com/vespa-engine/vespa/blob/master/storage/src/vespa/storage/visiting/stor-visitor.def) | The variable component of the maximum active count |
## resource-limits
Contained in [tuning](#tuning). Specifies resource limits used to decide whether external write operations should be blocked in the entire content cluster, based on the reported resource usage by content nodes. See [feed block](../../../writing/feed-block.html) for more details.
**Warning:** The content nodes require resource headroom to handle extra documents as part of re-distribution during node failure, and spikes when running[maintenance jobs](../../../content/proton.html#proton-maintenance-jobs). Tuning these limits should be done with extreme care, and setting them too high might lead to permanent data loss. They are best left untouched, using the defaults, and cannot be set in Vespa Cloud.
| Element | Required | Value | Default | Description |
| --- | --- | --- | --- | --- |
| disk | optional | float
[0, 1] | 0.75 |
Fraction of total space on the disk partition used on a content node before feed is blocked
|
| memory | optional | float
[0, 1] | 0.8/0.75 |
Fraction of physical memory that can be resident memory in anonymous mapping on a content node before feed is blocked. Total physical memory is sampled as the minimum of `sysconf(_SC_PHYS_PAGES) * sysconf(_SC_PAGESIZE)` and the cgroup (v1 or v2) memory limit. Nodes with 8 Gib or less memory in Vespa Cloud has a limit of 0.75.
|
Example - in the content tag:
```
```
0.78
0.77
```
```
## dispatch
Contained in [tuning](#tuning). Tune the query dispatch behavior - child elements:
| Element | Required | Value | Default | Description |
| --- | --- | --- | --- | --- |
| max-hits-per-partition | optional | Integer | No capping: Return all |
Maximum number of hits to return from a content node. By default, a query returns the requested number of hits + offset from every content node to the container. The container orders the hits globally according to the query, then discards all hits beyond the number requested.
In a system with a large fan-out, this consumes network bandwidth and the container nodes easily network saturated. Containers will also sort and discard more hits than optimal.
When there are sufficiently many search nodes, assuming an even distribution of the hits, it suffices to only return a fraction of the request number of hits from each node. Note that changing this number will have global ordering impact. See _top-k-probability_ below for improving performance with fewer hits.
|
| dispatch-policy | optional | best-of-random-2 / adaptive | adaptive |
Configure policy for choosing which group shall receive the next query request. However, multiphase requests that either requires or benefits from hitting the same group in all phases are always hashed. Relevant only for [grouped distribution](../../../performance/sizing-search.html#data-distribution):
| best-of-random-2 | Selects 2 random groups and selects the one with the lowest latency. |
| adaptive | measures latency, preferring lower latency groups, selecting group with probability latency/(sum latency over all groups) |
|
| prioritize-availability | optional | Boolean | true |
With [grouped distribution](../../../performance/sizing-search.html#data-distribution): If true, or by default, all groups that are within min-active-docs-coverage of the **median** of the document count of other groups will be used to service queries. If set to false, only groups within min-active-docs-coverage of the **max** document count will be used, with the consequence that full coverage is prioritized over availability when multiple groups are lacking content, since the remaining groups may not be able to service the full query load.
|
| min-active-docs-coverage | optional | A float percentage | 97 |
With [grouped distribution](../../../performance/sizing-search.html#data-distribution): The percentage of active documents a group must have, relative to the median across all groups in the content cluster, to be considered active for serving queries. Because of measurement timing differences, it is not advisable to tune this above 99 percent.
|
| top-k-probability | optional | Double | 0.9999 |
Probability that the top K hits will be the globally best. Based on this probability, the dispatcher will fetch enough hits from each node to achieve this. The only way to guarantee a probability of 1.0 is to fetch K hits from each partition. However, by reducing the probability from 1.0 to 0.99999, one can significantly reduce number of hits fetched and save both bandwidth and latency. The number of hits to fetch from each partition is computed as:
$${q}={\frac{k}{n}}+{qT}({p},{30})×{\sqrt{ {k}×{\frac{1}{n}}×({1}-{\frac{1}{n}}) }}$$
where qT is a Student's t-distribution. With n=10 partitions, k=200 hits and p=0.99999, only 45 hits per partition is needed, as opposed to 200 when p=1.0.
Use this option to reduce network and container cpu/memory in clusters with many nodes per group - see [Vespa Serving Scaling Guide](../../../performance/sizing-search.html).
|
## cluster-controller
Contained in [tuning](#tuning). Tuning parameters for the cluster controller managing this cluster - child elements:
| Element | Required | Value | Default | Description |
| --- | --- | --- | --- | --- |
| init-progress-time | optional | | |
If the initialization progress count have not been altered for this amount of seconds, the node is assumed to have deadlocked and is set down. Note that initialization may actually be prioritized lower now, so setting a low value here might cause false positives. Though if it is set down for wrong reason, when it will finish initialization and then be set up again.
|
| transition-time | optional | | [storage\_transition\_time](https://github.com/vespa-engine/vespa/blob/master/configdefinitions/src/vespa/fleetcontroller.def)[distributor\_transition\_time](https://github.com/vespa-engine/vespa/blob/master/configdefinitions/src/vespa/fleetcontroller.def) |
The transition time states how long (in seconds) a node will be in maintenance mode during what looks like a controlled restart. Keeping a node in maintenance mode during a restart allows a restart without the cluster trying to create new copies of all the data immediately. If the node has not started or got back up within the transition time, the node is set down, in which case, new full bucket copies will be created. Note separate defaults for distributor and storage (i.e. search) nodes.
|
| max-premature-crashes | optional | | [max\_premature\_crashes](https://github.com/vespa-engine/vespa/blob/master/configdefinitions/src/vespa/fleetcontroller.def) |
The maximum number of crashes allowed before a content node is permanently set down by the cluster controller. If the node has a stable up or down state for more than the _stable-state-period_, the crash count is reset. However, resetting the count will not re-enable the node again if it has been disabled - restart the cluster controller to reset.
|
| stable-state-period | optional | | [stable\_state\_time\_period](https://github.com/vespa-engine/vespa/blob/master/configdefinitions/src/vespa/fleetcontroller.def) |
If a content node's state doesn't change for this many seconds, it's state is considered _stable_, clearing the premature crash count.
|
| min-distributor-up-ratio | optional | | [min\_distributor\_up\_ratio](https://github.com/vespa-engine/vespa/blob/master/configdefinitions/src/vespa/fleetcontroller.def) |
The minimum ratio of distributors that are required to be _up_ for the cluster state to be _up_.
|
| min-storage-up-ratio | optional | | [min\_storage\_up\_ratio](https://github.com/vespa-engine/vespa/blob/master/configdefinitions/src/vespa/fleetcontroller.def) |
The minimum ratio of content nodes that are required to be _up_ for the cluster state to be _up_.
|
| groups-allowed-down-ratio | optional | | [groups-allowed-down-ratio](https://github.com/vespa-engine/vespa/blob/master/configdefinitions/src/vespa/fleetcontroller.def) |
A ratio for the number of content groups that are allowed to be down simultaneously. A value of 0.5 means that 50% of the groups are allowed to be down. The default is to allow only one group to be down at a time.
|
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [content](#content)
- [documents](#documents)
- [document](#document)
- [document-processing](#document-processing)
- [min-redundancy](#min-redundancy)
- [min-redundancy and groups](#min-redundancy-and-groups)
- [redundancy](#redundancy)
- [coverage-policy](#coverage-policy)
- [node](#node)
- [group](#group)
- [distribution (in group)](#distribution)
- [engine](#engine)
- [proton](#proton)
- [searchable-copies](#searchable-copies)
- [tuning](#tuning-proton)
- [searchnode](#searchnode)
- [requestthreads](#requestthreads)
- [flushstrategy](#flushstrategy)
- [initialize](#initialize)
- [lidspace](#lidspace)
- [feeding](#feeding)
- [index](#index)
- [removed-db](#removed-db)
- [summary](#summary)
- [flush-on-shutdown](#flush-on-shutdown)
- [sync-transactionlog](#sync-transactionlog)
- [resource-limits (in proton)](#resource-limits-proton)
- [search](#search)
- [query-timeout](#query-timeout)
- [visibility-delay](#visibility-delay)
- [coverage](#coverage)
- [minimum](#minimum)
- [min-wait-after-coverage-factor](#min-wait-after-coverage-factor)
- [max-wait-after-coverage-factor](#max-wait-after-coverage-factor)
- [tuning](#tuning)
- [bucket-splitting](#bucket-splitting)
- [min-node-ratio-per-group](#min-node-ratio-per-group)
- [distribution (in tuning)](#distribution_type)
- [maintenance](#maintenance)
- [max-document-size](#max-document-size)
- [merges](#merges)
- [persistence-threads](#persistence-threads)
- [visitors](#visitors)
- [max-concurrent](#max-concurrent)
- [resource-limits](#resource-limits)
- [dispatch](#dispatch-tuning)
- [cluster-controller](#cluster-controller)
---
# Source: https://docs.vespa.ai/en/learn/contributing.html.md
# Contributing to Vespa
Contributions to [Vespa](http://github.com/vespa-engine/vespa)and the [Vespa documentation](http://github.com/vespa-engine/documentation)are welcome. This document tells you what you need to know to contribute.
## Open development
All work on Vespa happens directly on GitHub, using the [GitHub flow model](https://docs.github.com/en/get-started/quickstart/github-flow). We release the master branch a few times a week, and you should expect it to almost always work. In addition to the [builds seen on factory.vespa.ai](https://factory.vespa.ai)we have a large acceptance and performance test suite which is also run continuously.
### Pull requests
All pull requests are reviewed by a member of the Vespa Committers team. You can find a suitable reviewer in the OWNERS file upward in the source tree from where you are making the change (the OWNERS have a special responsibility for ensuring the long-term integrity of a portion of the code). If you want to become a committer/OWNER making some quality contributions is the way to start.
We require all pull request checks to pass.
## Versioning
Vespa uses semantic versioning - see [vespa versions](releases.html). Notice in particular that any Java API in a package having a @PublicAPI annotation in the package-info file cannot be changed in an incompatible way between major versions: Existing types and method signatures must be preserved (but can be marked deprecated).
## Issues
We track issues in [GitHub issues](https://github.com/vespa-engine/vespa/issues). It is fine to submit issues also for feature requests and ideas, whether you intend to work on them or not.
There is also a [ToDo list](https://github.com/vespa-engine/vespa/blob/master/TODO.md) for larger things which no one are working on yet.
## Community
If you have questions, want to share your experience or help others, please join our community on the [Vespa Slack](https://slack.vespa.ai), or see Vespa on [Stack Overflow](http://stackoverflow.com/questions/tagged/vespa).
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [Open development](#open-development)
- [Pull requests](#pull-requests)
- [Versioning](#versioning)
- [Issues](#issues)
- [Community](#community)
---
# Source: https://docs.vespa.ai/en/operations/self-managed/cpu-support.html.md
# CPU Support
For maximum performance, the current version of Vespa for x86\_64 is compiled only for [Haswell (2013)](https://en.wikipedia.org/wiki/Haswell_(microarchitecture)) or later CPUs. If trying to run on an older CPU, you will likely see error messages like the following:
```
Problem running program /opt/vespa/bin/vespa-runserver => died with signal: illegal instruction (you probably have an older CPU than required)
```
or in older versions of Vespa, something like
```
/usr/local/bin/start-container.sh: line 67: 10 Illegal instruction /opt/vespa/bin/vespa-start-configserver
```
If you would like to run Vespa on an older CPU, we provide a [generic x86 container image](https://hub.docker.com/r/vespaengine/vespa-generic-intel-x86_64/). This image is slower, receives less testing than the regular image, and is less frequently updated.
**To start a Vespa Docker container using this image:**
```
$ docker run --detach --name vespa --hostname vespa-container \
--publish 8080:8080 --publish 19071:19071 \
vespaengine/vespa-generic-intel-x86_64
```
Copyright © 2026 - [Cookie Preferences](#)
---
# Source: https://docs.vespa.ai/en/ranking/cross-encoders.html.md
# Ranking With Transformer Cross-Encoder Models
[Cross-Encoder Transformer](https://blog.vespa.ai/pretrained-transformer-language-models-for-search-part-4/) based text ranking models are generally more effective than [text embedding](../rag/embedding.html) models as they take both the query and the document as input with full cross-attention between all the query and document tokens.
The downside of cross-encoder models is the computational complexity. This document is a guide on how to export cross-encoder Transformer based models from [huggingface](https://huggingface.co/), and how to configure them for use in Vespa.
## Exporting cross-encoder models
For exporting models from HF to [ONNX](onnx), we recommend the [Optimum](https://huggingface.co/docs/optimum/main/en/index)library. Example usage for two relevant ranking models.
Export [intfloat/simlm-msmarco-reranker](https://huggingface.co/intfloat/simlm-msmarco-reranker), which is a BERT-based transformer model for English texts:
```
$ optimum-cli export onnx --task text-classification -m intfloat/simlm-msmarco-reranker ranker
```
Export [BAAI/bge-reranker-base](https://huggingface.co/BAAI/bge-reranker-base), which is a ROBERTA-based transformer model for English and Chinese texts (multilingual):
```
$ optimum-cli export onnx --task text-classification -m BAAI/bge-reranker-base ranker
```
These two example ranking models use different language model [tokenization](../reference/rag/embedding.html#huggingface-tokenizer-embedder) and also different transformer inputs.
After the above Optimum export command you have two important files that is needed for importing the model to Vespa:
```
├── ranker
│ └── model.onnx
└── tokenizer.json
```
The Optimum tool also supports various Transformer optimizations, including quantization to optimize the model for faster inference.
## Importing ONNX and tokenizer model files to Vespa
Add the generated `model.onnx` and `tokenizer.json` files from the `ranker` directory created by Optimum to the Vespa [application package](../basics/applications.html):
```
├── models
│ └── model.onnx
└── tokenizer.json
├── schemas
│ └── doc.sd
└── services.xml
```
## Configure tokenizer embedder
To speed up inference, Vespa avoids re-tokenizing the document tokens, so we need to configure the [huggingface-tokenizer-embedder](../reference/rag/embedding.html#huggingface-tokenizer-embedder) in the `services.xml` file:
```
..
..
```
This allows us to use the tokenizer while indexing documents in Vespa and also at query time to map (embed) query text to language model tokens.
## Using tokenizer in schema
Assuming we have two fields that we want to index and use for re-ranking (title, body), we can use the `embed` indexing expression to invoke the tokenizer configured above:
```
schema my_document {
document my_document {
field title type string {..}
field body type string {..}
}
field tokens type tensor(d0[512]) {
indexing: (input title || "") . " " . (input body || "") | embed tokenizer | attribute
}
}
```
The above will concat the title and body input document fields, and input to the `hugging-face-tokenizer` tokenizer which saves the output tokens as float (101.0).
To use the generated `tokens` tensor in ranking, the tensor field must be defined with [attribute](../content/attributes.html).
## Using the cross-encoder model in ranking
Cross-encoder models are not practical for _retrieval_ over large document volumes due to their complexity, so we configure them using [phased ranking](phased-ranking.html).
### Bert-based model
Bert-based models have three inputs:
- input\_ids
- token\_type\_ids
- attention\_mask
The [onnx-model](../reference/schemas/schemas.html#onnx-model) configuration specifies the input names of the model and how to calculate them. It also specifies the file `models/model.onnx`. Notice also the [GPU](../operations/self-managed/vespa-gpu-container.html). GPU inference is not required, and Vespa will fall back to CPU if no GPU device is found. See the section on [performance](#performance).
```
rank-profile bert-ranker inherits default {
inputs {
query(q_tokens) tensor(d0[32])
}
onnx-model cross_encoder {
file: models/model.onnx
input input_ids: my_input_ids
input attention_mask: my_attention_mask
input token_type_ids: my_token_type_ids
gpu-device: 0
}
function my_input_ids() {
expression: tokenInputIds(256, query(q_tokens), attribute(tokens))
}
function my_token_type_ids() {
expression: tokenTypeIds(256, query(q_tokens), attribute(tokens))
}
function my_attention_mask() {
expression: tokenAttentionMask(256, query(q_tokens), attribute(tokens))
}
first-phase {
expression: #depends on the retriever used
}
# The output of this model is a tensor of size ["batch", 1]
global-phase {
rerank-count: 25
expression: onnx(cross_encoder){d0:0,d1:0}
}
}
```
The example above limits the sequence length to `256` using the built-in [convenience functions](../reference/ranking/rank-features.html#tokenInputIds(length,%20input_1,%20input_2,%20...)) for generating token sequence input to Transformer models. Note that `tokenInputIds` uses 101 as start of sequence and 102 as padding. This is only compatible with BERT-based tokenizers. See section on [performance](#performance)about sequence length and impact on inference performance.
### Roberta-based model
ROBERTA-based models only have two inputs (input\_ids and attention\_mask). In addition, the default tokenizer start of sequence token is 1 and end of sequence is 2. In this case we use the`customTokenInputIds` function in `my_input_ids` function. See[customTokenInputIds](../reference/ranking/rank-features.html#customTokenInputIds(start_sequence_id, sep_sequence_id, length, input_1, input_2, ...)).
```
rank-profile roberta-ranker inherits default {
inputs {
query(q_tokens) tensor(d0[32])
}
onnx-model cross_encoder {
file: models/model.onnx
input input_ids: my_input_ids
input attention_mask: my_attention_mask
gpu-device: 0
}
function my_input_ids() {
expression: customTokenInputIds(1, 2, 256, query(q_tokens), attribute(tokens))
}
function my_attention_mask() {
expression: tokenAttentionMask(256, query(q_tokens), attribute(tokens))
}
first-phase {
expression: #depends on the retriever used
}
# The output of this model is a tensor of size ["batch", 1]
global-phase {
rerank-count: 25
expression: onnx(cross_encoder){d0:0,d1:0}
}
}
```
## Using the cross-encoder model at query time
At query time, we need to tokenize the user query using the [embed](../rag/embedding.html#embedding-a-query-text) support.
The `embed` of the query text, sets the `query(q_tokens)`tensor that we defined in the ranking profile.
```
```
{
"yql": "select title,body from doc where userQuery()",
"query": "semantic search",
"input.query(q_tokens)": "embed(tokenizer, \"semantic search\")",
"ranking": "bert-ranker",
}
```
```
The retriever (query + first-phase ranking) can be anything, including [nearest neighbor search](../querying/nearest-neighbor-search) a.k.a. dense retrieval using bi-encoders.
## Performance
There are three major scaling dimensions:
- The number of hits that are re-ranked [rerank-count](../reference/schemas/schemas.html#globalphase-rerank-count) Complexity is linear with the number of hits that are re-ranked.
- The size of the transformer model used.
- The sequence input length. Transformer models scales quadratic with the input sequence length.
For models larger than 30-40M parameters, we recommend using GPU to accelerate inference. Quantization of model weights can drastically improve serving efficiency on CPU. See[Optimum Quantization](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/quantization)
## Examples
The [MS Marco](https://github.com/vespa-engine/sample-apps/tree/master/msmarco-ranking)sample application demonstrates using cross-encoders.
## Using cross-encoders with multi-vector indexing
When using [multi-vector indexing](https://blog.vespa.ai/semantic-search-with-multi-vector-indexing/) we can do the following to feed the best (closest) paragraph using the [closest()](../reference/ranking/rank-features.html#closest(name)) feature into re-ranking with the cross-encoder model.
```
schema my_document {
document my_document {
field paragraphs type arraystring {..}
}
field tokens type tensor(p{}, d0[512]) {
indexing: input paragraphs | embed tokenizer | attribute
}
field embedding type tensor(p{}, x[768]) {
indexing: input paragraphs | embed embedder | attribute
}
}
```
Notice that both tokens use the same mapped embedding dimension name `p`.
```
rank-profile max-paragraph-into-cross-encoder inherits default {
inputs {
query(tokens) tensor(d0[32])
query(q) tensor(x[768])
}
first-phase {
expression: closeness(field, embedding)
}
function best_input() {
expression: reduce(closest(embedding)*attribute(tokens), max, p)
}
function my_input_ids() {
expression: tokenInputIds(256, query(tokens), best_input)
}
function my_token_type_ids() {
expression: tokenTypeIds(256, query(tokens), best_input)
}
function my_attention_mask() {
expression: tokenAttentionMask(256, query(tokens), best_input)
}
match-features: best_input my_input_ids my_token_type_ids my_attention_mask
global-phase {
rerank-count: 25
expression: onnx(cross_encoder){d0:0,d1:0} #Slice
}
}
```
The `best_input` uses a tensor join between the `closest(embedding)` tensor and the `tokens` tensor, which then returns the tokens of the best-matching (closest) paragraph.
This tensor is used in the other Transformer-related functions (`tokenTypeIds tokenAttentionMask tokenInputIds`) as the document tokens.
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [Exporting cross-encoder models](#exporting-cross-encoder-models)
- [Importing ONNX and tokenizer model files to Vespa](#importing-onnx-and-tokenizer-model-files-to-vespa)
- [Configure tokenizer embedder](#configure-tokenizer-embedder)
- [Using tokenizer in schema](#using-tokenizer-in-schema)
- [Using the cross-encoder model in ranking](#using-the-cross-encoder-model-in-ranking)
- [Bert-based model](#bert-based-model)
- [Roberta-based model](#roberta-based-model)
- [Using the cross-encoder model at query time](#using-the-cross-encoder-model-at-query-time)
- [Performance](#performance)
- [Examples](#examples)
- [Using cross-encoders with multi-vector indexing](#using-cross-encoders-with-multi-vector-indexing)
---
# Source: https://docs.vespa.ai/en/operations/kubernetes/custom-overrides-podtemplate.html.md
# Provide Custom Overrides
While services.xml defines the Vespa application specification, it abstracts away the underlying Kubernetes infrastructure. Advanced users often need to configure Kubernetes-specific settings for the Vespa application Pods to integrate Vespa within their broader platform ecosystem.
The Pod Template mechanism allows you to inject custom configurations into the Vespa application pods created by the ConfigServer.
Common use cases for overriding the default pod configuration include:
- **Sidecar Injection**: Running auxiliary containers alongside Vespa for logging (e.g., Fluent Bit), monitoring (e.g., Datadog, Prometheus exporters), or service mesh proxies (e.g., Envoy, Istio).
- **Scheduling Constraints**: Using nodeSelector, affinity, or tolerations to pin Vespa pods to specific hardware (e.g., high-memory nodes, specific availability zones) or isolate them from other workloads.
- **Metadata Management**: Adding custom Labels or Annotations for cost allocation, team ownership, or integration with external inventory tools.
- **Security & Config**: Mounting Kubernetes Secrets or ConfigMaps that contain credentials or environment configurations required by custom sidecars.
## Configure Custom Overrides
Overrides are defined in the `VespaSet` Custom Resource under `spec.application.podTemplate` and `spec.configServer.podTemplate`. This field accepts a standard Kubernetes PodTemplateSpec.
The Operator and ConfigServer treat this template as an overlay. When creating a ConfigServer or Application Pod, the base template of the main `vespa` container is merged with your custom overlay.
Vespa on Kubernetes enforces a `Add-Only` merge strategy. One cannot remove or downgrade core `vespa` container settings, but only augment them.
| Category | Allowed Actions | Restricted Actions |
| --- | --- | --- |
| **Containers** |
- Add new sidecar containers.
- Add env vars/mounts to main container.
|
- Cannot change main container image, command, or args.
- Cannot override main container CPU/Memory resources (these are locked to `services.xml`).
|
| **Volumes** |
- Add new Volumes (ConfigMap, Secret, EmptyDir).
|
- Cannot modify operator-reserved volumes (e.g., `/data`).
|
| **Metadata** |
- Add new Labels and Annotations.
|
- Cannot overwrite operator-created labels and annotations
|
## Examples
### Example 1: Injecting a Logging Sidecar
This example adds a Fluent Bit sidecar to ship logs to a central system. It defines the sidecar container and mounts a shared volume that the Vespa container also writes to.
```
apiVersion: k8s.ai.vespa/v1
kind: VespaSet
metadata:
name: my-vespa-cluster
spec:
application:
image: vespaengine/vespa:8.200.15
# Define the Custom Overlay
podTemplate:
spec:
containers:
# 1. Define the Sidecar
- name: fluent-bit
image: fluent/fluent-bit:1.9
volumeMounts:
- name: vespa-logs
mountPath: /opt/vespa/logs/vespa
# 2. Define the Shared Volume
volumes:
- name: vespa-logs
emptyDir: {}
```
### Example 2: Pinning Pods to Specific Nodes
This example uses a nodeSelector to ensure Vespa pods only run on nodes labeled with workload=high-performance.
```
apiVersion: k8s.ai.vespa/v1
kind: VespaSet
metadata:
name: prod-vespa
spec:
application:
podTemplate:
spec:
# Schedule only on nodes with label 'workload: high-performance'
nodeSelector:
workload: high-performance
# Tolerate the 'dedicated' taint if those nodes are tainted
tolerations:
- key: "dedicated"
operator: "Equal"
value: "search-team"
effect: "NoSchedule"
```
### Example 3: Adding Cost Allocation Labels
This example adds custom labels that will appear on every tenant pod, enabling cost tracking by team.
```
apiVersion: k8s.ai.vespa/v1
kind: VespaSet
metadata:
name: shared-vespa
spec:
application:
podTemplate:
metadata:
labels:
cost-center: "engineering-search"
owner: "team-alpha"
annotations:
# Example annotation for an external monitoring system
monitoring.datadoghq.com/enabled: "true"
```
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [Configure Custom Overrides](#)
- [Examples](#)
- [Example 1: Injecting a Logging Sidecar](#)
- [Example 2: Pinning Pods to Specific Nodes](#)
- [Example 3: Adding Cost Allocation Labels](#)
---
# Source: https://docs.vespa.ai/en/operations/data-management.html.md
# Data management and backup
This guide documents how to export data from a Vespa cloud application and how to do mass updates or removals. See [cloning applications and data](cloning) for how to copy documents from one application to another.
Prerequisite: Use the latest version of the [vespa](../clients/vespa-cli.html) command-line client.
## Export documents
To export documents, configure the application to export from, then select zone, container cluster and schema - example:
```
$ vespa config set application vespa-team.vespacloud-docsearch.default
$ vespa visit --zone prod.aws-us-east-1c --cluster default --selection doc | head
```
Some of the parameters above are redundant if unambiguous. Here, the application is set up using a template found in [multinode-HA](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode-HA) with multiple container clusters. This example [visit](../writing/visiting.html) documents from the `doc` schema.
Use a [fieldset](../schemas/documents.html#fieldsets) to export document IDs only:
```
$ vespa visit --zone prod.aws-us-east-1c --cluster default --selection doc --field-set '[id]' | head
```
As the name implies, fieldsets are useful to select a subset of fields to export. Note that this normally does not speed up the exporting process, as the same amount of data is read from the index. The data transfer out of the Vespa application is smaller with fewer fields.
## Backup
Use the _visit_ operations above to extract documents for backup.
To back up documents to your own Google Cloud Storage, see [backup](https://github.com/vespa-engine/sample-apps/tree/master/examples/google-cloud/cloud-functions#backup---experimental) for a Google Cloud Function example.
## Feed
If a document feed is generated with `vespa visit` (above), it is already in [JSON Lines](https://jsonlines.org/) feed-ready format by default:
```
$ vespa visit | vespa feed - -t $ENDPOINT
```
Find more examples in [cloning applications and data](cloning).
A document export generated using [/document/v1](../writing/document-v1-api-guide.html) is slightly different from the .jsonl output from `vespa visit` (e.g., fields like a continuation token are added). Extract the `document` objects before feeding:
```
$ gunzip -c docs.gz |[jq](https://stedolan.github.io/jq/)'.documents[]' | \
vespa feed - -t $ENDPOINT
```
## Delete
To remove all documents in a Vespa deployment—or a selection of them—run a _deletion visit_. Use the `DELETE` HTTP method, and fetch only the continuation token from the response:
```
#!/bin/bash
set -x
# The ENDPOINT must be a regional endpoint, do not use '*.g.vespa-app.cloud/'
ENDPOINT="https://vespacloud-docsearch.vespa-team.aws-us-east-1c.z.vespa-app.cloud"
NAMESPACE=open
DOCTYPE=doc
CLUSTER=documentation
# doc.path =~ "^/old/" -- all documents under the /old/ directory:
SELECTION='doc.path%3D~%22%5E%2Fold%2F%22'
continuation=""
while
token=$( curl -X DELETE -s \
--cert data-plane-public-cert.pem \
--key data-plane-private-key.pem \
"${ENDPOINT}/document/v1/${NAMESPACE}/${DOCTYPE}/docid?selection=${SELECTION}&cluster=${CLUSTER}&${continuation}" \
| tee >( jq . > /dev/tty ) | jq -re .continuation )
do
continuation="continuation=${token}"
done
```
Each request will return a response after roughly one minute—change this by specifying _timeChunk_ (default 60).
To purge all documents in a document export (above), generate a feed with `remove`-entries for each document ID, like:
```
$ gunzip -c docs.gz | jq '[.documents[] | {remove: .id} ]' | head
[
{
"remove": "id:open:doc::open/documentation/schemas.html"
},
{
"remove": "id:open:doc::open/documentation/securing-your-vespa-installation.html"
},
```
Complete example for a single chunk:
```
$ gunzip -c docs.gz | jq '[.documents[] | {remove: .id} ]' | \
vespa feed - -t $ENDPOINT
```
## Update
To update all documents in a Vespa deployment—or a selection of them—run an _update visit_. Use the `PUT` HTTP method, and specify a partial update in the request body:
```
#!/bin/bash
set -x
# The ENDPOINT must be a regional endpoint, do not use '*.g.vespa-app.cloud/'
ENDPOINT="https://vespacloud-docsearch.vespa-team.aws-us-east-1c.z.vespa-app.cloud"
NAMESPACE=open
DOCTYPE=doc
CLUSTER=documentation
# doc.inlinks == "some-url" -- the weightedset inlinks has the key "some-url"
SELECTION='doc.inlinks%3D%3D%22some-url%22'
continuation=""
while
token=$( curl -X PUT -s \
--cert data-plane-public-cert.pem \
--key data-plane-private-key.pem \
--data '{ "fields": { "inlinks": { "remove": { "some-url": 0 } } } }' \
"${ENDPOINT}/document/v1/${NAMESPACE}/${DOCTYPE}/docid?selection=${SELECTION}&cluster=${CLUSTER}&${continuation}" \
| tee >( jq . > /dev/tty ) | jq -re .continuation )
do
continuation="continuation=${token}"
done
```
Each request will return a response after roughly one minute—change this by specifying _timeChunk_ (default 60).
## Using /document/v1/ api
To get started with a document export, find the _namespace_ and _document type_ by listing a few IDs. Hit the [/document/v1/](../reference/api/document-v1.html) ENDPOINT. Restrict to one CLUSTER, see [content clusters](../reference/applications/services/content.html):
```
$ curl \
--cert data-plane-public-cert.pem \
--key data-plane-private-key.pem \
"$ENDPOINT/document/v1/?cluster=$CLUSTER"
```
For ID export only, use a [fieldset](../schemas/documents.html#fieldsets):
```
$ curl \
--cert data-plane-public-cert.pem \
--key data-plane-private-key.pem \
"$ENDPOINT/document/v1/?cluster=$CLUSTER&fieldSet=%5Bid%5D"
```
From an ID, like _id:open:doc::open/documentation/schemas.html_, extract
- NAMESPACE: open
- DOCTYPE: doc
Example script:
```
#!/bin/bash
set -x
# The ENDPOINT must be a regional endpoint, do not use '*.g.vespa-app.cloud/'
ENDPOINT="https://vespacloud-docsearch.vespa-team.aws-us-east-1c.z.vespa-app.cloud"
NAMESPACE=open
DOCTYPE=doc
CLUSTER=documentation
continuation=""
idx=0
while
((idx+=1))
echo "$continuation"
printf -v out "%05g" $idx
filename=${NAMESPACE}-${DOCTYPE}-${out}.data.gz
echo "Fetching data..."
token=$( curl -s \
--cert data-plane-public-cert.pem \
--key data-plane-private-key.pem \
"${ENDPOINT}/document/v1/${NAMESPACE}/${DOCTYPE}/docid?wantedDocumentCount=1000&concurrency=4&cluster=${CLUSTER}&${continuation}" \
| tee >( gzip > ${filename} ) | jq -re .continuation )
do
continuation="continuation=${token}"
done
```
If only a few documents are returned per response, _wantedDocumentCount_ (default 1, max 1024) can be specified for a lower bound on the number of documents per response, if that many documents still remain.
Specifying _concurrency_ (default 1, max 100) increases throughput, at the cost of resource usage. This also increases the number of documents per response, and _could_ lead to excessive memory usage in the HTTP container when many large documents are buffered to be returned in the same response.
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [Export documents](#export-documents)
- [Backup](#backup)
- [Feed](#feed)
- [Delete](#delete)
- [Update](#update)
- [Using /document/v1/ api](#using-document-v1-api)
---
# Source: https://docs.vespa.ai/en/reference/operations/metrics/default-metric-set.html.md
# Default Metric Set
This document provides reference documentation for the Default metric set, including suffixes present per metric. If the suffix column contains "N/A" then the base name of the corresponding metric is used with no suffix.
## ClusterController Metrics
| Name | Unit | Suffixes | Description |
| --- | --- | --- | --- |
|
cluster-controller.down.count
| node | last, max | Number of content nodes down |
|
cluster-controller.maintenance.count
| node | last, max | Number of content nodes in maintenance |
|
cluster-controller.up.count
| node | last, max | Number of content nodes up |
|
cluster-controller.is-master
| binary | last, max | 1 if this cluster controller is currently the master, or 0 if not |
|
cluster-controller.resource\_usage.nodes\_above\_limit
| node | last, max | The number of content nodes above resource limit, blocking feed |
|
cluster-controller.resource\_usage.max\_memory\_utilization
| fraction | last, max | Current memory utilisation, for content node with the highest value |
|
cluster-controller.resource\_usage.max\_disk\_utilization
| fraction | last, max | Current disk space utilisation, for content node with the highest value |
## Container Metrics
| Name | Unit | Suffixes | Description |
| --- | --- | --- | --- |
|
http.status.1xx
| response | rate | Number of responses with a 1xx status |
|
http.status.2xx
| response | rate | Number of responses with a 2xx status |
|
http.status.3xx
| response | rate | Number of responses with a 3xx status |
|
http.status.4xx
| response | rate | Number of responses with a 4xx status |
|
http.status.5xx
| response | rate | Number of responses with a 5xx status |
|
jdisc.gc.ms
| millisecond | average, max | Time spent in JVM garbage collection |
|
jdisc.thread\_pool.work\_queue.capacity
| thread | max | Capacity of the task queue |
|
jdisc.thread\_pool.work\_queue.size
| thread | count, max, min, sum | Size of the task queue |
|
jdisc.thread\_pool.size
| thread | max | Size of the thread pool |
|
jdisc.thread\_pool.active\_threads
| thread | count, max, min, sum | Number of threads that are active |
|
jdisc.application.failed\_component\_graphs
| item | rate | JDISC Application failed component graphs |
|
jdisc.singleton.is\_active
| item | last, max | JDISC Singleton is active |
|
jdisc.http.ssl.handshake.failure.missing\_client\_cert
| operation | rate | JDISC HTTP SSL Handshake failures due to missing client certificate |
|
jdisc.http.ssl.handshake.failure.incompatible\_protocols
| operation | rate | JDISC HTTP SSL Handshake failures due to incompatible protocols |
|
jdisc.http.ssl.handshake.failure.incompatible\_chifers
| operation | rate | JDISC HTTP SSL Handshake failures due to incompatible chifers |
|
jdisc.http.ssl.handshake.failure.unknown
| operation | rate | JDISC HTTP SSL Handshake failures for unknown reason |
|
mem.heap.free
| byte | average | Free heap memory |
|
athenz-tenant-cert.expiry.seconds
| second | last, max, min | Time remaining until Athenz tenant certificate expires |
|
feed.operations
| operation | rate | Number of document feed operations |
|
feed.latency
| millisecond | count, sum | Feed latency |
|
queries
| operation | rate | Query volume |
|
query\_latency
| millisecond | average, count, max, sum | The overall query latency as seen by the container |
|
failed\_queries
| operation | rate | The number of failed queries |
|
degraded\_queries
| operation | rate | The number of degraded queries, e.g. due to some content nodes not responding in time |
|
hits\_per\_query
| hit\_per\_query | average, count, max, sum | The number of hits returned |
|
docproc.documents
| document | sum | Number of processed documents |
|
totalhits\_per\_query
| hit\_per\_query | average, count, max, sum | The total number of documents found to match queries |
|
serverActiveThreads
| thread | average | Deprecated. Use jdisc.thread\_pool.active\_threads instead. |
## Distributor Metrics
| Name | Unit | Suffixes | Description |
| --- | --- | --- | --- |
|
vds.distributor.docsstored
| document | average | Number of documents stored in all buckets controlled by this distributor |
|
vds.bouncer.clock\_skew\_aborts
| operation | count | Number of client operations that were aborted due to clock skew between sender and receiver exceeding acceptable range |
## NodeAdmin Metrics
| Name | Unit | Suffixes | Description |
| --- | --- | --- | --- |
|
endpoint.certificate.expiry.seconds
| second | N/A | Time until node endpoint certificate expires |
|
node-certificate.expiry.seconds
| second | N/A | Time until node certificate expires |
## SearchNode Metrics
| Name | Unit | Suffixes | Description |
| --- | --- | --- | --- |
|
content.proton.documentdb.documents.total
| document | last, max | The total number of documents in this documents db (ready + not-ready) |
|
content.proton.documentdb.documents.ready
| document | last, max | The number of ready documents in this document db |
|
content.proton.documentdb.documents.active
| document | last, max | The number of active / searchable documents in this document db |
|
content.proton.documentdb.disk\_usage
| byte | last | The total disk usage (in bytes) for this document db |
|
content.proton.documentdb.memory\_usage.allocated\_bytes
| byte | last | The number of allocated bytes |
|
content.proton.search\_protocol.query.latency
| second | average, count, max, sum | Query request latency (seconds) |
|
content.proton.search\_protocol.docsum.latency
| second | average, count, max, sum | Docsum request latency (seconds) |
|
content.proton.search\_protocol.docsum.requested\_documents
| document | rate | Total requested document summaries |
|
content.proton.resource\_usage.disk
| fraction | average | The relative amount of disk used by this content node (transient usage not included, value in the range [0, 1]). Same value as reported to the cluster controller |
|
content.proton.resource\_usage.memory
| fraction | average | The relative amount of memory used by this content node (transient usage not included, value in the range [0, 1]). Same value as reported to the cluster controller |
|
content.proton.resource\_usage.feeding\_blocked
| binary | last, max | Whether feeding is blocked due to resource limits being reached (value is either 0 or 1) |
|
content.proton.transactionlog.disk\_usage
| byte | last | The disk usage (in bytes) of the transaction log |
|
content.proton.documentdb.matching.docs\_matched
| document | rate | Number of documents matched |
|
content.proton.documentdb.matching.docs\_reranked
| document | rate | Number of documents re-ranked (second phase) |
|
content.proton.documentdb.matching.rank\_profile.query\_latency
| second | average, count, max, sum | Total average latency (sec) when matching and ranking a query |
|
content.proton.documentdb.matching.rank\_profile.query\_setup\_time
| second | average, count, max, sum | Average time (sec) spent setting up and tearing down queries |
|
content.proton.documentdb.matching.rank\_profile.rerank\_time
| second | average, count, max, sum | Average time (sec) spent on 2nd phase ranking |
## Sentinel Metrics
| Name | Unit | Suffixes | Description |
| --- | --- | --- | --- |
|
sentinel.totalRestarts
| restart | last, max, sum | Total number of service restarts done by the sentinel since the sentinel was started |
## Storage Metrics
| Name | Unit | Suffixes | Description |
| --- | --- | --- | --- |
|
vds.filestor.allthreads.put.count
| operation | rate | Number of requests processed. |
|
vds.filestor.allthreads.remove.count
| operation | rate | Number of requests processed. |
|
vds.filestor.allthreads.update.count
| request | rate | Number of requests processed. |
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [ClusterController Metrics](#clustercontroller-metrics)
- [Container Metrics](#container-metrics)
- [Distributor Metrics](#distributor-metrics)
- [NodeAdmin Metrics](#nodeadmin-metrics)
- [SearchNode Metrics](#searchnode-metrics)
- [Sentinel Metrics](#sentinel-metrics)
- [Storage Metrics](#storage-metrics)
---
# Source: https://docs.vespa.ai/en/reference/querying/default-result-format.html.md
# Default JSON Result Format
The default Vespa query response format is used when [presentation.format](../api/query.html#presentation.format) is unset or set to `json`. An alternative binary [CBOR](https://cbor.io/) format is available by setting `format=cbor` or using `Accept: application/cbor`. CBOR is a drop-in replacement - when deserialized, the result is identical to JSON. CBOR is both more compact and faster to generate, especially for numeric data such as tensors and embeddings. Results are rendered with one or more objects:
- `root`: mandatory object with the tree of returned data
- `timing`: optional object with query timing information
- `trace`: optional object for metadata about query execution
Refer to the [query API guide](../../querying/query-api.html#result-examples) for result and tracing examples.
All object names are literal strings, the node `root` is the map key "root" in the return JSON object, in other words, only strings are used as map keys.
| Element | Parent | Mandatory | Type | Description |
| --- | --- | --- | --- | --- |
|
## root
|
| root | | yes | Map of string to object |
The root of the tree of returned data.
|
| children | root | no | Array of objects |
Array of JSON objects with the same structure as `root`.
|
| fields | root | no | Map of string to object |
|
| totalCount | fields | no | Integer |
Number of documents matching the query. Not accurate when using _nearestNeighbor_, _wand_ or _weakAnd_ query operators.
The value is the number of hits after [first-phase dropping](../schemas/schemas.html#rank-score-drop-limit).
|
| coverage | root | no | Map of string to string and number |
Map of metadata about how much of the total corpus has been searched to return the given documents.
|
| coverage | coverage | yes | Integer |
Percentage of total corpus searched (when lower than 100 this is an approximation and is a lower bound, as no info from nodes down is known)
|
| documents | coverage | yes | Long |
The number of active documents searched.
|
| full | coverage | yes | Boolean |
Whether the full corpus was searched.
|
| nodes | coverage | yes | Integer |
The number of search nodes returning results.
|
| results | coverage | yes | Integer |
The number of results merged creating the final rendered result.
|
| resultsFull | coverage | yes | Integer |
The number of full result sets merged, e.g. when there are several sources/clusters for the results.
|
| degraded | coverage | no | Map of string to object |
Map of match-phase degradation elements.
|
| match-phase | degraded | no | Boolean |
Indicator whether [match-phase degradation](../schemas/schemas.html#match-phase) has occurred.
|
| timeout | degraded | no | Boolean |
Indicator whether the query [timed out](../api/query.html#timeout) before completion.
|
| adaptive-timeout | degraded | no | Boolean |
Indicator whether the query timed out with [adaptive timeout](../api/query.html#ranking.softtimeout.enable) before completion.
|
| non-ideal-state | degraded | no | Boolean |
Indicator whether the content cluster is in [ideal state](../../content/idealstate.html).
|
| errors | root | no | Array of objects |
Array of error messages with the fields given below. [Example](../../querying/query-api.html#error-result).
|
| code | errors | yes | Integer |
Numeric identifier used by the container application. See [error codes](https://github.com/vespa-engine/vespa/blob/master/container-disc/src/main/java/com/yahoo/container/protect/Error.java) and [ErrorMessage.java](https://github.com/vespa-engine/vespa/blob/master/container-search/src/main/java/com/yahoo/search/result/ErrorMessage.java) for a short description.
|
| message | errors | no | String |
Full error message.
|
| source | errors | no | String |
Which [data provider](../../querying/federation.html) logged the error condition.
|
| stackTrace | errors | no | String |
Stack trace if an exception was involved.
|
| summary | errors | yes | String |
Short description of error.
|
| transient | errors | no | Boolean |
Whether the system is expected to recover from the faulty state on its own. If the flag is not present, this may or may not be the case, or the flag is not applicable.
|
| fields | root | no | Map of string to object |
The named document (schema) [fields](../schemas/schemas.html#field). Fields without value are not rendered.
In addition to the fields defined in the schema, the following might be returned:
| Fieldname | Description |
| --- | --- |
| sddocname |
Schema name. Returned in the [default document summary](../../querying/document-summaries.html).
|
| documentid |
Document ID. Returned in the [default document summary](../../querying/document-summaries.html).
|
| summaryfeatures |
Refer to [summary-features](../schemas/schemas.html#summary-features) and [observing values used in ranking](../../ranking/ranking-intro#observing-values-used-in-ranking).
|
| matchfeatures |
Refer to [match-features](../schemas/schemas.html#match-features) and [example use](../../querying/nearest-neighbor-search-guide#strict-filters-and-distant-neighbors).
|
|
| id | root | no | String |
String identifying the hit, document or other data type. For document hits, this is the full string document id if the hit is filled with a document summary from disk. If it is not filled or only filled with data from memory (attributes), it is an internally generated unique id on the form `index:[source]/[node-index]/[hex-gid]`.
Also see the [/document/v1/ guide](../../writing/document-v1-api-guide.html#troubleshooting) and [receiving-responses-of-different-formats-for-the-same-query-in-vespa](https://stackoverflow.com/questions/74033383/receiving-responses-of-different-formats-for-the-same-query-in-vespa).
|
| label | root | no | String |
The label of a grouping list.
|
| limits | root | no | Object |
Used in grouping, the limits of a bucket in histogram style data.
|
| from | limits | no | String |
Lower bound of a bucket group.
|
| to | limits | no | String |
Upper bound of a bucket group.
|
| relevance | root | yes | Double |
Double value representing the rank score.
|
| source | root | no | String |
Which data provider created this node.
|
| types | root | no | Array of string |
Metadata about what kind of document or other kind of node in the result set this object is.
|
| value | root | no | String |
Used in grouping for value groups, the argument for the grouping data which is in the fields.
|
| |
|
## timing
|
| timing | | no | Map of string to object |
Query timing information, enabled by [presentation.timing](../api/query.html#presentation.timing). The [query performance guide](../../performance/practical-search-performance-guide#basic-text-search-query-performance) is a useful resource to understand the values in its child elements.
|
| querytime | timing | no | Double |
Time to execute the first protocol phase/matching phase, in seconds.
|
| summaryfetchtime | timing | no | Double |
[Document summary](../../querying/document-summaries.html) fetch time, in seconds. This is the time to execute the summary fill protocol phase for the globally ordered top-k hits.
|
| searchtime | timing | no | Double |
Approximately the sum of `querytime` and `summaryfetchtime` and is close to what a client will observe (except network latency). In seconds.
|
| |
|
## trace
**Note:** The tracing elements below is a subset of all elements. Refer to the [search performance guide](../../performance/practical-search-performance-guide#advanced-query-tracing) for examples.
|
| trace | | no | Map of string to object |
Metadata about query execution.
|
| children | trace | no | Array of object |
Array of maps with exactly the same structure as `trace` itself.
|
| timestamp | children | no | Long |
Number of milliseconds since the start of query execution this node was added to the trace.
|
| message | children | no | String |
Descriptive trace text regarding this step of query execution.
|
| message | children | no | Array of objects |
Array of messages
|
| start\_time | message | no | String |
Timestamp, e.g. 2022-07-27 09:51:21.938 UTC
|
| traces | message or threads | no | Array of traces or objects |
|
| distribution-key | message | no | Integer |
The [distribution key](../applications/services/content.html#node) of the content node creating this span.
|
| duration\_ms | message | no | float |
duration of span
|
| timestamp\_ms | traces | no | float |
time since start of parent, see `start_time`.
|
| event | traces | no | String |
Description of span
|
| tag | traces | no | String |
Name of span
|
| threads | traces | no | Array of objects |
Array of object that again has traces elements.
|
## JSON Schema
Formal schema for the query API default result format:
```
```
{
"$schema": "http://json-schema.org/draft-04/schema#",
"title": "Result",
"description": "Schema for Vespa results",
"type": "object",
"properties": {
"root": {
"type": "document_node",
"required": true
},
"trace": {
"type": "trace_node",
"required": false
}
},
"definitions": {
"document_node": {
"properties": {
"children": {
"type": "array",
"items": {
"type": "document_node"
},
"required": false
},
"coverage": {
"type": "coverage",
"required": false
},
"errors": {
"type": "array",
"items": {
"type": "error"
},
"required": false
},
"fields": {
"type": "object",
"additionalProperties": true,
"required": false
},
"id": {
"type": "string",
"required": false
},
"relevance": {
"type": "number",
"required": true
},
"types": {
"type": "array",
"items": {
"type": "string"
},
"required": false
},
"source": {
"type": "string",
"required": false
},
"value": {
"type": "string",
"required": false
},
"limits": {
"type": "object",
"required": false
},
"label": {
"type": "string",
"required": false
}
},
"additionalProperties": true,
},
"trace_node": {
"properties": {
"children": {
"type": "array",
"items": {
"type": "trace_node"
},
"required": false
},
"timestamp": {
"type": "number",
"required": false
},
"message": {
"type": "string",
"required": false
}
}
},
"fields": {
"properties": {
"totalCount": {
"type": "number",
"required": true
}
}
},
"coverage": {
"properties": {
"coverage": {
"type": "number",
"required": true
},
"documents": {
"type": "number",
"required": true
},
"full": {
"type": "boolean",
"required": true
},
"nodes": {
"type": "number",
"required": true
},
"results": {
"type": "number",
"required": true
},
"resultsFull": {
"type": "number",
"required": true
}
}
},
"error": {
"properties": {
"code": {
"type": "number",
"required": true
},
"message": {
"type": "string",
"required": false
},
"source": {
"type": "string",
"required": false
},
"stackTrace": {
"type": "string",
"required": false
},
"summary": {
"type": "string",
"required": true
},
"transient": {
"type": "boolean",
"required": false
}
}
}
}
}
```
```
## Appendix: Legacy Vespa 7 JSON rendering
There were some inconsistencies between search results and document rendering in Vespa 7, which are fixed in Vespa 8. This appendix describes the old behavior, what the changes are, and how to configure to select a specific rendering.
### Inconsistent weightedset rendering
Fields with various weightedset types has a JSON input representation (for feeding) as a JSON object; for example `{"one":1, "two":2,"three":3}` for the value of a a `weightedset` field. The same format is used when rendering a document (for example when visiting).
In search results however, there are intermediate processing steps during which the field value is represented as an array of item/weight pairs, so in a search result the field value would render as `[ {"item":"one", "weight":1},
{"item":"two", "weight":2},
{"item":"three", "weight":3} ]`
In Vespa 8, the default JSON renderer for search results outputs the same format as document rendering. If you have code that depends on the old format you can turn off this by setting `renderer.json.jsonWsets=false` in the query (usually via a [query profile](../../querying/query-profiles.html)).
### Inconsistent map rendering
Fields with various map types has a JSON input representation (for feeding) as a JSON object; for example `{"1001":1.0, "1002":2.0, "1003":3.0}` for the value of a a `map` field. The same format is used when rendering a document (for example when visiting).
In search results however, there are intermediate processing steps and the field value is represented as an array of key/value pairs, so in a search results the field value would (in some cases) render as `[ {"key":1001, "value":1.0},
{"key":1002, "value":2.0},
{"key":1003, "value":3.0} ]`
In Vespa 8, the default JSON renderer for search results output the same format as document rendering. For code that depends on the old format one can turn off this by setting `renderer.json.jsonMaps=false` in the query (usually via a [query profile](../../querying/query-profiles.html)).
### Geo position rendering
Fields with the type `position` would in Vespa 7 be rendered using the internal fields "x" and "y". These are integers representing microdegrees, aka geographical degrees times 1 million, of longitude (for x) and latitude (for y). Also, any field _foo_ of type `position` would trigger addition of two extra synthetic summary fields _foo.position_ and _foo.distance_ (see below for details).
In Vespa 8, positions are rendered with two JSON fields "lat" and "lng", both having a floating-point value. The "lat" field is latitude (going from -90.0 at the South Pole to +90.0 at the North Pole). The "lng" field is longitude (going from -180.0 at the dateline seen as extreme west, via 0.0 at the Greenwich meridian, to +180.0 at the dateline again, now as extreme east). The field names are chosen so the format is the same as used in the Google "places" API.
A closely related change is the removal of two synthetic summary fields which would be returned in search results. For example with this in schema:
```
field mainloc type position {
indexing: attribute | summary
}
```
Vespa 7 would include the _mainloc_ summary field, but also _mainloc.position_ and _mainloc.distance_; the latter only when the query actually had a position to take the distance from.
The first of these (_mainloc.position_ in this case) was mainly useful for producing XML output in older Vespa versions, and now contains just the same information as the _mainloc_ summary field. The second (_mainloc.distance_ in this case) would return a distance in internal units, and can be replaced by a summary feature - here `distance(mainloc)` would give the same number, while `distance(mainloc).km` would be the recommended replacement with suitable code changes.
### Summary-features wrapped in "rankingExpression"
In Vespa 7, if a rank profile wanted a function `foobar` returned in summary-features (or match-features), it would be rendered as `rankingExpression(foobar)` in the output.
For programmatic use, the `FeatureData` class has extra checking to allow lookup with `getDouble("foobar")` or `getTensor("foobar")`, but now it's present and rendered with just the original name as specified.
If applications needs the JSON rendering to look exactly as in Vespa 7, one can specify that in the rank profile. For example, with this in the schema:
```
rank-profile whatever {
function lengthScore() { expression: matchCount(title)/fieldLength(title) }
summary-features {
matchCount(title)
lengthScore
...
```
could, in Vespa 7, yield JSON output containing:
```
summaryfeatures: {
matchCount(title): 1,
rankingExpression(lengthScore): 0.25,
...
```
in Vespa 8, you instead get the expected:
```
summaryfeatures: {
matchCount(title): 1,
lengthScore: 0.25,
...
```
But to get the old behavior one can specify:
```
rank-profile whatever {
function lengthScore() { expression: matchCount(title)/fieldLength(title) }
summary-features {
matchCount(title)
rankingExpression(lengthScore)
...
```
which gives you the same output as before.
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [root](#root-header)
- [timing](#timing-header)
- [trace](#trace-heading)
- [JSON Schema](#json-schema)
- [Appendix: Legacy Vespa 7 JSON rendering](#appendix-legacy-vespa-7-json-rendering)
- [Inconsistent weightedset rendering](#inconsistent-weightedset-rendering)
- [Inconsistent map rendering](#inconsistent-map-rendering)
- [Geo position rendering](#geo-position-rendering)
- [Summary-features wrapped in "rankingExpression"](#summary-features-wrapped-in-rankingexpression)
---
# Source: https://docs.vespa.ai/en/operations/deleting-applications.html.md
# Deleting Applications
**Warning:** Following these steps will remove production instances or regions and all data within them. Data will be unrecoverable.
## Deleting an application
To delete an application, use the console:
- navigate to the _application_ view at https://console.vespa-cloud.com/tenant/tenant-name/application where you can find the trash can icon to the far right, as an `ACTION`.
- navigate to the _deploy_ view at_https://console.vespa-cloud.com/tenant/tenant-name/application/app-name/prod/deploy_.

When the application deployments are deleted, delete the application in the [console](https://console.vespa-cloud.com). Remove the CI job that builds and deploys application packages, if any.
## Deleting an instance / region
To remove an instance or a deployment to a region from an application:
1. Remove the `region` from `prod`, or the `instance` from `deployment`in [deployment.xml](../reference/applications/deployment.html#instance):
2. Add or modify [validation-overrides.xml](../reference/applications/validation-overrides.html), allowing Vespa Cloud to remove production instances:
3. Build and deploy the application package.
Copyright © 2026 - [Cookie Preferences](#)
---
# Source: https://docs.vespa.ai/en/applications/dependency-injection.html.md
# Dependency injection
The Container (a.k.a. JDisc container) implements a dependency injection framework that allows components to declare arbitrary dependencies on configuration and other components in the application. This document explains how to write a container component that depends on another component. See the [reference](../reference/applications/components.html#injectable-components)for a list of injectable components.
The container relies on auto-injection instead of Guice modules. All components declared in the container cluster are available for injection, and the dependent component only needs to declare the dependency as a constructor parameter. In general, dependency injection involves at least three elements:
- a dependent consumer,
- a declaration of a component's dependencies,
- an injector that creates instances of classes that implement a given dependency on request.
Notes:
- The dependent object describes what software component it depends on to do its work. The injector decides what concrete classes satisfy the requirements of the dependent object, and provides them to the dependent
- The Container encapsulates the injector, and the consumer and all its dependencies are considered to be components.
- The Container only supports constructor injection (i.e. all dependencies must be declared in a component's constructor).
- Circular dependencies is not supported.
Refer to the [multiple-bundles sample app](https://github.com/vespa-engine/sample-apps/tree/master/examples/multiple-bundles) for a practical example.
## Depending on another component
A component that depends on another is considered to be a _consumer_. A component's dependencies is whatever its `@Inject`-annotated constructor declares as arguments. E.g. the component:
```
package com.yahoo.example;
import com.yahoo.component.annotation.Inject;
public class MyComponent {
private final MyDependency dependency;
@Inject
public MyComponent(MyDependency dependency) {
this.dependency = dependency;
}
}
```
has a dependency on the class `com.yahoo.example.MyDependency`. To deploy `MyComponent`, register `MyDependency` in `services.xml`:
```
```
Upon deployment, the Container will first instantiate `MyDependency`, and then pass that instance to the constructor of `MyComponent`. Multiple consumers can take the same dependency. One can also [inject configuration](configuring-components.html) to components.
**Note:** A component will be reconstructed only when one of its dependencies, configuration, or its class changes - all which only occurs when you re-deploy your application package. Reconstruction is transitive; if component A depends on component B, and component B depends on component C, then a reconfiguration of component B causes a reconfiguration of A, but not of C. Reconfiguration of C causes a reconstruction of both A and B.
### Extending components
When injecting two components when one extends the other, the dependency injection code does not know which of the two to use as the argument for the parent class. To resolve this, inject a `ComponentRegistry` (see below), and look up its entries, like `getComponent(XXX.class.getName())`.
### Specify the bundle
The example above assumes the bundle name can be deducted from the class name. This is not always the case, and you will get class loading problems like:
```
Caused by: java.lang.IllegalArgumentException: Could not create a component with id
'com.yahoo.example.My'.
Tried to load class directly, since no bundle was found for spec:
com.yahoo.example.Dependency
```
To remedy, specify the jar file (i.e. bundle) with the component:
```
```
## Depending on all components of a specific type
Consider the use-case where a component chooses between various strategies, and each strategy is implemented as a separate component. Since the number and type of strategies is unknown when implementing the consumer, it is impossible to make a constructor that lists all of them. This is where the `ComponentRegistry` comes into play. E.g. the following component:
```
package com.yahoo.example;
public class MyComponent {
private final ComponentRegistry strategies;
@Inject
public MyComponent(ComponentRegistry strategies) {
this.strategies = strategies;
}
}
```
declares a dependency on the set of all components registered in `services.xml`that are instances of the class `Strategy` (including subclasses). The `ComponentRegistry` class provides accessors for components based on their [component id](../reference/applications/services/container.html#component).
## Special Components
There are cases where a component cannot be directly injected to its consumers - example:
- The component must be instantiated via a factory method instead of its constructor
- Each consumer must have a unique instance of the dependency class
- The component uses native resources that must be cleaned up when the component goes out of scope
For these situations, JDisc supports injection, and optional deconstruction, via its `Provider` interface:
```
public interface Provider {
T get();
void deconstruct();
}
```
`get()` is called by JDisc each time it needs to instantiate the specific component type.`deconstruct()` is only called after reconfiguring the system with a new application, where the current provider instance is either removed or replaced due to modified dependencies.
Following the earlier example, declare a provider for the `MyDependency` class, that returns a new instance for each consumer:
```
package com.yahoo.example;
import com.yahoo.container.di.componentgraph.Provider;
public class MyDependencyProvider implements Provider {
@Override
public MyDependency get() {
return new MyDependency();
}
@Override
public void deconstruct() { }
}
```
Using this provider, `services.xml` has two instances of `MyComponent`, each getting a unique instance of `MyDependency`:
```
```
Upon deployment, the Container will first instantiate `MyDependencyProvider`, and then invoke `MyDependencyProvider.get()` for each instantiation of `MyComponent`.
A provider can declare constructor dependencies, just like any other component.
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [Depending on another component](#depending-on-another-component)
- [Extending components](#extending-components)
- [Specify the bundle](#specify-the-bundle)
- [Depending on all components of a specific type](#depending-on-all-components-of-a-specific-type)
- [Special Components](#special-components)
---
# Source: https://docs.vespa.ai/en/basics/deploy-an-application-java.html.md
# Deploy an application having Java components
Follow these steps to deploy a Vespa application which includes Java components to the [dev zone](../operations/environments.html#dev) on Vespa Cloud (for free).
Alternative versions of this guide:
- [Deploy an application using pyvespa](https://vespa-engine.github.io/pyvespa/getting-started-pyvespa-cloud.html) - for Python developers
- [Deploy an application without Java components](deploy-an-application.html)
- [Deploy an application without Vespa CLI](deploy-an-application-shell.html)
- [Deploy an application locally](deploy-an-application-local.html)
- [Deploy an application having Java components locally](deploy-an-application-local-java.html)
**Prerequisites:**
- [Java 17](https://openjdk.org/projects/jdk/17/).
- [Apache Maven](https://maven.apache.org/install.html) to build the application.
Steps:
1. **Create a [tenant](../learn/tenant-apps-instances.html) on Vespa Cloud:**
2. **Install the [Vespa CLI](../clients/vespa-cli.html)** using [Homebrew](https://brew.sh/):
3. **Configure the Vespa client:**
4. **Get Vespa Cloud control plane access:**
5. **Clone a sample [application](applications.html):**
6. **Add a certificate for [data plane access](../security/guide#data-plane) to the application:**
7. **Build the application:**
8. **[Deploy](applications.html#deploying-applications) the application:**
9. **[Feed](../writing/reads-and-writes.html) [documents](../schemas/documents.html):**
10. **Run [queries](../querying/query-api.html):**
Congratulations, you have deployed your first Vespa application! Application instances in the [dev zone](../operations/environments.html#dev)will by default keep running for 14 days after the last deployment. You can control this in the[console](https://console.vespa-cloud.com/).
#### Next: [Vespa applications](applications.html)
Copyright © 2026 - [Cookie Preferences](#)
---
# Source: https://docs.vespa.ai/en/basics/deploy-an-application-local-java.html.md
# Deploy an application having Java components locally
Follow these steps to deploy a Vespa application having Java components on your own machine.
Alternative versions of this guide:
- [Deploy an application using pyvespa](https://vespa-engine.github.io/pyvespa/getting-started-pyvespa-cloud.html) - for Python developers
- [Deploy an application](deploy-an-application.html)
- [Deploy an application having Java components](deploy-an-application-java.html)
- [Deploy an application without Vespa CLI](deploy-an-application-shell.html)
- [Deploy an application without Java components locally](deploy-an-application-local.html)
This is tested with _vespaengine/vespa:8.634.24_ container image.
**Prerequisites:**
- Linux, macOS or Windows 10 Pro on x86\_64 or arm64, with [Podman Desktop](https://podman.io/) or [Docker Desktop](https://www.docker.com/products/docker-desktop/) installed, with an engine running.
- Alternatively, start the Podman daemon:
```
$ podman machine init --memory 6000
$ podman machine start
```
- See [Docker Containers](/en/operations/self-managed/docker-containers.html) for system limits and other settings.
- For CPUs older than Haswell (2013), see [CPU Support](/en/cpu-support.html).
- Memory: Minimum 4 GB RAM dedicated to Docker/Podman. [Memory recommendations](/en/operations/self-managed/node-setup.html#memory-settings).
- Disk: Avoid `NO_SPACE` - the vespaengine/vespa container image + headroom for data requires disk space. [Read more](/en/writing/feed-block.html).
- [Homebrew](https://brew.sh/) to install the [Vespa CLI](/en/clients/vespa-cli.html), or download the Vespa CLI from [Github releases](https://github.com/vespa-engine/vespa/releases).
- [Java 17](https://openjdk.org/projects/jdk/17/).
- [Apache Maven](https://maven.apache.org/install.html) is used to build the application.
Steps:
1. **Validate the environment:**
2. **Install the [Vespa CLI](../clients/vespa-cli.html)** using [Homebrew](https://brew.sh/):
3. **Set local target:**
4. **Start a Vespa Docker container:**
5. **Clone a sample [application](applications.html):**
6. **Build it:**
7. **[Deploy](applications.html#deploying-applications) the application:**
8. **[Feed](../writing/reads-and-writes.html) [documents](../schemas/documents.html):**
9. **Run [queries](../querying/query-api.html):**
Congratulations, you have deployed your first Vespa application!
#### Next: [Vespa applications](applications.html)
```
$ docker rm -f vespa
```
Copyright © 2026 - [Cookie Preferences](#)
---
# Source: https://docs.vespa.ai/en/basics/deploy-an-application-local.html.md
# Deploy an application locally
Follow these steps to deploy a Vespa application on your own machine.
Alternative versions of this guide:
- [Deploy an application using pyvespa](https://vespa-engine.github.io/pyvespa/getting-started-pyvespa-cloud.html) - for Python developers
- [Deploy an application](deploy-an-application.html)
- [Deploy an application having Java components](deploy-an-application-java.html)
- [Deploy an application without Vespa CLI](deploy-an-application-shell.html)
- [Deploy an application having Java components locally](deploy-an-application-local-java.html)
This is tested with _vespaengine/vespa:8.634.24_ container image.
**Prerequisites:**
- Linux, macOS or Windows 10 Pro on x86\_64 or arm64, with [Podman Desktop](https://podman.io/) or [Docker Desktop](https://www.docker.com/products/docker-desktop/) installed, with an engine running.
- Alternatively, start the Podman daemon:
```
$ podman machine init --memory 6000
$ podman machine start
```
- See [Docker Containers](/en/operations/self-managed/docker-containers.html) for system limits and other settings.
- For CPUs older than Haswell (2013), see [CPU Support](/en/cpu-support.html).
- Memory: Minimum 4 GB RAM dedicated to Docker/Podman. [Memory recommendations](/en/operations/self-managed/node-setup.html#memory-settings).
- Disk: Avoid `NO_SPACE` - the vespaengine/vespa container image + headroom for data requires disk space. [Read more](/en/writing/feed-block.html).
- [Homebrew](https://brew.sh/) to install the [Vespa CLI](/en/clients/vespa-cli.html), or download the Vespa CLI from [Github releases](https://github.com/vespa-engine/vespa/releases).
Steps:
1. **Validate the environment:**
2. **Install the [Vespa CLI](../clients/vespa-cli.html)** using [Homebrew](https://brew.sh/):
3. **Set local target:**
4. **Start a Vespa Docker container:**
5. **Clone a sample [application](applications.html):**
6. **[Deploy](applications.html#deploying-applications) the application:**
7. **[Feed](../writing/reads-and-writes.html) [documents](../schemas/documents.html):**
8. **Run [queries](../querying/query-api.html):**
9. **Get documents:**
Congratulations, you have deployed your first Vespa application!
#### Next: [Vespa applications](applications.html)
```
$ docker rm -f vespa
```
Copyright © 2026 - [Cookie Preferences](#)
---
# Source: https://docs.vespa.ai/en/basics/deploy-an-application-shell.html.md
# Deploy an application without Vespa CLI
This lets you deploy an application to the [dev zone](../operations/environments.html#dev)on Vespa Cloud (for free).
Alternative versions of this guide:
- [Deploy an application using pyvespa](https://vespa-engine.github.io/pyvespa/getting-started-pyvespa-cloud.html) - for Python developers
- [Deploy an application](deploy-an-application.html)
- [Deploy an application having Java components](deploy-an-application-java.html)
- [Deploy an application locally](deploy-an-application-local.html)
- [Deploy an application with Java components locally](deploy-an-application-local-java.html)
**Prerequisites:**
- git - or download the files from [album-recommendation](https://github.com/vespa-engine/sample-apps/tree/master/album-recommendation)
- zip - or other tool to create a .zip file
- curl - or other tool to send HTTP requests with security credentials
- OpenSSL
Steps:
1. **Create a [tenant](../learn/tenant-apps-instances.html) on Vespa Cloud:**
2. **Clone a sample [application](applications.html):**
3. **Add a certificate for [data plane access](../security/guide#data-plane) to the application:**
4. **Create a deployable application package zip:**
5. **Deploy the application:**
6. **Verify the application endpoint:**
7. **[Feed](../writing/reads-and-writes.html) [documents](../schemas/documents.html):**
8. **Run [queries](../querying/query-api.html):**
Congratulations, you have deployed your first Vespa application! Application instances in the [dev zone](../operations/environments.html#dev)will by default keep running for 14 days after the last deployment. You can control this in the[console](https://console.vespa-cloud.com/).
#### Next: [Vespa applications](applications.html)
Copyright © 2026 - [Cookie Preferences](#)
---
# Source: https://docs.vespa.ai/en/basics/deploy-an-application.html.md
# Deploy an application
Follow these steps to deploy a Vespa application to the [dev zone](../operations/environments.html#dev)on Vespa Cloud (for free).
Alternative versions of this guide:
- [Deploy an application using pyvespa](https://vespa-engine.github.io/pyvespa/getting-started-pyvespa-cloud.html) - for Python developers
- [Deploy an application having Java components](deploy-an-application-java.html)
- [Deploy an application without Vespa CLI](deploy-an-application-shell.html)
- [Deploy an application locally](deploy-an-application-local.html)
- [Deploy an application having Java components locally](deploy-an-application-local-java.html)
Steps:
1. **Create a [tenant](../learn/tenant-apps-instances.html) on Vespa Cloud:**
2. **Install the [Vespa CLI](../clients/vespa-cli.html)** using [Homebrew](https://brew.sh/):
3. **Configure the Vespa client:**
4. **Get Vespa Cloud control plane access:**
5. **Clone a sample [application](applications.html):**
6. **Add a certificate for [data plane access](../security/guide#data-plane) to the application:**
7. **[Deploy](applications.html#deploying-applications) the application:**
8. **[Feed](../writing/reads-and-writes.html) [documents](../schemas/documents.html):**
9. **Run [queries](../querying/query-api.html):**
Congratulations, you have deployed your first Vespa application! Application instances in the [dev zone](../operations/environments.html#dev)will by default keep running for 14 days after the last deployment. You can control this in the[console](https://console.vespa-cloud.com/).
#### Next: [Vespa applications](applications.html)
Copyright © 2026 - [Cookie Preferences](#)
---
# Source: https://docs.vespa.ai/en/reference/api/deploy-v2.html.md
# Deploy API
This is the API specification and some examples for the HTTP Deploy API that can be used to deploy an application:
- [upload](#create-session)
- [prepare](#prepare-session)
- [activate](#activate-session)
The response format is JSON. Examples are found in the [use-cases](#use-cases). Also see the [deploy guide](/en/basics/applications.html#deploying-applications).
**Note:** To build a multi-application system, use one or three config server(s) per application. Best practise is using a [containerized](/en/operations/self-managed/docker-containers.html) architecture, also see [multinode-HA](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode-HA).
The current API version is 2. The API port is 19071 - use [vespa-model-inspect](/en/reference/operations/self-managed/tools.html#vespa-model-inspect) service configserver to find config server hosts. Example: `http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/session`. Write operations return successfully after a majority of config servers have persisted changes (e.g. 2 out of 3 config servers).
Entities:
| session-id |
The session-id used in this API is generated by the server and is required for all operations after [creating](#create-session) a session. The session-id is valid if it is an active session, or it was created before [session lifetime](https://github.com/vespa-engine/vespa/blob/master/configdefinitions/src/vespa/configserver.def) has expired, the default value being 1 hour.
|
| path |
An application file path in a request URL or parameter refers to a relative path in the application package. A path ending with "/" refers to a directory.
|
Use [Vespa CLI](../../clients/vespa-cli.html) to deploy from the command line.
## POST /application/v2/tenant/default/prepareandactivate
Creates a new session with the application package that is included in the request, prepares it and then activates it. See details in the steps later in this document
| Parameters |
| Name | Default | Description |
| --- | --- | --- |
| | | |
|
| Request body |
| Required | Content | Note |
| --- | --- | --- |
| Yes | A compressed [application package](../applications/application-packages.html) (with gzip or zip compression) | Set `Content-Type` HTTP header to `application/x-gzip` or `application/zip`. |
|
| Response |
See [active](#activate-session).
|
Example:
```
$ (cd src/main/application && zip -r - .) | \
curl --header Content-Type:application/zip --data-binary @- \
localhost:19071/application/v2/tenant/default/prepareandactivate
```
```
```
{
"log": [
{
"time": 1619448107299,
"level": "WARNING",
"message": "Host named 'vespa-container' may not receive any config since it is not a canonical hostname. Disregard this warning when testing in a Docker container."
}
],
"tenant": "default",
"session-id": "3",
"url": "http://localhost:19071/application/v2/tenant/default/application/default/environment/prod/region/default/instance/default",
"message": "Session 3 for tenant 'default' prepared and activated.",
"configChangeActions": {
"restart": [],
"refeed": [],
"reindex": []
}
}
```
```
## POST /application/v2/tenant/default/session
Creates a new session with the application package that is included in the request.
| Parameters |
| Name | Default | Description |
| --- | --- | --- |
| from | N/A | Use when you want to create a new session based on an active application. The value supplied should be a URL to an active application. |
|
| Request body |
| Required | Content | Note |
| --- | --- | --- |
| Yes, unless `from` parameter is used | A compressed [application package](../applications/application-packages.html) (with gzip or zip compression) | It is required to set the `Content-Type` HTTP header to `application/x-gzip` or `application/zip`, unless the `from` parameter is used. |
|
| Response | The response contains:
- A [session-id](#session-id) to the application that was created.
- A [prepared](#prepare-session) URL for preparing the application.
|
Examples (both requests return the same response):
- `POST /application/v2/tenant/default/session`
- `POST /application/v2/tenant/default/session?from=http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/application/default/environment/default/region/default/instance/default`
```
{
"tenant": "default",
"session-id": "1",
"prepared": "http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/session/session-id/prepared/",
"content": "http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/session/session-id/content/",
"message": "Session 1 for tenant 'default' created."
}
```
## PUT /application/v2/tenant/default/session/[[session-id](#session-id)]/content/[[path](#path)]
Writes the content to the given path, or creates a directory if the path ends with '/'.
| Parameters | None |
| Request body |
- If path is a directory, none.
- If path is a file, the contents of the file.
|
| Response |
None
- Any errors or warnings from writing the file/creating the directory.
|
## GET /application/v2/tenant/default/session/[[session-id](#session-id)]/content/[[path](#path)]
Returns the content of the file at this path, or lists files and directories if `path` ends with '/'.
| Parameters |
| Name | Default | Description |
| --- | --- | --- |
| recursive | false | If _true_, directory content will be listed recursively. |
| return | content |
- If set to content and path refers to a file, the content will be returned.
- If set to content and path refers to a directory, the files and subdirectories in the directory will be listed.
- If set to status and path refers to a file, the file status and hash will be returned.
- If set to status and path refers to a directory, a list of file/subdirectory statuses and hashes will be returned.
|
|
| Request body | None. |
| Response |
- If path is a directory: a JSON array of URLs to the files and subdirectories of that directory.
- If path is a file: the contents of the file.
- If status parameter is set, the status and hash will be returned.
|
Examples:
`GET /application/v2/tenant/default/session/3/content/`
```
```
[
"http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/session/3/content/hosts.xml",
"http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/session/3/content/services.xml",
"http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/session/3/content/schemas/"
]
```
```
`GET /application/v2/tenant/default/session/3/content/?recursive=true`
```
```
[
"http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/session/3/content/hosts.xml",
"http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/session/3/content/services.xml",
"http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/session/3/content/schemas/",
"http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/session/3/content/schemas/music.sd",
"http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/session/3/content/schemas/video.sd"
]
```
```
`GET /application/v2/tenant/default/session/3/content/hosts.xml`
```
```
vespa1
vespa2
```
```
`GET /application/v2/tenant/default/session/3/content/hosts.xml?return=status`
```
```
{
"name": "http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/session/3/content/hosts.xml",
"status": "new",
"md5": "03d7cff861fcc2d88db70b7857d4d452"
}
```
```
`GET /application/v2/tenant/default/session/3/content/schemas/?return=status`
```
```
[
{
"name": "http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/session/3/content/schemas/music.sd",
"status": "new",
"md5": "03d7cff861fcc2d88db70b7857d4d452"
},
{
"name": "http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/session/3/content/schemas/video.sd",
"status": "changed",
"md5": "03d7cff861fcc2d88db70b7857d4d452"
},
{
"name": "http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/session/3/content/schemas/book.sd",
"status": "deleted",
"md5": "03d7cff861fcc2d88db70b7857d4d452"
}
]
```
```
## DELETE /application/v2/tenant/default/session/[[session-id](#session-id)]/content/[[path](#path)]
Deletes the resource at the given path.
| Parameters | None |
| Request body |
None
|
| Response |
Any errors or warnings from deleting the resource.
|
## PUT /application/v2/tenant/default/session/[[session-id](#session-id)]/prepared
Prepares an application with the [session-id](#session-id) given.
| Parameters |
| Parameter | Default | Description |
| --- | --- | --- |
| applicationName | N/A | Name of the application to be deployed |
| environment | default | Environment where application should be deployed |
| region | default | Region where application should be deployed |
| instance | default | Name of application instance |
| debug | false | If true, include stack trace in response if prepare fails. |
| timeout | 360 seconds | Timeout in seconds to wait for session to be prepared. |
|
| Request body |
None
|
| Response |
Returns a [session-id](#session-id) and a link to activate the session.
- Log with any errors or warnings from preparing the application.
- An [activate](#activate-session) URL for activating the application with this [session-id](#session-id), if there were no errors.
- A list of actions (possibly empty) that must be performed in order to apply some config changes between the current active application and this next prepared application. These actions are organized into three categories; _restart_, _reindex_, and _refeed_:
- _Restart_ actions are done after the application has been activated and are handled by restarting all listed services. See [schemas](../schemas/schemas.html#modifying-schemas) for details.
- _Reindex_ actions are special refeed actions that Vespa [handles automatically](../../operations/reindexing.html), if the [reindex](#reindex) endpoint below is used.
- _Refeed_ actions require several steps to handle. See [schemas](../schemas/schemas.html#modifying-schemas) for details.
|
Example:
`PUT /application/v2/tenant/default/session/3/prepared`
```
```
{
"tenant": "default",
"session-id": "3",
"activate": "http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/session/3/active",
"message": "Session 3 for tenant 'default' prepared.",
"log": [
{ "level": "WARNING",
"message": "Warning message 1",
"time": 1430134091319
},
{ "level": "WARNING",
"message": "Warning message 2",
"time": 1430134091320
}
],
"configChangeActions": {
"restart": [ {
"clusterName": "mycluster",
"clusterType": "search",
"serviceType": "searchnode",
"messages": ["Document type 'test': Field 'f1' changed: add attribute aspect"],
"services": [ {
"serviceName": "searchnode",
"serviceType": "searchnode",
"configId": "mycluster/search/cluster.mycluster/0",
"hostName": "myhost.mydomain.com"
} ]
} ],
"reindex": [ {
"documentType": "test",
"clusterName": "mycluster",
"messages": ["Document type 'test': Field 'f1' changed: add index aspect"],
"services": [ {
"serviceName": "searchnode",
"serviceType": "searchnode",
"configId": "mycluster/search/cluster.mycluster/0",
"hostName": "myhost.mydomain.com"
} ]
} ]
}
}
```
```
## GET /application/v2/tenant/default/session/[[session-id](#session-id)]/prepared
Returns the state of a prepared session. The response is the same as a successful [prepare](#prepare-session) operation (above), however the _configChangeActions_ element will be empty.
## PUT /application/v2/tenant/default/session/[[session-id](#session-id)]/active
Activates an application with the [session-id](#session-id) given. The [session-id](#session-id) must be for a [prepared session](#prepare-session). The operation will make sure the session is activated on all config servers.
| Parameters |
| Parameter | Default | Description |
| --- | --- | --- |
| timeout | 60 seconds | Timeout in seconds to wait for session to be activated (when several config servers are used, they might need to sync before activate can be done). |
|
| Request body | None |
| Response |
Returns a [session-id](#session-id), a message and a URL to the activated application.
- [session-id](#session-id)
- Message
|
Example:
`PUT /application/v2/tenant/default/session/3/active`
```
```
{
"tenant": "default",
"session-id": "3",
"message": "Session 3 for tenant 'default' activated.",
"url": "http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/application/default/environment/default/region/default/instance/default"
}
```
```
## GET /application/v2/tenant/default/application/
Returns a list of the currently active applications for the given tenant.
| Parameters | None |
| Request body | None |
| Response |
Returns a list of applications
- Array of active applications
|
Example:
`GET /application/v2/tenant/default/application/`
```
```
{
["http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/application/default/environment/default/region/default/instance/default"]
}
```
```
## GET /application/v2/tenant/default/application/default
Gets info about the application.
| Parameters | None |
| Request body | None |
| Response |
Returns information about the application specified.
- config generation
|
Example:
`GET /application/v2/tenant/default/application/default`
```
```
{
"generation": 2
}
```
```
## GET /application/v2/tenant/default/application/default/environment/default/region/default/instance/default/reindexing
Returns [reindexing](../../operations/reindexing.html) status for the given application.
| Parameters | N/A |
| Request body | N/A |
| Response | JSON detailing current reindexing status for the application, with all its clusters and document types.
- Status for each content cluster in the application, by name:
- Status of each document type in the cluster, by name:
- Last time reindexing was triggered for this document type.
- Current status of reindexing.
- Optional start time of reindexing.
- Optional end time of reindexing.
- Optional progress of reindexing, from 0 to 1.
- Pseudo-speed of reindexing.
|
Example:
`GET /application/v2/tenant/default/application/default/environment/default/region/default/instance/default/reindexing`
```
```
{
"clusters": {
"db": {
"ready": {
"test_artifact": {
"readyMillis": 1607937250998,
"startedMillis": 1607940060012,
"state": "running",
"speed": 1.0,
"progress": 0.04013824462890625
},
"test_result": {
"readyMillis": 1607688477294,
"startedMillis": 1607690520026,
"endedMillis": 1607709294236,
"speed": 0.1,
"state": "successful"
},
"test_run": {
"readyMillis": 1607937250998,
"state": "pending"
}
}
}
}
}
```
```
## POST /application/v2/tenant/default/application/default/environment/default/region/default/instance/default/reindex
Marks specified document types in specified clusters of an application as ready for [reindexing](../../operations/reindexing.html). Reindexing itself starts with the next redeployment of the application. To stop an ongoing reindexing, see [updating reindexing](#update-reindexing) below. All document types in all clusters are reindexed unless restricted, using parameters as specified:
| Parameters |
| Name | Description |
| --- | --- |
| clusterId | A comma-separated list of content clusters to limit reindexing to. All clusters are reindexed if this is not present. |
| documentType | A comma-separated list of document types to limit reindexing to. All document types are reindexed if this is not present. |
| indexedOnly | Boolean: whether to mark reindexing ready only for document types with indexing mode _index_ and at least one field with the indexing statement `index`. Default is `false`. |
| speed | Number (0–10], default 1: Indexing pseudo speed - balance speed vs. resource use. Example: speed=0.1 |
|
| Request body | N/A |
| Response | A human-readable message indicating what reindexing was marked as ready. |
Example:
`POST /application/v2/tenant/default/application/default/environment/default/region/default/instance/default/reindex?clusterId=foo,bar&documentType=moo,baz&indexedOnly=true`
```
```
{
"message": "Reindexing document types [moo, baz] in 'foo', [moo] in 'bar' of application default.default"
}
```
```
## PUT /application/v2/tenant/default/application/default/environment/default/region/default/instance/default/reindex
Modifies [reindexing](../../operations/reindexing.html) of specified document types in specified clusters of an application. Specifically, this can be used to alter the pseudo-speed of the reindexing, optionally halting it by specifying a speed of `0`; reindexing for the specified types will remain dormant until either speed is increased again, or a new reindexing is triggered (see [trigger reindexing](#reindex)). Speed changes become effective with the next redeployment of the application. Reindexing for all document types in all clusters are affected if no other parameters are specified:
| Parameters |
| Name | Description |
| --- | --- |
| clusterId | A comma-separated list of content clusters to limit the changes to. Reindexing for all clusters are modified if this is not present. |
| documentType | A comma-separated list of document types to limit the changes to. Reindexing for all document types are modified if this is not present. |
| indexedOnly | Boolean: whether to modify reindexing only for document types with indexing mode _index_ and at least one field with the indexing statement `index`. Default is `false`. |
| speed | Number [0–10], required: Indexing pseudo speed - balance speed vs. resource use. Example: speed=0.1 |
|
| Request body | N/A |
| Response | A human-readable message indicating what reindexing was modified. |
Example:
`PUT /application/v2/tenant/default/application/default/environment/default/region/default/instance/default/reindex?clusterId=foo,bar&documentType=moo,baz&speed=0.618`
```
```
{
"message": "Set reindexing speed to '0.618' for document types [moo, baz] in 'foo', [moo] in 'bar' of application default.default"
}
```
```
## GET /application/v2/tenant/default/application/default/environment/default/region/default/instance/default/content/[[path](#path)]
Returns content at the given path for an application. See [getting content](#content-get) for usage and response.
## DELETE /application/v2/tenant/default/application/default
Deletes an active application.
| Parameters | None |
| Request body | None |
| Response |
Returns a message stating if the operation was successful or not
|
Example:
`DELETE /application/v2/tenant/default/application/default`
```
```
{
"message": "Application 'default' was deleted"
}
```
```
## GET /application/v2/host/[hostname]
Gets information about which tenant and application a hostname is used by.
| Parameters | None |
| Request body | None |
| Response |
Returns a message with tenant and application details.
|
Example:
`GET /application/v2/host/myhost.mydomain.com`
```
```
{
"tenant": "default"
"application": "default"
"environment": "default"
"region": "default"
"instance": "default"
}
```
```
## Error Handling
Errors are returned using standard HTTP status codes. Any additional info is included in the body of the return call, JSON-formatted. The general format for an error response is:
```
```
{
"error-code": "ERROR_CODE",
"message": "An error message"
}
```
```
| HTTP status code | Error code | Description |
| --- | --- | --- |
| 400 | BAD\_REQUEST | Bad request. Client error. The error message should indicate the cause. |
| 400 | INVALID\_APPLICATION\_PACKAGE | There is an error in the application package. The error message should indicate the cause. |
| 400 | OUT\_OF\_CAPACITY | Not enough nodes available for the request to be fulfilled. |
| 401 | | Not authorized. The error message should indicate the cause. |
| 404 | NOT\_FOUND | Not found. E.g. when using a session-id that doesn't exist. |
| 405 | METHOD\_NOT\_ALLOWED | Method not implemented. E.g. using GET where only POST or PUT is allowed. |
| 409 | ACTIVATION\_CONFLICT | Conflict, returned when activating an application fails due to a conflict with other changes to the same application (in another session). Client should retry. |
| 500 | INTERNAL\_SERVER\_ERROR | Internal server error. Generic error. The error message should indicate the cause. |
## Access log
Requests are logged in the [access log](../../operations/access-logging.html) which can be found at _$VESPA\_HOME/logs/vespa/configserver/access-json.log_, example:
```
```
{
"ip": "172.17.0.2",
"time": 1655665104.751,
"duration": 1.581,
"responsesize": 230,
"requestsize": 0,
"code": 200,
"method": "PUT",
"uri": "/application/v2/tenant/default/session/2/prepared",
"version": "HTTP/2.0",
"agent": "vespa-deploy",
"host": "b614c9ff04d7:19071",
"scheme": "https",
"localport": 19071,
"peeraddr": "172.17.0.2",
"peerport": 47480,
"attributes": {
"http2-stream-id":"1"
}
}
```
```
## Use Cases
It is assumed that the tenant _default_ is already created in these use cases, and the application package is in _app_.
### Create, prepare and activate an application
Create a session with the application package:
```
$ (cd app && zip -r - .) | \
curl -s --header Content-Type:application/zip --data-binary @- \
"http://host:19071/application/v2/tenant/default/session"
```
Prepare the application with the URL in the _prepared_ link from the response:
```
$ curl -s -X PUT "http://host:19071/application/v2/tenant/default/session/1/prepared?applicationName=default"
```
Activate the application with the URL in the _activate_ link from the response:
```
$ curl -s -X PUT "http://host:19071/application/v2/tenant/default/session/1/active"
```
### Modify the application package
Dump _services.xml_ from session 1:
```
$ curl -s -X GET "http://host:19071/application/v2/tenant/default/session/1/content/services.xml"
```
```
```
12345
```
```
Session 1 is activated and cannot be changed - create a new session based on the active session:
```
$ curl -s -X POST "http://host:19071/application/v2/tenant/default/session?from=http://host:19071/application/v2/tenant/default/application/default/environment/default/region/default/instance/default"
```
Modify rpcport to 12346 in _services.xml_, deploy the change:
```
$ curl -s -X PUT --data-binary @app/services.xml \
"http://host:19071/application/v2/tenant/default/session/2/content/services.xml"
```
Get _services.xml_ from session 2 to validate:
```
$ curl -s -X GET "http://host:19071/application/v2/tenant/default/session/2/content/services.xml"
```
```
```
12346
```
```
To add the file _files/test1.txt_, first create the directory, then add the file:
```
$ curl -s -X PUT "http://host:19071/application/v2/tenant/default/session/2/content/files/"
$ curl -s -X PUT --data-binary @app/files/test1.txt \
"http://host:19071/application/v2/tenant/default/session/2/content/files/test1.txt"
```
Prepare and activate the session:
```
$ curl -s -X PUT "http://host:19071/application/v2/tenant/default/session/2/prepared?applicationName=fooapp"
$ curl -s -X PUT "http://host:19071/application/v2/tenant/default/session/2/active"
```
### Rollback
If you need to roll back to a previous version of the application package this can be achieved by creating a new session based on the previous known working version by passing the corresponding session-id in the _from_ argument, see [creating a session](#create-session)
Also see [rollback](/en/applications/deployment.html#rollback).
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [POST /application/v2/tenant/default/prepareandactivate](#prepareandactivate)
- [POST /application/v2/tenant/default/session](#create-session)
- [PUT /application/v2/tenant/default/session/[](#content-put)
- [GET /application/v2/tenant/default/session/[](#content-get)
- [DELETE /application/v2/tenant/default/session/[](#content-delete)
- [PUT /application/v2/tenant/default/session/[](#prepare-session)
- [GET /application/v2/tenant/default/session/[](#get-prepare-session)
- [PUT /application/v2/tenant/default/session/[](#activate-session)
- [GET /application/v2/tenant/default/application/](#get-application)
- [GET /application/v2/tenant/default/application/default](#get-application-info)
- [GET /application/v2/tenant/default/application/default/environment/default/region/default/instance/default/reindexing](#reindexing)
- [POST /application/v2/tenant/default/application/default/environment/default/region/default/instance/default/reindex](#reindex)
- [PUT /application/v2/tenant/default/application/default/environment/default/region/default/instance/default/reindex](#update-reindexing)
- [GET /application/v2/tenant/default/application/default/environment/default/region/default/instance/default/content/[](#get-application-content)
- [DELETE /application/v2/tenant/default/application/default](#delete-application)
- [GET /application/v2/host/[hostname]](#get-host-info)
- [Error Handling](#error-handling)
- [Access log](#access-log)
- [Use Cases](#use-cases)
- [Create, prepare and activate an application](#use-case-start)
- [Modify the application package](#use-case-modify)
- [Rollback](#rollback)
---
# Source: https://docs.vespa.ai/en/operations/deployment-patterns.html.md
# Deployment patterns
Vespa Cloud's [automated deployments](automated-deployments.html)lets you design CD pipelines for staged rollouts and multi-zone deployments. This guide documents some of these patterns.
## Two regions, two AZs each, sequenced deployment
This is the simplest pattern, deploy to a set of zones/regions, in a sequence:

```
aws-us-east-1c
aws-use1-az4
aws-use2-az1
aws-use2-az3
```
## Two regions, two AZs each, parallel deployment
Same as above, but deploying all zones in parallel:

```
aws-us-east-1c
aws-use1-az4
aws-use2-az1
aws-use2-az3
```
## Two regions, two AZs each, parallel deployment inside region
Deploy to the use1 region first, both AZs in parallel, then the use2 region, both AZs in parallel:

```
aws-us-east-1c
aws-use1-az4
aws-use2-az1
aws-use2-az3
```
## Deploy to a test instance first
Deploy to a (downscaled) instance first, and add a delay before propagating to later instances and zones.

```
aws-use2-az1
aws-use2-az1
```
### Deployment variants
[Deployment variants](deployment-variants.html) are useful to set up a downscaled instance. In [services.xml](../reference/applications/services/services.html), override settings per instance:
```
```
## Test and prod instances as separate applications
In the section before, we modeled the test and prod app as one pipeline. This lets users halt the pipeline (using the delay) before prod propagation.
In some cases, this is better modeled as different applications:
- The CI pipeline is multistep, with approvals and use of different branches
The below uses different _applications_ to model the flow, these are completely separate application instances. The application owner will model the flow in own tool, and orchestrate deployments to Vespa Cloud as fit:
 
The important point is, these are two _separate_ deploy commands to Vespa Cloud:
```
$ vespa config set application kkraunetenant1.canaryapp
$ vespa prod deploy app
```
```
aws-use2-az1
```
```
$ vespa config set application kkraunetenant1.prodapp
$ vespa prod deploy app
```
```
aws-use2-az1
```
## services.xml structure
It is possible to split _services.xml_ to more file using includes:
```
```
Note: The include-feature can not be used in combination with [deployment variants](#deployment-variants).
## Next reads
- [Environments](environments.html)
- [Zones](zones.html)
- [Routing](endpoint-routing.html)
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [Two regions, two AZs each, sequenced deployment](#two-regions-two-azs-each-sequenced-deployment)
- [Two regions, two AZs each, parallel deployment](#two-regions-two-azs-each-parallel-deployment)
- [Two regions, two AZs each, parallel deployment inside region](#two-regions-two-azs-each-parallel-deployment-inside-region)
- [Deploy to a test instance first](#deploy-to-a-test-instance-first)
- [Deployment variants](#deployment-variants)
- [Test and prod instances as separate applications](#test-and-prod-instances-as-separate-applications)
- [services.xml structure](#servicesxml-structure)
- [Next reads](#next-reads)
---
# Source: https://docs.vespa.ai/en/operations/deployment-variants.html.md
# Instance, region, cloud and environment variants
Sometimes it is useful to create configuration that varies depending on properties of the deployment, for example to set region specific endpoints of services used by [Searchers](/en/applications/searchers.html), or use smaller clusters for a "beta" instance.
This is supported both for [services.xml](#services.xml-variants) and [query profiles](#query-profile-variants).
## services.xml variants
[services.xml](../reference/applications/services/services.html) files support different configuration settings for different _tags_, _instances_, _environments_, _clouds_ and _regions_. To use this, import the _deploy_ namespace:
```
```
```
```
Deploy directives are used to specify with which tags, and in which instance, environment, cloud and/or [region](https://cloud.vespa.ai/en/reference/zones) an XML element should be included:
```
```
2
```
```
The example above configures different node counts/configurations depending on the deployment target. Deploying the application in the _dev_ environment gives:
```
```
2
```
```
Whereas in `aws-us-west-2a` it is:
```
```
2
```
```
This can be used to modify any config by deployment target.
The `deploy` directives have a set of override rules:
- A directive specifying more conditions will override one specifying fewer.
- Directives are inherited in child elements.
- When multiple XML elements with the same name is specified (e.g. when specifying search or docproc chains), the _id_ attribute or the _idref_ attribute of the element is used together with the element name when applying directives.
Some overrides are applied by default in some environments, see [environments](/en/operations/environments.html). Any override made explicitly for an environment will override the defaults for it.
### Specifying multiple targets
More than one tag, instance, region or environment can be specified in the attribute, separated by space.
Note that `tags` by default only apply in production instances, and are matched whenever the tags of the element and the tags of the instance intersect. To match tags in other environments, an explicit `deploy:environment` directive for that environment must also match. Use tags if you have a complex instance structure which you want config to vary by.
The namespace can be applied to any element. Example:
```
```
Hello from application config
Hello from east colo!
```
```
Above, the `container` element is configured for the 3 environments only (it will not apply to `dev`) - and in region `aws-us-east-1c`, the config is different.
## Query profile variants
[Query profiles](/en/querying/query-profiles.html) support different configuration settings for different _instances_, _environments_ and _regions_ through [query profile variants](/en/querying/query-profiles.html#query-profile-variants). This allows you to set different query parameters for a query type depending on these deployment attributes.
To use this feature, create a regular query profile variant with any of `instance`, `environment` and `region` as dimension names and let your query profile vary by that. For example:
```
```
instance, environment, region
My default value
My beta value
My dev value
My main instance prod value
```
```
You can pick and combine these dimensions in any way you want with other dimensions sent as query parameters, e.g:
```
```
device, instance, usecase
```
```
Copyright © 2026 - [Cookie Preferences](#)
---
# Source: https://docs.vespa.ai/en/applications/deployment.html.md
# Source: https://docs.vespa.ai/en/reference/applications/deployment.html.md
# deployment.xml reference
_deployment.xml_ controls how an application is deployed.
_deployment.xml_ is placed in the root of the [application package](../../basics/applications.html) and specifies which environments and regions the application is deployed to during [automated application deployment](../../operations/automated-deployments.html), as which application instances.
Deployment progresses through the `test` and `staging` environments to the `prod` environments listed in _deployment.xml_.
Simple example:
```
```
aws-us-east-1c
aws-us-west-2a
```
```
More complex example:
```
```
aws-us-east-1c
aws-us-east-1c
aws-us-west-1c
aws-eu-west-1a
aws-us-west-2a
aws-us-east-1c
beta
```
```
Some of the elements can be declared _either_ under the `` root, **or**, if one or more `` tags are listed, under these. These have a bold **or** when listing where they may be present.
## deployment
The root element.
| Attribute | Mandatory | Values |
| --- | --- | --- |
| version | Yes | 1.0 |
| major-version | No | The major version number this application is valid for. |
| cloud-account | No | Account to deploy to with [Vespa Cloud Enclave](../../operations/enclave/enclave). |
## instance
In `` or `` (which must be a direct descendant of the root). An instance of the application; several of these may be simultaneously deployed in the same zone. If no `` is specified, all children of the root are implicitly children of an `` with `id="default"`, as in the simple example at the top.
| Attribute | Mandatory | Values |
| --- | --- | --- |
| id | Yes | The unique name of the instance. |
| tags | No | Space-separated tags which can be referenced to make [deployment variants](../../operations/deployment-variants.html). |
| cloud-account | No | Account to deploy to with [Vespa Cloud Enclave](../../operations/enclave/enclave). Overrides parent's use of cloud-account. |
## block-change
In ``, **or** ``. This blocks changes from being deployed to production in the matching time interval. Changes are nevertheless tested while blocked.
By default, both application revision changes and Vespa platform changes (upgrades) are blocked. It is possible to block just one kind of change using the `revision` and `version` attributes.
Any combination of the attributes below can be specified. Changes on a given date will be blocked if all conditions are met. Invalid `` tags (i.e. that contains conditions that never match an actual date) are rejected by the system.
This tag must be placed after any `` and `` tags, and before ``. It can be declared multiple times.
| Attribute | Mandatory | Values |
| --- | --- | --- |
| revision | No, default `true` | Set to `false` to allow application deployments |
| version | No, default `true` | Set to `false` to allow Vespa platform upgrades |
| days | No, default `mon-sun` | List of days this block is effective - a comma-separated list of single days or day intervals where the start and end day are separated by a dash and are inclusive. Each day is identified by its english name or three-letter abbreviation. |
| hours | No, default `0-23` | List of hours this block is effective - a comma-separated list of single hours or hour intervals where the start and end hour are separated by a dash and are inclusive. Each hour is identified by a number in the range 0 to 23. |
| time-zone | No, default UTC | The name of the time zone used to interpret the hours attribute. Time zones are full names or short forms, when the latter is unambiguous. See [ZoneId.of](https://docs.oracle.com/javase/8/docs/api/java/time/ZoneId.html#of-java.lang.String-) for the full spec of acceptable values. |
| from-date | No | The inclusive starting date of this block (ISO-8601, `YYYY-MM-DD`). |
| to-date | No | The inclusive ending date of this block (ISO-8601, `YYYY-MM-DD`). |
The below example blocks all changes on weekends, and blocks revisions outside working hours, in the PST time zone:
```
```
```
```
The below example blocks:
- all changes on Sundays starting on 2022-03-01
- all changes in the hours 16-23 between 2022-02-10 and 2022-02-15
- all changes until 2022-01-05
```
```
```
```
## upgrade
In ``, or ``. Determines the strategy for upgrading the application, or one of its instances. By default, application revision changes and Vespa platform changes are deployed separately. The exception is when an upgrade fails; then, the latest application revision is deployed together with the upgrade, as these may be necessary to fix the upgrade failure.
| Attribute | Mandatory | Values |
| --- | --- | --- |
| rollout | No, default `separate` |
- `separate` is the default. When a revision catches up to a platform upgrade, it stays behind, unless the upgrade alone fails.
- `simultaneous` favors revision roll-out. When a revision catches up to a platform upgrade, it joins, and then passes the upgrade.
|
| revision-target | No, default `latest` |
- `latest` is the default. When rolling out a new revision to an instance, the latest available revision is chosen.
- `next` trades speed for smaller changes. When rolling out a new revision to an instance, the next available revision is chosen.
The available revisions for an instance are revisions which are not yet deployed, or revisions which have rolled out in previous instances. |
| revision-change | No, default `when-failing` |
- `always` is the most aggressive setting. A new, available revision may always replace the one which is currently rolling out.
- `when-failing` is the default. A new, available revision may replace the one which is currently rolling out if this is failing.
- `when-clear` is the most conservative setting. A new, available revision may never replace one which is currently rolling out.
Revision targets will never automatically change inside [revision block window](#block-change), but may be set by manual intervention at any time. |
| max-risk | No, default `0` | May only be used with `revision-change="when-clear"` and `revision-target="next"`. The maximum amount of _risk_ to roll out per new revision target. The default of `0` results in the next build always being chosen, while a higher value allows skipping intermediate builds, as long as the cumulative risk does not exceed what is configured here. |
| min-risk | No, default `0` | Must be less than or equal to the configured `max-risk`. The minimum amount of _risk_ to start rolling out a new revision. The default of `0` results in a new revision rolling out as soon as anything is ready, while a higher value lets the system wait until enough cumulative risk is available. This can be used to avoid blocking a lengthy deployment process with trivial changes. |
| max-idle-hours | No, default `8` | May only be used when `min-risk` is specified, and greater than `0`. The maximum number of hours to wait for enough cumulative risk to be available, before rolling out a new revision. |
## test
Meaning depends on where it is located:
| Parent | Description |
| --- | --- |
| `` `` | If present, the application is deployed to the [`test`](../../operations/environments.html#test) environment, and system tested there, even if no prod zones are deployed to. Also, when specified, system tests _must_ be present in the application test package. See guides for [getting to production](../../operations/production-deployment.html).
If present in an `` element, system tests are run for that specific instance before any production deployments of the instance may proceed — otherwise, previous system tests for any instance are acceptable. |
| `` `` `` | If present, production tests are run against the production region with id contained in this element. A test must be _after_ a corresponding [region](#region) element. When specified, production tests _must_ be preset in the application test package. See guides for [getting to production](../../operations/production-deployment.html). |
| Attribute | Mandatory | Values |
| --- | --- | --- |
| cloud-account | No | For [system tests](../../operations/automated-deployments.html#system-tests) only: account to deploy to with [Vespa Cloud Enclave](../../operations/enclave/enclave). Overrides parent's use of cloud-account. Cloud account _must not_ be specified for [production tests](../../operations/automated-deployments.html#production-tests), which always run in the account of the corresponding deployment. |
## staging
In ``, or ``. If present, the application is deployed to the[`staging`](../../operations/environments.html#staging) environment, and tested there, even if no prod zones are deployed to. If present in an `` element, staging tests are run for that specific instance before any production deployments of the instance may proceed — otherwise, previous staging tests for any instance are acceptable. When specified, staging tests _must_ be preset in the application test package. See guides for [getting to production](../../operations/production-deployment.html).
| Attribute | Mandatory | Values |
| --- | --- | --- |
| cloud-account | No | Account to deploy to with [Vespa Cloud Enclave](../../operations/enclave/enclave). Overrides parent's use of cloud-account. |
## prod
In ``, **or** in ``. If present, the application is deployed to the production regions listed inside this element, under the specified instance, after deployments and tests in the `test` and `staging` environments.
| Attribute | Mandatory | Values |
| --- | --- | --- |
| cloud-account | No | Account to deploy to with [Vespa Cloud Enclave](../../operations/enclave/enclave). Overrides parent's use of cloud-account. |
## region
In ``, ``, ``, or ``. The application is deployed to the production[region](../../operations/zones.html) with id contained in this element.
| Attribute | Mandatory | Values |
| --- | --- | --- |
| fraction | No | Only when this region is inside a group: The fractional membership in the group. |
| cloud-account | No | Account to deploy to with [Enclave](../../operations/enclave/enclave). Overrides parent's use of cloud-account. |
## dev
In ``. Optionally used to control deployment settings for the [dev environment](../../operations/environments.html). This can be used specify a different cloud account, tags, and private endpoints.
| Attribute | Mandatory | Values |
| --- | --- | --- |
| tags | No | Space-separated tags which can be referenced to make [deployment variants](../../operations/deployment-variants.html). |
| cloud-account | No | Account to deploy to with [Vespa Cloud Enclave](../../operations/enclave/enclave). Overrides parent's use of cloud-account. |
## delay
In ``, ``, ``, ``, or ``. Introduces a delay which must pass after completion of all previous steps, before subsequent steps may proceed. This may be useful to allow some grace time to discover errors before deploying a change in additional zones, or to gather higher-level metrics for a production deployment for a while, before evaluating these in a production test. The maximum total delay for the whole deployment spec is 48 hours. The delay is specified by any combination of the `hours`, `minutes` and `seconds` attributes.
## parallel
In ``, ``, or ``. Runs the contained steps in parallel: instances if in ``, or primitive steps (deployments, tests or delays) or a series of these (see [steps](#steps)) otherwise. Multiple `` elements are permitted. The following example will deploy to `us-west-1` first, then to `us-east-3` and `us-central-1`simultaneously, and, finally to `eu-west-1`, once both parallel deployments have completed:
```
```
us-west-1
us-east-3
us-central-1
eu-west-1
```
```
## steps
In ``. Runs the contained parallel or primitive steps (deployments, tests or delays) serially. The following example will in parallel:
1. deploy to `us-east-3`,
2. deploy to `us-west-1`, then delay 1 hour, and run tests for `us-west-1`, and
3. delay for two hours.
Thus, the parallel block is complete when both deployments are complete, tests are successful for the second deployment, and at least two hours have passed since the block began executing.
```
```
us-east-3
us-west-1
us-west-1
```
```
## tester
In ``, `` and ``. Specifies container settings for the tester application container, which is used to run system, staging and production verification tests.
The allowed elements inside this are [``](../applications/services/services.html#nodes).
```
```
```
```
## endpoints (global)
In ``, without any ``declared **or** in ``: This allows_global_ endpoints, via one or more [``](#endpoint-global) elements; and [zone endpoint](#endpoint-zone) and [private endpoint](#endpoint-private)elements for cloud-native private network configuration.
## endpoints (dev)
In ``. This allows[zone endpoint](#endpoint-zone) elements for cloud-native private network configuration for[dev](../../operations/environments.html#dev) deployments. Note that [private endpoints](#endpoint-private) are only supported in `prod`.
## endpoint (global)
In `` or ``. Specifies a global endpoint for this application. Each endpoint will point to the regions that are declared in the endpoint. If no regions are specified, the endpoint defaults to the regions declared in the `` element. The following example creates a default endpoint to all regions, and a _us_ endpoint pointing only to US regions.
```
```
aws-us-east-1c
aws-us-west-2a
```
```
| Attribute | Mandatory | Values |
| --- | --- | --- |
| id | No | The identifier for the endpoint. This will be part of the endpoint name that is generated. If not specified, the endpoint will be the default global endpoint for the application. |
| container-id | Yes | The id of the [container cluster](/en/reference/applications/services/container.html) to which requests to the global endpoint is forwarded. |
Global endpoints are implemented using Route 53 and healthchecks, to keep active zones in rotation. See [BCP](#bcp) for advanced configurations.
## endpoint (zone)
In `` or ``, with `type='zone'`. Used to disable public zone endpoints. _Non-public endpoints can not be used in global endpoints, which require that all constituent endpoints are public._The example disables the public zone endpoint for the `my-container`container cluster in all regions, except where it is explicitly enabled, in `region-1`. Changing endpoint visibility will make the service unavailable for a short period of time.
```
```
region-1
```
```
| Attribute | Mandatory | Values |
| --- | --- | --- |
| type | Yes | Private endpoints are specified with `type='zone'`. |
| container-id | Yes | The id of the [container cluster](/en/reference/applications/services/container.html) to disable public endpoints for. |
| enabled | No | Whether a public endpoint for this container cluster should be enabled; default `true`. |
## endpoint (private)
In `` or ``, with `type='private'`. Specifies a private endpoint service for this application. Each service will be launched in the regions that are declared in the endpoint. If no regions are specified, the service is launched in all regions declared in the`` element, that support any of the declared [access types](#allow). The following example creates a private endpoint in two specific regions.
```
```
aws-us-east-1c
gcp-us-central1-f
```
```
| Attribute | Mandatory | Values |
| --- | --- | --- |
| type | Yes | Private endpoints are specified with `type='private'`. |
| container-id | Yes | The id of the [container cluster](/en/reference/applications/services/container.html) to which requests to the private endpoint service is forwarded. |
| auth-method | No | The authentication method to use with this [private endpoint](/en/operations/private-endpoints.html).
Must be either `mtls` or `token`. Defaults to mTLS if not included. |
## allow
In ``. Allows a principal identified by the URN to set up a connection to the declared private endpoint service. This element must be repeated for each additional URN. An endpoint service will only consider allowed URNs of a compatible type, and will only be created if at least one compatible access type-and-URN is given:
- For AWS deployments, specify `aws-private-link`, and an _ARN_.
- For GCP deployments, specify `gcp-service-connect`, and a _project ID_
```
```
```
```
| Attribute | Mandatory | Values |
| --- | --- | --- |
| with | Yes | The private endpoint access type; must be `aws-private-link` or `gcp-service-connect`. |
| arn | Maybe | Must be specified with `aws-private-link`. See [AWS documentation](https://docs.aws.amazon.com/vpc/latest/privatelink/configure-endpoint-service.html) for more details. |
| project | Maybe | Must be specified with `gcp-service-connect`. See [GCP documentation](https://cloud.google.com/vpc/docs/configure-private-service-connect-services) for more details. |
## bcp
In `` or ``. Defines the BCP (Business Continuity Planning) structure of this instance: Which zones should take over for which others during the outage of a zone and how fast they must have the capacity ready. Autoscaling uses this information to decide the ideal cpu load of a zone. If this element is not defined, it is assumed that all regions covers for an equal share of the traffic of all other regions and must have that capacity ready at all times.
If a bcp element is specified at the root, and explicit instances are used, that bcp element becomes the default for all instances that does not contain a bcp element themselves. If a BCP element contains no group elements it will implicitly define a single group of all the regions of the instance in which it is used.
See [BCP test](https://cloud.vespa.ai/en/reference/bcp-test.html) for a procedure to verify that your BCP configuration is correct.
| Attribute | Mandatory | Values |
| --- | --- | --- |
| deadline | No |
The max time after a region becomes unreachable until the other regions in its BCP group must be able to handle the traffic of it, given as a number of minutes followed by 'm', 'h' or 'd' (for minutes, hours or days). The default deadline is 0: Regions must at all times have capacity to handle BCP traffic immediately.
By providing a deadline, autoscaling can avoid the cost of provisioning additional resources for BCP capacity if it predicts that it can grow to handle the traffic faster than the deadline in a given cluster.
This is the default deadline to be used for all groups that don't specify one themselves.
|
Example:
```
```
us-east1
us-east2
us-central1
us-west1
us-west2
us-central1
```
```
## group
In ``. Defines a bcp group: A set of regions whose members cover for each other during a regional outage.
Each region in a group will (as allowed, when autoscaling ranges are configured) provision resources sufficient to handle that any other single region in the group goes down. The traffic of the region is assumed to be rerouted in equal amount to the remaining regions in the group. That is, if a group has one member, no resources will be provisioned to handle an outage in that member. If a group has two members, each will aim to provision sufficient resources to handle the actual traffic of the other. If a group has three members, each will provision to handle half of the traffic observed in the region among the two others which receives the most traffic.
A region may have fractional membership in multiple groups, meaning it will handle just that fraction of the traffic of the remaining members, and vice versa. A regions total membership among groups must always sum to exactly 1.
A group may also define global endpoints for the region members in the group. This is exactly the same as defining the endpoint separately and repeating the regions of the group under the endpoint. Endpoints under a group cannot contain explicit region sub-elements.
| Attribute | Mandatory | Values |
| --- | --- | --- |
| deadline | No |
The deadline of this BCP group. See deadline on the BCP element.
|
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [deployment](#deployment)
- [instance](#instance)
- [block-change](#block-change)
- [upgrade](#upgrade)
- [test](#test)
- [staging](#staging)
- [prod](#prod)
- [region](#region)
- [dev](#dev)
- [delay](#delay)
- [parallel](#parallel)
- [steps](#steps)
- [tester](#tester)
- [endpoints (global)](#endpoints-global)
- [endpoints (dev)](#endpoints-dev)
- [endpoint (global)](#endpoint-global)
- [endpoint (zone)](#endpoint-zone)
- [endpoint (private)](#endpoint-private)
- [allow](#allow)
- [bcp](#bcp)
- [group](#group)
---
# Source: https://docs.vespa.ai/en/applications/developer-guide.html.md
# Developer Guide
This document explains how to develop applications, including basic terminology, tips on using the Vespa Cloud Console, and how to benchmark and size your application. See [deploy a sample application](../basics/deploy-an-application.html) to deploy a basic sample application, and[automated deployments](../operations/automated-deployments.html) on making production deployments safe routine occurences.
## Manual deployments
Developers will typically deploy their application to the `dev` [zone](../operations/zones.html) during development. Each deployment is owned by a _tenant_, and each specified _instance_ is a separate copy of the application; this lets developers work on independent copies of the same application, or collaborate on a shared one, as they prefer—more details [here](../learn/tenant-apps-instances.html). These values can be set in the Vespa Cloud UI when deploying, or with each of the build and deploy tools, as shown in the respective getting-started guides.
Additionally, a deployment may specify a different [zone](../operations/zones.html) to deploy to, instead of the default `dev` zone.
### Auto downsizing
Deployments to `dev` are downscaled to one small node by default, so that applications can be deployed there without changing `services.xml`. See [performance testing](#performance-testing) for how to disable auto downsizing using `deploy:environment="dev"`.
### Availability
The `dev` zone is a sandbox and not for production serving; It has no uptime guarantees.
An automated Vespa software upgrade can be triggered at any time, and this may lead to some downtime if you have only one node per cluster (as with the default [auto downsizing](#auto-downsizing)).
## Performance testing
For performance testing, to avoid auto downsizing, lock the [resources](../reference/applications/services/services.html) using `deploy:environment="dev"`:
```
```
```
```
Read more in [benchmarking](../performance/benchmarking-cloud.html) and [variants in services.xml](../operations/deployment-variants.html).
## Component overview

Application packages can contain Java components to be run in container clusters. The most common component types are:
- [Searchers](searchers.html), which can modify or build the query, modify the result, implement workflows issuing multiple queries etc.
- [Document processors](document-processors.html) that can modify incoming write operations.
- [Handlers](request-handlers.html) that can implement custom web service APIs.
- [Renderers](result-renderers.html) that are used to define custom result formats.
Components are constructed by dependency injection and are reloaded safely on deployment without restarts. See the [container documentation](containers.html) for more details.
See [deploy an application having Java components](../basics/deploy-an-application-java.html), and [troubleshooting](../operations/self-managed/admin-procedures.html#troubleshooting).
## Developing Components
The development cycle consists of creating the component, deploying the application package to Vespa, writing tests, and iterating. These steps refer to files in [album-recommendation-java](https://github.com/vespa-engine/sample-apps/tree/master/album-recommendation-java):
| Build |
All the Vespa sample applications use the [bundle plugin](bundles.html#maven-bundle-plugin) to build the components.
| |
| Configure |
A key Vespa feature is code and configuration consistency, deployed using an [application package](../basics/applications.html). This ensures that code and configuration is in sync, and loaded atomically when deployed. This is done by generating config classes from config definition files. In Vespa and application code, configuration is therefore accessed through generated config classes.
The Maven target `generate-sources` (invoked by `mvn install`) uses [metal-names.def](https://github.com/vespa-engine/sample-apps/blob/master/album-recommendation-java/src/main/resources/configdefinitions/metal-names.def) to generate `target/generated-sources/vespa-configgen-plugin/com/mydomain/example/MetalNamesConfig.java`.
After generating config classes, they will resolve in tools like [IntelliJ IDEA](https://www.jetbrains.com/idea/download/).
| |
| Tests |
Examples unit tests are found in [MetalSearcherTest.java](https://github.com/vespa-engine/sample-apps/blob/master/album-recommendation-java/src/test/java/ai/vespa/example/album/MetalSearcherTest.java). `testAddedOrTerm1` and `testAddedOrTerm2` illustrates two ways of doing the same test:
- The first setting up the minimal search chain for [YQL](../querying/query-language.html) programmatically
- The second uses `
com.yahoo.application.Application`, which sets up the application package and simplifies testing
Read more in [unit testing](unit-testing.html).
|
## Debugging Components
**Important:** The debugging procedure only works for endpoints with an open debug port - most managed services don't do this for security reasons.
Vespa Cloud does not allow debugging over the _Java Debug Wire Protocol (JDWP)_ due to the protocol's inherent lack of security measures. If you need interactive debugging, deploy your application to a self-hosted Vespa installation (below) and manually [add the _JDWP_ agent to JVM options](#debugging-components).
You may debug your Java code by requesting either a JVM heap dump or a Java Flight Recorder recording through the [Vespa Cloud Console](https://console.vespa-cloud.com/). Go to your application's cluster overview and select _export JVM artifact_ on any _container_ node. The process will take up to a few minutes. You'll find the steps to download the dump on the Console once it's completed. Extract the files from the downloaded Zstandard-compressed archive, and use the free [JDK Mission Control](https://www.oracle.com/java/technologies/jdk-mission-control.html) utility to inspect the dump/recording.

To debug a [Searcher](searchers.html) / [Document Processor](document-processors.html) / [Component](components.html) running in a self-hosted container, set up a remote debugging configuration in the IDEA - IntelliJ example:
1. Run -\> Edit Configurations...
2. Click `+` to add a new configuration.
3. Select the "Remote JVM Debug" option in the left-most pane.
4. Set hostname to the host running the container, change the port if needed.
5. Set the container's [jvm options](../reference/applications/services/container.html#jvm) to the value in "Command line arguments for remote JVM":
```
\
```
6. Re-deploy the application, then restart Vespa on the node that runs the container. Make sure the port is published if using a Docker/Podman container, e.g.:
```
$ docker run --detach --name vespa --hostname vespa-container \
--publish 127.0.0.1:8080:8080 --publish 127.0.0.1:19071:19071--publish 127.0.0.1:5005:5005\
vespaengine/vespa
```
7. Start debugging! Check _vespa.log_ for errors.
[](https://www.youtube.com/embed/dUCLKtNchuE) **Vespa videos:** Find _Debugging a Vespa Searcher_ in the vespaengine [youtube channel](https://www.youtube.com/@vespaai)!
## Developing system and staging tests
When using Vespa Cloud, system and tests are most easily developed using a test deployment in a `dev` zone to run the tests against. Refer to [general testing guide](testing.html) for a discussion of the different test types, and the [basic HTTP tests](../reference/applications/testing.html) or [Java JUnit tests](../reference/applications/testing-java.html) reference for how to write the relevant tests.
If using the [Vespa CLI](../clients/vespa-cli.html) to deploy and run [basic HTTP tests](../reference/applications/testing.html), the same commands as in the test reference will just work, provided the CLI is configured to use the `cloud` target.
### Running Java tests
With Maven, and [Java Junit tests](../reference/applications/testing-java.html), some additional configuration is required, to infuse the test runtime on the local machine with API and data plane credentials:
```
$ mvn test \
-D test.categories=system \
-D dataPlaneKeyFile=data-plane-private-key.pem -D dataPlaneCertificateFile=data-plane-public-cert.pem \
-D apiKey="$API_KEY"
```
The `apiKey` is used to fetch the _dev_ instance's endpoints. The data plane key and certificate pair is used by [ai.vespa.hosted.cd.Endpoint](https://github.com/vespa-engine/vespa/blob/master/tenant-cd-api/src/main/java/ai/vespa/hosted/cd/Endpoint.java) to access the application endpoint. Note that the `-D vespa.test.config` argument is gone; this configuration is automatically fetched from the Vespa Cloud API—hence the need for the API key.
When running Vespa self-hosted like in the [sample application](../basics/deploy-an-application-local.html), no authentication is required by default, to either API or container, and specifying a data plane key and certificate will instead cause the test to fail, since the correct SSL context is the Java default in this case.
Make sure the TestRuntime is able to start. As it will init an SSL context, make sure to remove config when running locally, in order to use a default context. Remove properties from _pom.xml_ and IDE debug configuration.
Developers can also set these parameters in the IDE run configuration to debug system tests:
```
-D test.categories=system
-D tenant=my_tenant
-D application=my_app
-D instance=my_instance
-D apiKeyFile=/path/to/myname.mytenant.pem
-D dataPlaneCertificateFile=data-plane-public-cert.pem
-D dataPlaneKeyFile=data-plane-private-key.pem
```
## Tips and troubleshooting
- Vespa Cloud upgrades daily, and applications in `dev` also have their Vespa platform upgraded. This usually happens at the opposite time of day of when deployments are made to each instance, and takes some minutes. Deployments without redundancy will be unavailable during the upgrade.
- Failure to deploy, due to authentication (HTTP code 401) or authorization (HTTP code 403), is most often due to wrong configuration of `tenant` and/or `application`, when using command line tools to deploy. Ensure the values set with Vespa CLI or in `pom.xml` match what is configured in the UI.
- In case of data plane failure, remember to copy the public certificate to `src/main/application/security/clients.pem` before building and deploying. This is handled by the Vespa CLI `vespa auth cert` command.
- To run Java [system and staging tests](../reference/applications/testing-java.html) in an IDE, ensure all API and data plane keys and certificates are configured in the IDE as well; not all IDEs pick up all settings from `pom.xml` correctly:
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [Manual deployments](#manual-deployments)
- [Auto downsizing](#auto-downsizing)
- [Availability](#availability)
- [Performance testing](#performance-testing)
- [Component overview](#component-overview)
- [Developing Components](#developing-components)
- [Debugging Components](#debugging-components)
- [Developing system and staging tests](#developing-system-and-staging-tests)
- [Running Java tests](#running-java-tests)
- [Tips and troubleshooting](#tips-and-troubleshooting)
---
# Source: https://docs.vespa.ai/en/reference/operations/metrics/distributor.html.md
# Distributor Metrics
| Name | Unit | Description |
| --- | --- | --- |
|
vds.idealstate.buckets\_rechecking
| bucket | The number of buckets that we are rechecking for ideal state operations |
|
vds.idealstate.idealstate\_diff
| bucket | A number representing the current difference from the ideal state. This is a number that decreases steadily as the system is getting closer to the ideal state |
|
vds.idealstate.buckets\_toofewcopies
| bucket | The number of buckets the distributor controls that have less than the desired redundancy |
|
vds.idealstate.buckets\_toomanycopies
| bucket | The number of buckets the distributor controls that have more than the desired redundancy |
|
vds.idealstate.buckets
| bucket | The number of buckets the distributor controls |
|
vds.idealstate.buckets\_notrusted
| bucket | The number of buckets that have no trusted copies. |
|
vds.idealstate.bucket\_replicas\_moving\_out
| bucket | Bucket replicas that should be moved out, e.g. retirement case or node added to cluster that has higher ideal state priority. |
|
vds.idealstate.bucket\_replicas\_copying\_out
| bucket | Bucket replicas that should be copied out, e.g. node is in ideal state but might have to provide data other nodes in a merge |
|
vds.idealstate.bucket\_replicas\_copying\_in
| bucket | Bucket replicas that should be copied in, e.g. node does not have a replica for a bucket that it is in ideal state for |
|
vds.idealstate.bucket\_replicas\_syncing
| bucket | Bucket replicas that need syncing due to mismatching metadata |
|
vds.idealstate.max\_observed\_time\_since\_last\_gc\_sec
| second | Maximum time (in seconds) since GC was last successfully run for a bucket. Aggregated max value across all buckets on the distributor. |
|
vds.idealstate.delete\_bucket.done\_ok
| operation | The number of operations successfully performed |
|
vds.idealstate.delete\_bucket.done\_failed
| operation | The number of operations that failed |
|
vds.idealstate.delete\_bucket.pending
| operation | The number of operations pending |
|
vds.idealstate.delete\_bucket.blocked
| operation | The number of operations blocked by blocking operation starter |
|
vds.idealstate.delete\_bucket.throttled
| operation | The number of operations throttled by throttling operation starter |
|
vds.idealstate.merge\_bucket.done\_ok
| operation | The number of operations successfully performed |
|
vds.idealstate.merge\_bucket.done\_failed
| operation | The number of operations that failed |
|
vds.idealstate.merge\_bucket.pending
| operation | The number of operations pending |
|
vds.idealstate.merge\_bucket.blocked
| operation | The number of operations blocked by blocking operation starter |
|
vds.idealstate.merge\_bucket.throttled
| operation | The number of operations throttled by throttling operation starter |
|
vds.idealstate.merge\_bucket.source\_only\_copy\_changed
| operation | The number of merge operations where source-only copy changed |
|
vds.idealstate.merge\_bucket.source\_only\_copy\_delete\_blocked
| operation | The number of merge operations where delete of unchanged source-only copies was blocked |
|
vds.idealstate.merge\_bucket.source\_only\_copy\_delete\_failed
| operation | The number of merge operations where delete of unchanged source-only copies failed |
|
vds.idealstate.split\_bucket.done\_ok
| operation | The number of operations successfully performed |
|
vds.idealstate.split\_bucket.done\_failed
| operation | The number of operations that failed |
|
vds.idealstate.split\_bucket.pending
| operation | The number of operations pending |
|
vds.idealstate.split\_bucket.blocked
| operation | The number of operations blocked by blocking operation starter |
|
vds.idealstate.split\_bucket.throttled
| operation | The number of operations throttled by throttling operation starter |
|
vds.idealstate.join\_bucket.done\_ok
| operation | The number of operations successfully performed |
|
vds.idealstate.join\_bucket.done\_failed
| operation | The number of operations that failed |
|
vds.idealstate.join\_bucket.pending
| operation | The number of operations pending |
|
vds.idealstate.join\_bucket.blocked
| operation | The number of operations blocked by blocking operation starter |
|
vds.idealstate.join\_bucket.throttled
| operation | The number of operations throttled by throttling operation starter |
|
vds.idealstate.garbage\_collection.done\_ok
| operation | The number of operations successfully performed |
|
vds.idealstate.garbage\_collection.done\_failed
| operation | The number of operations that failed |
|
vds.idealstate.garbage\_collection.pending
| operation | The number of operations pending |
|
vds.idealstate.garbage\_collection.documents\_removed
| document | Number of documents removed by GC operations |
|
vds.idealstate.garbage\_collection.blocked
| operation | The number of operations blocked by blocking operation starter |
|
vds.idealstate.garbage\_collection.throttled
| operation | The number of operations throttled by throttling operation starter |
|
vds.distributor.puts.latency
| millisecond | The latency of put operations |
|
vds.distributor.puts.ok
| operation | The number of successful put operations performed |
|
vds.distributor.puts.failures.total
| operation | Sum of all failures |
|
vds.distributor.puts.failures.notfound
| operation | The number of operations that failed because the document did not exist |
|
vds.distributor.puts.failures.test\_and\_set\_failed
| operation | The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document |
|
vds.distributor.puts.failures.concurrent\_mutations
| operation | The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID |
|
vds.distributor.puts.failures.notconnected
| operation | The number of operations discarded because there were no available storage nodes to send to |
|
vds.distributor.puts.failures.notready
| operation | The number of operations discarded because distributor was not ready |
|
vds.distributor.puts.failures.wrongdistributor
| operation | The number of operations discarded because they were sent to the wrong distributor |
|
vds.distributor.puts.failures.safe\_time\_not\_reached
| operation | The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed |
|
vds.distributor.puts.failures.storagefailure
| operation | The number of operations that failed in storage |
|
vds.distributor.puts.failures.timeout
| operation | The number of operations that failed because the operation timed out towards storage |
|
vds.distributor.puts.failures.busy
| operation | The number of messages from storage that failed because the storage node was busy |
|
vds.distributor.puts.failures.inconsistent\_bucket
| operation | The number of operations failed due to buckets being in an inconsistent state or not found |
|
vds.distributor.removes.latency
| millisecond | The latency of remove operations |
|
vds.distributor.removes.ok
| operation | The number of successful removes operations performed |
|
vds.distributor.removes.failures.total
| operation | Sum of all failures |
|
vds.distributor.removes.failures.notfound
| operation | The number of operations that failed because the document did not exist |
|
vds.distributor.removes.failures.test\_and\_set\_failed
| operation | The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document |
|
vds.distributor.removes.failures.concurrent\_mutations
| operation | The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID |
|
vds.distributor.removes.failures.busy
| operation | The number of messages from storage that failed because the storage node was busy |
|
vds.distributor.removes.failures.inconsistent\_bucket
| operation | The number of operations failed due to buckets being in an inconsistent state or not found |
|
vds.distributor.removes.failures.notconnected
| operation | The number of operations discarded because there were no available storage nodes to send to |
|
vds.distributor.removes.failures.notready
| operation | The number of operations discarded because distributor was not ready |
|
vds.distributor.removes.failures.safe\_time\_not\_reached
| operation | The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed |
|
vds.distributor.removes.failures.storagefailure
| operation | The number of operations that failed in storage |
|
vds.distributor.removes.failures.timeout
| operation | The number of operations that failed because the operation timed out towards storage |
|
vds.distributor.removes.failures.wrongdistributor
| operation | The number of operations discarded because they were sent to the wrong distributor |
|
vds.distributor.updates.latency
| millisecond | The latency of update operations |
|
vds.distributor.updates.ok
| operation | The number of successful updates operations performed |
|
vds.distributor.updates.failures.total
| operation | Sum of all failures |
|
vds.distributor.updates.failures.notfound
| operation | The number of operations that failed because the document did not exist |
|
vds.distributor.updates.failures.test\_and\_set\_failed
| operation | The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document |
|
vds.distributor.updates.failures.concurrent\_mutations
| operation | The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID |
|
vds.distributor.updates.diverging\_timestamp\_updates
| operation | Number of updates that report they were performed against divergent version timestamps on different replicas |
|
vds.distributor.updates.failures.busy
| operation | The number of messages from storage that failed because the storage node was busy |
|
vds.distributor.updates.failures.inconsistent\_bucket
| operation | The number of operations failed due to buckets being in an inconsistent state or not found |
|
vds.distributor.updates.failures.notconnected
| operation | The number of operations discarded because there were no available storage nodes to send to |
|
vds.distributor.updates.failures.notready
| operation | The number of operations discarded because distributor was not ready |
|
vds.distributor.updates.failures.safe\_time\_not\_reached
| operation | The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed |
|
vds.distributor.updates.failures.storagefailure
| operation | The number of operations that failed in storage |
|
vds.distributor.updates.failures.timeout
| operation | The number of operations that failed because the operation timed out towards storage |
|
vds.distributor.updates.failures.wrongdistributor
| operation | The number of operations discarded because they were sent to the wrong distributor |
|
vds.distributor.updates.fast\_path\_restarts
| operation | Number of safe path (write repair) updates that were restarted as fast path updates because all replicas returned documents with the same timestamp in the initial read phase |
|
vds.distributor.removelocations.ok
| operation | The number of successful removelocations operations performed |
|
vds.distributor.removelocations.failures.total
| operation | Sum of all failures |
|
vds.distributor.removelocations.failures.busy
| operation | The number of messages from storage that failed because the storage node was busy |
|
vds.distributor.removelocations.failures.concurrent\_mutations
| operation | The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID |
|
vds.distributor.removelocations.failures.inconsistent\_bucket
| operation | The number of operations failed due to buckets being in an inconsistent state or not found |
|
vds.distributor.removelocations.failures.notconnected
| operation | The number of operations discarded because there were no available storage nodes to send to |
|
vds.distributor.removelocations.failures.notfound
| operation | The number of operations that failed because the document did not exist |
|
vds.distributor.removelocations.failures.notready
| operation | The number of operations discarded because distributor was not ready |
|
vds.distributor.removelocations.failures.safe\_time\_not\_reached
| operation | The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed |
|
vds.distributor.removelocations.failures.storagefailure
| operation | The number of operations that failed in storage |
|
vds.distributor.removelocations.failures.test\_and\_set\_failed
| operation | The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document |
|
vds.distributor.removelocations.failures.timeout
| operation | The number of operations that failed because the operation timed out towards storage |
|
vds.distributor.removelocations.failures.wrongdistributor
| operation | The number of operations discarded because they were sent to the wrong distributor |
|
vds.distributor.removelocations.latency
| millisecond | The average latency of removelocations operations |
|
vds.distributor.gets.latency
| millisecond | The average latency of gets operations |
|
vds.distributor.gets.ok
| operation | The number of successful gets operations performed |
|
vds.distributor.gets.failures.total
| operation | Sum of all failures |
|
vds.distributor.gets.failures.notfound
| operation | The number of operations that failed because the document did not exist |
|
vds.distributor.gets.failures.busy
| operation | The number of messages from storage that failed because the storage node was busy |
|
vds.distributor.gets.failures.concurrent\_mutations
| operation | The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID |
|
vds.distributor.gets.failures.inconsistent\_bucket
| operation | The number of operations failed due to buckets being in an inconsistent state or not found |
|
vds.distributor.gets.failures.notconnected
| operation | The number of operations discarded because there were no available storage nodes to send to |
|
vds.distributor.gets.failures.notready
| operation | The number of operations discarded because distributor was not ready |
|
vds.distributor.gets.failures.safe\_time\_not\_reached
| operation | The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed |
|
vds.distributor.gets.failures.storagefailure
| operation | The number of operations that failed in storage |
|
vds.distributor.gets.failures.test\_and\_set\_failed
| operation | The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document |
|
vds.distributor.gets.failures.timeout
| operation | The number of operations that failed because the operation timed out towards storage |
|
vds.distributor.gets.failures.wrongdistributor
| operation | The number of operations discarded because they were sent to the wrong distributor |
|
vds.distributor.visitor.latency
| millisecond | The average latency of visitor operations |
|
vds.distributor.visitor.ok
| operation | The number of successful visitor operations performed |
|
vds.distributor.visitor.failures.total
| operation | Sum of all failures |
|
vds.distributor.visitor.failures.notready
| operation | The number of operations discarded because distributor was not ready |
|
vds.distributor.visitor.failures.notconnected
| operation | The number of operations discarded because there were no available storage nodes to send to |
|
vds.distributor.visitor.failures.wrongdistributor
| operation | The number of operations discarded because they were sent to the wrong distributor |
|
vds.distributor.visitor.failures.safe\_time\_not\_reached
| operation | The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed |
|
vds.distributor.visitor.failures.storagefailure
| operation | The number of operations that failed in storage |
|
vds.distributor.visitor.failures.timeout
| operation | The number of operations that failed because the operation timed out towards storage |
|
vds.distributor.visitor.failures.busy
| operation | The number of messages from storage that failed because the storage node was busy |
|
vds.distributor.visitor.failures.inconsistent\_bucket
| operation | The number of operations failed due to buckets being in an inconsistent state or not found |
|
vds.distributor.visitor.failures.notfound
| operation | The number of operations that failed because the document did not exist |
|
vds.distributor.visitor.bytes\_per\_visitor
| operation | The number of bytes visited on content nodes as part of a single client visitor command |
|
vds.distributor.visitor.docs\_per\_visitor
| operation | The number of documents visited on content nodes as part of a single client visitor command |
|
vds.distributor.visitor.failures.concurrent\_mutations
| operation | The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID |
|
vds.distributor.visitor.failures.test\_and\_set\_failed
| operation | The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document |
|
vds.distributor.docsstored
| document | Number of documents stored in all buckets controlled by this distributor |
|
vds.distributor.bytesstored
| byte | Number of bytes stored in all buckets controlled by this distributor |
|
metricmanager.periodichooklatency
| millisecond | Time in ms used to update a single periodic hook |
|
metricmanager.resetlatency
| millisecond | Time in ms used to reset all metrics. |
|
metricmanager.sleeptime
| millisecond | Time in ms worker thread is sleeping |
|
metricmanager.snapshothooklatency
| millisecond | Time in ms used to update a single snapshot hook |
|
metricmanager.snapshotlatency
| millisecond | Time in ms used to take a snapshot |
|
vds.distributor.activate\_cluster\_state\_processing\_time
| millisecond | Elapsed time where the distributor thread is blocked on merging pending bucket info into its bucket database upon activating a cluster state |
|
vds.distributor.bucket\_db.memory\_usage.allocated\_bytes
| byte | The number of allocated bytes |
|
vds.distributor.bucket\_db.memory\_usage.dead\_bytes
| byte | The number of dead bytes (\<= used\_bytes) |
|
vds.distributor.bucket\_db.memory\_usage.onhold\_bytes
| byte | The number of bytes on hold |
|
vds.distributor.bucket\_db.memory\_usage.used\_bytes
| byte | The number of used bytes (\<= allocated\_bytes) |
|
vds.distributor.getbucketlists.failures.busy
| operation | The number of messages from storage that failed because the storage node was busy |
|
vds.distributor.getbucketlists.failures.concurrent\_mutations
| operation | The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID |
|
vds.distributor.getbucketlists.failures.inconsistent\_bucket
| operation | The number of operations failed due to buckets being in an inconsistent state or not found |
|
vds.distributor.getbucketlists.failures.notconnected
| operation | The number of operations discarded because there were no available storage nodes to send to |
|
vds.distributor.getbucketlists.failures.notfound
| operation | The number of operations that failed because the document did not exist |
|
vds.distributor.getbucketlists.failures.notready
| operation | The number of operations discarded because distributor was not ready |
|
vds.distributor.getbucketlists.failures.safe\_time\_not\_reached
| operation | The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed |
|
vds.distributor.getbucketlists.failures.storagefailure
| operation | The number of operations that failed in storage |
|
vds.distributor.getbucketlists.failures.test\_and\_set\_failed
| operation | The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document |
|
vds.distributor.getbucketlists.failures.timeout
| operation | The number of operations that failed because the operation timed out towards storage |
|
vds.distributor.getbucketlists.failures.total
| operation | Total number of failures |
|
vds.distributor.getbucketlists.failures.wrongdistributor
| operation | The number of operations discarded because they were sent to the wrong distributor |
|
vds.distributor.getbucketlists.latency
| millisecond | The average latency of getbucketlists operations |
|
vds.distributor.getbucketlists.ok
| operation | The number of successful getbucketlists operations performed |
|
vds.distributor.recoverymodeschedulingtime
| millisecond | Time spent scheduling operations in recovery mode after receiving new cluster state |
|
vds.distributor.set\_cluster\_state\_processing\_time
| millisecond | Elapsed time where the distributor thread is blocked on processing its bucket database upon receiving a new cluster state |
|
vds.distributor.state\_transition\_time
| millisecond | Time it takes to complete a cluster state transition. If a state transition is preempted before completing, its elapsed time is counted as part of the total time spent for the final, completed state transition |
|
vds.distributor.stats.failures.busy
| operation | The number of messages from storage that failed because the storage node was busy |
|
vds.distributor.stats.failures.concurrent\_mutations
| operation | The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID |
|
vds.distributor.stats.failures.inconsistent\_bucket
| operation | The number of operations failed due to buckets being in an inconsistent state or not found |
|
vds.distributor.stats.failures.notconnected
| operation | The number of operations discarded because there were no available storage nodes to send to |
|
vds.distributor.stats.failures.notfound
| operation | The number of operations that failed because the document did not exist |
|
vds.distributor.stats.failures.notready
| operation | The number of operations discarded because distributor was not ready |
|
vds.distributor.stats.failures.safe\_time\_not\_reached
| operation | The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed |
|
vds.distributor.stats.failures.storagefailure
| operation | The number of operations that failed in storage |
|
vds.distributor.stats.failures.test\_and\_set\_failed
| operation | The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document |
|
vds.distributor.stats.failures.timeout
| operation | The number of operations that failed because the operation timed out towards storage |
|
vds.distributor.stats.failures.total
| operation | The total number of failures |
|
vds.distributor.stats.failures.wrongdistributor
| operation | The number of operations discarded because they were sent to the wrong distributor |
|
vds.distributor.stats.latency
| millisecond | The average latency of stats operations |
|
vds.distributor.stats.ok
| operation | The number of successful stats operations performed |
|
vds.distributor.update\_gets.failures.busy
| operation | The number of messages from storage that failed because the storage node was busy |
|
vds.distributor.update\_gets.failures.concurrent\_mutations
| operation | The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID |
|
vds.distributor.update\_gets.failures.inconsistent\_bucket
| operation | The number of operations failed due to buckets being in an inconsistent state or not found |
|
vds.distributor.update\_gets.failures.notconnected
| operation | The number of operations discarded because there were no available storage nodes to send to |
|
vds.distributor.update\_gets.failures.notfound
| operation | The number of operations that failed because the document did not exist |
|
vds.distributor.update\_gets.failures.notready
| operation | The number of operations discarded because distributor was not ready |
|
vds.distributor.update\_gets.failures.safe\_time\_not\_reached
| operation | The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed |
|
vds.distributor.update\_gets.failures.storagefailure
| operation | The number of operations that failed in storage |
|
vds.distributor.update\_gets.failures.test\_and\_set\_failed
| operation | The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document |
|
vds.distributor.update\_gets.failures.timeout
| operation | The number of operations that failed because the operation timed out towards storage |
|
vds.distributor.update\_gets.failures.total
| operation | The total number of failures |
|
vds.distributor.update\_gets.failures.wrongdistributor
| operation | The number of operations discarded because they were sent to the wrong distributor |
|
vds.distributor.update\_gets.latency
| millisecond | The average latency of update\_gets operations |
|
vds.distributor.update\_gets.ok
| operation | The number of successful update\_gets operations performed |
|
vds.distributor.update\_metadata\_gets.failures.busy
| operation | The number of messages from storage that failed because the storage node was busy |
|
vds.distributor.update\_metadata\_gets.failures.concurrent\_mutations
| operation | The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID |
|
vds.distributor.update\_metadata\_gets.failures.inconsistent\_bucket
| operation | The number of operations failed due to buckets being in an inconsistent state or not found |
|
vds.distributor.update\_metadata\_gets.failures.notconnected
| operation | The number of operations discarded because there were no available storage nodes to send to |
|
vds.distributor.update\_metadata\_gets.failures.notfound
| operation | The number of operations that failed because the document did not exist |
|
vds.distributor.update\_metadata\_gets.failures.notready
| operation | The number of operations discarded because distributor was not ready |
|
vds.distributor.update\_metadata\_gets.failures.safe\_time\_not\_reached
| operation | The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed |
|
vds.distributor.update\_metadata\_gets.failures.storagefailure
| operation | The number of operations that failed in storage |
|
vds.distributor.update\_metadata\_gets.failures.test\_and\_set\_failed
| operation | The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document |
|
vds.distributor.update\_metadata\_gets.failures.timeout
| operation | The number of operations that failed because the operation timed out towards storage |
|
vds.distributor.update\_metadata\_gets.failures.total
| operation | The total number of failures |
|
vds.distributor.update\_metadata\_gets.failures.wrongdistributor
| operation | The number of operations discarded because they were sent to the wrong distributor |
|
vds.distributor.update\_metadata\_gets.latency
| millisecond | The average latency of update\_metadata\_gets operations |
|
vds.distributor.update\_metadata\_gets.ok
| operation | The number of successful update\_metadata\_gets operations performed |
|
vds.distributor.update\_puts.failures.busy
| operation | The number of messages from storage that failed because the storage node was busy |
|
vds.distributor.update\_puts.failures.concurrent\_mutations
| operation | The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID |
|
vds.distributor.update\_puts.failures.inconsistent\_bucket
| operation | The number of operations failed due to buckets being in an inconsistent state or not found |
|
vds.distributor.update\_puts.failures.notconnected
| operation | The number of operations discarded because there were no available storage nodes to send to |
|
vds.distributor.update\_puts.failures.notfound
| operation | The number of operations that failed because the document did not exist |
|
vds.distributor.update\_puts.failures.notready
| operation | The number of operations discarded because distributor was not ready |
|
vds.distributor.update\_puts.failures.safe\_time\_not\_reached
| operation | The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed |
|
vds.distributor.update\_puts.failures.storagefailure
| operation | The number of operations that failed in storage |
|
vds.distributor.update\_puts.failures.test\_and\_set\_failed
| operation | The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document |
|
vds.distributor.update\_puts.failures.timeout
| operation | The number of operations that failed because the operation timed out towards storage |
|
vds.distributor.update\_puts.failures.total
| operation | The total number of put failures |
|
vds.distributor.update\_puts.failures.wrongdistributor
| operation | The number of operations discarded because they were sent to the wrong distributor |
|
vds.distributor.update\_puts.latency
| millisecond | The average latency of update\_puts operations |
|
vds.distributor.update\_puts.ok
| operation | The number of successful update\_puts operations performed |
|
vds.distributor.mutating\_op\_memory\_usage
| byte | Estimated amount of memory used by active mutating operations across all distributor stripes, in bytes |
|
vds.idealstate.nodes\_per\_merge
| node | The number of nodes involved in a single merge operation. |
|
vds.idealstate.set\_bucket\_state.blocked
| operation | The number of operations blocked by blocking operation starter |
|
vds.idealstate.set\_bucket\_state.done\_failed
| operation | The number of operations that failed |
|
vds.idealstate.set\_bucket\_state.done\_ok
| operation | The number of operations successfully performed |
|
vds.idealstate.set\_bucket\_state.pending
| operation | The number of operations pending |
|
vds.idealstate.set\_bucket\_state.throttled
| operation | The number of operations throttled by throttling operation starter |
|
vds.bouncer.clock\_skew\_aborts
| operation | Number of client operations that were aborted due to clock skew between sender and receiver exceeding acceptable range |
Copyright © 2026 - [Cookie Preferences](#)
---
# Source: https://docs.vespa.ai/en/operations/self-managed/docker-containers.html.md
# Docker containers
This document describes tuning and adaptions for running Vespa Docker containers, for developer use on laptop, and in production.
## Mounting persistent volumes
The [quick start](../../basics/deploy-an-application-local.html) and [AWS ECS multinode](multinode-systems.html#aws-ecs) guides show how to run Vespa in Docker containers. In these examples, all the data is stored inside the container - the data is lost if the container is deleted. When running Vespa inside Docker containers in production, volume mappings to the parent host should be added to persist data and logs.
- /opt/vespa/var
- /opt/vespa/logs
```
$ mkdir -p /tmp/vespa/var; export VESPA_VAR_STORAGE=/tmp/vespa/var
$ mkdir -p /tmp/vespa/logs; export VESPA_LOG_STORAGE=/tmp/vespa/logs
$ docker run --detach --name vespa --hostname vespa-container \
--volume $VESPA_VAR_STORAGE:/opt/vespa/var \
--volume $VESPA_LOG_STORAGE:/opt/vespa/logs \
--publish 8080:8080 \
vespaengine/vespa
```
## Start Vespa container with Vespa user
You can start the container directly as the _vespa_ user. The _vespa_ user and group within the container are configured with user id _1000_ and group id _1000_. The vespa user and group must be the owner of the _/opt/vespa/var_ and _/opt/vespa/logs_ volumes that are mounted in the container for Vespa to start. This is required for Vespa to create the required directories and files within those directories.
The start script will check that the correct owner uid and gid are set and fail if the wrong user or group is set as the owner.
When using an isolated user namespace for the Vespa container, you must set the uid and gid of the directories on the host to the subordinate uid and gid, depending on your mapping. See the [Docker documentation](https://docs.docker.com/engine/security/userns-remap/) for more details.
```
$ mkdir -p /tmp/vespa/var; export VESPA_VAR_STORAGE=/tmp/vespa/var
$ mkdir -p /tmp/vespa/logs; export VESPA_LOG_STORAGE=/tmp/vespa/logs
$ sudo chown -R 1000:1000 $VESPA_VAR_STORAGE $VESPA_LOG_STORAGE
$ docker run --detach --name vespa --user vespa:vespa --hostname vespa-container \
--volume $VESPA_VAR_STORAGE:/opt/vespa/var \
--volume $VESPA_LOG_STORAGE:/opt/vespa/logs \
--publish 8080:8080 \
vespaengine/vespa
```
## System limits
When Vespa starts inside Docker containers, the startup scripts will set [system limits](files-processes-and-ports.html#vespa-system-limits). Make sure that the environment starting the Docker engine is set up in such a way that these limits can be set inside the containers.
For a CentOS/RHEL base host, Docker is usually started by [systemd](https://www.freedesktop.org/software/systemd/man/systemd.exec.html). In this case, `LimitNOFILE`, `LimitNPROC` and `LimitCORE` should be set to meet the minimum requirements in [system limits](files-processes-and-ports.html#vespa-system-limits).
In general, when using Docker or Podman to run Vespa, the `--ulimit` option should be used to set limits according to [system limits](files-processes-and-ports.html#vespa-system-limits). The `--pids-limit` should be set to unlimited (`-1` for Docker and `0` for Podman).
## Transparent Huge Pages
Vespa performance improves significantly by enabling [Transparent Huge Pages (THP)](https://www.kernel.org/doc/html/latest/admin-guide/mm/transhuge.html), especially for memory-intensive applications with large dense tensors with concurrent query and write workloads.
One application improved query p99 latency from 950 ms to 150 ms during concurrent query and write by enabling THP. Using THP is even more important when running in virtualized environments like AWS and GCP due to nested page tables.
When running Vespa using the container image, _THP_ settings must be set on the base host OS (Linux). The recommended settings are:
```
$ echo 1 > /sys/kernel/mm/transparent_hugepage/khugepaged/defrag
$ echo always > /sys/kernel/mm/transparent_hugepage/enabled
$ echo never > /sys/kernel/mm/transparent_hugepage/defrag
```
To verify that the setting is active, one should see that _AnonHugePages_ is non-zero, In this case, 75 GB has been allocated using AnonHugePages.
```
$ cat /proc/meminfo |grep AnonHuge
AnonHugePages: 75986944 kB
```
Note that the Vespa container needs to be restarted after modifying the base host OS settings to make the changes effective. Vespa uses `MADV_HUGEPAGE` for memory allocations done by the [content node process (proton)](/en/content/proton.html).
## Controlling which services to start
The Docker image _vespaengine/vespa_'s [start script](https://github.com/vespa-engine/docker-image/blob/master/include/start-container.sh) takes a parameter that controls which services are started inside the container.
Starting a _configserver_ container:
```
$ docker run \
--env VESPA_CONFIGSERVERS= \
vespaengine/vespaconfigserver
```
Starting a _services_ container (configserver will not be started):
```
$ docker run \
--env VESPA_CONFIGSERVERS= \
vespaengine/vespaservices
```
Starting a container with _both configserver and services_:
```
$ docker run \
--env VESPA_CONFIGSERVERS= \
vespaengine/vespaconfigserver,services
```
This is required in the case where the configserver container should run other services like an adminserver or logserver (see [services.html](/en/reference/applications/services/services.html))
If the [VESPA\_CONFIGSERVERS](files-processes-and-ports.html#environment-variables) environment variable is not specified, it will be set to the container hostname, also see [node setup](node-setup.html#hostname).
Use the [multinode-HA](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode-HA) sample application as a blueprint for how to set up config servers and services.
## Graceful stop
Stopping a running _vespaengine/vespa_ container triggers a graceful shutdown, which saves time when starting the container again (i.e., data structures are flushed). If the container is shut down forcefully, the content nodes might need to restore the state from the transaction log, which might be time-consuming. There is no chance of data loss or data corruption as the data is always written and synced to persistent storage.
The default timeout for the Docker daemon to wait for the shutdown might be too low for larger number of documents per node. Below stop will wait at least 120 seconds before terminating the running container forcefully, if the stop is successfully performed before the timeout has passed, the command takes less than the timeout:
```
$ docker stop name -t 120
```
It is also possible to configure the default Docker daemon timeout, see [--shutdown-timeout](https://docs.docker.com/reference/cli/dockerd/).
A clean content node shutdown looks like:
```
[2025-05-02 10:07:52.052] EVENT searchnode proton.node.server stopping/1 name="storagenode" why="Stopped"
[2025-05-02 10:07:52.056] EVENT searchnode proton stopping/1 name="servicelayer" why="clean shutdown"
[2025-05-02 10:07:52.056] INFO searchnode proton.proton.server.rtchooks shutting down monitoring interface
[2025-05-02 10:07:52.058] INFO searchnode proton.searchlib.docstore.logdatastore Flushing. Disk bloat is now at 0 of 8832 at 0.00 percent
[2025-05-02 10:07:52.059] INFO searchnode proton.searchlib.docstore.logdatastore Flushing. Disk bloat is now at 0 of 8832 at 0.00 percent
[2025-05-02 10:07:52.060] INFO searchnode proton.searchlib.docstore.logdatastore Flushing. Disk bloat is now at 0 of 8840 at 0.00 percent
[2025-05-02 10:07:52.066] INFO searchnode proton.transactionlog.server Stopping TLS
[2025-05-02 10:07:52.066] INFO searchnode proton.transactionlog.server TLS Stopped
[2025-05-02 10:07:52.071] EVENT searchnode proton stopping/1 name="proton" why="clean shutdown"
[2025-05-02 10:07:52.078] EVENT config-sentinel sentinel.sentinel.service stopped/1 name="searchnode" pid=354 exitcode=0
```
## Memory
The [sample applications](https://github.com/vespa-engine/sample-apps) and [local application deployment guide](../../basics/deploy-an-application-local.html) indicates the minimum memory requirements for the Docker containers.
**Note:** Too little memory is a very common problem when testing Vespa in Docker containers. Use the below to troubleshoot before making a support request, and also see the [FAQ](../../learn/faq).
As a rule of thumb, a single-node Vespa application requires a minimum of 4 GB for the Docker container. Using `docker stats` can be useful to track memory usage:
```
$ docker stats
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
589bf5801b22 node0 213.25% 697.3MiB / 3.84GiB 17.73% 14.2kB / 11.5kB 617MB / 976MB 253
e108dde84679 node1 213.52% 492.7MiB / 3.84GiB 12.53% 15.7kB / 12.7kB 74.3MB / 924MB 252
be43aacd0bbb node2 191.22% 497.8MiB / 3.84GiB 12.66% 19.6kB / 21.6kB 64MB / 949MB 261
```
It is not necessarily easy to verify that Vespa has started all services successfully. Symptoms of errors due to insufficient memory vary, depending on where it fails. Example: Inspect restart logs in a container named _vespa_, running the [quickstart](../../basics/deploy-an-application-local.html) with only 2G:
```
$ docker exec -it vespa sh -c "/opt/vespa/bin/vespa-logfmt -S config-sentinel -c sentinel.sentinel.service"
INFO : config-sentinel sentinel.sentinel.service container: incremented restart penalty to 2.000 seconds
INFO : config-sentinel sentinel.sentinel.service container: incremented restart penalty to 6.000 seconds
INFO : config-sentinel sentinel.sentinel.service container: incremented restart penalty to 14.000 seconds
INFO : config-sentinel sentinel.sentinel.service container: incremented restart penalty to 30.000 seconds
INFO : config-sentinel sentinel.sentinel.service container: will delay start by 25.173 seconds
INFO : config-sentinel sentinel.sentinel.service container: incremented restart penalty to 62.000 seconds
INFO : config-sentinel sentinel.sentinel.service container: incremented restart penalty to 126.000 seconds
INFO : config-sentinel sentinel.sentinel.service container: will delay start by 119.515 seconds
INFO : config-sentinel sentinel.sentinel.service container: incremented restart penalty to 254.000 seconds
INFO : config-sentinel sentinel.sentinel.service container: incremented restart penalty to 510.000 seconds
INFO : config-sentinel sentinel.sentinel.service container: will delay start by 501.026 seconds
INFO : config-sentinel sentinel.sentinel.service container: incremented restart penalty to 1022.000 seconds
INFO : config-sentinel sentinel.sentinel.service container: incremented restart penalty to 1800.000 seconds
INFO : config-sentinel sentinel.sentinel.service container: will delay start by 1793.142 seconds
```
Observe that the _container_ service restarts in a loop, with increasing pause.
A common problem is [config servers](configuration-server.html) not starting or running properly due to a lack of memory. This manifests itself as nothing listening on 19071, or deployment failures.
Some guides/sample applications have specific configurations to minimize resource usage. Example from [multinode-HA](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode-HA):
```
$ docker run --detach --name node0 --hostname node0.vespanet \
-e VESPA_CONFIGSERVERS=node0.vespanet,node1.vespanet,node2.vespanet \
-eVESPA\_CONFIGSERVER\_JVMARGS="-Xms32M -Xmx128M"\
-eVESPA\_CONFIGPROXY\_JVMARGS="-Xms32M -Xmx32M"\
--network vespanet \
--publish 19071:19071 --publish 19100:19100 --publish 19050:19050 --publish 20092:19092 \
vespaengine/vespa
```
Here [VESPA\_CONFIGSERVER\_JVMARGS](files-processes-and-ports.html#environment-variables) and [VESPA\_CONFIGPROXY\_JVMARGS](files-processes-and-ports.html#environment-variables) are tweaked to the minimum for a functional test only.
**Important:** For production use, do not reduce memory settings in `VESPA_CONFIGSERVER_JVMARGS` and `VESPA_CONFIGPROXY_JVMARGS`unless you know what you are doing - the Vespa defaults are set for regular production use, and rarely need changing.
Container memory setting are done in _services.xml_, example from [multinode-HA](https://github.com/vespa-engine/sample-apps/blob/master/examples/operations/multinode-HA/services.xml):
```
\
```
Make sure that the settings match the Docker container Vespa is running in.
Also see [node memory settings](node-setup.html#memory-settings) for more settings.
## Network
Vespa processes communicate over both fixed and ephemeral ports - in general, all ports must be accessible. See [example ephemeral use](../../writing/visiting.html#handshake-failed).
Find an example application using a Docker network in [multinode-HA](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode-HA).
## Resource usage
Note that CPU usage will not be zero even if there are zero documents and zero queries. Starting the _vespaengine/vespa_ container image means starting the [configuration server](configuration-server.html) and the [configuration sentinel](config-sentinel.html). When deploying an application, the sentinel starts the configured service processes, and they all listen to work to do, changes in the config, and so forth.
Therefore, an "idle" container instance consumes CPU and memory.
## Troubleshooting
The Vespa documentation examples use `docker`. The Vespa Team has good experience with using `podman`, too, in the examples just change from `docker` to `podman`. We recommend using Podman v5, see the [release notes](https://github.com/containers/podman/blob/main/RELEASE_NOTES.md). [emulating-docker-cli-with-podman](https://podman-desktop.io/docs/migrating-from-docker/emulating-docker-cli-with-podman) is a useful resource.
Many startup failures are caused by a failed Vespa Container start due to configuration or download errors. Use `docker logs vespa` to show the log (this example assumes a Docker container named `vespa`, use `docker ps` to list containers).
### Docker image
Make sure to use a recent Vespa release (check [releases](https://factory.vespa.ai/releases)) and validate the downloaded image:
```
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
docker.io/vespaengine/vespa latest 8cfb0da22c01 35 hours ago 1.2 GB
```
### Model download failures
If the application package depends on downloaded models, look for `RuntimeException: Not able to create config builder for payload` - [details](../../applications/components.html#component-load).
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [Mounting persistent volumes](#mounting-persistent-volumes)
- [Start Vespa container with Vespa user](#start-vespa-container-with-vespa-user)
- [System limits](#system-limits)
- [Transparent Huge Pages](#transparent-huge-pages)
- [Controlling which services to start](#controlling-which-services-to-start)
- [Graceful stop](#graceful-stop)
- [Memory](#memory)
- [Network](#network)
- [Resource usage](#resource-usage)
- [Troubleshooting](#troubleshooting)
- [Docker image](#docker-image)
- [Model download failures](#model-download-failures)
---
# Source: https://docs.vespa.ai/en/reference/applications/services/docproc.html.md
# services.xml - document-processing
This is the [document-processing](../../../applications/document-processors.html)reference in [services.xml](services.html):
```
[container](container.html)document-processing [numnodesperclient, preferlocalnode, maxmessagesinqueue, maxqueuebytesize, maxqueuewait, maxconcurrentfactor, documentexpansionfactor, containercorememory][include](container.html#include)[documentprocessor [class, bundle, id, idref, provides, before, after]](#documentprocessor)[provides](#provides)[before](processing.html#before)[after](processing.html#after)[map](#map)[field [doctype, in-document, in-processor]](#map)[chain [name, id, idref, inherits, excludes, documentprocessors]](#chain)[map](#map)[field [doctype, in-document, in-processor]](#map)[inherits](processing.html#inherits)[chain](processing.html#chain)[exclude](processing.html#exclude)[documentprocessor [class, bundle, id, idref, provides, before, after]](#documentprocessor)[provides](#provides)[before](processing.html#before)[after](processing.html#after)[map](#map)[field [doctype, in-document, in-processor]](#map)[phase [id, idref, before, after]](processing.html#phase)[before](processing.html#before)[after](processing.html#after)[threadpool](#threadpool)
```
The root element of the _document-processing_ configuration model.
| Attribute | Required | Value | Default | Description |
| --- | --- | --- | --- | --- |
| numnodesperclient | optional | | |
**Deprecated:** Ignored and deprecated, will be removed in Vespa 9.
Set to some number below the amount of nodes in the cluster to limit how many nodes a single client can connect to. If you have many clients, this can reduce the memory usage on both document-processing and client nodes. |
| preferlocalnode | optional | | false |
**Deprecated:** Ignored and deprecated, will be removed in Vespa 9.
Set to always prefer sending to a document-processing node running on the same host as the client. You should use this if you are running a client on each document-processing node. |
| maxmessagesinqueue | | | |
|
| maxqueuebytesize | | | |
**Deprecated:** Ignored and deprecated, will be removed in Vespa 9.
|
| maxqueuewait | optional | | |
The maximum number of seconds a message should wait in queue before being processed. Docproc will adapt its queue size to adhere to this. If the queue is full, new messages will be replied to with SESSION\_BUSY.
|
| maxconcurrentfactor | | | |
|
| documentexpansionfactor | optional | | |
|
| containercorememory | | | |
|
## Document Processor elements
_documentprocessor_ elements are contained in [docproc chain elements](#chain)or in the _document-processing_ root.
A documentprocessor element is either a document processor definition or document processor reference. The rest of this section deals with document processor definitions; document processor references are described in [docproc chain elements](#docproc-chain-elements).
A documentprocessor definition causes the creation of exactly one document processor instance. This instance is set up according to the content of the documentprocessor element.
A documentprocessor definition contained in a docproc chain element defines an_inner document processor_. Otherwise, it defines an _outer document processor._
For inner documentprocessors, the name must be unique inside the docproc chain. For outer documentprocessors, the component id must be unique. An inner documentprocessor is not permitted to have the same name as an outer documentprocessor.
Optional sub-elements:
- provides, a single name that should be added to the provides list
- before, a single name that should be added to the before list
- after, a single name that should be added to the after list
- config (one or more)
For more information on provides, before and after, see [Chained components](../../../applications/chaining.html).
| Attribute | Required | Value | Default | Description |
| --- | --- | --- | --- | --- |
| class | | | | |
| bundle | | | | |
| id | required | | |
The component id of the documentprocessor instance.
|
| idref | | | | |
| provides | optional | | |
A space-separated list of names that represents what this documentprocessor produces.
|
| before | optional | | |
A space-separated list of phase or provided names. Phases or documentprocessors providing these names will be placed later in the docproc chain than this document processor.
|
| after | optional | | |
A space-separated list of phase or provided names. Phases or documentprocessors providing these names will be placed earlier in the docproc chain than this document processor.
|
### documentprocessor
Defines a documentprocessor instance of a user specified class.
```
...
```
| Attribute | Required | Value | Default | Description |
| --- | --- | --- | --- | --- |
| id | required | | | The component id of the documentprocessor instance. |
| class | optional | | | A component specification containing the name of the class to instantiate to create the document processor instance. If missing, copied from id. |
| bundle | optional | | | The bundle containing the class: The name in \ in pom.xml. If a bundle is not specified, the bundle containing document processors bundled with Vespa is used. |
## Docproc chain elements
Specifies how a docproc chain should be instantiated, and how the contained document processors should be ordered.
### chain
Contained in _document-processing_. Refer to the [chain reference](processing.html#chain). Chains can [inherit](processing.html#inherits) document processors from other chains and use [phases](processing.html#phase) for ordering. Optional sub-elements:
- [documentprocessor element](#documentprocessor) (one or more), either a documentprocessor reference or documentprocessor definition. If the name given for a documentprocessor matches an _outer documentprocessor_, it is a _documentprocessor reference_ - otherwise, it is a _documentprocessor definition_. If it is a documentprocessor definition, it is also an implicit documentprocessor reference saying: use _exactly_ this documentprocessor. All these documentprocessor elements must have different name.
- [phase](processing.html#phase) (one or more).
- [config](../config-files.html#generic-configuration-in-services-xml) (one or more - will apply to all _inner_ documentprocessors in this docproc chain, unless overridden by individual inner documentprocessors).
## Map
Set up a field name mapping from the name(s) of field(s) in the input documents to the names used in a deployed docproc. The purpose is to reuse functionality without changing the field names. The example below shows the configuration:
```
```
In the example, a chain is deployed with 2 docprocs.
For the chain, a mapping from _key_ to _id_ is set up. Imagine that some or all of the docprocs in the chain read and write to a field called _id_, but we want this functionality to the document field _key_.
Furthermore, a similar thing is done for the `CityDocProc`: The docproc accesses the field_city_, whereas it's called _town_ in the feed. The mapping only applies to the document type _restaurant_.
The `CarDocProc` accesses a field called _cyl_. In this example this is mapped to the field _cylinders_ of a struct _engine_using a dotted notation.
If you specify mappings on different levels of the config (say both for a cluster and a docproc), the mapping closest to the actual docproc will take precedence.
## threadpool
Available since Vespa 8.601.12
Specifies configuration for the thread pool used by document processor chains. All values scale with the number of vCPU—see the [container tuning example](../../../performance/container-tuning.html#container-worker-threads-example). When all workers are busy, new document processing requests are rejected.
### threads
Number of worker threads per vCPU. Default value is `1`. The pool runs with `threads * vCPU` workers.
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [Document Processor elements](#document-processor-elements)
- [documentprocessor](#documentprocessor)
- [Docproc chain elements](#docproc-chain-elements)
- [chain](#chain)
- [Map](#map)
- [threadpool](#threadpool)
- [threads](#threadpool-threads)
---
# Source: https://docs.vespa.ai/en/writing/document-api-guide.html.md
# Document API
This is an introduction to how to build and compile Vespa clients using the Document API. It can be used for feeding, updating and retrieving documents, or removing documents from the repository. See also the [Java reference](https://javadoc.io/doc/com.yahoo.vespa/documentapi).
Use the [VESPA\_CONFIG\_SOURCES](../operations/self-managed/files-processes-and-ports.html#environment-variables) environment variable to set config servers to interface with.
The most common use case is using the async API in a [document processor](../applications/document-processors.html) - from the sample apps:
- Async GET in [LyricsDocumentProcessor.java](https://github.com/vespa-engine/sample-apps/blob/master/examples/document-processing/src/main/java/ai/vespa/example/album/LyricsDocumentProcessor.java)
- Async UPDATE in [ReviewProcessor.java](https://github.com/vespa-engine/sample-apps/blob/master/use-case-shopping/src/main/java/ai/vespa/example/shopping/ReviewProcessor.java)
## Documents
All data fed, indexed and searched in Vespa are instances of the `Document` class. A [document](../schemas/documents.html) is a composite object that consists of:
- A `DocumentType` that defines the set of fields that can exist in a document. A document can only have a single _document type_, but document types can inherit the content of another. All fields of an inherited type is available in all its descendants. The document type is defined in the [schema](../reference/schemas/schemas.html), which is converted into a configuration file to be read by the `DocumentManager`.
- A `DocumentId` which is a unique document identifier. The document distribution uses the document identifier, see the [reference](../content/buckets.html#distribution) for details.
- A set of `(Field, FieldValue)` pairs, or "fields" for short. The `Field` class has methods for getting its name, data type and internal identifier. The field object for a given field name can be retrieved using the `getField()` method in the `DocumentType`.
Use [DocumentAccess](https://javadoc.io/doc/com.yahoo.vespa/documentapi/latest/com/yahoo/documentapi/DocumentAccess.html) javadoc. Sample app:
```
com.yahoo.vespa
documentapi
8.634.24
```
```
import com.yahoo.document.DataType;
import com.yahoo.document.Document;
import com.yahoo.document.DocumentId;
import com.yahoo.document.DocumentPut;
import com.yahoo.document.DocumentType;
import com.yahoo.document.DocumentUpdate;
import com.yahoo.document.datatypes.StringFieldValue;
import com.yahoo.document.datatypes.WeightedSet;
import com.yahoo.document.update.FieldUpdate;
import com.yahoo.documentapi.DocumentAccess;
import com.yahoo.documentapi.SyncParameters;
import com.yahoo.documentapi.SyncSession;
public class DocClient {
public static void main(String[] args) {
// DocumentAccess is injectable in Vespa containers, but not in command line tools, etc.
DocumentAccess access = DocumentAccess.createForNonContainer();
DocumentType type = access.getDocumentTypeManager().getDocumentType("music");
DocumentId id = new DocumentId("id:namespace:music::0");
Document docIn = new Document(type, id);
SyncSession session = access.createSyncSession(new SyncParameters.Builder().build());
// Put document with a1,1
WeightedSet wset = new WeightedSet<>(DataType.getWeightedSet(DataType.STRING));
wset.put(new StringFieldValue("a1"), 1);
docIn.setFieldValue("aWeightedset", wset);
DocumentPut put = new DocumentPut(docIn);
System.out.println(docIn.toJson());
session.put(put);
// Update document with a1,10 and a2,20
DocumentUpdate upd1 = new DocumentUpdate(type, id);
WeightedSet wset1 = new WeightedSet<>(DataType.getWeightedSet(DataType.STRING));
wset1.put(new StringFieldValue("a1"), 10);
wset1.put(new StringFieldValue("a2"), 20);
upd1.addFieldUpdate(FieldUpdate.createAddAll(type.getField("aWeightedset"), wset1));
System.out.println(upd1.toString());
session.update(upd1);
Document docOut = session.get(id);
System.out.println("document get:" + docOut.toJson());
session.destroy();
access.shutdown();
}
}
```
To test using the [sample apps](https://github.com/vespa-engine/sample-apps/tree/master/album-recommendation), enable more ports for client to connect to config server and other processes on localhost - change docker command:
```
$ docker run --detach --name vespa --hostnamelocalhost--privileged \
--volume $VESPA_SAMPLE_APPS:/vespa-sample-apps --publish 8080:8080 \--publish 19070:19070 --publish 19071:19071 --publish 19090:19090 --publish 19099:19099 --publish 19101:19101 --publish 19112:19112\
vespaengine/vespa
```
## Fields
Examples:
```
doc.setFieldValue("aByte", (byte)1);
doc.setFieldValue("aInt", (int)1);
doc.setFieldValue("aLong", (long)1);
doc.setFieldValue("aFloat", 1.0);
doc.setFieldValue("aDouble", 1.0);
doc.setFieldValue("aBool", new BoolFieldValue(true));
doc.setFieldValue("aString", "Hello Field!");
doc.setFieldValue("unknownField", "Will not see me!");
Array intArray = new Array<>(doc.getField("aArray").getDataType());
intArray.add(new IntegerFieldValue(11));
intArray.add(new IntegerFieldValue(12));
doc.setFieldValue("aArray", intArray);
Struct pos = PositionDataType.valueOf(1,2);
pos = PositionDataType.fromString("N0.000002;E0.000001"); // two ways to set same value
doc.setFieldValue("aPosition", pos);
doc.setFieldValue("aPredicate", new PredicateFieldValue("aLong in [10..20]"));
byte[] rawBytes = new byte[100];
for (int i = 0; i < rawBytes.length; i++) {
rawBytes[i] = (byte)i;
}
doc.setFieldValue("aRaw", new Raw(ByteBuffer.wrap(rawBytes)));
Tensor tensor = Tensor.Builder.of(TensorType.fromSpec("tensor>(x[2],y[2])")).
cell().label("x", 0).label("y", 0).value(1.0).
cell().label("x", 0).label("y", 1).value(2.0).
cell().label("x", 1).label("y", 0).value(3.0).
cell().label("x", 1).label("y", 1).value(5.0).build();
doc.setFieldValue("aTensor", new TensorFieldValue(tensor));
MapFieldValue map = new MapFieldValue<>(new MapDataType(DataType.STRING, DataType.STRING));
map.put(new StringFieldValue("key1"), new StringFieldValue("foo"));
map.put(new StringFieldValue("key2"), new StringFieldValue("bar"));
doc.setFieldValue("aMap", map);
WeightedSet wset = new WeightedSet<>(DataType.getWeightedSet(DataType.STRING));
wset.put(new StringFieldValue("strval 1"), 5);
wset.put(new StringFieldValue("strval 2"), 10);
doc.setFieldValue("aWeightedset", wset);
```
## Document updates
A document update is a request to modify a document, see [reads and writes](reads-and-writes.html).
Primitive, and some multivalue fields (WeightedSet and Array\), are updated using a[FieldUpdate](https://javadoc.io/doc/com.yahoo.vespa/document/latest/com/yahoo/document/update/FieldUpdate.html).
Complex, multivalue fields like Map and Array\ are updated using[AddFieldPathUpdate](https://javadoc.io/doc/com.yahoo.vespa/document/latest/com/yahoo/document/fieldpathupdate/AddFieldPathUpdate.html),[AssignFieldPathUpdate](https://javadoc.io/doc/com.yahoo.vespa/document/latest/com/yahoo/document/fieldpathupdate/AssignFieldPathUpdate.html) and[RemoveFieldPathUpdate](https://javadoc.io/doc/com.yahoo.vespa/document/latest/com/yahoo/document/fieldpathupdate/RemoveFieldPathUpdate.html).Field path updates are only supported on non-attribute[fields](../reference/schemas/schemas.html#field),[index](../reference/schemas/schemas.html#index) fields, or fields containing[struct field](../reference/schemas/schemas.html#struct-field) attributes. If a field is both an index field and an attribute, then the document is updated in the document store, the index is updated, but the attribute is not updated. Thus, you can get old values in document summary requests and old values being used in ranking and grouping. A [field path](../reference/schemas/document-field-path.html) string identifies fields to update - example:
```
upd.addFieldPathUpdate(new AssignFieldPathUpdate(type, "myMap{key2}", new StringFieldValue("abc")));
```
_FieldUpdate_ examples:
```
// Simple assignment
Field intField = type.getField("aInt");
IntegerFieldValue intFieldValue = new IntegerFieldValue(2);
FieldUpdate assignUpdate = FieldUpdate.createAssign(intField, intFieldValue);
upd.addFieldUpdate(assignUpdate);
// Arithmetic
FieldUpdate addUpdate = FieldUpdate.createIncrement(type.getField("aLong"), 3);
upd.addFieldUpdate(addUpdate);
// Composite - add one array element
upd.addFieldUpdate(FieldUpdate.createAdd(type.getField("aArray"),
new IntegerFieldValue(13)));
// Composite - add two array elements
upd.addFieldUpdate(FieldUpdate.createAddAll(type.getField("aArray"),
List.of(new IntegerFieldValue(14), new IntegerFieldValue(15))));
// Composite - add weightedset element
upd.addFieldUpdate(FieldUpdate.createAdd(type.getField("aWeightedset"),
new StringFieldValue("add_me"),101));
// Composite - add set to set
WeightedSet wset = new WeightedSet<>(DataType.getWeightedSet(DataType.STRING));
wset.put(new StringFieldValue("a1"), 3);
wset.put(new StringFieldValue("a2"), 4);
upd.addFieldUpdate(FieldUpdate.createAddAll(type.getField("aWeightedset"), wset));
// Composite - update array element
upd.addFieldUpdate(FieldUpdate.createMap(type.getField("aArray"),
new IntegerFieldValue(1), // array index
new AssignValueUpdate(new IntegerFieldValue(2)))); // value at index
// Composite - increment weight
upd3.addFieldUpdate(FieldUpdate.createIncrement(type.getField("aWeightedset"),
new StringFieldValue("a1"), 1));
// Composite - add to set
upd.addFieldUpdate(FieldUpdate.createMap(type.getField("aWeightedset"),
new StringFieldValue("element1"), // value
new AssignValueUpdate(new IntegerFieldValue(30))));
```
_FieldPathUpdate_ examples:
```
// Add an element to a map
Array stringArray = new Array(DataType.getArray(DataType.STRING));
stringArray.add(new StringFieldValue("my-val"));
AddFieldPathUpdate addElement = new AddFieldPathUpdate(type, "aMap{key1}", stringArray);
upd.addFieldPathUpdate(addElement);
// Modify an element in a map
upd.addFieldPathUpdate(new AssignFieldPathUpdate(type, "aMap{key2}", new StringFieldValue("abc")));
```
### Update reply semantics
Sending in an update for which the system can not find a corresponding document to update is _not_ considered an error. These are returned with a successful status code (assuming that no actual error occurred during the update processing). Use[UpdateDocumentReply.wasFound()](https://javadoc.io/doc/com.yahoo.vespa/documentapi/latest/com/yahoo/documentapi/UpdateResponse.html#wasFound()) to check if the update was known to have been applied.
If the update returns with an error reply, the update _may or may not_ have been applied, depending on where in the platform stack the error occurred.
## Document Access
The starting point of for passing documents and updates to Vespa is the `DocumentAccess` class. This is a singleton (see `get()` method) session factory (see `createXSession()` methods), that provides three distinct access types:
- **Synchronous random access**: provided by the class `SyncSession`. Suitable for low-throughput proof-of-concept applications.
- [**Asynchronous random access**](#asyncsession): provided by the class `AsyncSession`. It allows for document repository writes and random access with **high throughput**.
- [**Visiting**](#visitorsession): provided by the class `VisitorSession`. Allows a set of documents to be accessed in order decided by the document repository, which gives higher read throughput than random access.
### AsyncSession
This class represents a session for asynchronous access to a document repository. It is created by calling`myDocumentAccess.createAsyncSession(myAsyncSessionParams)`, and provides document repository writes and random access with high throughput. The usage pattern for an asynchronous session is like:
1. `put()`, `update()`, `get()` or `remove()` is invoked on the session, and it returns a synchronous `Result` object that indicates whether the request was successful or not. The `Result` object also contains a _request identifier_.
2. The client polls the session for a `Response` through its `getNext()` method. Any operation accepted by an asynchronous session will produce exactly one response within the configured timeout.
3. Once a response is available, it is matched to the request by inspecting the response's request identifier. The response may also contain data, either a retrieved document or a failed document put or update that needs to be handled.
4. Note that the client must process the response queue or your JVM will run into garbage collection issues, as the underlying session keeps track of all responses and unless they are consumed they will be kept alive and not be garbage collected.
Example:
```
import com.yahoo.document.*;
import com.yahoo.documentapi.*;
public class MyClient {
// DocumentAccess is injectable in Vespa containers, but not in command line tools, etc.
private final DocumentAccess access = DocumentAccess.createForNonContainer();
private final AsyncSession session = access.createAsyncSession(new AsyncParameters());
private boolean abort = false;
private int numPending = 0;
/**
* Implements application entry point.
*
* @param args Command line arguments.
*/
public static void main(String[] args) {
MyClient app = null;
try {
app = new MyClient();
app.run();
} catch (Exception e) {
e.printStackTrace();
} finally {
if (app != null) {
app.shutdown();
}
}
if (app == null || app.abort) {
System.exit(1);
}
}
/**
* This is the main entry point of the client. This method will not return until all available documents
* have been fed and their responses have been returned, or something signaled an abort.
*/
public void run() {
System.out.println("client started");
while (!abort) {
flushResponseQueue();
Document doc = getNextDocument();
if (doc == null) {
System.out.println("no more documents to put");
break;
}
System.out.println("sending doc " + doc);
while (!abort) {
Result res = session.put(doc);
if (res.isSuccess()) {
System.out.println("put has request id " + res.getRequestId());
++numPending;
break; // step to next doc.
} else if (res.type() == Result.ResultType.TRANSIENT_ERROR) {
System.out.println("send queue full, waiting for some response");
processNext(9999);
} else {
res.getError().printStackTrace();
abort = true; // this is a fatal error
}
}
}
if (!abort) {
waitForPending();
}
System.out.println("client stopped");
}
/**
* Shutdown the underlying api objects.
*/
public void shutdown() {
System.out.println("shutting down document api");
session.destroy();
access.shutdown();
}
/**
* Returns the next document to feed to Vespa. This method should only return null when the end of the
* document stream has been reached, as returning null terminates the client. This is the point at which
* your application logic should block if it knows more documents will eventually become available.
*
* @return The next document to put, or null to terminate.
*/
public Document getNextDocument() {
return null; // TODO: Implement at your discretion.
}
/**
* Processes all immediately available responses.
*/
void flushResponseQueue() {
System.out.println("flushing response queue");
while (processNext(0)) {
// empty
}
}
/**
* Wait indefinitely for the responses of all sent operations to return. This method will only return
* early if the abort flag is set.
*/
void waitForPending() {
while (numPending != 0) {
if (abort) {
System.out.println("waiting aborted, " + numPending + " still pending");
break;
}
System.out.println("waiting for " + numPending + " responses");
processNext(9999);
}
}
/**
* Retrieves and processes the next response available from the underlying asynchronous session. If no
* response becomes available within the given timeout, this method returns false.
*
* @param timeout The maximum number of seconds to wait for a response.
* @return True if a response was processed, false otherwise.
*/
boolean processNext(int timeout) {
Response res;
try {
res = session.getNext(timeout);
} catch (InterruptedException e) {
e.printStackTrace();
abort = true;
return false;
}
if (res == null) {
return false;
}
System.out.println("got response for request id " + res.getRequestId());
--numPending;
if (!res.isSuccess()) {
System.err.println(res.getTextMessage());
abort = true;
return false;
}
return true;
}
}
```
### VisitorSession
This class represents a session for sequentially visiting documents with high throughput.
A visitor is started when creating the `VisitorSession`through a call to `createVisitorSession`. A visitor target, that is a receiver of visitor data, can be created through a call to `createVisitorDestinationSession`. The `VisitorSession` is a receiver of visitor data. See [visiting reference](visiting.html) for details. The `VisitorSession`:
- Controls the operation of the visiting process
- Handles the data resulting from visiting data in the system
Those two different tasks may be set up to be handled by a `VisitorControlHandler` and a `VisitorDataHandler` respectively. These handlers may be supplied to the `VisitorSession` in the `VisitorParameters` object, together with a set of other parameters for visiting. Example: To increase performance, let more separate visitor destinations handle visitor data, then specify the addresses to remote data handlers.
The default `VisitorDataHandler` used by the `VisitorSession` returned from`DocumentAccess` is `VisitorDataQueue` which queues up incoming documents and implements a polling API. The documents can be extracted by calls to the session's `getNext()` methods and can be ack-ed by the `ack()` method. The default `VisitorControlHandler` can be accessed through the session's `getProgress()`,`isDone()`, and `waitUntilDone()` methods.
Implement custom `VisitorControlHandler`and `VisitorDataHandler` by subclassing them and supplying these to the `VisitorParameters` object.
The `VisitorParameters` object controls how and what data will be visited - refer to the [javadoc](https://javadoc.io/doc/com.yahoo.vespa/documentapi/latest/com/yahoo/documentapi/VisitorParameters.html). Configure the[document selection](../reference/writing/document-selector-language.html) string to select what data to visit - the default is all data.
You can specify what fields to return in a result by specifying a[fieldSet](https://javadoc.io/doc/com.yahoo.vespa/documentapi/latest/com/yahoo/documentapi/VisitorParameters.html) - see [document field sets](../schemas/documents.html#fieldsets). Specifying only the fields you need may improve performance a lot, especially if you can make do with only in-memory fields or if you have large fields you don't need returned.
Example:
```
import com.yahoo.document.Document;
import com.yahoo.document.DocumentId;
import com.yahoo.documentapi.DocumentAccess;
import com.yahoo.documentapi.DumpVisitorDataHandler;
import com.yahoo.documentapi.ProgressToken;
import com.yahoo.documentapi.VisitorControlHandler;
import com.yahoo.documentapi.VisitorParameters;
import com.yahoo.documentapi.VisitorSession;
import java.util.concurrent.TimeoutException;
public class MyClient {
public static void main(String[] args) throws Exception {
VisitorParameters params = new VisitorParameters("true");
params.setLocalDataHandler(new DumpVisitorDataHandler() {
@Override
public void onDocument(Document doc, long timeStamp) {
System.out.print(doc.toXML(""));
}
@Override
public void onRemove(DocumentId id) {
System.out.println("id=" + id);
}
});
params.setControlHandler(new VisitorControlHandler() {
@Override
public void onProgress(ProgressToken token) {
System.err.format("%.1f %% finished.\n", token.percentFinished());
super.onProgress(token);
}
@Override
public void onDone(CompletionCode code, String message) {
System.err.println("Completed visitation, code " + code + ": " + message);
super.onDone(code, message);
}
});
params.setRoute(args.length > 0 ? args[0] : "[Storage:cluster=storage;clusterconfigid=storage]");
params.setFieldSet(args.length > 1 ? args[1] : "[document]");
// DocumentAccess is injectable in Vespa containers, but not in command line tools, etc.
DocumentAccess access = DocumentAccess.createForNonContainer();
VisitorSession session = access.createVisitorSession(params);
if (!session.waitUntilDone(0)) {
throw new TimeoutException();
}
session.destroy();
access.shutdown();
}
}
```
The first optional argument to this client is the [route](document-routing.html) of the cluster to visit. The second is the [fieldset](../schemas/documents.html#fieldsets) set to retrieve.
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [Documents](#documents)
- [Fields](#fields)
- [Document updates](#document-updates)
- [Update reply semantics](#update-reply-semantics)
- [Document Access](#document-access)
- [AsyncSession](#asyncsession)
- [VisitorSession](#visitorsession)
---
# Source: https://docs.vespa.ai/en/rag/document-enrichment.html.md
# Document enrichment with LLMs
Document enrichment enables automatic generation of document field values using large language models (LLMs) or custom code during feeding. It can be used to transform raw text into a more structured representation or expand it with additional contextual information. Examples of enrichment tasks include:
- Extraction of named entities (e.g., names of people, organizations, locations, and products) for fuzzy matching and customized ranking
- Categorization and tagging (e.g., sentiment and topic analysis) for filtering and faceting
- Generation of relevant keywords, queries, and questions to improve search recall and search suggestions
- Anonymization to remove personally identifiable information (PII) and reduction of bias in search results
- Translation of content for multilingual search
- LLM chunking
These tasks are defined through prompts, which can be customized for a particular application. Generated fields are indexed and stored as normal fields and can be used for searching without additional latency associated with LLM inference.
## Setting up document enrichment components
This section provides guidelines for configuring document enrichment, using the[LLM document enrichment sample app](https://github.com/vespa-engine/sample-apps/tree/master/field-generator) as an example.
### Defining generated fields
Enrichments are defined in a schema using a [generate indexing expression](../reference/writing/indexing-language.html#generate). For example the following schema defines two [synthetic fields](../operations/reindexing.html) with `generate`:
```
schema passage {
document passage {
field id type string {
indexing: summary | attribute
}
field text type string {
indexing: summary | index
index: enable-bm25
}
}
# Generate relevant questions to increase recall and search suggestions
field questions type array {
indexing: input text | generate questions_generator | summary | index
index: enable-bm25
}
# Extract named entities for fuzzy matching with ngrams
field names type array {
indexing: input text | generate names_extractor | summary | index
match {
gram
gram-size: 3
}
}
}
```
Indexing statement `input text | generate questions_generator | summary | index` is interpreted as follows:
1. Take document field named `text` as an input
2. Pass the input to a field generator with id `questions_generator`
3. Store the output of the generator as summary
4. Index the output of the generator for lexical search
Example of a document generated with this schema:
```
{
"id": "71",
"text": "Barley (Hordeum vulgare L.), a member of the grass family, is a major cereal grain. It was one of the first cultivated grains and is now grown widely. Barley grain is a staple in Tibetan cuisine and was eaten widely by peasants in Medieval Europe. Barley has also been used as animal fodder, as a source of fermentable material for beer and certain distilled beverages, and as a component of various health foods.",
"questions": [
"What are the major uses of Barley (Hordeum vulgare L.) in different cultures and regions throughout history?",
"How has the cultivation and consumption of Barley (Hordeum vulgare L.) evolved over time, from its initial cultivation to its present-day uses?",
"What role has Barley (Hordeum vulgare L.) played in traditional Tibetan cuisine and Medieval European peasant diets?"
],
"names": [
"Barley",
"Hordeum vulgare L.",
"Tibetan",
"Medieval Europe"
]
}
```
### Configuring field generators
A schema can contain multiple generated fields that use one or multiple field generators. All used field generators should be configured in `services.xml`, e.g.
```
...
...
local_llm
Generate 3 questions relevant for this text: {input}
openai
files/names_extractor.txt
...
...
```
All field generators must specify `` that references a language model client, which is either a local LLM, an OpenAI client or a custom component.
In addition to the language model, field generators require a prompt. Prompts are constructed from three parts:
1. Prompt template, specified either inline inside `` or in a file within application package with the path in ``.
2. Input from the indexing statement, e.g. `input text` where `text` is a document field name.
3. Output type of the field being generated.
If neither `` nor `` are provided, the default prompt is set to the input part. When both are provided, `` has precedence.
A prompt template must contain `{input}` placeholder, which will be replaced with the input value. It is possible to combine several fields into one input by concatenating them into a single string, e.g.
```
input "title: " . title . " text: " . text | generate names_extractor | summary | index
```
A prompt template might also contain a `{jsonSchema}` placeholder which will be replaced with a JSON schema based on the type of the field being generated, see the [structured output section](#structured-output) for details. Including a JSON schema in your prompt can help language models generate output in a specific format. However, it's important to understand that field generators already provide the JSON schema as a separate inference parameter to the underlying language model client. Both local LLM and OpenAI client utilize [structured output](#structured-output) functionality, which forces LLMs to produce outputs that conform to the schema. For this reason, explicitly including `{jsonSchema}` in your prompt template is unnecessary for most use cases.
Structured output can be disabled by specifying `TEXT `. In this case, the generated field must have a `string` type. This is useful for very small models (less than a billion parameters) that struggle to generate structured output. For most use cases, it is recommended to use structured output even for `string` fields.
The last parameter in the field generator configuration is ``, which specifies what to do when the output from the underlying language model can't be converted to the generated field type. This shouldn't happen when using structured output, but it can happen with `TEXT` response format. The default value is `DISCARD`, which leaves the field empty, sets it to `null`. Other values `WARN` and `FAIL` log a warning and throw an exception respectively.
Overview of all the field generator parameters is available in the [configuration definition file](https://github.com/vespa-engine/vespa/blob/master/model-integration/src/main/resources/configdefinitions/language-model-field-generator.def).
## Configuring language models
Field generators specify `` to reference a language model client to be used for generation, which is either a local LLM, an OpenAI client or a custom component.
Configuration details for local LLM and OpenAI client are covered in [local LLM](local-llms.html)and [OpenAI client](external-llms.html) documentation. This section focuses on configuration parameters that are important for document enrichment.
Both local LLM and OpenAI client can be configured with different models. For efficient scaling of document enrichment, it is recommended to select the smallest model that delivers acceptable performance for the task at hand. In general, larger models produce better results but are more expensive and slower.
Document enrichment tasks such as information extraction, summarization, expansion and classification are often less complex than the problem-solving capabilities targeted by larger models. These tasks can be accomplished by smaller, cost-efficient models, such as [Microsoft Phi-3.5-mini](https://huggingface.co/microsoft/Phi-3.5-mini-instruct) for a local model or [GPT-4o mini](https://platform.openai.com/docs/models/gpt-4o-mini) for OpenAI API.
Here is an example of a OpenAI client configured with GPT-4o mini model:
```
...
...
openai-key
gpt-4o-mini
...
...
```
For OpenAI client, model selection influences API cost and latency.
In addition to the model, local LLM client has several other parameters that are important for performance of document enrichment. The following configuration is a good starting point:
```
...
...
5000
5
500
500
3
60000
60000
FAIL
...
...
```
There are three important aspects of this configuration in addition to the model used.
1. `model`, `contextSize` and `parallelRequests` determine compute resources necessary to run the model.
2. `contextSize`, `parallelRequests`, `maxPromptTokens` and `maxTokens` should be configured to avoid context overflow - a situation when context size is too small to process multiple parallel requests with the given number of prompt and completion tokens.
3. `maxQueueSize`, `maxEnqueueWait` and `maxQueueWait` are related to managing the queue used for storing and feeding parallel requests into LLM runtime (llama.cpp).
[Local LLMs documentation](local-llms.html) explains how to configure `model`, `contextSize` and `parallelRequests` with respect to the model and compute resources used. Memory usage (RAM or GPU VRAM) is especially important to considered when configuring these parameters.
To avoid context overflow, configure `contextSize`, `parallelRequests`, `maxPromptTokens` and `maxTokens` parameters so that `contextSize / parallelRequests >= maxPromptTokens + maxTokens`. Also consider that larger `contextSize` takes longer to process.
The queue related parameters are used to balance latency with throughput. Values for these parameters heavily depends on underlying compute resources. Local LLM configuration presented above is optimized for CPU nodes with 16 cores and 32GB RAM as well as GPU nodes with NVIDIA T4 GPUs 16GB VRAM.
### Configuring compute resources
Provisioned compute resources only affect local LLM performance, as OpenAI client merely calls a remote API that leverages the service provider's infrastructure. In practice, GPU is highly recommended for running local LLMs. It provided order of magnitude speedup compared to CPU. For Vespa Cloud, a reasonable starting configuration is as follows:
```
...
...
...
...
```
This configuration provisions a container cluster with a single node containing NVIDIA T4 GPUs 16GB VRAM. Local model throughput scales linearly with the number of nodes in the container cluster used for feeding. For example with 8 GPU nodes (``) and throughput per node 1.5 generations/second, combined throughput will be close to 12 generations/second.
### Feeding configuration
Generated fields introduce considerable latency during feeding. Large number of high-latency parallel requests might lead to timeouts in the document processing pipeline. To avoid this, it is recommended to reduce the number of connections during feeding. A reasonable starting point is to use three connections per GPU node and one connection per CPU node. Example for one GPU node:
```
vespa feed data.json --connections 3
```
## Structured output
Document enrichment generates field values based on the data types defined in a document schema. Both local LLMs and the OpenAI client support structured output, forcing LLMs to produce JSON that conforms to a specified schema. This JSON schema is automatically constructed by a field generator according to the data type of the field being created. For example, a JSON schema for `field questions type array` in document `passage` will be as follows:
```
{
"type": "object",
"properties": {
"passage.questions": {
"type": "array",
"items": {
"type": "string"
}
}
},
"required": [
"passage.questions"
],
"additionalProperties": false
}
```
Constructed schemas for different data types correspond to the [document JSON format](../reference/schemas/document-json-format.html#) used for feeding. The following field types are supported:
- string
- bool
- int
- long
- byte
- float
- float16
- double
- array of types mentioned above
Types that are not supported:
- map
- struct
- weightset
- tensors
- references
- predicate
- position
## Custom field generator
As usual with Vespa, existing functionality can be extended by developing [custom application components](../applications/developer-guide.html). A custom generator component can be used to implement application-specific logic to construct prompts, transform and validate LLM inputs and output, combine outputs of several LLMs or use other sources such a knowledge graph.
A custom field generator compatible with `generate` should implement `com.yahoo.language.process.FieldGenerator` interface with `generate` method that returns a field value. Here is toy example of a custom field generator:
```
package ai.vespa.test;
import ai.vespa.llm.completion.Prompt;
import com.yahoo.document.datatypes.FieldValue;
import com.yahoo.document.datatypes.StringFieldValue;
import com.yahoo.language.process.FieldGenerator;
public class MockFieldGenerator implements FieldGenerator {
private final MockFieldGeneratorConfig config;
public MockFieldGenerator(MockFieldGeneratorConfig config) {
this.config = config;
}
@Override
public FieldValue generate(Prompt prompt, Context context) {
var stringBuilder = new StringBuilder();
for (int i = 0; i < config.repetitions(); i++) {
stringBuilder.append(prompt.asString());
if (i < config.repetitions() - 1) {
stringBuilder.append(" ");
}
}
return new StringFieldValue(stringBuilder.toString());
}
}
```
The config definition for this component looks as follows:
```
namespace=ai.vespa.test
package=ai.vespa.test
repetitions int default=1
```
To be used with `generate` indexing expression this component should be added to `services.xml`:
```
...
2
...
```
The last step is to use it in a document schema, e.g.:
```
schema passage {
document passage {
field text type string {
indexing: summary | index
index: enable-bm25
}
}
field mock_text type string {
indexing: input text | generate mock_generator | summary
}
}
```
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [Setting up document enrichment components](#setting-up-document-enrichment-components)
- [Defining generated fields](#defining-generated-fields)
- [Configuring field generators](#configuring-field-generators)
- [Configuring language models](#configuring-language-models)
- [Configuring compute resources](#configuring-compute-resources)
- [Feeding configuration](#feeding-configuration)
- [Structured output](#structured-output)
- [Custom field generator](#custom-field-generator)
---
# Source: https://docs.vespa.ai/en/reference/schemas/document-field-path.html.md
# Document field path reference
The field path syntax is used several places in Vespa to traverse documents through arrays, structs, maps and sets and generate a set of values matching the expression. Examples - If the document contains the field `mymap`, and it has a key `mykey`, the expression returns the value of the map for that key:
```
mymap{mykey}
```
Returns the value in index 3 of the `myarray` field, if set:
```
myarray[3]
```
Returns the value of the `value1` field in the struct field `mystruct`, if set:
```
mystruct.value1
```
If mystructarray is an array field containing structs, returns the values of value1 for each of those structs:
```
mystructarray.value1
```
The following syntax can be used for the different field types, and can be combined recursively as required:
## Maps/weighted Sets
| \{\} | Retrieve the value of a specific key |
| \{$\} | Retrieve all values, setting the [variable](#variables) to the key value for each |
| \.key | Retrieve all key values |
| \.value | Retrieve all values |
| \ | Retrieve all keys |
In the case of weighted sets, the value referenced above is the weight of the item.
## Array
| \[\] | Retrieve the value in a specific index |
| \[$\] | Retrieve all values in the array, setting the [variable](#variables) to the index of each |
| \ | Retrieve all values in the array |
## Struct
| \{.\} | Return the value of the struct field |
| \ | Return the value of all subfields |
Note that when specifying values of subscripts of maps, weighted sets and arrays, only primitive types (numbers and strings) may be used.
## Variables
It can be useful to reference several field paths using a common variable. For instance, if you have an array of structs, you may want to use document selection on fields within the same array index together. This could be done by an expression like:
```
mydoctype.mystructarray{$x}.field1=="foo" AND mydoctype.mystructarray{$x}.field2=="bar"
```
Variables either have a `key` value (for maps and weighted sets), or an `index` value (for arrays). Variables cannot be used across such contexts (that is, a map key cannot be used to index into an array).
Copyright © 2026 - [Cookie Preferences](#)
---
# Source: https://docs.vespa.ai/en/reference/schemas/document-json-format.html.md
# Document JSON format reference
This document describes the JSON format used for sending document operations to Vespa. Field types are defined in the[schema reference](schemas.html#field). This is a reference for:
- JSON representation of [document operations](#document-operations) (put, get, remove, update)
- JSON representation of [field types](#field-types) in Vespa documents
- JSON representation of addressing fields for update, and [update operations](#update-operations)
Also refer to [encoding troubleshooting](../../linguistics/troubleshooting-encoding.html).
```
[Document operations](#document-operations)[Put](#put)[Get](#get)[Remove](#remove)[Update](#update)[Test and set](#test-and-set)[Create](#create)[Field types](#field-types)[string](#string)[int](#int)[long](#long)[bool](#bool)[byte](#byte)[float](#float)[double](#double)[position](#position)[predicate](#predicate)[raw](#raw)[uri](#uri)[array](#array)[weightedset](#weightedset)[Tensors](#tensor)[Indexed tensors short form](#tensor-short-form-indexed)[Short form for tensors with a single mapped dimension](#tensor-short-form-mapped)[Mixed tensors short form](#tensor-short-form-mixed)[Cell values as binary data (hex dump format)](#tensor-hex-dump)[Tensor verbose form](#tensor-verbose-form)[struct](#struct)[map](#map)[reference](#reference)[Empty fields](#empty-fields)[Update operations](#update-operations)[assign](#assign)[Single value field](#single-value-field)[Assign tensor](#tensor-field)[Assign struct field](#(x[6]):[-1,0,17,-128,34,-2]`.
For other cell types, it's possible to take the bits of the floating-point value, interpreted directly as an unsigned integer of appropriate width (16, 32, or 64 bits) and use the hex dump (respectively 4, 8, or 16 hex digits per cell) in a string. For "float" cells (32-bit IEE754 floating-point) a simple snippet for converting a cell could look like this:
```
```
import struct
def float_to_hex(f: float):
return format(struct.unpack('=I', struct.pack('=f', f))[0], '08X')
```
```
As an advanced combination example, if you have a tensor with type `tensor(tag{},x[3])` this input could be used, shown with corresponding output:
```
```
"mixedtensor": {
"foo": "3DE38E393E638E393EAAAAAB",
"bar": "3EE38E393F0E38E43F2AAAAB",
"baz": "3F471C723F638E393F800000"
}
"mixedtensor":{
"type":"tensor(tag{},x[3])",
"blocks":{
"foo":[0.1111111119389534,0.2222222238779068,0.3333333432674408],
"bar":[0.4444444477558136,0.5555555820465088,0.6666666865348816],
"baz":[0.7777777910232544,0.8888888955116272,1.0]
}
}
```
```
**Verbose:** [Tensor](../../ranking/tensor-user-guide.html) fields may be represented as an array of cells:
```
```
"tensorfield": [
{ "address": { "x": "a", "y": "0" }, "value": 2.0 },
{ "address": { "x": "a", "y": "1" }, "value": 3.0 },
{ "address": { "x": "b", "y": "0" }, "value": 4.0 },
{ "address": { "x": "b", "y": "1" }, "value": 5.0 }
]
```
```
This works for any tensor but is verbose, so shorter forms specific to various tensor types are also supported. Use the shortest form applicable to your tensor type for the best possible performance.
The cells array can optionally be nested in an object under the key "cells". This is how tensor values are returned [by default](../api/document-v1.html#format.tensors), along with another key "type" containing the tensor type.
|
| struct |
```
```
"mystruct": {
"intfield": 123,
"stringfield": "foo"
}
```
```
|
| map |
The JSON dictionary key must be a string, even if the map key type in the schema is not a string:
```
```
"int_to_string_map": {
"123": "foo",
"456": "bar",
"789": "foobar"
}
```
```
Feeding in an empty map ({}) for a field will have the same effect as not feeding a value for that field, and the field will not be rendered in the document API and in document summaries.
|
| reference |
String with document ID referring to a [parent document](../../schemas/parent-child.html):
```
```
"artist_ref": "id:mynamespace:artists::artist-1"
```
```
|
|
| |
## Empty fields
In general, fields that have not received a value during feeding will be ignored when rendering the document. They are considered as empty fields. However, certain field types have some values which causes them to be considered empty. For instance, the empty string ("") is considered empty, as well as the empty array ([]). See the above table for more information for each type.
## Document operations
Refer to [reads and writes](../../writing/reads-and-writes.html) for details - alternatives:
- Use the [Vespa CLI](../../clients/vespa-cli.html#documents).
- [/document/v1/](../api/document-v1.html): This API accepts one operation per request, with the document ID encoded in the URL.
- [Vespa feed client](../../clients/vespa-feed-client.html): Java APIs / command line tool to feed document operations asynchronously to Vespa, over HTTP.
### Put
The "put" payload has a "put" operation and ["fields"](#field-types) containing field values; ([/document/v1/ example](../../writing/document-v1-api-guide.html#post)):
```
```
{
"put": "id:mynamespace:music::123",
"fields": {
"title": "Best of Bob Dylan"
}
}
```
```
### Get
"get" does not have a payload - the response has the same "field" object as in "put", and also "id" and "pathId" fields ([/document/v1/ example](../../writing/document-v1-api-guide.html#get)):
```
```
{
"pathId": "/document/v1/mynamespace/music/docid/123",
"id": "id:mynamespace:music::123",
"fields": {
"title": "Best of Bob Dylan"
}
}
```
```
### Remove
The "remove" payload only has a "remove" operation ([/document/v1/ example](../../writing/document-v1-api-guide.html#delete)):
```
```
{
"remove": "id:mynamespace:music::123"
}
```
```
### Update
The "update" payload has an "update" operation and "fields". Note: Each field must contain an [update operation](#update-operations), not just the field value directly; ([/document/v1/ example](../../writing/document-v1-api-guide.html#put)):
```
```
{
"update": "id:mynamespace:music::123",
"fields": {
"title": {
"assign": "The best of Bob Dylan"
}
}
}
```
```
Flags can be added to add a [test and set](#test-and-set) condition, or allow the update to [create](#create) a new document (a so-called "upsert" operation).
#### Test and set
An optional _condition_ can be added to operations to specify a _test and set_ condition - see [conditional writes](../../writing/document-v1-api-guide.html#conditional-writes). The value of the _condition_ is a [document selection](../writing/document-selector-language.html), encoded as a string. Example: Increment the _sales_ field only if it is already equal to 999 ([/document/v1/ example](../../writing/document-v1-api-guide.html#conditional-writes)):
```
```
{
"update": "id:mynamespace:music::bob/BestOf",
"condition": "music.sales==999",
"fields": {
"sales": {
"increment": 1
}
}
}
```
```
**Note:** Use _documenttype.fieldname_ in the condition, not only _fieldname_.
If the condition is not met, a 412 response code is returned.
#### create (create if nonexistent)
**Updates** to nonexistent documents are supported using _create_; ([/document/v1/ example](../../writing/document-v1-api-guide.html#create-if-nonexistent)):
```
```
{
"update": "id:mynamespace:music::bob/BestOf",
"create": true,
"fields": {
"title": {
"assign": "The best of Bob Dylan"
}
}
}
```
```
Since Vespa 8.178, _create_ can also be used together with conditional **Put** operations ([/document/v1/ example](../../writing/document-v1-api-guide.html#conditional-updates-and-puts-with-create) - review notes there before using):
```
```
{
"put": "id:mynamespace:music::123",
"condition": "music.sales==999",
"create": true,
"fields": {
"title": "Best of Bob Dylan"
}
}
```
```
## Update operations
The update operations are: [`assign`](#assign), [`add`](#add), [`remove`](#composite-remove), [arithmetics](#arithmetic) (`increment` `decrement` `multiply` `divide`), [`match`](#match), [`modify`](#tensor-modify)
## assign
`assign` is used to replace the value of a field (or an element of a collection) with a new value. When assigning, one can generally use the same syntax and structure as when feeding that field's value in a `put` operation.
### Single value field
```
field title type string {
indexing: summary
}
```
```
```
{
"update": "id:mynamespace:music::example",
"fields": {
"title": {
"assign": "The best of Bob Dylan"
}
}
}
```
```
### Tensor field
```
field tensorfield type tensor(x{},y{}) {
indexing: attribute | summary
}
```
```
```
{
"update": "id:mynamespace:tensordoctype::example",
"fields": {
"tensorfield": {
"assign": {
"cells": [
{ "address": { "x": "a", "y": "b" }, "value": 2.0 },
{ "address": { "x": "c", "y": "d" }, "value": 3.0 }
]
}
}
}
}
```
```
This will fully replace the entire tensor stored in this field.
### Struct field
#### Replacing all fields in a struct
A full struct is replaced by assigning an object of struct key/value pairs.
```
struct person {
field first_name type string {}
field last_name type string {}
}
field contact type person {
indexing: summary
}
```
```
```
{
"update": "id:mynamespace:workers::example",
"fields": {
"contact": {
"assign": {
"first_name": "Bob",
"last_name": "The Plumber"
}
}
}
}
```
```
#### Individual struct fields
Individual struct fields are updated using [field path](#fieldpath) syntax. Refer to the [reference](schemas.html#struct-name) for restrictions using structs.
```
```
{
"update": "id:mynamespace:workers::example",
"fields": {
"contact.first_name": {
"assign": "Bob"
},
"contact.last_name": {
"assign": "The Plumber"
}
}
}
```
```
### Map field
Individual map entries can be updated using [field path](document-field-path.html) syntax. The following declaration defines a `map` where the `key` is an Integer and the value is a `person` struct.
```
struct person {
field first_name type string {}
field last_name type string {}
}
field contact type map {
indexing: summary
}
```
Example updating part of an entry in the `contact` map:
- `contact` is the name of the map field to be updated
- `{0}` is the key that is going to be updated
- `first_name` is the struct field to be updated inside the `person` struct
```
```
{
"update": "id:mynamespace:workers::example",
"fields": {
"contact{0}.first_name": {
"assign": "John"
}
}
}
```
```
Assigning an element to a key in a map will insert the key/value mapping if it does not already exist, or overwrite it with the new value if it does exist. Refer to the [reference](schemas.html#map) for restrictions using maps.
#### Map to primitive value
```
field my_food_scores type map {
indexing: summary
}
```
```
```
{
"update": "id:mynamespace:food::example",
"fields": {
"my_food_scores{Strawberries}": {
"assign": "Delicious!"
}
}
}
```
```
#### Map to struct
```
struct contact_info {
field phone_number type string {}
field email type string {}
}
field contacts type map {
indexing: summary
}
```
```
```
{
"update": "id:mynamespace:people::d_duck",
"fields": {
"contacts{\"Uncle Scrooge\"}": {
"assign": {
"phone_number": "555-123-4567",
"email": "number_one_dime_luvr1877@example.com"
}
}
}
}
```
```
### Array field
#### Array of primitive values
```
field ingredients type array {
indexing: summary
}
```
Assign full array:
```
```
{
"update": "id:mynamespace:cakes:tasty_chocolate_cake",
"fields": {
"ingredients": {
"assign": ["sugar", "butter", "vanilla", "flour"]
}
}
}
```
```
Assign existing elements in array:
```
```
{
"update": "id:mynamespace:cakes:tasty_chocolate_cake",
"fields": {
"ingredients[3]": {
"assign": "2 cups of flour (editor's update: NOT asbestos!)"
}
}
}
```
```
Note that the index element 3 needs to exist. Alternative using match:
```
```
{
"update": "id:mynamespace:cakes:tasty_chocolate_cake",
"fields": {
"ingredients": {
"match": {
"element": 3,
"assign": "2 cups of flour (editor's update: NOT asbestos!)"
}
}
}
}
```
```
Individual array elements may be updated using [field path](document-field-path.html) or [match](#match) syntax.
#### Array of struct
Refer to the reference for restrictions using[array of structs](schemas.html#array).
```
struct person {
field first_name type string {}
field last_name type string {}
}
field people type array {
indexing: summary
}
```
```
```
{
"update": "id:mynamespace:students:example",
"fields": {
"people[34]": {
"assign": {
"first_name": "Bobby",
"last_name": "Tables"
}
}
}
}
```
```
Note that the element index needs to exist. Use [add](#add-array-elements) to add a new element. Alternative syntax using match:
```
```
{
"update": "id:mynamespace:students:example",
"fields": {
"people": {
"match": {
"element": 34,
"assign": {
"first_name": "Bobby",
"last_name": "Tables"
}
}
}
}
}
```
```
### Weighted set field
Adding new elements to a weighted set can be done using [add](#add-weighted-set), or by assigning with `field{key}` syntax. Example of the latter:
```
field int_weighted_set type weightedset {
indexing: summary
}
field string_weighted_set type weightedset {
indexing: summary
}
```
```
```
{
"update":"id:mynamespace:weightedsetdoctype::example1",
"fields": {
"int_weighted_set{123}": {
"assign": 123
},
"int_weighted_set{456}": {
"assign": 100
},
"string_weighted_set{\"item 1\"}": {
"assign": 144
},
"string_weighted_set{\"item 2\"}": {
"assign": 7
}
}
}
```
```
Note that using the `field{key}` syntax for weighted sets _may_ be less efficient than using [add](#add-weighted-set).
### Clearing a field
To clear a field, assign a `null` value to it.
```
```
{
"update": "id:mynamespace:music::example",
"fields": {
"title": {
"assign": null
}
}
}
```
```
## add
`add` is used to add entries to arrays, weighted sets or to the mapped dimensions of tensors.
### Adding array elements
The added entries are appended to the end of the array in the order specified.
```
field tracks type array {
indexing: summary
}
```
```
```
{
"update": "id:mynamespace:music::https://music.yahoo.com/bobdylan/BestOf",
"fields": {
"tracks": {
"add": [
"Lay Lady Lay",
"Every Grain of Sand"
]
}
}
}
```
```
### Add weighted set entries
Add weighted set elements by using a JSON key/value syntax, where the value is the weight of the element.
Adding a key/weight mapping that already exists will overwrite the existing weight with the new one.
```
field int_weighted_set type weightedset {
indexing: summary
}
field string_weighted_set type weightedset {
indexing: summary
}
```
```
```
{
"update":"id:mynamespace:weightedsetdoctype::example1",
"fields": {
"int_weighted_set": {
"add": {
"123": 123,
"456": 100
}
},
"string_weighted_set": {
"add": {
"item 1": 144,
"item 2": 7
}
}
}
}
```
```
### Add tensor cells
Add cells to mapped or mixed tensors. Invalid for tensors with only indexed dimensions. Adding a cell that already exists will overwrite the cell value with the new value. The address must be fully specified, but cells with bound indexed dimensions not specified will receive the default value of `0.0`. See system test[tensor add update](https://github.com/vespa-engine/system-test/tree/master/tests/search/tensor_feed/tensor_add_remove_update)for more examples.
```
field tensorfield type tensor(x{},y[3]) {
indexing: attribute | summary
}
```
```
```
{
"update": "id:mynamespace:tensordoctype::example",
"fields": {
"tensorfield": {
"add": {
"cells": [
{ "address": { "x": "b", "y": "0" }, "value": 2.0 },
{ "address": { "x": "b", "y": "1" }, "value": 3.0 }
]
}
}
}
}
```
```
In this example, cell `{"x":"b","y":"2"}` will implicitly be set to 0.0.
So if you started with the following tensor:
```
{
{"x": "a", "y": "0"}: 0.2,
{"x": "a", "y": "1"}: 0.3,
{"x": "a", "y": "2"}: 0.5,
}
```
You now end up with this tensor after the above add operation was applied:
```
{
{"x": "a", "y": "0"}: 0.2,
{"x": "a", "y": "1"}: 0.3,
{"x": "a", "y": "2"}: 0.5,
{"x": "b", "y": "0"}: 2.0,
{"x": "b", "y": "1"}: 3.0,
{"x": "b", "y": "2"}: 0.0,
}
```
Prefer the _block short form_ for mixed tensors instead. This also avoids the problem where cells with indexed dimensions are not specified:
```
```
{
"update": "id:mynamespace:tensordoctype::example",
"fields": {
"tensorfield": {
"add": {
"blocks": [
{ "address": { "x": "b" }, "values": [2.0, 3.0, 5.0] }
]
}
}
}
}
```
```
## remove
Remove elements from weighted sets, maps and tensors with `remove`.
### Weighted set field
```
field string_weighted_set type weightedset {
indexing: summary
}
```
```
```
{
"update":"id:mynamespace:weightedsetdoctype::example1",
"fields": {
"string_weighted_set": {
"remove": {
"item 2": 0
}
}
}
}
```
```
### Map field
```
field string_map type map {
indexing: summary
}
```
```
```
{
"update":"id:mynamespace:mapdoctype::example1",
"fields": {
"string_map{item 2}": {
"remove": 0
}
}
}
```
```
### Tensor field
Removes cells from mapped or mixed tensors. Invalid for tensors with only indexed dimensions. Only mapped dimensions should be specified for tensors with both mapped and indexed dimensions, as all indexed cells the mapped dimensions point to will be removed implicitly. See system test[tensor remove update](https://github.com/vespa-engine/system-test/tree/master/tests/search/tensor_feed/tensor_add_remove_update)for more examples.
```
field tensorfield type tensor(x{},y[2]) {
indexing: attribute | summary
}
```
```
```
{
"update": "id:mynamespace:tensordoctype::example",
"fields": {
"tensorfield": {
"remove": {
"addresses": [
{"x": "b"},
{"x": "c"}
]
}
}
}
}
```
```
In this example, cells `{x:b,y:0},{x:b,y:1},{x:c,y:0},{x:c,y:1}` will be removed.
It is also supported to specify only a subset of the mapped dimensions in the addresses. In that case, all cells that match the label values of the specified dimensions are removed. In the given example, all cells having label `b` for dimension `x` are removed.
```
field tensorfield type tensor(x{},y{},z[2]) {
indexing: attribute | summary
}
```
```
```
{
"update": "id:mynamespace:tensordoctype::example",
"fields": {
"tensorfield": {
"remove": {
"addresses": [
{"x": "b"}
]
}
}
}
}
```
```
## Arithmetic
The four arithmetic operators `increment`, `decrement`,`multiply` and `divide` are used to modify _single value_ numeric values without having to look up the current value before applying the update. Example:
```
field sales type int {
indexing: summary | attribute
}
```
```
```
{
"update": "id:mynamespace:music::https://music.yahoo.com/bobdylan/BestOf",
"fields": {
"sales": {
"increment": 1
}
}
}
```
```
## match
If an arithmetic operation is to be done for a specific key in a _weighted set or array_, use the `match` operation:
```
field track_popularity type weightedset {
indexing: summary | attribute
}
```
```
```
{
"update": "id:mynamespace:music::https://music.yahoo.com/bobdylan/BestOf",
"fields": {
"track_popularity": {
"match": {
"element": "Lay Lady Lay",
"increment": 1
}
}
}
}
```
```
In other words, for the weighted set "track\_popularity",`match` the element "Lay Lady Lay", then `increment` its weight by 1. See the [weightedset properties](schemas.html#weightedset-properties)reference for how to make incrementing a non-existing key trigger auto-create of the key.
If the updated field is an array, the `element` value would be a positive integer.
**Note:** Only oneelement can be matched per operation.
## Modify tensors
Individual cells in tensors can be modified using the `modify` update. The cells are modified according to the given operation:
- `replace` - replaces a single cell value
- `add` - adds a value to the existing cell value
- `multiply` - multiples a value with the existing cell value
The addresses of cells must be fully specified. If the cell does not exist, the update for that cell will be ignored. Use `"create": true` (see example below) to create non-existing cells before the modify update is applied. See system test[tensor modify update](https://github.com/vespa-engine/system-test/tree/master/tests/search/tensor_feed/tensor_modify_update)for more examples.
```
field tensorfield type tensor(x[3]) {
indexing: attribute | summary
}
```
```
```
{
"update": "id:mynamespace:tensordoctype::example",
"fields": {
"tensorfield": {
"modify": {
"operation": "replace",
"addresses": [
{ "address": { "x": "1" }, "value": 7.0 },
{ "address": { "x": "2" }, "value": 8.0 }
]
}
}
}
}
```
```
In this example, cell `{"x":"1"}` is replaced with value 7.0 and `{"x":"2"}` with value 8.0. If operation `add` or `multiply` was used instead, 7.0 and 8.0 would be added or multiplied to the current values of cells `{"x":"1"}` and `{"x":"2"}`.
For tensors with a single mapped dimension the _cells short form_ can also be used:
```
field tensorfield type tensor(x{}) {
indexing: attribute | summary
}
```
```
```
{
"update": "id:mynamespace:tensordoctype::example",
"fields": {
"tensorfield": {
"modify": {
"operation": "add",
"create": true,
"cells": {
"b": 5.0,
"c": 6.0
}
}
}
}
}
```
```
In this example, 5.0 is added to cell `{"x":"b"}` and 6.0 is added to cell `{"x":"c"}`. With `"create": true` non-existing cells in the input tensor are created before applying the modify update. The default cell value is 0.0 for `replace` and `add`, and 1.0 for `multiply`. This means a non-existing cell ends up with the value specified in the operation.
For mixed tensors the _block short form_ can also be used to modify entire dense subspaces:
```
field tensorfield type tensor(x{},y[3]) {
indexing: attribute | summary
}
```
```
```
{
"update": "id:mynamespace:tensordoctype::example",
"fields": {
"tensorfield": {
"modify": {
"operation": "replace",
"blocks": {
"a": [1,2,3],
"b": [4,5,6]
}
}
}
}
}
```
```
## Fieldpath
Fieldpath is for accessing fields within composite structures - for structures that are not part of index or attribute, it is possible to access elements directly using fieldpaths. This is done by adding more information to the field value. For map structures, specify the key (see [example](#assign)).
```
mymap{mykey}
```
and then do operation on the element which is keyed by "mykey". Arrays can be accessed as well (see [details](#assign)).
```
myarray[3]
```
And this is also true for structs (see [details](#assign)).**Note:** Struct updates do not work for[index](../applications/services/content.html#document) mode:
```
mystruct.value1
```
This also works for nested structures, e.g. a `map` of `map` to `array` of `struct`:
```
```
{
"update": "id:mynamespace:complexdoctype::foo",
"fields": {
"nested_structure{firstMapKey}{secondMapKey}[4].title": {
"assign": "Look at me, mom! I'm hiding deep in a nested type!"
}
}
}
```
```
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [Field types](#field-types)
- [Empty fields](#empty-fields)
- [Document operations](#document-operations)
- [Put](#put)
- [Get](#get)
- [Remove](#remove)
- [Update](#update)
- [Update operations](#update-operations)
- [assign](#assign)
- [Single value field](#single-value-field)
- [Tensor field](#tensor-field)
- [Map field](#assign-map-field)
- [Array field](#array-field)
- [Weighted set field](#weighted-set-field)
- [Clearing a field](#clearing-a-field)
- [add](#add)
- [Adding array elements](#add-array-elements)
- [Add weighted set entries](#add-weighted-set)
- [Add tensor cells](#tensor-add)
- [remove](#composite-remove)
- [Weighted set field](#weighted-set-field-remove)
- [Map field](#map-field-remove)
- [Tensor field](#tensor-remove)
- [Arithmetic](#arithmetic)
- [match](#match)
- [Modify tensors](#tensor-modify)
- [Fieldpath](#fieldpath)
---
# Source: https://docs.vespa.ai/en/applications/document-processors.html.md
# Document processors
This document describes how to develop and deploy _Document Processors_, often called _docproc_ in this documentation. Document processing is a framework to create [chains](chaining.html) of configurable [components](components.html), that read and modify document operations.
The input source splits the input data into logical units called [documents](../schemas/documents.html). A [feeder application](../writing/reads-and-writes.html) sends the documents into a document processing chain. This chain is an ordered list of document processors. Document processing examples range from language detection, HTML removal and natural language processing to mail attachment processing, character set transcoding and image thumbnailing. At the end of the processing chain, extracted data will typically be set in some fields in the document.
The motivation for document processing is that code and configuration is atomically deployed, as like all Vespa components. It is also easy to build components that access data in Vespa as part of processing.
To get started, see the [sample application](https://github.com/vespa-engine/sample-apps/tree/master/examples/document-processing). Read [indexing](../writing/indexing.html) to understand deployment and routing. As document processors are chained components just like Searchers, read [Searcher Development](searchers.html). For reference, see the [Javadoc](https://javadoc.io/doc/com.yahoo.vespa/docproc), and [services.xml](../reference/applications/services/docproc.html).

## Deploying a Document Processor
Refer to[album-recommendation-docproc](https://github.com/vespa-engine/sample-apps/tree/master/examples/document-processing) to get started, [LyricsDocumentProcessor.java](https://github.com/vespa-engine/sample-apps/blob/master/examples/document-processing/src/main/java/ai/vespa/example/album/LyricsDocumentProcessor.java) is a document processor example. Add the document processor in [services.xml](../reference/applications/services/docproc.html), and then add it to a [chain](#chains). The type of processing done by the processor dictates what chain it should be part of:
- If it does general data-processing, such as populating some document fields from others, looking up data in external services, etc., it should be added to a general docproc chain.
- If, and only if, it does processing required for _indexing_ - or requires this to have already been run — it should be added to a chain which inherits the _indexing_ chain, and which is used for indexing by a content cluster.
An example that adds a general document processor to the "default" chain, and an indexing related processor to the chain for a particular content cluster:
```
\
\
...\
```
The "default" chain, if it exists, is run by default, before the chain used for indexing. The default indexing chain is called "indexing", and _must_ be inherited by any chain that is to replace it.
To run through any chain, specify a [route](../writing/document-routing.html) which includes the chain. For example, the route `default/chain.my-chain indexing` would route feed operations through the chain "my-chain" in the "default" container cluster, and then to the "indexing" hop, which resolves to the specified indexing chain for each content cluster the document should be sent to. More details can be found in [indexing](../writing/document-routing.html#document-processing):
## Document Processors
A document processor is a component extending `com.yahoo.docproc.DocumentProcessor`. All document processors must implement `process()`:
```
public Progress process(Processing processing);
```
When the container receives a document operation, it will create a new `Processing`, and add the `DocumentPut`s, `DocumentUpdate`s or `DocumentRemove`s to the `List` accessible through `Processing.getDocumentOperations()`. The latter is useful also where a processing should be stopped by doing `Processing.getDocumentOperations().clear()` before `Progress.DONE`, say for blocklist use, to stop a `DocumentPut/Update`.
Furthermore, the call stack of the document processing chain in question will be _copied_ to `Processing.callStack()`, so that document processors may freely modify the flow of control for this processing without affecting all other processings going on. After creation, the `Processing` is added to an internal queue.
A worker thread will retrieve a `Processing` from the input queue, and run its document operations through its call stack. A minimal, no-op document processor implementation is thus:
```
```
import com.yahoo.docproc.*;
public class SimpleDocumentProcessor extends DocumentProcessor {
public Progress process(Processing processing) {
return Progress.DONE;
}
}
```
```
The `process()` method should loop through all document operations in `Processing.getDocumentOperations()`, do whatever it sees fit to them, and return a `Progress`:
```
```
public Progress process(Processing processing) {
for (DocumentOperation op : processing.getDocumentOperations()) {
if (op instanceof DocumentPut) {
DocumentPut put = (DocumentPut) op;
// TODO do something to 'put here
} else if (op instanceof DocumentUpdate) {
DocumentUpdate update = (DocumentUpdate) op;
// TODO do something to 'update' here
} else if (op instanceof DocumentRemove) {
DocumentRemove remove = (DocumentRemove) op;
// TODO do something to 'remove' here
}
}
return Progress.DONE;
}
```
```
| Return code | Description |
| --- | --- |
| `Progress.DONE` | Returned if a document processor has successfully processed a `Processing`. |
| `Progress.FAILED` |
Processing failed and the input message should return a _fatal_ failure back to the feeding application, meaning that this application will not try to re-feed this document operation. Return an error message/reason by calling `withReason()`:
This result is represented as a `500 Internal Server Error` response in [Document v1](../writing/document-v1-api-guide.html).
```
```
if (op instanceof DocumentPut) {
return Progress.FAILED.withReason("PUT is not supported");
}
```
```
|
| `Progress.INVALID_INPUT` |
Available since 8.584.
Processing failed due to invalid input, like a malformed document operation.
This result is represented as a `400 Bad Request` response in [Document v1](../writing/document-v1-api-guide.html).
|
| `Progress.LATER` |
See [execution model](#execution-model). The document processor wants to release the calling thread and be called again later. This is useful if e.g. calling an external service with high latency. The document processor may then save its state in the `Processing` and resume when called again later. There are no guarantees as to _when_ the processor is called again with this `Processing`; it is simply appended to the back of the input queue.
By the use of `Progress.LATER`, this is an asynchronous model, where the processing of a document operation does not need to consume one thread for its entire lifespan. Note, however, that the document processors themselves are shared between all processing operations in a chain, and must thus be implemented [thread-safe](#state).
|
| Exception | Description |
| --- | --- |
| `com.yahoo.docproc.TransientFailureException` | Processing failed and the input message should return a _transient_ failure back to the feeding application, meaning that this application _may_ try to re-feed this document operation. |
| `RuntimeException` | Throwing any other `RuntimeException` means same behavior as for `Progress.FAILED`. |
## Chains
The call stack mentioned above is another name for a _document processor chain_. Document processor chains are a special case of the general [component chains](chaining.html) - to avoid confusion some concepts are explained here as well. A document processor chain is nothing more than a list of document processor instances, having an id, and represented as a stack. The document processor chains are typically not created for every processing, but are part of the configuration. Multiple ones may exist at the same time, the chain to execute will be specified by the message bus destination of the incoming message. The same document processor instance may exist in multiple document processor chains, which is why the `CallStack` of the `Processing` is responsible for knowing the next document processor to invoke in a particular message.
The execution order of the document processors in a chain are not ordered explicitly, but by [ordering constraints](chaining.html#ordering-components) declared in the document processors or their configuration.
## Execution model
The Document Processing Framework works like this:
1. A thread from the message bus layer appends an incoming message to an internal priority queue, shared between all document processing chains configured on a node. The priority is set based on the message bus priority of the message. Messages of the same priority are ordered FIFO.
2. One worker thread from the docproc thread pool picks one message from the head of the queue, deserializes it, copies the call stack (chain) in question, and runs it through the document processors.
3. Processing finishes if **(a)** the document(s) has passed successfully through the whole chain, or **(b)** a document processor in the chain has returned `Progress.FAILED` or thrown an exception.
4. The same thread passes the message on to the message bus layer for further transport on to its destination.
There is a single instance of each document processor chain. In every chain, there is a single instance of each document processor - unless a chain is configured with multiple, identical document processors - this is a rare case.
As is evident from the model above, multiple worker threads execute the document processors in a chain concurrently. Thus, many threads of execution can be going through `process()` in a document processor, at the same time.
This model places an important constraint on document processor classes: _instance variables are not safe._ They must be eliminated, or made thread-safe somehow.
Also see [Resource management](components.html#resource-management), use `deconstruct()` in order to not leak resources.
### Asynchronous execution
The execution model outlined above also shows one important restriction: If a document processor performs any high-latency operation in its process() method, a docproc worker thread will be occupied. With all _n_ worker threads blocking on an external resource, throughput will be limited. This can be fixed by saving the state in the Processing object, and returning `Progress.LATER`. A document processor doing a high-latency operation should use a pattern like this:
1. Check a self-defined context variable in Processing for status. Basically, _have we seen this Processing before?_
2. If no:
1. We have been given a Processing object fresh off the network, we have not seen this before. Process it up until the high-latency operation.
2. Start the high-latency operation (possibly in a separate thread).
3. Save the state of the operation in a self-defined context variable in the Processing.
4. Return `Progress.LATER`. This Processing is the appended to the back of the input queue, and we will be called again later.
3. If yes:
1. Retrieve the reference that we set in our self-defined context variable in Processing.
2. Is the high-latency operation done? If so, return `Progress.DONE`.
3. Is it not yet done? Return `Progress.LATER` again.
As is evident, this will let the finite set of document processing threads to do more work at the same time.
## State
Any state in the document processor for the particular Processing should be kept as local variables in the process method, while state which should be shared by all Processings should be kept as member variables. As the latter kind will be accessed by multiple threads at any one time, the state of such member variables must be _thread-safe_. This critical restriction is similar to those of e.g. the Servlet API. Options for implementing a multithread-safe document processor with instance variables:
1. Use immutable (and preferably final) objects: they never change after they are constructed; no modifications to their state occurs after the DocumentProcessor constructor returns.
2. Use a single instance of a thread-safe class.
3. Create a single instance and synchronize access to it across all threads (but this will severely limit scalability).
4. Arrange for each thread to have its own instance, e.g. with a `ThreadLocal`.
### Processing Context Variables
`Processing` has a map `String -> Object` that can be used to pass information between document processors. It is also useful when using `Progress.LATER` to save the state of a processing - see [Processing.java](https://github.com/vespa-engine/vespa/blob/master/docproc/src/main/java/com/yahoo/docproc/Processing.java) for `get/setVariable` and more.
The [sample application](https://github.com/vespa-engine/sample-apps/tree/master/examples/document-processing) uses such context variables, too.
## Operation ordering
### Feed ordering
Ordering of feed operations is not guaranteed. Operations on different documents will be done concurrently and are therefore not ordered. However, Vespa guarantees that operations on the same document are processed in the order they were fed if they enter vespa at the _same_ feed endpoint.
### Document processing ordering
Document operations that are produced inside a document processor obey the same rules as at feed time. If you either split the input into other documents or into multiple operations to the same document, Vespa will ensure that operations to the same document id are sequenced and are delivered in the order they enter.
## (Re)configuring Document Processing
Consider the following configuration:
```
\\value\ \
```
Changing chain ids, components in a chain, component configuration and schema mapping all takes effect after deployment - no restart required. Changing a _cluster name_ (i.e. the container id) requires a restart of docproc services after _vespa activate_.
Note when adding or modifying a processing chain in a running cluster; if at the same time deploying a _new_ document processor (i.e. a document processor that was unknown to Vespa at the time the cluster was started), the container must be restarted:
```
$[vespa-sentinel-cmd](../reference/operations/self-managed/tools.html#vespa-sentinel-cmd)restart container
```
## Class diagram

The framework core supports asynchronous processing, processing one or multiple documents or document updates at the same time, document processors that makes dynamic decisions about the processing flow and passing of information between processors outside the document or document update:
- One or more named `Docproc Services` may be created. One of the services is the _default_.
- A service accepts subclasses of `DocumentOperation` for processing, meaning `DocumentPuts`, `DocumentUpdates` and `DocumentRemoves`. It has a `Call Stack` which lists the calls to make to various `DocumentProcessors` to process each DocumentOperation handed to the service.
- Call Stacks consist of `Calls`, which refer to the Document Processor instance to call.
- Document puts and document updates are processed asynchronously, the state is kept in a `Processing` for its duration (instead of in a thread or process). A Document Processor may make some asynchronous calls (typically to remote services) and return to the framework that it should be called again later for the same Processing to handle the outcome of the calls.
- A processing contains its own copy of the Call Stack of the Docproc Service to keep track of what to call next. Document Processors may modify this Call Stack to dynamically decide the processing steps required to process a DocumentOperation.
- A Processing may contain one or more DocumentOperations to be processed as a unit.
- A Processing has a `context`, which is a Map of named values which can be used to pass arguments between processors.
- Processings are prepared to be stored to disk, to allow a high number of ongoing long-term processings per node.
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [Deploying a Document Processor](#deploying-a-document-processor)
- [Document Processors](#document-processors)
- [Chains](#chains)
- [Execution model](#execution-model)
- [Asynchronous execution](#asynchronous-execution)
- [State](#state)
- [Processing Context Variables](#processing-context-variables)
- [Operation ordering](#operation-ordering)
- [Feed ordering](#feed-ordering)
- [Document processing ordering](#document-processing-ordering)
- [(Re)configuring Document Processing](#reconfiguring-document-processing)
- [Class diagram](#class-diagram)
---
# Source: https://docs.vespa.ai/en/writing/document-routing.html.md
# Routing
_Routing_ is used to configure the paths that documents and updates written to Vespa take through the system. Vespa will automatically set up a routing configuration which is appropriate for most cases, so no explicit routing configuration is necessary. However, explicit routing can be used in advanced use cases such as sending different document streams to different document processing clusters, or through multiple consecutive clusters etc.
There are other, more in-depth, articles on routing:
- Use [vespa-route](../reference/operations/self-managed/tools.html#vespa-route) to inspect routes and services of a Vespa application, like in the [example](#example-reconfigure-the-default-route)
- [Routing policies reference](#routing-policies-reference). See the [routing policies](#routing-policies) note for complex routes and default routing
In Vespa, there is a transport layer and a programming interface that are available to clients that wish to communicate with a Vespa application. The transport layer is _Message Bus_.[Document API](document-api-guide.html) is implemented on top of Message Bus. Configuring the interface therefore exposes some features available in Message Bus. Refer to the [Vespa APIs and interfaces](../reference/api/api.html)for clients using the _Document API_. The atoms in Vespa routing are _routes_ and _hops_.
[document-processing](https://github.com/vespa-engine/sample-apps/tree/master/examples/document-processing) is an example of custom document processing, and useful for testing routing.
## A route is a sequence of hops
The sequence of hosts, routers, bridges, gateways, and other devices that network traffic takes, or could take, from its source to its destination is what is classically termed a _route_. As a verb, _to route_ means to determine the link down which to send a packet, that will minimize its total journey time according to some routing algorithm.
In Vespa, a route is simply a sequence of named hops. Instead of leaving selection logic to a route, the responsibility of resolving recipients is given to the [hops](#a-hop-is-a-point-to-point-transmission)' [selectors](#selection-logic). A hop can do more or less whatever it wants to change a message's journey through your application; it can slightly alter itself by choosing among some predefined recipients, it can change itself completely by either rewriting or looking up another hop, or it can even modify the entire route from that branch onwards. In effect, a route can end up branching at several points along its path, resulting in complex routes. As the figure suggests, Message Bus supports both [unicasting](https://en.wikipedia.org/wiki/Unicast)and [multicasting](https://en.wikipedia.org/wiki/Multicast) - Message Bus allows for arbitrarily complex routes. Each node in the above graph represents a Vespa service:

## A hop is a point-to-point transmission
In telecommunication, a _hop_ is one step, from one router to the next, on the path of a packet on an Internet Protocol network. It is a direct host-to-host connection forming part of the route between two hosts in a routed network such as the Internet. In more general terms, a hop is a point-to-point transmission in a series required to get a message from point A to point B.
With Message Bus the concept of hops was introduced as the smallest steps of the transmission of a message. A hop consists of a _name_ that is used by the messaging clients to select it, a list of _recipient_ services that it may transmit to, and a _selector_ that is used to select among those recipients. Unlike traditional hops, in Vespa a hop is a transmission from one sender to many recipients.
Well, the above is only partially true; it is the easiest way to understand the hop concept. In fact, a hop's recipient list is nothing more than a configured list of strings that is made available to all [routing policies](#routing-policies)that are named in the selector string. See [selection logic](#selection-logic) below for details.
A hop's recipient is the service name of a Message Bus client that has been registered in Vespa's service location broker (vespa-slobrok). These names are well-defined once their derivation logic is understood; they are "/"-separated sets of address-components whose values are given by a service's role in the application. An example of a recipient is:
```
search/cluster.foo/*/feed-destination
```
The marked components of the above recipient, `/search/cluster.foo/*`, resolves to a host's symbolic name. This is the name with which a Message Bus instance was configured. The unmarked component, `feed-destination`, is the local name of the running service that the hop transmits to, i.e. the name of the _session_ created on the running Message Bus instance.
The Active Configuration page in Vespa's administration interface gives an insight into what symbolic names exist for any given application by looking at its current configuration subscriptions. All available Message Bus services use their `ConfigId` as their host's symbolic name. See [vespa-route](../reference/operations/self-managed/tools.html#vespa-route) for how to inspect this, or use the [config API](../reference/api/config-v2.html).
A hop can be prefixed using the special character "?" to force it to behave as if its[ignore-result](#hop) attribute was configured to "true".
### Asterisk
A service identifier may include the special character "\*" as an address component. A recipient that contains this character is a request for the network to choose _any one_ service that matches it.
## Routing policies
A routing policy is a protocol-specific algorithm that chooses among a list of candidate recipients for a single address component - see [hop description](#a-hop-is-a-point-to-point-transmission) above. These policies are designed and implemented as key parts of a Message Bus protocol. E.g. for the "Document" protocol these are what make up the routing behavior for document transmission. Without policies, a hop would only be able to match verbatim to a recipient, and thus the only advanced selection logic would be that of the [asterisk](#asterisk).
In addition to implementing a selection algorithm, a routing policy must also implement a merging algorithm that combines the replies returned from each selected recipient into a single sensible reply. This is needed because a client does not necessarily know whether a message has been sent to one or multiple recipients, and **Message Bus guarantees a single reply for every message**.
More formally, a routing policy is an arbitrarily large (or small), named, stand-alone piece of code registered with a Message Bus protocol. As discussed [above](#selection-logic), an instance of a policy is run both when resolving a route to recipients, and when merging replies. The policy is passed a `RoutingContext` object that pretty much allows it to do whatever it pleases to the route and replies. The same policy object and the same context object is used for both selection and merging.
Refer to the [routing policy reference](#routing-policies-reference).
## Selection logic
When Message Bus is about to route a message, at the last possible time, it inspects the **first** hop of the message's route to resolve a set of recipients. First, all of its [policies are resolved](#1-resolve-policy-directives). Second, the output service name is matched to the routing table to see if it maps to another hop or route. Finally, the message is [sent](#3-send-to-services) to all chosen recipient services. Because each policy can select multiple recipients, this can give rise to an arbitrarily complex routing tree. There are, of course, safeguards within Message Bus to prevent infinite recursions due to circular dependencies or misconfiguration.
**Note:** It **is** possible to develop a different protocol with other policies to run in the application, but since all of Vespa's component only support the "Document" protocol, it makes little sense to do so.
### 1. Resolve Policy Directives
The logic run at this step is actually simple; as long as the hop string contains a policy directive, i.e. some arbitrary string enclosed in square brackets, Message Bus will create and run an instance of that policy for the protocol of the message being routed.
```
Name: storage/cluster.backup
Selector: storage/cluster.backup/distributor/[Distributor]/default
Recipients: -
```
The above hop is probably the simplest hop you will encounter in Vespa; it has a single policy directive contained in a string that closely resembles service names discussed above, and it has no recipients. When resolving this hop, Message Bus creates an instance of the "DocumentRouteSelector" policy and invokes its `select()` method. The "Distributor" policy will replace its own directive with a proper distributor identifier, yielding a hop string that is now an unambiguous service identifier.
```
Name: indexing
Selector: [DocumentRouteSelector]
Recipients: search/cluster.music
search/cluster.books
```
This hop has a selector which is nothing more than a single policy directive, "[DocumentRouteSelector]", and it has two configured recipients, "search/cluster.music" and "search/cluster.books". This policy expands the hop to zero, one or two **new** routes by replacing its own directive with the content of the recipient routes. Each of these routes may have one or more hops themselves. In turn, these will be processed independently. When replies are available from all chosen recipients, the policy's `merge()` method is invoked, and the resulting reply is passed upwards.
```
Name: default
Selector: [AND:indexing storage/cluster.backup]
Recipients: -
```
This hop has a selector but no recipients. The reason for this is best explained in the [routing policies reference](#routing-policies-reference), but it serves as an example of a hop that has no configured recipients. Notice how the policy directive contains a colon (":") which denotes that the remainder of the directive is a parameter to the policy constructor. This policy replaces the whole route of the message with the set of routes named in the parameter string.
What routing policies are available depends on what protocol is currently running. As of this version the only supported protocol is "Document". This offers a set of routing policies discussed[below](#routing-policies-reference).
### 2. Resolve Hop- and Route names
As soon as all policy directives have been resolved, Message Bus makes sure that the resulting string is, in fact, a service name and not the name of another hop or route (in that order) configured for the running protocol. The outcome is either:
1. The string is recognized as a hop name - The current hop is replaced by the named one, and processing returns to [step 1](#1-resolve-policy-directives).
2. The string is recognized as a route name - The current route, including all the hops following this, is replaced by the named one. Processing returns to [step 1](#1-resolve-policy-directives).
3. The string is accepted as a service name - This terminates the current branch of the routing tree. If all branches are terminated, processing proceeds to [step 3](#3-send-to-services).
Because hop names are checked before route names, Message Bus also supports a "route:" prefix that forces the remainder of the string to resolve to a configured route or fail.
### 3. Send to Services
When the route resolver reaches this point, the first hop of the message being sent has been resolved to an arbitrarily complex routing tree. Each leaf of this tree represents a service that is to receive the message, unless some policy has already generated a reply for it. No matter how many recipients are chosen, the message is serialized only once, and the network transmission is able to share the same chunk of memory between all recipients.
As replies to the message arrive at the sender they are handed over to the corresponding leaf nodes of the routing tree, but merging will not commence until all leaf nodes are ready.
Route resolving happens just before network transmission, after all resending logic. This means that if the route configuration changes while there are messages scheduled for resending, these will adhere to the new routes.
If the resolution of a recipient passed through a hop that was configured to [ignore results](#hop), the network layer will reply immediately with a synthetic "OK".
## Example: Reconfigure the default route
Assume that the application requires both search and storage capabilities, but that the default feed should only pass through to search. An imaginary scenario for this would be a system where there is a continuous feed being passed into Vespa with no filtering on spam. You would like a minimal storage-only cluster that stores a URL blocklist that can be used by a custom document processor to block incoming documents from offending sites.
Apart from the blocklist and the document processor, add the following:
```
```
This overrides the default route to pass through any available blocklisting document processor before being indexed. If the document processor decides to block a message, it must respond with an appropriate _ok_ reply, or your client software needs to accept whatever error reply you decide to return when blocking.
When feeding blocklist information to storage, your application need only use the already available `storage` hop.
See [#13193](https://github.com/vespa-engine/vespa/issues/13193) for a discussion on using _default_ as a name.
### The Document API
With the current implementation of Document API running on Message bus, the configuration of the API implies configuration of the latter. Most clients will only ever route through this API. To use the Document API, you need to instantiate a class that implements the `DocumentAccess` interface. At the time of writing only `MessageBusDocumentAccess` exists, and it requires a parameter set for creation. These parameters are contained in an instance of `MessageBusDocumentAccessParam`that looks somewhat like the following:
```
class MessageBusDocumentAccessParams {
String documentManagerConfigId; // The id to resolve to document manager config.
String oosServerPattern; // The service pattern to resolve to fleet controller
// services.
String appConfigId; // The id to resolve to application config.
String slobrokConfigId; // The id to resolve to slobrok config.
String routingConfigId; // The id to resolve to messagebus routing config. String routeName; // The name of the route to send to. int traceLevel; // The trace level to use when sending.class SourceSessionParams { int maxPending; // Maximum number of pending messages. int maxPendingSize; // Maximum size of pending messages. double timeout; // Default timeout in seconds for messages // that have no timeout set. double requestTimeoutA; // Default request timeout in seconds, using double requestTimeoutB; // the equation 'requestTimeout = a \* retry + b'. double retryDelay; // Number of seconds to wait before resending.};
}
```
The most obvious configuration parameter is `routeName`, which informs the `MessageBusDocumentAccess` object the name of the route to use when sending documents and updates. The second parameter is `traceLevel`, which allows a client to see exactly how the data was transmitted.
**Note:** Tracing can be enabled on a level from 1-9, where a higher number means more tracing. Because the concept of tracing is not exposed by the Document API itself, its data will simply be printed to standard output when a reply arrives for the sender. This should therefore not be used in production, but can be helpful when debugging.
Refer to the [Document API JavaDoc](https://javadoc.io/doc/com.yahoo.vespa/documentapi).
## Routing services
This is the reference documentation for all elements in the _routing_ section of [services.xml](../reference/applications/services/services.html).
```
[routing [version]](#routing)[routingtable [protocol, verify]](#routingtable)[route [name, hops]](#route)[hop [name, selector, ignore-result]](#hop)[recipient [session]](#recipient)[services [protocol]](#services)[service [name]](#service)
```
## routing
Contained in [services](../reference/applications/services/services.html#services). The container element for all configuration related to routing.
| Attribute | Required | Value | Default | Description |
| --- | --- | --- | --- | --- |
| version | required | number | | Must be set to "1.0" in this Vespa-version |
Optional subelements:
- [routingtable](#routingtable)
- [services](#services)
Example:
```
```
## routingtable
Contained in [routing](#routing). Specifies a routing table for a specific protocol.
| Attribute | Required | Value | Default | Description |
| --- | --- | --- | --- | --- |
| protocol | required | | | Configure which protocol to use. Only the protocol _document_ is defined, so if you define a routing table for an unsupported protocol, the application will just log an INFO entry that contains the name of that protocol. |
| verify | optional | boolean | | ToDo: document this |
Optional subelements:
- [route](#route)
- [hop](#hop)
Example:
```
```
## route
Contained in [routingtable](#routingtable). Specifies a route for a message to its destination through a set of intermediate hops. If at least one hop in a route does not exist, the application will fail to start and issue an error that contains the name of that hop.
| Attribute | Required | Value | Default | Description |
| --- | --- | --- | --- | --- |
| name | required | | | Route name. |
| hops | required | | | A whitespace-separated list of hop names, where each name must be a valid hop. |
Subelements: none
Example:
```
```
## hop
Contained in [routingtable](#routingtable). Specifies a single hop that can be used to construct one or more routes. A hop must have a name that is unique within the routing table to which it belongs. A hop contains a selector string and a list of recipient sessions.
| Attribute | Required | Value | Default | Description |
| --- | --- | --- | --- | --- |
| name | required | | | Hop name. |
| selector | required | | | Selector string. |
| ignore-result | optional | | | If set to _true_, specifies that the result of routing through that hop should be ignored. |
Optional subelements:
- [recipient](#recipient)
Example:
```
```
## recipient
Contained in [hop](#hop). Specifies a recipient session of a hop.
| Attribute | Required | Value | Default | Description |
| --- | --- | --- | --- | --- |
| session | required | | | This attribute must correspond to a running instance of a service that can be routed to. All session identifiers consist of a location part and a name. A search node is always given a session name on the form _search/cluster.name/g#/r#/c#/feed-destination_, whereas a document processor service is always named _docproc/cluster.name/docproc/#/feed-processor_. |
Subelements: none
Example:
```
```
## services
Contained in [routing](#routing). Specifies a set of services available for a specific protocol. At the moment the only supported protocol is _document_. The services specified are used by the route verification step to allow hops and routes to reference services known to exist, but that can not be derived from _services.xml_.
| Attribute | Required | Value | Default | Description |
| --- | --- | --- | --- | --- |
| protocol | required | | | Configure which protocol to use. Only the protocol _document_ is defined. |
Optional subelements:
- [service](#service)
Example:
```
```
## service
Contained in [services](#services). Specifies a single known service that can not be derived from the _services.xml_.
| Attribute | Required | Value | Default | Description |
| --- | --- | --- | --- | --- |
| name | required | | | The name of the service. |
Subelements: none
Example:
```
```
## Routingpolicies reference
This article contains detailed descriptions of the behaviour of all routing policies available in Vespa.
The _Document protocol_ is currently the only Message Bus protocol supported by Vespa. Furthermore, all routing policies that are part of this protocol share a common code path for [merging replies](#merge). The policies offered by the protocol are:
- [AND](#and) - Selects all configured recipient hops.
- [DocumentRouteSelector](#documentrouteselector) - Uses a [document selection string](../reference/writing/document-selector-language.html) to select compatible routes
- [Content](#content) - Selects a content cluster distributor based on system state
- [MessageType](#messagetype) - Selects a next hop based on message type
- [Extern](#extern) - Selects a recipient by querying a remote Vespa application
- [LocalService](#localservice) - Selects a recipient based on ip address
- [RoundRobin](#roundrobin) - Selects one from the configured recipients in round-robin order
- [SubsetService](#subsetservice) - Selects only among a subset of all matching services
- [LoadBalancer](#loadbalancer) - A round-robin policy that chooses between the recipients by generating a weight according to their performance
### Common Document `merge()` logic
The shared merge logic of most Document routing policies is an attempt to do the "right" thing when merging multiple replies into one. It works by first stepping through all replies, storing their content as either:
1. OK replies,
2. IGNORE replies, or
3. ERROR replies
If at least one ERROR reply is found, return a new reply that contains all the errors of the others. If there is at least one OK reply, return the first OK reply, but transfer all feed information from the others to this (this is specific data for start- and end-of-feed messages). Otherwise, return a new reply that contains all the IGNORE errors. Pseudocode:
```
for each reply, do
if reply has no errors, do
store reply in OK list
else, do
if reply has only IGNORE errors
copy all errors from reply to IGNORE list
else, do
copy all errors from reply to ERROR list
if ERROR list is not empty, do
return new reply with all errors
else, do
if OK list is not empty, do
return first reply with all feed answers
else, do
return new reply with all IGNORE errors
```
## Routing policies reference
| Policy | Description |
| --- | --- |
| AND |
This is a mostly a convenience policy that allows the user to fork a message's route to all configured recipients. It is not message-type aware, and will simply always select all recipients. Replies are merged according to the [shared logic](#merge).
The optional string parameter is parsed as a space-separated list of hops. Configured recipients have precedence over parameter-given recipients, although this is likely to be changed in the future.
|
| DocumentRouteSelector |
This policy is responsible for selecting among the policy's recipients according to the subscription rules defined by a content cluster's _documents_ element in [services.xml](../reference/applications/services/services.html). If the "selection" attribute is set in the "documents" element, its value is processed as a [document select](../reference/writing/document-selector-language.html) string, and run on documents and document updates to determine routes. If the "feedname" attribute is set, all feed commands are filtered through it.
The recipient list of this policy is required to map directly to route names. E.g. if a recipient is "search/cluster.music", and a message is appropriate according to the selection criteria, the message is routed to the "search/cluster.music" route. If the route does not exist, this policy will reply with an error. In short, this policy selects one or more recipient routes based on document content and configured criteria.
If more than one route is chosen, its replies are merged according to the [shared logic](#merge).
This policy does not support any parameters.
The configuration for this is "documentrouteselectorpolicy" available from config id "routing/documentapi".
**Important:** Because GET messages do not contain any document on which to run the selection criteria, this policy returns an IGNORED reply that the merging logic processes. You can see this by attempting to retrieve a document from an application that does not have a content cluster.
|
| Content |
This policy allows you to send a message to a content cluster. The policy uses a system state retrieved from the cluster in question in conjunction with slobrok information to pick the correct distributor for your message.
In short; use this policy when communicating with document storage.
This policy supports multiple parameters, up to one each of:
| cluster | The name of the cluster you want to reach. Example: cluster=mycluster |
| config | A comma-separated list of config servers or proxies you want to use to fetch configuration for the policy. This can be used to communicate with other clusters than the one you're currently in. Example: config=tcp/myadmin1:19070,tcp/myadmin2:19070 |
Separate each parameter with a semicolon. |
| MessageType |
This policy will select the next hop based on the type of the message. You configure where all messages should go (defaultroute). Then you configure what messages types should be overridden and sent to alternative routes. It is currently only used internally by vespa when using the [content](../reference/applications/services/content.html#content) element.
|
| Extern |
This policy implements the necessary logic to communicate with an external Vespa application and resolve a single service pattern using that other application's slobrok servers. Keep in mind that there might be some delay from the moment this policy is initially created and when it receives the response to its service query, so using this policy might cause a message to be resent a few times until it is resolved. If you disable retries, this policy might cause all messages to fail for the first seconds.
This policy uses its parameter for both the address of the extern slobrok server to connect to, and also the pattern to use for querying. The parameter is required to be on the form `;`, where `spec` is a comma-separated list of slobrok connection specs on the form "tcp/hostname:port", and `service` is a service running on the remote Vespa application.
**Important:** The remote application needs to have a version of both message bus and the document api that is binary compatible with the application sending from. This can be a problem even between patch releases, so keep the application versions in sync when using this policy.
|
| LocalService |
This policy is used to select among all matching services, but preferring those running on the same host as the current one. The pattern used when querying for available services is the current one, but replacing the policy directive with an asterisk. E.g. the hop "docproc/cluster.default/[LocalService]/chain.default" would prefer local services among all those that match the pattern "docproc/cluster.default/\*/chain.default". If there are multiple matching services that run locally, this policy will do simple round-robin load balancing between them. If no matching services run locally, this policy simply returns the asterisk as a match to allow the underlying network logic to do load balancing among all available.
This policy accepts an optional parameter which overrides the local hostname. Use this if you wish the hop to prefer some specific host.
**Important:** There is no additional logic to replace other policy directives with an asterisk, meaning that if other policies directives are present in the hop string after "[LocalService]", no services can possibly be matched.
|
| RoundRobin |
This policy is used to select among a configured set of recipients. For each configured recipient, this policy determines what online services are matched, and then selects one among all of those in round-robin order. If none of the configured recipients match any available service, this policy returns an error that indicates to the sender that it should retry later.
Because this policy only selects a single recipient, it contains no merging logic.
|
| SubsetService |
This policy is used to select among a subset of all matching services, and is used to minimize number of connections in the system. The pattern used when querying for available services is the current one, but replacing the policy directive with an asterisk. E.g. the hop "docproc/cluster.default/[SubsetService:3]/chain.default" would select among a subset of all those that match the pattern "docproc/cluster.default/\*/chain.default". Given that the pattern returns a set of matches, this policy stores a subset of these based on the hash-value of the running message bus' connection string (this is unique for each instance). If there are no matching services, this policy returns the asterisk as a match to allow the underlying network logic to fail gracefully.
This policy parses its optional parameter as the size of the subset. If none is given, the subset defaults to size 5.
**Important:** There is no additional logic to replace other policy directives with an asterisk, meaning that if other policies directives are present in the hop string after "[SubsetService]", no services can possibly be matched.
|
| LoadBalancer |
This policy is used to send to a stateless cluster such as docproc, where any node can be chosen to process any message. Messages are sent between the nodes in a round-robin fashion, but each node is assigned a weight based on its performance. The weights are calculated by measuring the number of times the node had a full input-queue and returned a busy response. Use this policy to send to docproc clusters that have nodes with different performance characteristics.
This policy supports multiple parameters, up to one each of:
| cluster | The name of the cluster you want to reach. Example: cluster=docproc/cluster.default (mandatory) |
| session | The destination session you want to reach. In the case of docproc, the name of the docproc chain. Example: session=chain.mychain (mandatory) |
| config | A comma-separated list of config servers or proxies you want to use to fetch configuration for the policy. This can be used to communicate with other clusters than the one you're currently in. Example: config=tcp/myadmin1:19070,tcp/myadmin2:19070 |
Separate each parameter with a semicolon. By default, this policy will use the current Vespa cluster for configuration. |
## Routing for indexing
A normal Vespa configuration has container and content cluster(s), with one or more document types defined in _schemas_. Routing document writes means routing documents to the _indexing_ container cluster, then the right _content_ cluster.
The indexing cluster is a container cluster - see [multiple container clusters](#multiple-container-clusters) for variants. Add the [document-api](../reference/applications/services/container.html#document-api) feed endpoint to this cluster. The mapping from document type to content cluster is in [document](../reference/applications/services/content.html#document) in the content cluster. From [album-recommendation](https://github.com/vespa-engine/sample-apps/blob/master/album-recommendation/app/services.xml):
```
\
1
```
Given this configuration, Vespa knows which is the container cluster used for indexing, and which content cluster that stores the _music_ document type. Use [vespa-route](../reference/operations/self-managed/tools.html#vespa-route) to display routing generated from this configuration:
```
$ vespa-route
There are 6 route(s):
1.default2. default-get
3. music
4. music-direct
5. music-index
6. storage/cluster.music
There are 2 hop(s):
1.container/chain.indexing
2. indexing
```
Note the _default_ route. This route is auto-generated by Vespa, and is used when no other route is used when using [/document/v1](../reference/api/document-v1.html). _default_ points to _indexing_:
```
$ vespa-route --route default
The route 'default' has 1 hop(s):
1. indexing
```
```
$ vespa-route --hop indexing
The hop 'indexing' has selector:
[DocumentRouteSelector]
And 1 recipient(s):
1. music
```
```
$ vespa-route --route music
The route 'music' has 1 hop(s):
1. [MessageType:music]
```
In short, the _default_ route handles documents of type _music_. Vespa will route to the container cluster with _document-api_ - note the _chain.indexing_ above. This is a set of built-in _document processors_ that does the indexing (below).
Refer to the [trace appendix](#appendix-trace) for routing details.
## chain.indexing
This indexing chain is set up on the container once a content cluster has `mode="index"`.
The [IndexingProcessor](https://github.com/vespa-engine/vespa/blob/master/docprocs/src/main/java/com/yahoo/docprocs/indexing/IndexingProcessor.java) annotates the document based on the [indexing script](../reference/writing/indexing-language.html) generated from the schema. Example:
```
$ vespa-get-config -n vespa.configdefinition.ilscripts \
-i container/docprocchains/chain/indexing/component/com.yahoo.docprocs.indexing.IndexingProcessor
maxtermoccurrences 100
fieldmatchmaxlength 1000000
ilscript[0].doctype "music"
ilscript[0].docfield[0] "artist"
ilscript[0].docfield[1] "artistId"
ilscript[0].docfield[2] "title"
ilscript[0].docfield[3] "album"
ilscript[0].docfield[4] "duration"
ilscript[0].docfield[5] "year"
ilscript[0].docfield[6] "popularity"
ilscript[0].content[0] "clear_state | guard { input artist | tokenize normalize stem:"BEST" | summary artist | index artist; }"
ilscript[0].content[1] "clear_state | guard { input artistId | summary artistId | attribute artistId; }"
ilscript[0].content[2] "clear_state | guard { input title | tokenize normalize stem:"BEST" | summary title | index title; }"
ilscript[0].content[3] "clear_state | guard { input album | tokenize normalize stem:"BEST" | index album; }"
ilscript[0].content[4] "clear_state | guard { input duration | summary duration; }"
ilscript[0].content[5] "clear_state | guard { input year | summary year | attribute year; }"
ilscript[0].content[6] "clear_state | guard { input popularity | summary popularity | attribute popularity; }"
```
Refer to [linguistics](../linguistics/linguistics.html) for more details.
By default, the indexing chain is set up on the _first_ container cluster in _services.xml_. When having multiple container clusters, it is recommended to configure this explicitly, see [multiple container clusters](#multiple-container-clusters).
## Document selection
The [document](../reference/applications/services/content.html#document) can have a [selection](../reference/writing/document-selector-language.html) string, normally used to expire documents. This is also evaluated during feeding, so documents that would immediately expire are dropped. This is not an error, the document API will report 200 - but can be confusing.
The evaluation is done in the [DocumentRouteSelector](https://github.com/vespa-engine/vespa/blob/master/documentapi/src/main/java/com/yahoo/documentapi/messagebus/protocol/DocumentRouteSelectorPolicy.java) at the feeding endpoint - _before_ any processing/indexing. I.e. the document is evaluated using the selection string (drop it or not), then where to route it, based on document type.
Example: the selection is configured to not match the document being fed:
```
1
```
```
$ vespa-feeder --trace 6 doc.json
[1564576570.693] Source session accepted a 4096 byte message. 1 message(s) now pending.
[1564576570.713] Sequencer sending message with sequence id '-1163801147'.
[1564576570.721] Recognized 'default' as route 'indexing'.
[1564576570.727] Recognized 'indexing' as HopBlueprint(selector = { '[DocumentRouteSelector]' }, recipients = { 'music' }, ignoreResult = false).
[1564576570.811] Running routing policy 'DocumentRouteSelector'.
[1564576570.822] Policy 'DocumentRouteSelector' assigned a reply to this branch.
[1564576570.828] Sequencer received reply with sequence id '-1163801147'.
[1564576570.828] Source session received reply. 0 message(s) now pending.
Messages sent to vespa (route default) :
----------------------------------------
PutDocument: ok: 0 msgs/sec: 0.00 failed: 0ignored: 1latency(min, max, avg): 9223372036854775807, -9223372036854775808, 0
```
Without the selection (i.e. everything matches):
```
$ vespa-feeder --trace 6 doc.json
[1564576637.147] Source session accepted a 4096 byte message. 1 message(s) now pending.
[1564576637.168] Sequencer sending message with sequence id '-1163801147'.
[1564576637.176] Recognized 'default' as route 'indexing'.
[1564576637.180] Recognized 'indexing' as HopBlueprint(selector = { '[DocumentRouteSelector]' }, recipients = { 'music' }, ignoreResult = false).
[1564576637.256] Running routing policy 'DocumentRouteSelector'.[1564576637.268] Component '[MessageType:music]' selected by policy 'DocumentRouteSelector'....
Messages sent to vespa (route default) :
----------------------------------------
PutDocument: ok: 1 msgs/sec: 1.05 failed: 0ignored: 0latency(min, max, avg): 845, 845, 845
```
In the last case, in the [DocumentRouteSelector](https://github.com/vespa-engine/vespa/blob/master/documentapi/src/main/java/com/yahoo/documentapi/messagebus/protocol/DocumentRouteSelectorPolicy.java) routing policy, the document matched the selection string / there was no selection string, and the document was forward to the nex hop in the route.
## Document processing
Add custom processing of documents using [document processing](../applications/document-processors.html). The normal use case is to add document processors in the default route, before indexing. Example:
```
\\\\ \
1
type="music" mode="index" />
```
Note that a new hop _default/chain.default_ is added, and the default route is changed to include this:
```
$ vespa-route
There are 6 route(s):
1. default
2. default-get
3. music
4. music-direct
5. music-index
6. storage/cluster.music
There are 3 hop(s):
1. default/chain.default
2. default/chain.indexing
3. indexing
```
```
$ vespa-route --route default
The route 'default' has 2 hop(s):
1. default/chain.default
2. indexing
```
Note that the document processing chain must be called _default_ to automatically be included in the default route.
### Inherit indexing chain
An alternative to the above is inheriting the indexing chain - use this when getting this error:
```
Indexing cluster 'XX' specifies the chain 'default' as indexing chain.
As the 'default' chain is run by default, using it as the indexing chain will run it twice.
Use a different name for the indexing chain.
```
Call the chain something else than _default_, and let it inherit _indexing_:
```
\\\\ \
1 \\\\
```
See [#13193](https://github.com/vespa-engine/vespa/issues/13193) for details.
## Multiple container clusters
Vespa can be configured to use more than one container cluster. Use cases can be to separate search and document processing or having different document processing clusters due to capacity constraints or dependencies. Example with separate search and feeding/indexing container clusters:
```
\\
1
\
```
Notes:
- The indexing route is explicit using [document-processing](../reference/applications/services/content.html#document-processing) elements from the content to the container cluster
- Set up _document-api_ on the same cluster as indexing to avoid network hop from feed endpoint to indexing processors
- If no _document-processing_ is configured, it defaults to a container cluster named _default_. When using multiple container clusters, it is best practice to explicitly configure _document-processing_.
Observe the _container-indexing/chain.indexing_ hop, and the indexing chain is set up on the _container-indexing_ cluster:
```
$ vespa-route
There are 6 route(s):
1. default
2. default-get
3. music
4. music-direct
5. music-index
6. storage/cluster.music
There are 2 hop(s):
1. container-indexing/chain.indexing
2. indexing
```
```
$ curl -s http://localhost:8081 | python -m json.tool | grep -C 3 chain.indexing
{
"bundle": "container-disc:7.0.0",
"class": "com.yahoo.messagebus.jdisc.MbusClient",
"id": "chain.indexing@MbusClient",
"serverBindings": []
},
{
--
"class": "com.yahoo.docproc.jdisc.DocumentProcessingHandler",
"id": "com.yahoo.docproc.jdisc.DocumentProcessingHandler",
"serverBindings": [
"mbus://*/chain.indexing"
]
},
{
```
## Appendix: trace
Below is a trace example, no selection string:
```
$ cat doc.json
[
{
"put": "id:mynamespace:music::123",
"fields": {
"album": "Bad",
"artist": "Michael Jackson",
"title": "Bad",
"year": 1987,
"duration": 247
}
}
]
$ vespa-feeder --trace 6 doc.json
[1564571762.403] Source session accepted a 4096 byte message. 1 message(s) now pending.
[1564571762.420] Sequencer sending message with sequence id '-1163801147'.
[1564571762.426] Recognized 'default' as route 'indexing'.
[1564571762.429] Recognized 'indexing' as HopBlueprint(selector = { '[DocumentRouteSelector]' }, recipients = { 'music' }, ignoreResult = false).
[1564571762.489] Running routing policy 'DocumentRouteSelector'.
[1564571762.493] Component '[MessageType:music]' selected by policy 'DocumentRouteSelector'.
[1564571762.493] Resolving '[MessageType:music]'.
[1564571762.520] Running routing policy 'MessageType'.
[1564571762.520] Component 'music-index' selected by policy 'MessageType'.
[1564571762.520] Resolving 'music-index'.
[1564571762.520] Recognized 'music-index' as route 'container/chain.indexing [Content:cluster=music]'.
[1564571762.520] Recognized 'container/chain.indexing' as HopBlueprint(selector = { '[LoadBalancer:cluster=container;session=chain.indexing]' }, recipients = { }, ignoreResult = false).
[1564571762.526] Running routing policy 'LoadBalancer'.
[1564571762.538] Component 'tcp/vespa-container:19101/chain.indexing' selected by policy 'LoadBalancer'.
[1564571762.538] Resolving 'tcp/vespa-container:19101/chain.indexing [Content:cluster=music]'.
[1564571762.580] Sending message (version 7.83.27) from client to 'tcp/vespa-container:19101/chain.indexing' with 179.853 seconds timeout.
[1564571762.581] Message (type 100004) received at 'container/container.0' for session 'chain.indexing'.
[1564571762.581] Message received by MbusServer.
[1564571762.582] Request received by MbusClient.
[1564571762.582] Running routing policy 'Content'.
[1564571762.582] Selecting route
[1564571762.582] No cluster state cached. Sending to random distributor.
[1564571762.582] Too few nodes seen up in state. Sending totally random.
[1564571762.582] Component 'tcp/vespa-container:19114/default' selected by policy 'Content'.
[1564571762.582] Resolving 'tcp/vespa-container:19114/default'.
[1564571762.586] Sending message (version 7.83.27) from 'container/container.0' to 'tcp/vespa-container:19114/default' with 179.995 seconds timeout.
[1564571762.587181] Message (type 100004) received at 'storage/cluster.music/distributor/0' for session 'default'.
[1564571762.587245] music/distributor/0 CommunicationManager: Received message from message bus
[1564571762.587510] Communication manager: Sending Put(BucketId(0x2000000000000020), id:mynamespace:music::123, timestamp 1564571762000000, size 275)
[1564571762.587529] Communication manager: Passing message to source session
[1564571762.587547] Source session accepted a 1 byte message. 1 message(s) now pending.
[1564571762.587681] Sending message (version 7.83.27) from 'storage/cluster.music/distributor/0' to 'storage/cluster.music/storage/0/default' with 180.00 seconds timeout.
[1564571762.587960] Message (type 10) received at 'storage/cluster.music/storage/0' for session 'default'.
[1564571762.588052] music/storage/0 CommunicationManager: Received message from message bus
[1564571762.588263] PersistenceThread: Processing message in persistence layer
[1564571762.588953] Communication manager: Sending PutReply(id:mynamespace:music::123, BucketId(0x2000000000000020), timestamp 1564571762000000)
[1564571762.589023] Sending reply (version 7.83.27) from 'storage/cluster.music/storage/0'.
[1564571762.589332] Reply (type 11) received at 'storage/cluster.music/distributor/0'.
[1564571762.589448] Source session received reply. 0 message(s) now pending.
[1564571762.589459] music/distributor/0Communication manager: Received reply from message bus
[1564571762.589679] Communication manager: Sending PutReply(id:music:music::123, BucketId(0x0000000000000000), timestamp 1564571762000000)
[1564571762.589807] Sending reply (version 7.83.27) from 'storage/cluster.music/distributor/0'.
[1564571762.590] Reply (type 200004) received at 'container/container.0'.
[1564571762.590] Routing policy 'Content' merging replies.
[1564571762.590] Reply received by MbusClient.
[1564571762.590] Sending reply from MbusServer.
[1564571762.590] Sending reply (version 7.83.27) from 'container/container.0'.
[1564571762.612] Reply (type 200004) received at client.
[1564571762.613] Routing policy 'LoadBalancer' merging replies.
[1564571762.613] Routing policy 'MessageType' merging replies.
[1564571762.615] Routing policy 'DocumentRouteSelector' merging replies.
[1564571762.622] Sequencer received reply with sequence id '-1163801147'.
[1564571762.622] Source session received reply. 0 message(s) now pending.
Messages sent to vespa (route default) :
----------------------------------------
PutDocument: ok: 1 msgs/sec: 3.30 failed: 0 ignored: 0 latency(min, max, avg): 225, 225, 225
```
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [A route is a sequence of hops](#a-route-is-a-sequence-of-hops)
- [A hop is a point-to-point transmission](#a-hop-is-a-point-to-point-transmission)
- [Asterisk](#asterisk)
- [Routing policies](#routing-policies)
- [Selection logic](#selection-logic)
- [1. Resolve Policy Directives](#1-resolve-policy-directives)
- [2. Resolve Hop- and Route names](#2-resolve-hop-and-route-names)
- [3. Send to Services](#3-send-to-services)
- [Example: Reconfigure the default route](#example-reconfigure-the-default-route)
- [The Document API](#the-document-api)
- [Routing services](#routing-services)
- [routing](#routing)
- [routingtable](#routingtable)
- [route](#route)
- [hop](#hop)
- [recipient](#recipient)
- [services](#services)
- [service](#service)
- [Routingpolicies reference](#routingpolicies-reference)
- [Common Document](#merge)
- [Routing policies reference](#routing-policies-reference)
- [Routing for indexing](#routing-for-indexing)
- [chain.indexing](#chain-indexing)
- [Document selection](#document-selection)
- [Document processing](#document-processing)
- [Inherit indexing chain](#inherit-indexing-chain)
- [Multiple container clusters](#multiple-container-clusters)
- [Appendix: trace](#appendix-trace)
---
# Source: https://docs.vespa.ai/en/reference/writing/document-selector-language.html.md
# Document selector language reference
This document describes the _document selector language_, used to select a subset of documents when feeding, dumping and garbage collecting data. It defines a text string format that can be parsed to build a parse tree, which in turn can answer whether a given document is contained within the subset or not.
## Examples
Match all documents in the `music` schema:
`music`
As applications can have multiple schemas, match document type (schema) and then a specific value in the `artistname` field:
`music and music.artistname == "Coldplay"`
Below, the first condition states that the documents should be of type music, and the author field must exist. The second states that the field length must be set, and be less than or equal to 1000:
`music.author and music.length <= 1000`
The next expression selects all documents where either of the subexpressions are true. The first one states that the author field should include the name John Doe, with anything in between or in front. The `\n` escape is converted to a newline before the field comparison is done. Thus requiring the field to end with Doe and a newline for a match to be true. The second expression selects all books where no author is defined:
`book.author = "*John*Doe\n" or not book.author`
Here is an example of how parentheses are used to group expressions. Also, a constant value false has been used. Note that the `(false or music.test)` sub-expression could be exchanged with just `music.test` without altering the result of the selection. The sub-expression within the `not` clause selects all documents where the size field is above 1000 and the test field is defined. The `not` clause inverts the selection, thus selecting all documents with size less than or equal to 1000 or the test field undefined:
`not (music.length > 1000) and (false or music.test)`
Other examples:
- `music.version() == 3 and (music.givenname + " " + music.surname).lowercase() = "bruce spring*"`
- `id.user.hash().abs() % 300 % 7 = 1`
- `music.wavstream.hash() == music.checksum`
- `music.size / music.length > 10`
- `music.expire > now() - 7200`
## Case sensitiveness
The identifiers used in this language (`and or not true false null id
scheme namespace specific user group`) are not case-sensitive. It is recommended to use lower cased identifiers for consistency with the documentation.
## Branch operators / precedence
The branch operators are used to combine other nodes in the parse tree generated from the text format. The different branch nodes existing is listed in the table below in order of precedence. Operators listed in order of precedence:
| Operator | Description |
| --- | --- |
| NOT | Unary prefix operator inverting the selection of the child node |
| AND | Binary infix operator, which is true if all its children are |
| OR | Binary infix operator, which is true if any of its children are |
Use parentheses to define own precedence. `a and b or c and d` is equivalent to `(a and b) or (c and d)
` since and has higher precedence than or. The expression `a and (b or c) and d` is not equivalent to the previous two, since parentheses have been used to force the or-expression to be evaluated first.
Parentheses can also be used in value calculations. Where modulo `%` has the highest precedence, multiplication `*` and division `/` next, addition `+` and subtractions `-` have lowest precedence.
## Primitives
| Primitive | Description |
| --- | --- |
| Boolean constant | The boolean constants `true` and `false` can be used to match all/nothing |
| Null constant | Referencing a field that is not present in a document returns a special `null` value. The expression `music.title` is shorthand for `music.title != null`. There are potentially subtle interactions with null values when used with comparisons, see [comparisons with missing fields (null values)](#comparisons-with-missing-fields-null-values). |
| Document type | A document type can be used as a primitive to select a given type of documents - [example](/en/writing/visiting.html#analyzing-field-values). |
| Document field specification | A document field specification (`doctype.field`) can be used as a primitive to select all documents that have field set - a shorter form of `doctype.field != null` |
| Comparison | The comparison is a primitive used to compare two values |
## Comparison
Comparisons operators compares two values using an operator. All the operators are infix and take two arguments.
| Operator | Description |
| --- | --- |
| \> | This is true if the left argument is greater than the right one. Operators using greater than or less than notations only makes sense where both arguments are either numbers or strings. In case of strings, they are ordered by their binary (byte-wise) representation, with the first character being the most significant and the last character the least significant. If the argument is of mixed type or one of the arguments are not a number or a string, the comparison will be invalid and not match. |
| \< | Matches if left argument is less than the right one |
| \<= | Matches if the left argument is less than or equal to the right one |
| \>= | Matches if the left argument is greater than or equal to the right one |
| == | Matches if both arguments are exactly the same. Both arguments must be of the same type for a match |
| != | Matches if both arguments are not the same |
| = | String matching using a glob pattern. Matches only if the pattern given as the right argument matches the whole string given by the left argument. Asterisk `*` can be used to match zero or more of any character. Question mark `?` can be used to match any one character. The pattern matching operators, regex `=~` and glob `=`, only makes sense if both arguments are strings. The regex operator will never match anything else. The glob operator will revert to the behaviour of `==` if both arguments are not strings. |
| =~ | String matching using a regular expression. Matches if the regular expression given as the right argument matches the string given as the left argument. Regex notation is like perl. Use '^' to indicate start of value, '$' to indicate end of value |
### Comparisons with missing fields (null values)
The only comparison operators that are well-defined when one or both operands may be `null`(i.e. field is not present) are `==` and `!=`. Using any other comparison operators on a `null` value will yield a special _invalid_ value.
Invalid values may "poison" any logical expression they are part of:
- `AND` returns invalid if none of its operands are false and at least one is invalid
- `OR` returns invalid if none of its operands are true and at least one is invalid
- `NOT` returns invalid if the operand is invalid
If an invalid value is propagated as the root result of a selection expression, the document is not considered a match. This is usually the behavior you want; if a field does not exist, any selection requiring it should not match either. However, in garbage collection, documents which results in an invalid selection are _not_ removed as that could be dangerous.
One example where this may have _unexpected_ behavior:
1. You have many documents of type `foo` already fed into a cluster.
2. You add a new field `expires_at_time` to the document type and update a subset of the documents that you wish to keep.
3. You add a garbage collection selection to the `foo` document declaration to only keep non-expired documents: `foo.expires_at_time > now()`
At this point, the old documents that _do not_ contain an `expires_at_time` field will _not_ be removed, as the expression will evaluate to invalid instead of `false`.
To work around this issue, "short-circuiting" using a field presence check may be used: `(foo.expires_at_time != null) and (foo.expires_at_time > now())`.
## Null behavior with imported fields
If your selection references imported fields, `null` will be returned for any imported field when the selection is evaluated in a context where the referenced document can't be retrieved. For GC expressions this will happen in the client as part of the feed routing logic, and it may also happen on backend nodes whose parent document set is incomplete (in case of node failures etc.). It is therefore important that you have this in mind when writing GC selections using imported fields.
When you specify a selection criteria in a `` tag, you're stating what a document must satisfy in order to be fed into the content cluster and to be kept there.
As an example, imagine a document type `music_recording` with an imported field `artist_is_cool` that points to a boolean field `is_cool` in a parent `artist` document. If you only want your cluster to retain recordings from artists that are certifiably cool, you might be tempted to write a selection like the following:
```
```
```
```
**This won't work as expected**, because this expression is evaluated as part of the feeding pipeline to figure out if a cluster should accept a given document. At that point in time, there is no access to the parent document. Consequently, the field will return `null` and the document won't be routed to the cluster.
Instead, write your expressions to handle the case where the parent document _may not exist_:
```
```
```
```
With this selection, we explicitly let a document be accepted into the cluster if its imported field is _not_ available. However, if it _is_ available, we allow it to be used for GC.
## Locale / Character sets
The language currently does not support character sets other than ASCII. Glob and regex matching of single characters are not guaranteed to match exactly one character, but might match a part of a character represented by multiple byte values.
## Values
The comparison operator compares two values. A value can be any of the following:
| Document field specification |
Syntax: `.`
Documents have a set of fields defined, depending on the document type. The field name is the identifier used for the field. This expression returns the value of the field, which can be an integer, a floating point number, a string, an array, or a map of these types.
For multivalues, we support only the _equals_ operator for comparison. The semantics is that the array returned by the fieldvalue must _contain_ at least one element that matches the other side of the comparison. For maps, there must exist a key matching the comparison.
The simplest use of the fieldpath is to specify a field, but for complex types please refer to [the field path syntax documentation](../schemas/document-field-path.html).
|
| Id |
Syntax: ` id.[scheme|namespace|type|specific|user|group] `
Each document has a Document Id, uniquely identifying that document within a Vespa installation. The id operator returns the string identifier, or if an optional argument is given, a part of the id.
- scheme (id)
- namespace (to separate different users' data)
- type (specified in the id scheme)
- specific (User specified part to distinguish documents within a namespace)
- user (The number specified in document ids using the n= modifier)
- group (The string group specified in document ids using the g= modifier)
|
| null |
The value null can be given to specify nothingness. For instance, a field specification for a document not containing the field will evaluate to null, so the comparison 'music.artist == null' will select all documents that don't have the artist field set. 'id.user == null' will match all documents that don't use the `n=`[document id scheme](../../schemas/documents.html#id-scheme).
Tensor fields can _only_ be compared against null. It's not possible to write a document selection that uses the _contents_ of tensor fields—only their presence can be checked.
|
| Number |
A value can be a number, either an integer or a floating point number. Type of number is insignificant. You don't have to use the same type of number on both sides of a comparison. For instance '3.0 \< 4' will match, and '3.0 == 3' will probably match (operator == is generally not advised for floating point numbers due to rounding issues). Numbers can be written in multiple ways - examples:
```
1234 -234 +53 +534.34 543.34e4 -534E-3 0.2343e-8
```
|
| Strings |
A string value is given quoted with double quotes (i.e. "mystring"). The string is interpreted as an ASCII string. that is, only ASCII values 32 to 126 can be used unescaped, apart from the characters \ and " which also needs to be escaped. Escape common special characters like:
| Character | Escaped character |
| --- | --- |
| Newline | \n |
| Carriage return | \r |
| Tab | \t |
| Form feed | \f |
| " | \" |
| Any other character | \x## (where ## is a two digit hexadecimal number specifying the ASCII value. |
|
### Value arithmetics
You can do arithmetics on values. The common arithmetics operators addition `+`, subtraction `-`, multiplication `*`, division `/` and modulo `%` are supported.
### Functions
Functions are called on something and returns a value that can be used in comparison expressions:
| Value functions | A value function takes a value, does something with it and returns a value which can be of any type.
- _abs()_ Called on a numeric type, returns the absolute value of that numeric type. That is -3 returns 3 and -4.3 returns 4.3.
- _hash()_ Calculates an MD5 hash of whatever value it is called on. The result is a signed 64-bit integer. (Use abs() after if you want to only get positive hash values).
- _lowercase()_ Called on a string value to turn upper case characters into lower case ones. **NOTE:** This only works for the characters 'a' through 'z', no locale support.
|
| Document type functions | Some functions can take a document type instead of a value, and return a value based on the type.
- _version()_ The `version()` function returns the version number of a document type.
|
#### Now function
Document selection provides a _now()_ function, which returns the current date timestamp. Use this to filter documents by age, typically for [garbage collection](../applications/services/content.html#documents).
**Example**: If you have a long field _inserttimestamp_ in your `music` schema, this expression will only match documents from the last two hours:
`music.inserttimestamp > now() - 7200`
## Using imported fields in selections
When using [parent-child](../../schemas/parent-child.html) you can refer to simple imported fields (i.e. top-level primitive fields) in selections as if they were regular fields in the child document type. Complex fields (collections, structures etc.) are not supported.
**Important:** special care needs to be taken when using document selections referencing imported fields, especially if using these are part of garbage collection expressions. If an imported field references a document that cannot be accessed at evaluation time, the imported field behaves as if it had been a regular, non-present field in the child document. In other words, it will return the special `null` value.
See [comparisons with missing fields (null values)](#comparisons-with-missing-fields-null-values)for a more detailed discussion of null-semantics and how to write selections that handle these in a well-defined manner. In particular, read [null behavior with imported fields](#null-behavior-with-imported-fields) if you're writing GC selections.
### Example
The following is an example of a 3-level parent-child hierarchy.
Grandparent schema:
```
schema grandparent {
document grandparent {
field a1 type int {
indexing: attribute | summary
}
}
}
```
Parent schema, with reference to grandparent:
```
schema parent {
document parent {
field a2 type int {
indexing: attribute | summary
}
field ref type reference {
indexing: attribute | summary
}
}
import field ref.a1 as a1 {}
}
```
Child schema, with reference to parent and (transitively) grandparent:
```
schema child {
document child {
field a3 type int {
indexing: attribute | summary
}
field ref type reference {
indexing: attribute | summary
}
}
import field ref.a1 as a1 {}
import field ref.a2 as a2 {}
}
```
Using these in document selection expressions is easy:
Find all child docs whose grandparents have an `a1` greater than 5:
`child.a1 > 5`
Find all child docs whose parents have an `a2` of 10 and grandparents have `a1` of 4:
`child.a1 == 10 and child.a2 == 4`
Find all child docs where the parent document cannot be found (or where the referenced field is not set in the parent):
`child.a2 == null`
Note that when visiting `child` documents we only ever access imported fields via the**child** document type itself.
A much more complete list usage examples for the above document schemas and reference relations can be found in the[imported fields in selections](https://github.com/vespa-engine/system-test/blob/master/tests/search/parent_child/imported_fields_in_selections.rb) system test. This test covers both the visiting and GC cases.
## Constraints
Language identifiers restrict what can be used as document type names. The following values are not valid document type names:_true, false, and, or, not, id, null_
## Grammar - EBNF of the language
To simplify, double casing of strings has not been included. The identifiers "null", "true", "false" etc. can be written in any case, including mixed case.
```
nil = "null" ;
bool = "true" | "false" ;
posdigit = '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' ;
digit = '0' | posdigit ;
hexdigit = digit | 'a' | 'b' | 'c' | 'd' | 'e' | 'f'
| 'A' | 'B' | 'C' | 'D' | 'E' | 'F' ;
integer = ['-' | '+'], posdigit, { digit } ;
float = ['-' | '+'], digit, { digit },
['.' , { digit }, [ ('e' | 'E'), posdigit, { digit }] ] ;
number = float | integer ;
stdchars = ? All ASCII chars except '\\', '"', 0 - 31 and 127 - 255 ? ;
alpha = ? ASCII characters in the range a-z and A-Z ? ;
alphanum = alpha | digit ;
space = ( ' ' | '\t' | '\f' | '\r' | '\n' ) ;
string = '"', { stdchars | ( '\\', ( 't' | 'n' | 'f' | 'r' | '"' ) )
| ( "\\x", hexdigit, hexdigit ) }, '"' ;
doctype = alpha, { alphanum } ;
fieldname = { alphanum '{' |'}' | '[' | ']' '.' } ;
function = alpha, { alphanum } ;
idarg = "scheme" | "namespace" | "type" | "specific" | "user" | "group" ;
searchcolumnarg = integer ;
operator = ">=" | ">" | "==" | "=~" | "=" | "<=" | "<" | "!=" ;
idspec = "id", ['.', idarg] ;
searchcolumnspec = "searchcolumn", ['.', searchcolumnarg] ;
fieldspec = doctype, ( function | ('.', fieldname) ) ;
value = ( valuegroup | nil | number | string | idspec | searchcolumnspec | fieldspec ),
{ function } ;
valuefuncmod = ( valuegroup | value ), '%',
( valuefuncmod | valuegroup | value ) ;
valuefuncmul = ( valuefuncmod | valuegroup | value ), ( '*' | '/' ),
( valuefuncmul | valuefuncmod | valuegroup | value ) ;
valuefuncadd = ( valuefuncmul | valuefuncmod | valuegroup | value ),
( '+' | '-' ),
( valuefuncadd | valuefuncmul | valuefuncmod | valuegroup
| value ) ;
valuegroup = '(', arithmvalue, ')' ;
arithmvalue = ( valuefuncadd | valuefuncmul | valuefuncmod | valuegroup
| value ) ;
comparison = arithmvalue, { space }, operator, { space },
arithmvalue ;
leaf = bool | comparison | fieldspec | doctype ;
not = "not", { space }, ( group | leaf ) ;
and = ( not | group | leaf ), { space }, "and", { space },
( and | not | group | leaf ) ;
or = ( and | not | group | leaf ), { space }, "or", { space },
( or | and | not | group | leaf ) ;
group = '(', { space }, ( or | and | not | group | leaf ),
{ space }, ')' ;
expression = ( or | and | not | group | leaf ) ;
```
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [Examples](#examples)
- [Case sensitiveness](#case-sensitiveness)
- [Branch operators / precedence](#branch-operators-precedence)
- [Primitives](#primitives)
- [Comparison](#comparison)
- [Comparisons with missing fields (null values)](#comparisons-with-missing-fields-null-values)
- [Null behavior with imported fields](#null-behavior-with-imported-fields)
- [Locale / Character sets](#locale-character-sets)
- [Values](#values)
- [Value arithmetics](#value-arithmetics)
- [Functions](#functions)
- [Using imported fields in selections](#using-imported-fields-in-selections)
- [Example](#example)
- [Constraints](#constraints)
- [Grammar - EBNF of the language](#grammar-EBNF-of-the-language)
---
# Source: https://docs.vespa.ai/en/querying/document-summaries.html.md
# Document Summaries
A _document summary_ is the information that is shown for each document in a query result. What information to include is determined by a _document summary class_: A named set of fields with config on which information they should contain.
A special document summary named `default` is always present and used by default. This contains:
- all fields which specifies in their indexing statements that they may be included in summaries
- all fields specified in any document summary
- [sddocname](../reference/querying/default-result-format.html#sddocname)
- [documentid](../reference/querying/default-result-format.html#documentid).
Summary classes are defined in the schema:
```
schema music {
document music {
field artist type string {
indexing: summary | index
}
field album type string {
indexing: summary | index
index: enable-bm25
}
field year type int {
indexing: summary | attribute
}
field category_scores type tensor(cat{}) {
indexing: summary | attribute
}
}document-summary my-short-summary {summary artist {}summary album {}}}
```
See the [schema reference](../reference/schemas/schemas.html#summary) for details.
The summary class to use for a query is determined by the parameter[presentation.summary](../reference/api/query.html#presentation.summary);:
```
$ vespa query "select\*from music where album contains 'head'" \"presentation.summary=my-short-summary"
```
A common reason to define a document summary class is [performance](#performance): By configuring a document summary which only contains attributes the result can be generated without disk accesses. Note that this is needed to ensure only memory is accessed even if all fields are attributes because the [document id](../schemas/documents.html#document-ids) is not stored as an attribute.
Document summaries may also contain [dynamic snippets and highlighted terms](#dynamic-snippets).
The document summary class to use can also be issued programmatically to the `fill()`method from a searcher, and multiple fill operations interleaved with programmatic filtering can be used to optimize data access and transfer when programmatic filtering in a Searcher is used.
## Selecting summary fields in YQL
A [YQL](query-language.html) statement can also be used to filter which fields from a document summary to include in results. Note that this is just a field filter in the container - a summary containing all fields of a summary class is always fetched from content nodes, so to optimize performance it is necessary to create custom summary classes.
```
$ vespa query "selectartist,album,documentid,sddocnamefrom music where album contains 'head'"
```
```
```
{
"root": { },
"children": [
{
"id": "id:mynamespace:music::a-head-full-of-dreams",
"relevance": 0.16343879032006284,
"source": "mycontentcluster",
"fields": {
"sddocname": "music",
"documentid": "id:mynamespace:music::a-head-full-of-dreams",
"artist": "Coldplay",
"album": "A Head Full of Dreams"
}
}
]
}
}
```
```
Use `*` to select all the fields of the chosen document summary class used, (which is `default` by default).
```
$ vespa query "select\*from music where album contains 'head'"
```
```
```
{
"root": { },
"children": [
{
"id": "id:mynamespace:music::a-head-full-of-dreams",
"relevance": 0.16343879032006284,
"source": "mycontentcluster",
"fields": {
"sddocname": "music",
"documentid": "id:mynamespace:music::a-head-full-of-dreams",
"artist": "Coldplay",
"album": "A Head Full of Dreams",
"year": 2015,
"category_scores": {
"type": "tensor(cat{})",
"cells": {
"pop": 1.0,
"rock": 0.20000000298023224,
"jazz": 0.0
}
}
}
}
]
}
}
```
```
## Summary field rename
Summary classes may define fields by names not used in the document type:
```
document-summary rename-summary {
summary artist_name {
source: artist
}
}
```
Refer to the [schema reference](../reference/schemas/schemas.html#source) for adding [attribute](../reference/schemas/schemas.html#add-or-remove-an-existing-document-field-from-document-summary) and[non-attribute](../reference/schemas/schemas.html#add-or-remove-a-new-non-attribute-document-field-from-document-summary) fields - some changes require re-indexing.
## Dynamic snippets
Use [dynamic](../reference/schemas/schemas.html#summary)to generate dynamic snippets from fields based on the query keywords. Example from Vespa Documentation Search - see the[schema](https://github.com/vespa-cloud/vespa-documentation-search/blob/main/src/main/application/schemas/doc.sd):
```
document doc {
field content type string {
indexing: summary | indexsummary : dynamic}
```
A query for _document summary_ returns:
> Use **document summaries** to configure which fields ... indexing: **summary** | index } } **document-summary** titleyear { **summary** title ...
The example above creates a dynamic summary with the matched terms highlighted. The latter is called [bolding](../reference/schemas/schemas.html#bolding)and can be enabled independently of dynamic summaries.
Refer to the [reference](../reference/schemas/schemas.html#summary) for the response format.
### Dynamic snippet configuration
You can configure generation of dynamic snippets by adding an instance of the[vespa.config.search.summary.juniperrc config](https://github.com/vespa-engine/vespa/blob/master/searchsummary/src/vespa/searchsummary/config/juniperrc.def)in services.xml inside the \ cluster tag for the content cluster in question. E.g:
```
...
2
1000
500
300
...
```
Numbers here are in bytes.
## Performance
[Attribute](../content/attributes.html) fields are held in memory. This means summaries are memory-only operations if all fields requested are attributes, and is the optimal way to get high query throughput. The other document fields are stored as blobs in the [document store](../content/proton.html#document-store). Requesting these fields may therefore require a disk access, increasing latency.
**Important:** The default summary class will access the document store as it includes the [documentid](../reference/querying/default-result-format.html#documentid) field which is stored there. For maximum query throughput using memory-only access, use a dedicated summary class with attributes only.
When using additional summary classes to increase performance, only the network data size is changed - the data read from storage is unchanged. Having "debug" fields with summary enabled will hence also affect the amount of information that needs to be read from disk.
See [query execution](query-api.html#query-execution) - breakdown of the summary (a.k.a. result processing, rendering) phase:
- The document summary latency on the content node, tracked by [content\_proton\_search\_protocol\_docsum\_latency\_average](../operations/metrics.html).
- Getting data across from content nodes to containers.
- Deserialization from internal binary formats (potentially) to Java objects if touched in a [Searcher](../applications/searchers.html), and finally serialization to JSON (default rendering) + rendering and network.
The work, and thus latency, increases with more [hits](../reference/api/query.html#hits). Use [query tracing](query-api.html#query-tracing) to analyze performance.
Refer to [content node summary cache](../performance/caches-in-vespa.html#content-node-summary-cache).
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [Selecting summary fields in YQL](#selecting-summary-fields-in-yql)
- [Summary field rename](#summary-field-rename)
- [Dynamic snippets](#dynamic-snippets)
- [Dynamic snippet configuration](#dynamic-snippet-configuration)
- [Performance](#performance)
---
# Source: https://docs.vespa.ai/en/writing/document-v1-api-guide.html.md
# /document/v1 API guide
Use the _/document/v1/_ API to read, write, update and delete documents.
Refer to the [document/v1 API reference](../reference/api/document-v1.html) for API details. [Reads and writes](reads-and-writes.html) has an overview of alternative tools and APIs as well as the flow through the Vespa components when accessing documents. See [getting started](#getting-started) for how to work with the _/document/v1/ API_.
Examples:
| GET |
| Get |
```
$ curl http://localhost:8080/document/v1/my_namespace/music/docid/love-id-here-to-stay
```
|
| Visit | [Visit](visiting.html) all documents with given namespace and document type:
```
$ curl http://localhost:8080/document/v1/namespace/music/docid
```
Visit all documents using continuation:
```
$ curl http://localhost:8080/document/v1/namespace/music/docid?continuation=AAAAEAAAAAAAAAM3AAAAAAAAAzYAAAAAAAEAAAAAAAFAAAAAAABswAAAAAAAAAAA
```
Visit using a _selection_:
```
$ curl http://localhost:8080/document/v1/namespace/music/docid?selection=music.genre=='blues'
```
Visit documents across all _non-global_ document types and namespaces stored in content cluster `mycluster`:
```
$ curl http://localhost:8080/document/v1/?cluster=mycluster
```
Visit documents across all _[global](../reference/applications/services/content.html#document)_ document types and namespaces stored in content cluster `mycluster`:
```
$ curl http://localhost:8080/document/v1/?cluster=mycluster&bucketSpace=global
```
Read about [visiting throughput](#visiting-throughput) below. |
|
| POST |
Post data in the [document JSON format](../reference/schemas/document-json-format.html).
```
$ curl -X POST -H "Content-Type:application/json" --data '
{
"fields": {
"artist": "Coldplay",
"album": "A Head Full of Dreams",
"year": 2015
}
}' \
http://localhost:8080/document/v1/mynamespace/music/docid/a-head-full-of-dreams
```
|
| PUT |
Do a [partial update](partial-updates.html) for a document.
```
$ curl -X PUT -H "Content-Type:application/json" --data '
{
"fields": {
"artist": {
"assign": "Warmplay"
}
}
}' \
http://localhost:8080/document/v1/mynamespace/music/docid/a-head-full-of-dreams
```
|
| DELETE |
Delete a document by ID:
```
$ curl -X DELETE http://localhost:8080/document/v1/mynamespace/music/docid/a-head-full-of-dreams
```
Delete all documents in the `music` schema:
```
$ curl -X DELETE \
"http://localhost:8080/document/v1/mynamespace/music/docid?selection=true&cluster=my_cluster"
```
|
## Conditional writes
A _test-and-set_ [condition](../reference/writing/document-selector-language.html) can be added to Put, Remove and Update operations. Example:
```
$ curl -X PUT -H "Content-Type:application/json" --data '
{
"condition": "music.artist==\"Warmplay\"",
"fields": {
"artist": {
"assign": "Coldplay"
}
}
}' \
http://localhost:8080/document/v1/mynamespace/music/docid/a-head-full-of-dreams
```
**Important:** Use _documenttype.fieldname_ (e.g. music.artist) in the condition, not only _fieldname_.
If the condition is not met, a _412 Precondition Failed_ is returned:
```
```
{
"pathId": "/document/v1/mynamespace/music/docid/a-head-full-of-dreams",
"id": "id:mynamespace:music::a-head-full-of-dreams",
"message": "[UNKNOWN(251013) @ tcp/vespa-container:19112/default]: ReturnCode(TEST_AND_SET_CONDITION_FAILED, Condition did not match document nodeIndex=0 bucket=20000000000000c4 ) "
}
```
```
Also see the [condition reference](../reference/schemas/document-json-format.html#test-and-set).
## Create if nonexistent
### Upserts
Updates to nonexistent documents are supported using [create](../reference/schemas/document-json-format.html#create). This is often called an _upsert_ — insert a document if it does not already exist, or update it if it exists. An empty document is created on the content nodes, before the update is applied. This simplifies client code in the case of multiple writers. Example:
```
$ curl -X PUT -H "Content-Type:application/json" --data '
{
"fields": {
"artist": {
"assign": "Coldplay"
}
}
}' \
http://localhost:8080/document/v1/mynamespace/music/docid/a-head-full-of-thoughts?create=true
```
### Conditional updates and puts with create
Conditional updates and puts can be combined with [create](../reference/schemas/document-json-format.html#create). This has the following semantics:
- If the document already exists, the condition is evaluated against the most recent document version available. The operation is applied if (and only if) the condition matches.
- Otherwise (i.e. the document does not exist or the newest document version is a tombstone), the condition is _ignored_ and the operation is applied as if no condition was provided.
Support for conditional puts with create was added in Vespa 8.178.
```
$ curl -X POST -H "Content-Type:application/json" --data '
{
"fields": {
"artist": {
"assign": "Coldplay"
}
}
}' \
http://localhost:8080/document/v1/mynamespace/music/docid/a-head-full-of-thoughts?create=true&condition=music.title%3D%3D%27best+of%27
```
**Warning:** If all existing replicas of a document are missing when an operation with `"create": true` is executed, a new document will always be created. This happens even if a condition has been given. If the existing replicas become available later, their version of the document will be overwritten by the newest update since it has a higher timestamp.
**Note:** See [document expiry](../schemas/documents.html#document-expiry) for auto-created documents — it is possible to create documents that do not match the selection criterion.
**Note:** Specifying _create_ for a Put operation _without_ a condition has no observable effect, as unconditional Put operations will always write a new version of a document regardless of whether it existed already.
## Data dump
To iterate over documents, use [visiting](visiting.html) — sample output:
```
```
{
"pathId": "/document/v1/namespace/doc/docid",
"documents": [
{
"id": "id:namespace:doc::id-1",
"fields": {
"title": "Document title 1",
}
}
],
"continuation": "AAAAEAAAAAAAAAM3AAAAAAAAAzYAAAAAAAEAAAAAAAFAAAAAAABswAAAAAAAAAAA"
}
```
```
Note the _continuation_ token — use this in the next request for more data. Below is a sample script dumping all data using [jq](https://stedolan.github.io/jq/) for JSON parsing. It splits the corpus in 8 slices by default; using a number of slices at least four times the number of container nodes is recommended for high throughput. Timeout can be set lower for benchmarking. (Each request has a maximum timeout of 60s to ensure progress is saved at regular intervals)
```
```
#!/bin bash
set -eo pipefail
if [$# -gt 2]
then
echo "Usage: $0 [number of slices, default 8] [timeout in seconds, default 31536000 (1 year)]"
exit 1
fi
endpoint="https://my.vespa.endpoint"
cluster="db"
selection="true"
slices="${1:-8}"
timeout="${2:-31516000}"
curlTimeout="$((timeout > 60 ? 60 : timeout))"
url="$endpoint/document/v1/?cluster=$cluster&selection=$selection&stream=true&timeout=$curlTimeout&concurrency=8&slices=$slices"
auth="--key my-key --cert my-cert -H 'Authorization: my-auth'"
curl="curl -sS $auth"
start=$(date '+%s')
doom=$((start + timeout))
## auth can be something like auth='--key data-plane-private-key.pem --cert data-plane-public-cert.pem'
curl="curl -sS $auth"
function visit {
sliceId="$1"
documents=0
continuation=""
while
printf -v filename "data-%03g-%012g.json.gz" $sliceId $documents
json="$(eval "$curl '$url&sliceId=$sliceId$continuation'" | tee >( gzip > $filename ) | jq '{ documentCount, continuation, message }')"
message="$(jq -re .message <<< $json)" && echo "Failed visit for sliceId $sliceId: $message" >&2 && exit 1
documentCount="$(jq -re .documentCount <<< $json)" && ((documents += $documentCount))
["$(date '+%s')" -lt "$doom"] && token="$(jq -re .continuation <<< $json)"
do
echo "$documentCount documents retrieved from slice $sliceId; continuing at $token"
continuation="&continuation=$token"
done
time=$(($(date '+%s') - start))
echo "$documents documents total retrieved in $time seconds ($((documents / time)) docs/s) from slice $sliceId" >&2
}
for ((sliceId = 0; sliceId < slices; sliceId++))
do
visit $sliceId &
done
wait
```
```
### Visiting throughput
Note that visit with selection is a linear scan over all the music documents in the request examples at the start of this guide. Each complete visit thus requires the selection expression to be evaluated for all documents. Running concurrent visits with selections that match disjoint subsets of the document corpus is therefore a poor way of increasing throughput, as work is duplicated across each such visit. Fortunately, the API offers other options for increasing throughput:
- Split the corpus into any number of smaller [slices](../reference/api/document-v1.html#slices), each to be visited by a separate, independent series of HTTP requests. This is by far the most effective setting to change, as it allows visiting through all HTTP containers simultaneously, and from any number of clients—either of which is typically the bottleneck for visits through _/document/v1_. A good value for this setting is at least a handful per container.
- Increase backend [concurrency](../reference/api/document-v1.html#concurrency) so each visit HTTP response is promptly filled with documents. When using this together with slicing (above), take care to also stream the HTTP responses (below), to avoid buffering too much data in the container layer. When a high number of slices is specified, this setting may have no effect.
- [Stream](../reference/api/document-v1.html#stream) the HTTP responses. This lets you receive data earlier, and more of it per request, reducing HTTP overhead. It also minimizes memory usage due to buffering in the container, allowing higher concurrency per container. It is recommended to always use this, but the default is not to, due to backwards compatibility.
## Getting started
Pro-tip: It is easy to generate a `/document/v1` request by using the [Vespa CLI](../clients/vespa-cli.html), with the `-v` option to output a generated `/document/v1` request - example:
```
$ vespa document -v ext/A-Head-Full-of-Dreams.jsoncurl -X POST -H 'Content-Type: application/json'--data-binary @ext/A-Head-Full-of-Dreams.jsonhttp://127.0.0.1:8080/document/v1/mynamespace/music/docid/a-head-full-of-dreamsSuccess: put id:mynamespace:music::a-head-full-of-dreams
```
See the [document JSON format](../reference/schemas/document-json-format.html) for creating JSON payloads.
This is a quick guide into dumping random documents from a cluster to get started:
1. To get documents from a cluster, look up the content cluster name from the configuration, like in the [album-recommendation](https://github.com/vespa-engine/sample-apps/blob/master/album-recommendation/app/services.xml) example: ``.
2. Use the cluster name to start dumping document IDs (skip `jq` for full json):
```
$ curl -s 'http://localhost:8080/document/v1/?cluster=music&wantedDocumentCount=10&timeout=60s' | \
jq -r .documents[].id
```
```
id:mynamespace:music::love-is-here-to-stay
id:mynamespace:music::a-head-full-of-dreams
id:mynamespace:music::hardwired-to-self-destruct
```
`wantedDocumentCount` is useful to let the operation run longer to find documents, to avoid an empty result. This operation is a scan through the corpus, and it is normal to get empty result and the [continuation token](#data-dump).
3. Look up the document with id `id:mynamespace:music::love-is-here-to-stay`:
```
$ curl -s 'http://localhost:8080/document/v1/mynamespace/music/docid/love-is-here-to-stay' | jq .
```
```
```
{
"pathId": "/document/v1/mynamespace/music/docid/love-is-here-to-stay",
"id": "id:mynamespace:music::love-is-here-to-stay",
"fields": {
"artist": "Diana Krall",
"year": 2018,
"category_scores": {
"type": "tensor(cat{})",
"cells": {
"pop": 0.4000000059604645,
"rock": 0,
"jazz": 0.800000011920929
}
},
"album": "Love Is Here To Stay"
}
}
```
```
4. Read more about [document IDs](../schemas/documents.html).
## Troubleshooting
- When troubleshooting documents not found using the query API, use [vespa visit](../clients/vespa-cli.html#documents) to export the documents. Then compare the `id` field with other user-defined `id` fields in the query.
- Document not found responses look like:
- Query results can have results like:
- Delete _all_ documents in _music_ schema, with security credentials:
## Request size limit
Starting from version 8.577.16, Vespa returns 413 (Content too large) as a response to POST and PUT requests that are above the request size limit. To avoid this, automatically check document size and truncate or split large documents before feeding. For optimal performance, it is recommended to keep the document size below 10 MB.
## Backpressure
Vespa returns response code 429 (Too Many Requests) as a backpressure signal whenever client feed throughput exceeds system capacity. Clients should implement retry strategies as described in the [HTTP best practices](../cloud/http-best-practices.html) document.
Instead of implementing your own retry logic, consider using Vespa's feed clients which automatically handle retries and backpressure. See the [feed command](../clients/vespa-cli.html#documents) of the Vespa CLI and the [vespa-feed-client](../clients/vespa-feed-client.html).
The `/document/v1` API includes a configurable operation queue that by default is tuned to balance latency, throughput and memory. Applications can adjust this balance by overriding the parameters defined in the [document-operation-executor](https://github.com/vespa-engine/vespa/blob/master/configdefinitions/src/vespa/document-operation-executor.def) config definition.
To optimize for higher throughput at the cost of increased latency and higher memory usage on the container, increase any of the `maxThrottled` (maximum queue capacity in number of operations), `maxThrottledAge` (maximum time in queue in seconds), and `maxThrottledBytes` (maximum memory usage in bytes) parameters. This allows the container to buffer more operations during temporary spikes in load, reducing the number of 429 responses while increasing request latency. Make sure to increase operation and client timeouts to accommodate for the increased latency.
See the [config definition](https://github.com/vespa-engine/vespa/blob/master/configdefinitions/src/vespa/document-operation-executor.def) for a detailed explanation of each parameter.
Set the values to `0` for the opposite effect, i.e. to optimize for latency. Operations will be dispatched directly, and failed out immediately if the number of pending operations exceeds the dynamic window size of the document processing pipeline.
_Example: overriding the default value of all 3 parameters to `0`._
```
0
0
0
```
The effective operation queue configuration is logged when the container starts up, see below example.
```
INFO container Container.com.yahoo.document.restapi.resource.DocumentV1ApiHandler Operation queue: max-items=256, max-age=3000 ms, max-bytes=100 MB
```
You can observe the state of the operation queue through the metrics `httpapi_queued_operations`, `httpapi_queued_bytes` and `httpapi_queued_age`.
## Using number and group id modifiers
Do not use group or number modifiers with regular indexed mode document types. These are special cases that only work as expected for document types with [mode=streaming or mode=store-only](../reference/applications/services/content.html#document). Examples:
| Get | Get a document in a group:
```
$ curl http://localhost:8080/document/v1/mynamespace/music/number/23/some_key
```
```
$ curl http://localhost:8080/document/v1/mynamespace/music/group/mygroupname/some_key
```
|
| Visit | Visit all documents for a group:
```
$ curl http://localhost:8080/document/v1/namespace/music/number/23/
```
```
$ curl http://localhost:8080/document/v1/namespace/music/group/mygroupname/
```
|
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [Conditional writes](#conditional-writes)
- [Create if nonexistent](#create-if-nonexistent)
- [Upserts](#upserts)
- [Conditional updates and puts with create](#conditional-updates-and-puts-with-create)
- [Data dump](#data-dump)
- [Visiting throughput](#visiting-throughput)
- [Getting started](#getting-started)
- [Troubleshooting](#troubleshooting)
- [Request size limit](#request-size-limit)
- [Backpressure](#backpressure)
- [Using number and group id modifiers](#using-number-and-group-id-modifiers)
---
# Source: https://docs.vespa.ai/en/reference/api/document-v1.html.md
# /document/v1 API reference
This is the /document/v1 API reference documentation. Use this API for synchronous [Document](../../schemas/documents.html) operations to a Vespa endpoint - refer to [reads and writes](../../writing/reads-and-writes.html) for other options.
The [document/v1 API guide](../../writing/document-v1-api-guide.html) has examples and use cases.
**Note:** Mapping from document IDs to /document/v1/ URLs is found in [document IDs](../../schemas/documents.html#id-scheme) - also see [troubleshooting](../../writing/document-v1-api-guide.html#troubleshooting).
Some examples use _number_ and _group_[document id](../../schemas/documents.html#document-ids) modifiers. These are special cases that only work as expected for document types with [mode=streaming or mode=store-only](../applications/services/content.html#document). Do not use group or number modifiers with regular indexed mode document types.
## Configuration
To enable the API, add `document-api` in the serving container cluster - [services.xml](../applications/services/container.html):
```
\
```
## HTTP requests
| HTTP request | document/v1 operation | Description |
| --- | --- | --- |
| GET |
_Get_ a document by ID or _Visit_ a set of documents by selection.
|
| | Get | Get a document:
```
/document/v1///docid/
/document/v1///number//
/document/v1///group//
```
Optional parameters:
- [cluster](#cluster)
- [fieldSet](#fieldset)
- [timeout](#timeout)
- [tracelevel](#tracelevel)
|
| | Visit |
Iterate over and get all documents, or a [selection](#selection) of documents, in chunks, using [continuation](#continuation) tokens to track progress. Visits are a linear scan over the documents in the cluster.
```
/document/v1/
```
It is possible to specify namespace and document type with the visit path:
```
/document/v1///docid
```
Documents can be grouped to limit accesses to a subset. A group is defined by a numeric ID or string — see [id scheme](../../schemas/documents.html#id-scheme).
```
/document/v1///group/
/document/v1///number/
```
Mandatory parameters:
- [cluster](#cluster) - Visits can only retrieve data from _one_ content cluster, so `cluster` **must** be specified for requests at the root `/document/v1/` level, or when there is ambiguity. This is required even if the application has only one content cluster.
Optional parameters:
- [bucketSpace](#bucketspace) - Parent documents are [global](../applications/services/content.html#document) and in the `global` [bucket space](#bucketspace). By default, visit will visit non-global documents in the `default` bucket space, unless document type is indicated, and is a global document type.
- [concurrency](#concurrency) - Use to configure backend parallelism for each visit HTTP request.
- [continuation](#continuation)
- [fieldSet](#fieldset)
- [selection](#selection)
- [sliceId](#sliceid)
- [slices](#slices) - Split visiting of the document corpus across more than one HTTP request—thus allowing the concurrent use of more HTTP containers—use the `slices` and `sliceId` parameters.
- [stream](#stream) - It's recommended enabling streamed HTTP responses, with the [stream](#stream) parameter, as this reduces memory consumption and reduces HTTP overhead.
- [timeout](#timeout)
- [tracelevel](#tracelevel)
- [wantedDocumentCount](#wanteddocumentcount)
- [fromTimestamp](#fromtimestamp)
- [toTimestamp](#totimestamp)
- [includeRemoves](#includeRemoves)
Optional request headers:
- [Accept](#accept) - specify the desired response format.
|
| POST |
_Put_ a given document, by ID, or _Copy_ a set of documents by selection from one content cluster to another.
|
| | Put | Write the document contained in the request body in JSON format.
```
/document/v1///docid/
/document/v1///group/
/document/v1///number/
```
Optional parameters:
- [condition](#condition) - Use for conditional writes.
- [route](#route)
- [timeout](#timeout)
- [tracelevel](#tracelevel)
|
| | Copy |
Write documents visited in source [cluster](#cluster) to the [destinationCluster](#destinationcluster) in the same application. A [selection](#selection) is mandatory — typically the document type. Supported paths (see [visit](#visit) above for semantics):
```
/document/v1/
/document/v1///docid/
/document/v1///group/
/document/v1///number/
```
Mandatory parameters:
- [cluster](#cluster)
- [destinationCluster](#destinationcluster)
- [selection](#selection)
Optional parameters:
- [bucketSpace](#bucketspace)
- [continuation](#continuation)
- [timeChunk](#timechunk)
- [timeout](#timeout)
- [tracelevel](#tracelevel)
|
| PUT |
_Update_ a document with the given partial update, by ID, or _Update where_ the given selection is true.
|
| | Update | Update a document with the partial update contained in the request body in the [document update JSON format](../schemas/document-json-format.html#update).
```
/document/v1///docid/
```
Optional parameters:
- [condition](#condition) - use for conditional writes
- [create](#create) - use to create empty documents when updating non-existent ones.
- [route](#route)
- [timeout](#timeout)
- [tracelevel](#tracelevel)
|
| | Update where |
Update visited documents in [cluster](#cluster) with the partial update contained in the request body in the [document update JSON format](../schemas/document-json-format.html#update). Supported paths (see [visit](#visit) above for semantics):
```
/document/v1///docid/
/document/v1///group/
/document/v1///number/
```
Mandatory parameters:
- [cluster](#cluster)
- [selection](#selection)
Optional parameters:
- [bucketSpace](#bucketspace) - See [visit](#visit), `default` or `global` bucket space
- [continuation](#continuation)
- [stream](#stream)
- [timeChunk](#timechunk)
- [timeout](#timeout)
- [tracelevel](#tracelevel)
|
| DELETE |
_Remove_ a document, by ID, or _Remove where_ the given selection is true.
|
| | Remove | Remove a document.
```
/document/v1///docid/
```
Optional parameters:
- [condition](#condition)
- [route](#route)
- [timeout](#timeout)
- [tracelevel](#tracelevel)
|
| | Delete where |
Delete visited documents from [cluster](#cluster). Supported paths (see [visit](#visit) above for semantics):
```
/document/v1/
/document/v1///docid/
/document/v1///group/
/document/v1///number/
```
Mandatory parameters:
- [cluster](#cluster)
- [selection](#selection)
Optional parameters:
- [bucketSpace](#bucketspace) - See [visit](#visit), `default` or `global` bucket space
- [continuation](#continuation)
- [stream](#stream)
- [timeChunk](#timechunk)
- [timeout](#timeout)
- [tracelevel](#tracelevel)
|
## Request parameters
| Parameter | Type | Description |
| --- | --- | --- |
| bucketSpace | String |
Specify the bucket space to visit. Document types marked as `global` exist in a separate _bucket space_ from non-global document types. When visiting a particular document type, the bucket space is automatically deduced based on the provided type name. When visiting at a root `/document/v1/` level this information is not available, and the non-global ("default") bucket space is visited by default. Specify `global` to visit global documents instead. Supported values: `default` (for non-global documents) and `global`.
|
| cluster | String |
Name of [content cluster](../../content/content-nodes.html) to GET from, or visit.
|
| concurrency | Integer |
Sends the given number of visitors in parallel to the backend, improving throughput at the cost of resource usage. Default is 1. When `stream=true`, concurrency limits the maximum concurrency, which is otherwise unbounded, but controlled by a dynamic throttle policy.
**Important:** Given a concurrency parameter of _N_, the worst case for memory used while processing the request grows linearly with _N_, unless [stream](#stream) mode is turned on. This is because the container currently buffers all response data in memory before sending them to the client, and all sent visitors must complete before the response can be sent.
|
| condition | String |
For test-and-set. Run a document operation conditionally — if the condition fails, a _412 Precondition Failed_ is returned. See [example](../../writing/document-v1-api-guide.html#conditional-writes).
|
| continuation | String |
When visiting, a continuation token is returned as the `"continuation"` field in the JSON response, as long as more documents remain. Use this token as the `continuation` parameter to visit the next chunk of documents. See [example](../../writing/document-v1-api-guide.html#data-dump).
|
| create | Boolean |
If `true`, updates to non-existent documents will create an empty document to update. See [create if nonexistent](../../writing/document-v1-api-guide.html#create-if-nonexistent).
|
| destinationCluster | String |
Name of [content cluster](../../content/content-nodes.html) to copy to, during a copy visit.
|
| dryRun | Boolean |
Used by the [vespa-feed-client](../../clients/vespa-feed-client.html) using `--speed-test` for bandwidth testing, by setting to `true`.
|
| fieldSet | String |
A [field set string](../../schemas/documents.html#fieldsets) with the set of document fields to fetch from the backend. Default is the special `[document]` fieldset, returning all _document_ fields. To fetch specific fields, use the name of the document type, followed by a comma-separated list of fields (for example `music:artist,song` to fetch two fields declared in `music.sd`).
|
| route | String |
The route for single document operations, and for operations generated by [copy](#copy), [update](#update-where) or [deletion](#delete-where) visits. Default value is `default`. See [routes](../../writing/document-routing.html).
|
| selection | String |
Select only a subset of documents when [visiting](../../writing/visiting.html) — details in [document selector language](../writing/document-selector-language.html).
|
| sliceId | Integer |
The slice number of the visit represented by this HTTP request. This number must be non-negative and less than the number of [slices](#slices) specified for the visit - e.g., if the number of slices is 10, `sliceId` is in the range [0-9].
**Note:** If the number of distribution bits change during a sliced visit, the results are undefined. Thankfully, this is a very rare occurrence and is only triggered when adding content nodes.
|
| slices | Integer |
Split the document corpus into this number of independent slices. This lets multiple, concurrent series of HTTP requests advance the same logical visit independently, by specifying a different [sliceId](#sliceid) for each.
|
| stream | Boolean |
Whether to stream the HTTP response, allowing data to flow as soon as documents arrive from the backend. This obsoletes the [wantedDocumentCount](#wanteddocumentcount) parameter. The HTTP status code will always be 200 if the visit is successfully initiated. Default value is false.
|
| format.tensors | String |
Controls how tensors are rendered in the result.
| Value | Description |
| --- | --- |
| `short` | **Default**. Render the tensor value in an object having two keys, "type" containing the value, and "cells"/"blocks"/"values" ([depending on the type](../schemas/document-json-format.html#tensor)) containing the tensor content.
Render the tensor content in the [type-appropriate short form](../schemas/document-json-format.html#tensor). |
| `long` | Render the tensor value in an object having two keys, "type" containing the value, and "cells" containing the tensor content.
Render the tensor content in the [general verbose form](../schemas/document-json-format.html#tensor). |
| `short-value` | Render the tensor content directly.
Render the tensor content in the [type-appropriate short form](../schemas/document-json-format.html#tensor). |
| `long-value` | Render the tensor content directly.
Render the tensor content in the [general verbose form](../schemas/document-json-format.html#tensor). |
|
| timeChunk | String |
Target time to spend on one chunk of a copy, update or remove visit; with optional ks, s, ms or µs unit. Default value is 60.
|
| timeout | String |
Request timeout in seconds, or with optional ks, s, ms or µs unit. Default value is 180s.
|
| tracelevel | Integer |
Number in the range [0,9], where higher gives more details. The trace dumps which nodes and chains the document operation has touched. See [routes](../../writing/document-routing.html).
|
| wantedDocumentCount | Integer |
Best effort attempt to not respond to the client before `wantedDocumentCount` number of documents have been visited. Response may still contain fewer documents if there are not enough matching documents left to visit in the cluster, or if the visiting times out. This parameter is intended for the case when you have relatively few documents in your cluster and where each visit request would otherwise process only a handful of documents.
The maximum value of `wantedDocumentCount` is bounded by an implementation-specific limit to prevent excessive resource usage. If the cluster has many documents (on the order of tens of millions), there is no need to set this value.
|
| fromTimestamp | Integer |
Filters the returned document set to only include documents that were last modified at a time point equal to or higher to the specified value, in microseconds from UTC epoch. Default value is 0 (include all documents).
|
| toTimestamp | Integer |
Filters the returned document set to only include documents that were last modified at a time point lower than the specified value, in microseconds from UTC epoch. Default value is 0 (sentinel value; include all documents). If non-zero, must be greater than, or equal to, `fromTimestamp`.
|
| includeRemoves | Boolean |
Include recently removed document IDs, along with the set of returned documents. By default, only documents currently present in the corpus are returned in the `"documents"` array of the response; when this parameter is set to `"true"`, documents that were recently removed, and whose tombstones still exist, are also included in that array, as entries on the form `{ "remove": "id:ns:type::foobar" }`. See [here](/en/operations/self-managed/admin-procedures.html#data-retention-vs-size) for specifics on tombstones, including their lifetime.
|
## HTTP request headers
| Header | Values | Description |
| --- | --- | --- |
| Accept | `application/json` or `application/jsonl` |
The [Accept](https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Accept) header lets the client specify to the server what [media (MIME) types](https://developer.mozilla.org/en-US/docs/Web/HTTP/Guides/MIME_types) it accepts as the response format.
All Document V1 API calls support `application/json` for returning [JSON](#json) responses. [Streaming visiting](#stream) additionally supports `application/jsonl` for returning [JSON Lines](#json-lines) (JSONL) since Vespa 8.593.
To ensure compatibility with older versions, make sure to check the `Content-Type`[HTTP response header](#http-response-headers). A JSONL response will always have a `Content-Type` media type of `application/jsonl`, and JSON wil always have a media type of `application/json`.
Multiple acceptable types can be specified. JSONL will be returned if (and only if) `application/jsonl` is part of the list _and_ no other media types have a higher [quality value](https://httpwg.org/specs/rfc9110.html#quality.values).
Example:
```
Accept: application/jsonl
```
If the client accepts both JSON and JSONL, the server will respond with JSONL:
```
Accept: application/json, application/jsonl
```
For backwards compatibility, if no `Accept` header is provided (or if no provided media types are acceptable) `application/json` is assumed.
|
## Request body
POST and PUT requests must include a body for single document operations; PUT must also include a body for [update where](#update-where) visits. A field has a _value_ for a POST and an _update operation object_ for PUT. Documents and operations use the [document JSON format](../schemas/document-json-format.html). The document fields must match the [schema](../../basics/schemas.html):
```
```
{
"fields": {
"": ""
}
}
```
```
```
```
{
"fields": {
"": {
"" : ""
}
}
}
```
```
The _update-operation_ is most often `assign` - see [update operations](../schemas/document-json-format.html#update-operations) for the full list. Values for `id` / `put` / `update` in the request body are silently dropped. The ID is generated from the request path, regardless of request body data - example:
```
```
{
"put" : "id:mynamespace:music::123",
"fields": {
"title": "Best of"
}
}
```
```
This makes it easier to generate a feed file that can be used for both the [vespa-feed-client](../../clients/vespa-feed-client.html) and this API.
## HTTP status codes
| Code | Description |
| --- | --- |
| 200 | OK. Attempts to remove or update a non-existent document also yield this status code (see 412 below). |
| 204 | No Content. Successful response to OPTIONS request. |
| 400 | Bad request. Returned for undefined document types + other request errors. See [13465](https://github.com/vespa-engine/vespa/issues/13465) for defined document types not assigned to a content cluster when using PUT. Inspect `message` for details. |
| 404 | Not found; the document was not found. This is only used when getting documents. |
| 405 | Method Not Allowed. HTTP method is not supported by the endpoint. Valid combinations are listed [above](#http-requests) |
| 412 | [condition](#condition) is not met. Inspect `message` for details. This is also the result when a condition if specified, but the document does not exist. |
| 413 | Content too large; used for POST and PUT requests that are above the [request size limit](../../writing/document-v1-api-guide.html#request-size-limit). |
| 429 | Too many requests; the document API has too many inflight feed operations, retry later. |
| 500 | Server error; an unspecified error occurred when processing the request/response. |
| 503 | Service unavailable; the document API was unable to produce a response at this time. |
| 504 | Gateway timeout; the document API failed to respond within the given (or default 180s) timeout. |
| 507 | Insufficient storage; the content cluster is out of memory or disk space. |
## HTTP response headers
| Header | Values | Description |
| --- | --- | --- |
| X-Vespa-Ignored-Fields | true |
Will be present and set to 'true' only when a put or update contains one or more fields which were [ignored since they are not present in the document type](../applications/services/container.html#ignore-undefined-fields). Such operations will be applied exactly as if they did not contain the field operations referencing non-existing fields. References to non-existing fields in field _paths_ are not detected.
|
| Content-Type | `application/json` or `application/jsonl` |
The [media type](https://developer.mozilla.org/en-US/docs/Web/HTTP/Guides/MIME_types) (MIME type) of the response body.
Either `application/json` for [JSON](#json) responses or `application/jsonl` for [JSON Lines](#json-lines) (JSONL) responses.
The content type may include additional parameters such as `charset`.
Example header:
```
Content-Type: application/json; charset=UTF-8
```
|
## Response formats
Responses are by default in JSON format. [Streaming visiting](#stream)supports an optional [JSON Lines](#json-lines) (JSONL) response format since Vespa 8.593.
### JSON
JSON responses have the following fields:
| Field | Description |
| --- | --- |
| pathId | Request URL path — always included. |
| message | An error message — included for all failed requests. |
| id | Document ID — always included for single document operations, including _Get_. |
| fields | The requested document fields — included for successful _Get_ operations. |
| documents[] | Array of documents in a visit result — each document has the _id_ and _fields_. |
| documentCount | Number of visited and selected documents. If [includeRemoves](#includeRemoves) is `true`, this also includes the number of returned removes (tombstones). |
| continuation | Token to be used to get the next chunk of the corpus - see [continuation](#continuation). |
GET can include a `fields` object if a document was found in a _GET_ request
```
```
{
"pathId": "",
"id": "",
"fields": {
}
}
```
```
A GET _visit_ result can include an array of `documents`plus a [continuation](#continuation):
```
```
{
"pathId": "",
"documents": [
{
"id": "",
"fields": {
}
}
],
"continuation": "",
"documentCount": 123
}
```
```
A continuation indicates the client should make further requests to get more data, while lack of a continuation indicates an error occurred, and that visiting should cease, or that there are no more documents.
A `message` can be returned for failed operations:
```
```
{
"pathId": "",
"message": ""
}
```
```
### JSON Lines
A JSON Lines (JSONL) response is a stream of newline-separated JSON objects. Each line contains exactly one JSON object, and each JSON object takes up exactly one line. No line breaks are allowed within an object.
JSONL is an optional response format for [streaming visiting](#stream), enabling efficient client-side parsing and fine-grained, continuous tracking of visitor progress. The JSONL response format is currently not supported for any other operations than streaming visiting.
The JSONL response format is enabled by providing a HTTP [Accept](#accept) request header that specifies `application/jsonl` as the preferred response type, and will have a [Content-Type](#content-type) of `application/jsonl` if the server is on a version that supports JSONL visiting. Clients must check the `Content-Type` header to ensure they are getting the format they expect.
JSONL support requires Vespa 8.593 or newer.
Example response body:
```
```
{"put":"id:ns:music::one","fields":{"foo":"bar"}}
{"put":"id:ns:music::two","fields":{"foo":"baz"}}
{"continuation":{"token":"...","percentFinished":40.0}}
{"put":"id:ns:music::three","fields":{"foo":"zoid"}}
{"remove":"id:ns:music::four"}
{"continuation":{"token":"...","percentFinished":50.0}}
{"continuation":{"token":"...","percentFinished":60.0}}
{"put":"id:ns:music::five","fields":{"foo":"berg"}}
{"continuation":{"token":"...","percentFinished":70.0}}
{"sessionStats":{"documentCount":5}}
{"continuation":{"percentFinished":100.0}}
```
```
Note that the `"..."` values are placeholders for (from a client's perspective) opaque string values.
#### JSONL response objects
**Note:** To be forwards compatible with future extensions to the response format, ignore unknown objects and fields.
| Object | Description |
| --- | --- |
| put | A document [Put](../schemas/document-json-format.html#put) operation in the same format as that accepted by Vespa's JSONL feed API. |
| remove | A document [Remove](../schemas/document-json-format.html#remove) operation in the same format as that accepted by Vespa's JSONL feed API. Only present if [includeRemoves](#includeRemoves) is `true`. |
| continuation |
A visitor [continuation](#continuation).
Possible sub-object fields:
| Field name | Description |
| --- | --- |
| `token` |
An opaque string value representing the current visitor progress through the data space. This value can be provided as part of a subsequent visitor request to continue visiting from where the last request left off. Clients should not attempt to parse the contents of this string, as it's considered an internal implementation detail and may be changed (in a backwards compatible way) without any prior announcement.
|
| `percentFinished` | A floating point number between 0 and 100 (inclusive) that gives an approximation of how far the visitor has progressed through the data space. |
The last line of a successful request should always be a `continuation` object.
If (and only if) visiting has completed, the last `continuation` object will have a `percentFinished` value of `100` and will _not_ have a `token` field.
|
| message |
A message received from the backend visitor session. Can be used by clients to report problems encountered during visiting.
Possible sub-object fields:
| Field name | Description |
| --- | --- |
| `text` | The actual message, in unstructured text |
| `severity` | The severity of the message. One of `info`, `warning` or `error`. |
|
| sessionStats |
Statistics from the backend visitor session.
Possible sub-object fields:
| Field name | Description |
| --- | --- |
| `documentCount` | The number of visited and selected documents. If [includeRemoves](#includeRemoves) is `true`, this also includes the number of returned removes (tombstones). |
|
Note that it's possible for a successful response to contain zero `put` or `remove` objects if the [selection](#selection) did not match any documents.
#### Differences from the JSON format
The biggest difference in semantics between the JSON and JSONL response formats is when, and how, [continuation](#continuation) objects are returned.
In the JSON format a continuation is included _once_ at the very end of the response object and covers the progress made by the entire request. If the request somehow fails after receiving 99% of all documents but prior to receiving the continuation field, the client must retry the entire request from the previously known continuation value. This can result in getting many requested documents twice; once from the incomplete first request and once more from the second request that covers the same part of the data space.
In the JSON Lines format, a contination object is emitted to the stream _every time_ a backend data [bucket](../../content/buckets.html) has been fully visited, as well as at the end of the response stream. This may happen many times in a response. Each continuation object _subsumes_ the progress of previously emitted continuations, meaning that a client only needs to remember the _most recent_ continuation value it observed in the response. If the request fails prior to completion, the client can specify the most recent continuation in the next request; it will then only receive duplicates for the data buckets that were actively being processed when the request failed.
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [Configuration](#configuration)
- [HTTP requests](#http-requests)
- [Request parameters](#request-parameters)
- [HTTP request headers](#http-request-headers)
- [Request body](#request-body)
- [HTTP status codes](#http-status-codes)
- [HTTP response headers](#http-response-headers)
- [Response formats](#response-formats)
- [JSON](#json)
- [JSON Lines](#json-lines)
---
# Source: https://docs.vespa.ai/en/schemas/documents.html.md
# Documents
Vespa models data as _documents_. A document has a string identifier, set by the application, unique across all documents. A document is a set of [key-value pairs](../writing/document-api-guide.html). A document has a schema (i.e. type), defined in the [schema](../basics/schemas.html).
When configuring clusters, a [documents](../reference/applications/services/content.html#documents) element sets what document types a cluster is to store. This configuration is used to configure the garbage collector if it is enabled. Additionally, it is used to define default routes for documents sent into the application. By default, a document will be sent to all clusters having the document type defined. Refer to [routing](../writing/document-routing.html) for details.
Vespa uses the document ID to distribute documents to nodes. From the document identifier, the content layer calculates a numeric location. A bucket contains all the documents, where a given amount of least-significant bits of the location are all equal. This property is used to enable co-localized storage of documents - read more in [buckets](../content/buckets.html) and [content cluster elasticity](../content/elasticity.html).
Documents can be [global](../reference/applications/services/content.html#document), see [parent/child](parent-child.html).
## Document IDs
The document identifiers are URIs, represented by a string, which must conform to a defined URI scheme for document identifiers. The document identifier string may only contain _text characters_, as defined by `isTextCharacter` in [com.yahoo.text.Text](https://github.com/vespa-engine/vespa/blob/master/vespajlib/src/main/java/com/yahoo/text/Text.java).
### id scheme
Vespa currently has only one defined scheme, the _id scheme_: `id::::`
**Note:** An example mapping from ID to the URL used in [/document/v1/](../writing/document-v1-api-guide.html) is from`id:mynamespace:mydoctype::user-defined-id` to`/document/v1/mynamespace/mydoctype/docid/user-defined-id`. Find examples and tools in [troubleshooting](../writing/document-v1-api-guide.html#document-not-found).
Find examples in the [/document/v1/](../writing/document-v1-api-guide.html) guide.
| Part | Required | Description |
| --- | --- | --- |
| namespace | Yes | Not used by Vespa, see [below](#namespace). |
| document-type | Yes | Document type as defined in [services.xml](../reference/applications/services/content.html#document) and the [schema](../reference/schemas/schemas.html). |
| key/value-pair | Optional | Modifiers to the id scheme, used to configure document distribution to [buckets](../content/buckets.html#document-to-bucket-distribution). With no modifiers, the id scheme distributes all documents uniformly. The key/value-pair field contains one of two possible key/value pairs; **n** and **g** are mutually exclusive:
| n=_\_ | Number in the range [0,2^63-1] - only for testing of abnormal bucket distributions |
| g=_\_ | The _groupname_ string is hashed and used to select the storage location |
**Important:** This is only useful for document types with [mode=streaming or mode=store-only](../reference/applications/services/content.html#document). Do not use modifiers for regular indexed document types.
See [streaming search](../performance/streaming-search.html). Using modifiers for regular indexed document will cause unpredictable feeding performance, in addition, search dispatch does not have support to limit the search to modifiers/buckets. |
| user-specified | Yes | A unique ID string. |
### Document IDs in search results
The full Document ID (as a string) will often contain redundant information and be quite long; a typical value may look like "id:mynamespace:mydoctype::user-specified-identifier" where only the last part is useful outside Vespa. The Document ID is therefore not stored in memory, and it **not always present** in [search results](../reference/querying/default-result-format.html#id). It is therefore recommended to put your own unique identifier (usually the "user-specified-identifier" above) in a document field, typically named "myid" or "shortid" or similar:
```
field shortid type string {
indexing: attribute | summary
}
```
This enables using a [document-summary](../querying/document-summaries.html) with only in-memory fields while still getting the identifier you actually care about. If the "user-specified-identifier" is just a simple number you could even use "type int" for this field for minimal memory overhead.
The Document ID is stored on disk in the document summary. To return this value in search results, configure the schema like this:
```
schema music {
document music {
field ...
}
document-summary empty-summary {
summary documentid {
source: documentid
}
from-disk
}
...
```
... and use `presentation.summary=empty-summary` in the query API. The `from-disk` setting mutes a warning for document summary disk access; Use a higher query timeout when requesting many IDs like this.
### Namespace
The namespace in document ids is useful when you have multiple document collections that you want to be sure never end up with the same document id. It has no function in Vespa beyond this, and can just be set to any short constant value like for example "doc". Consider also letting synthetic documents used for testing use namespace "test" so it's easy to detect and remove them if they are present outside the test by mistake.
Example - if feeding
- document A by `curl -X POST https:.../document/v1/first_namespace/my_doc_type/docid/shakespeare`
- document B by `curl -X POST https:.../document/v1/second_namespace/my_doc_type/docid/shakespeare`
then those will be separate documents, both searchable, with different document IDs. The document ID differs not in the user specified part (this is `shakespeare` for both documents), but in the namespace part (`first_namespace` vs `second_namespace`). The full document ID for document A is `id:first_namespace:my_doc_type::shakespeare`.
The namespace has no relation to other configuration elsewhere, like in _services.xml_ or in schemas. It is just like the user specified part of each document ID in that sense. Namespace can not be used in queries, other than as part of the full document ID. However, it can be used for [document selection](../reference/writing/document-selector-language.html), where `id.namespace` can be accessed and compared to a given string, for instance. An example use case is [visiting](../writing/visiting.html) a subset of documents.
## Fields
Documents can have fields, see the [schema reference](../reference/schemas/schemas.html#field).
A field can not be defined with a default value. Use a [choice ('||') indexing statement or a](../writing/indexing.html#choice-example)[document processor](../applications/document-processors.html) to assign a default to document put/update operations.
## Fieldsets
Use _fieldset_ to limit the fields that are returned from a read operation, like _get_ or _visit_ - see [examples](../clients/vespa-cli.html#documents). Vespa may return more fields than specified if this does not impact performance.
**Note:** Document field sets is a different thing than[searchable fieldsets](../reference/schemas/schemas.html#fieldset).
There are two options for specifying a fieldset:
- Built-in fieldset
- Name of a document type, then a colon ":", followed by a comma-separated list of fields (for example `music:artist,song` to fetch two fields declared in `music.sd`)
Built-in fieldsets:
| Fieldset | Description |
| --- | --- |
| [all] | Returns all fields in the schema (generated fields included) and the document ID. |
| [document] | Returns original fields in the document, including the document ID. |
| [none] | Returns no fields at all, not even the document ID. _Internal, do not use_ |
| [id] | Returns only the document ID |
| \:[document] |
**Deprecated:** Use `[document]`
Same as `[document]` fieldset above: Returns only the original document fields (generated fields not included) together with the document ID. |
If a built-in field set is not used, a list of fields can be specified. Syntax:
```
:field1,field2,…
```
Example:
```
music:title,artist
```
## Document expiry
To auto-expire documents, use a [selection](../reference/applications/services/content.html#documents.selection) with [now](../reference/writing/indexing-language.html#now). Example, set time-to-live (TTL) for _music_-documents to one day, using a field called _timestamp_:
```
```
```
```
Note: The `selection` expression says which documents to _keep_, not which ones to delete. The _timestamp_ field must have a value in seconds since EPOCH:
```
field timestamp type long {
indexing: attribute
attribute {
fast-access
}
}
```
When `garbage-collection="true"`, Vespa iterates over the document space to purge expired documents. Vespa will invoke the configured GC selection for each stored document once every [garbage-collection-interval](../reference/applications/services/content.html#documents.selection) seconds. It is unspecified when a particular document will be processed within the configured interval.
**Important:** This is a best-effort garbage collection feature to conserve CPU and space. Use query filters if it is important to exclude documents based on a criterion.
- Using a _selection_ with _now_ can have side effects when re-feeding or re-processing documents, as timestamps can be stale. A common problem is feeding with too old timestamps, resulting in no documents being indexed.
- Normally, documents that are already expired at write time are not persisted. When using [create](../writing/document-v1-api-guide.html#create-if-nonexistent) (Create if nonexistent), it is possible to create documents that are expired and will be removed in next cycle.
- Deploying a configuration where the selection string selects no documents will cause all documents to be garbage collected. Use [visit](../writing/visiting.html) to test the selection string. Garbage collected documents are not to be expected to be recoverable.
- The fields that are referenced in the selection expression should be attributes. Also, either the fields should be set with _"fast-access"_ or the number of [searchable copies](../reference/applications/services/content.html#searchable-copies) in the content cluster should be the same as the [redundancy](../reference/applications/services/content.html#redundancy). Otherwise, the document selection maintenance will be slow and have a major performance impact on the system.
- [Imported fields](../reference/schemas/schemas.html#import-field) can be used in the selection string to expire documents, but special care needs to be taken when using these. See [using imported fields in selections](../reference/writing/document-selector-language.html#using-imported-fields-in-selections) for more information and restrictions.
- Document garbage collection is a low priority background operation that runs continuously unless preempted by higher priority operations. If the cluster is too heavily loaded by client feed operations, there's a risk of starving GC from running. To verify that garbage collection is not starved, check the [vds.idealstate.max\_observed\_time\_since\_last\_gc\_sec.average](../operations/metrics.html) distributor metric. If it significantly exceeds `garbage-collection-interval` it is an indication that GC is starved.
To batch remove, set a selection that matches no documents, like _"not music"_
Use [vespa visit](../writing/visiting.html) to test the selection. Dump the IDs of all documents that would be _preserved_:
```
```
$ vespa visit --selection 'music.timestamp > now() - 86400' --field-set "music.timestamp"
```
```
Negate the expression by wrapping it in a `not` to dump the IDs of all the documents that would be _removed_ during GC:
```
```
$ vespa visit --selection 'not (music.timestamp > now() - 86400)' --field-set "music.timestamp"
```
```
## Processing documents
To process documents, use [Document processing](../applications/document-processors.html). Examples are enriching documents (look up data from other sources), transform content (like linguistic transformations, tokenization), filter data and trigger other events based on the input data.
See the sample app [album-recommendation-docproc](https://github.com/vespa-engine/sample-apps/tree/master/examples/document-processing) for use of Vespa APIs like:
- [Document API](../writing/document-api-guide.html) - work on documents and fields in documents, and create unit tests using the Application framework
- [Document Processing](../applications/document-processors.html) - chain independent processors with ordering constraints
The sample app [vespa-documentation-search](https://github.com/vespa-cloud/vespa-documentation-search) has examples of processing PUTs or UPDATEs (using [create-if-nonexistent](../writing/document-v1-api-guide.html#create-if-nonexistent)) of documents in [OutLinksDocumentProcessor](https://github.com/vespa-cloud/vespa-documentation-search/blob/main/src/main/java/ai/vespa/cloud/docsearch/OutLinksDocumentProcessor.java). It is also in introduction to using [multivalued fields](../searching-multi-valued-fields) like arrays, maps and tensors. Use the [VespaDocSystemTest](https://github.com/vespa-cloud/vespa-documentation-search/blob/main/src/test/java/ai/vespa/cloud/docsearch/VespaDocSystemTest.java) to build code that feeds and tests an instance in the Vespa Developer Cloud / local Docker instance.
Both sample apps also use the Document API to GET/PUT/UPDATE other documents as part of processing, using asynchronous [DocumentAccess](https://github.com/vespa-engine/vespa/blob/master/documentapi/src/main/java/com/yahoo/documentapi/DocumentAccess.java). Use this as a starting point for applications that enrich data when writing.
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [Document IDs](#document-ids)
- [id scheme](#id-scheme)
- [Document IDs in search results](#docid-in-results)
- [Namespace](#namespace)
- [Fields](#fields)
- [Fieldsets](#fieldsets)
- [Document expiry](#document-expiry)
- [Processing documents](#processing-documents)
---
# Source: https://docs.vespa.ai/en/learn/tutorials/e-commerce.html.md
# Use Case - shopping
The [e-commerce, or shopping, use case](https://github.com/vespa-engine/sample-apps/tree/master/use-case-shopping)is an example of an e-commerce site complete with sample data and a web front end to browse product data and reviews. To quick start the application, follow the instructions in the[README](https://github.com/vespa-engine/sample-apps/blob/master/use-case-shopping/README.md)in the sample app.

To browse the application, navigate to[localhost:8080/site](http://localhost:8080/site). This site is implemented through a custom [request handler](../../applications/request-handlers.html)and is meant to be a simple example of creating a front end / middleware that sits in front of the Vespa back end. As such it is fairly independent of Vespa features, and the code is designed to be fairly easy to follow and as non-magical as possible. All the queries against Vespa are sent as HTTP requests, and the JSON results from Vespa are parsed and rendered.
This sample application is built around the Amazon product data set found at[https://cseweb.ucsd.edu/~jmcauley/datasets.html](https://cseweb.ucsd.edu/~jmcauley/datasets.html). A small sample of this data is included in the sample application, and full data sets are available from the above site. This sample application contains scripts to convert from the data set format to Vespa format:[convert\_meta.py](https://github.com/vespa-engine/sample-apps/blob/master/use-case-shopping/convert_meta.py) and[convert\_reviews.py](https://github.com/vespa-engine/sample-apps/blob/master/use-case-shopping/convert_reviews.py). See [README](https://github.com/vespa-engine/sample-apps/tree/master/use-case-shopping#readme) for example use.
When feeding reviews, there is a custom [document processor](../../applications/document-processors.html)that intercepts document writes and updates the parent item with the review rating, so the aggregated review rating is kept stored with the item - see [ReviewProcessor](https://github.com/vespa-engine/sample-apps/blob/master/use-case-shopping/src/main/java/ai/vespa/example/shopping/ReviewProcessor.java). This is more an example of a custom document processor than a recommended way to do this, as feeding the reviews more than once will result in inflated values. To do this correctly, one should probably calculate this offline so a re-feed does not cause unexpected results.
### Highlighted features
- [Multiple document types](../../basics/schemas.html)
- [Custom document processor](../../applications/document-processors.html)
- [Custom searcher processor](../../applications/searchers.html)
- [Custom handlers](../../applications/request-handlers.html)
- [Custom configuration](../../applications/configuring-components.html)
- [Partial update](../../reference/schemas/document-json-format.html#update)
- [Search using YQL](../../querying/query-language.html)
- [Grouping](../../querying/grouping.html)
- [Rank profiles](../../basics/ranking.html)
- [Native embedders](../../rag/embedding.html)
- [Vector search](../../querying/nearest-neighbor-search)
- [Ranking functions](../../reference/schemas/schemas.html#function-rank)
Copyright © 2026 - [Cookie Preferences](#)
---
# Source: https://docs.vespa.ai/en/content/elasticity.html.md
# Content cluster elasticity
Vespa clusters can be grown and shrunk while serving queries and writes. Documents in content clusters are automatically redistributed on changes to maintain an even distribution with minimal data movement. To resize, just change the [nodes](../reference/applications/services/services.html#nodes) and redeploy the application - no restarts needed.

Documents are managed by Vespa in chunks called [buckets](#buckets). The size and number of buckets are completely managed by Vespa and there is never any need to manually control sharding.
The elasticity mechanism is also used to recover from a node loss: New replicas of documents are created automatically on other nodes to maintain the configured redundancy. Failed nodes is therefore not a problem that requires immediate attention - clusters will self-heal from node failures as long as there are sufficient resources.

When you want to remove nodes from a content cluster, you can have the system migrate data off them in an orderly fashion prior to removal. This is done by marking nodes as _retired_. This is useful to remove nodes that should be retired, but also to migrate a cluster to entirely new nodes while online: Add the new nodes, mark the old nodes retired, wait for the data to be redistributed and remove the old nodes.
The auto-elasticity is configured for a normal fail-safe operation, but there are tradeoffs like recovery speed and resource usage. Learn more in [procedures](../operations/self-managed/admin-procedures.html#content-cluster-configuration).
## Adding nodes
To add or remove nodes from a content cluster, just `nodes` tag of the [content](../reference/applications/services/content.html) cluster in [services.xml](../reference/applications/services/services.html) and [redeploy](../basics/applications.html#deploying-applications). Read more in [procedures](../operations/self-managed/admin-procedures.html).
When adding a new node, a new _ideal state_ is calculated for all buckets. The buckets mapped to the new node are moved, the superfluous are removed. See redistribution example - add a new node to the system, with redundancy n=2:

The distribution algorithm generates a random node sequence for each bucket. In this example with n=2, replicas map to the two nodes sorted first. The illustration shows how placement onto two nodes changes as a third node is added. The new node takes over as primary for the buckets where it got sorted first, and as secondary for the buckets where it got sorted second. This ensures minimal data movement when nodes come and go, and allows capacity to be changed easily.
No buckets are moved between the existing nodes when a new node is added. Based on the pseudo-random sequences, some buckets change from primary to secondary, or are removed. Multiple nodes can be added in the same deployment.
## Removing nodes
Whether a node fails or is _retired_, the same redistribution happens. If the node is retired, replicas are generated on the other nodes and the node stays up, but with no active replicas. Example of redistribution after node failure, n=2:

Here, node 2 fails. This node held the active replicas of bucket 2 and 6. Once the node fails the secondary replicas are set active. If they were already in a _ready_ state, they start serving queries immediately, otherwise they will index replicas, see [searchable-copies](../reference/applications/services/content.html#searchable-copies). All buckets that no longer have secondary replicas are merged to the remaining nodes according to the ideal state.
## Grouped distribution
Nodes in content clusters can be placed in [groups](../reference/applications/services/content.html#group). A group of nodes in a content cluster will have one or more complete replicas of the entire document corpus.

This is useful in the cases listed below:
| Cluster upgrade | With multiple groups it becomes safe to take out a full group for upgrade instead of just one node at a time. [Read more](../operations/self-managed/live-upgrade.html). |
| Query throughput | Applications with high query rates and/or high static query cost can use groups to scale to higher query rates since Vespa will automatically send a query to just a single group. [Read more](../performance/sizing-search.html). |
| Topology | By using groups you can control replica placement over network switches or racks to ensure there is redundancy at the switch and rack level. |
Tuning group sizes and node resources enables applications to easily find the latency/cost sweet spot, the elasticity operations are automatic and queries and writes work as usual with no downtime.
## Changing topology
A Vespa elasticity feature is the ability to change topology (i.e. grouped distribution) without service disruption. This is a live change, and will auto-redistribute documents to the new topology.
Also read [topology change](../operations/self-managed/admin-procedures.html#topology-change) if running Vespa self-hosted - the below steps are general for all hosting options.
### Replicas
When changing topology, pay attention to the [min-redundancy](../reference/applications/services/content.html#min-redundancy) setting - this setting configures a _minimum_ number of replicas in a cluster, the _actual_ number is topology dependent - example:
A flat cluster with min-redundancy n=2 and 15 nodes is changed into a grouped cluster with 3 groups with 5 nodes each (total node count and n is kept unchanged). In this case, the actual redundancy will be 3 after the change, as each of the 3 groups will have at least 1 replica for full query coverage. The practical consequence is that disk and memory requirements per node _increases_ due to the change to topology. It is therefore important to calculate the actual replica count before reconfiguring topology.
### Query coverage
Changing topology might cause query coverage loss in the transition, unless steps taken in the right order. If full coverage is not important, just make the change and wait for document redistribution to complete.
To keep full query coverage, make sure not to change both group size and number of groups at the same time:
1. To add nodes for more data, or to have less data per node, increase group size. E.g., in a 2-group cluster with 8 nodes per group, add 4 nodes for a 25% capacity increase with 10 nodes per group.
2. If the goal is to add query capacity, add one or more groups, with the same node count as existing group(s). A flat cluster is the same as one group - if the flat cluster has 8 nodes, change to a grouped cluster with 2 groups of 8 nodes per group. This will add an empty group, which is put in query serving once populated.
In short, if the end-state means both changing number of groups and node count per group, do this as separate steps, as a combination of the above. Between each step, wait for document redistribution to complete using the `merge_bucket.pending` metric - see [example](../writing/initial-batch-feed.html).
## Buckets
To manage documents, Vespa groups them in _buckets_, using hashing or hints in the [document id](../schemas/documents.html).
A document Put or Update is sent to all replicas of the bucket with the document. If bucket replicas are out of sync, a bucket merge operation is run to re-sync the bucket. A bucket contains [tombstones](../operations/self-managed/admin-procedures.html#data-retention-vs-size) of recently removed documents.
Buckets are split when they grow too large, and joined when they shrink. This is a key feature for high performance in small to large instances, and eliminates need for downtime or manual operations when scaling. Buckets are purely a content management concept, and data is not stored or indexed in separate buckets, nor does queries relate to buckets in any way. Read more in [buckets](buckets.html).
## Ideal state distribution algorithm
The [ideal state distribution algorithm](idealstate.html) uses a variant of the [CRUSH algorithm](https://ceph.io/assets/pdfs/weil-crush-sc06.pdf) to decide bucket placement. It makes a minimal number of documents move when nodes are added or removed. Central to the algorithm is the assignment of a node sequence to each bucket:

Steps to assign a bucket to a set of nodes:
1. Seed a random generator with the bucket ID to generate a pseudo-random sequence of numbers. Using the bucket ID as seed will then always generate the same sequence for the bucket.
2. Nodes are ordered by [distribution-key](../reference/applications/services/content.html#node), assign the random number in that order. E.g. a node with distribution-key 0 will get the first random number, node 1 the second.
3. Sort the node list by the random number.
4. Select nodes in descending random number order - above, node 1, 3 and 0 will store bucket 0x3c000000000000a0 with n=3 (redundancy). For n=2, node 1 and 3 will store the bucket. This specification of where to place a bucket is called the bucket's _ideal state_.
Repeat this for all buckets in the system.
## Consistency
Consistency is maintained at bucket level. Content nodes calculate local checksums based on the bucket contents, and the distributors compare checksums across the bucket replicas. A _bucket merge_ is issued to resolve inconsistency, when detected. While there are inconsistent bucket replicas, operations are routed to the "best" replica.
As buckets are split and joined, it is possible for replicas of a bucket to be split at different levels. A node may have been down while its buckets have been split or joined. This is called _inconsistent bucket splitting_. Bucket checksums can not be compared across buckets with different split levels. Consequently, content nodes do not know whether all documents exist in enough replicas in this state. Due to this, inconsistent splitting is one of the highest maintenance priorities. After all buckets are split or joined back to the same level, the content nodes can verify that all the replicas are consistent and fix any detected issues with a merge. [Read more](consistency).
## Further reading
- [content nodes](content-nodes.html)
- [proton](proton.html) - see _ready_ state
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [Adding nodes](#adding-nodes)
- [Removing nodes](#removing-nodes)
- [Grouped distribution](#grouped-distribution)
- [Changing topology](#changing-topology)
- [Replicas](#replicas)
- [Query coverage](#query-coverage)
- [Buckets](#buckets)
- [Ideal state distribution algorithm](#ideal-state-distribution-algorithm)
- [Consistency](#consistency)
- [Further reading](#further-reading)
---
# Source: https://docs.vespa.ai/en/rag/embedding.html.md
# Source: https://docs.vespa.ai/en/reference/rag/embedding.html.md
# Embedding Reference
Reference configuration for [embedders](../../rag/embedding.html).
## Model config reference
Embedder models use the [model](../applications/config-files.html#model) type configuration which accepts the attributes `model-id`, `url` or `path`. Multiple of these can be specified as a single config value, where one is used depending on the deployment environment:
- If a `model-id` is specified and the application is deployed on Vespa Cloud, the `model-id` is used.
- Otherwise, if a `url` is specified, it is used
- Otherwise, `path` is used.
When using `path`, the model files must be supplied in the application package.
## Huggingface Embedder
An embedder using any [Huggingface tokenizer](https://huggingface.co/docs/tokenizers/index), including multilingual tokenizers, to produce tokens which is then input to a supplied transformer model in ONNX model format.
The Huggingface embedder is configured in [services.xml](../applications/services/services.html), within the `container` tag:
```
```
query:
passage:
...
```
```
### Private Model Hub
You may also use models hosted in a[private Huggingface model hub](https://huggingface.co/docs/hub/en/repositories-settings#private-repositories).
Retrieve an API key from Huggingface with the appropriate permissions, and add it to the [vespa secret store.](../../security/secret-store)Add the secret to the container `` and refer to it in your Huggingface model configuration:
```
```
```
```
### Huggingface embedder reference config
In addition to [embedder ONNX parameters](#embedder-onnx-reference-config):
| Name | Occurrence | Description | Type | Default |
| --- | --- | --- | --- | --- |
| transformer-model | One | Use to point to the transformer ONNX model file | [model-type](#model-config-reference) | N/A |
| tokenizer-model | One | Use to point to the `tokenizer.json` Huggingface tokenizer configuration file | [model-type](#model-config-reference) | N/A |
| max-tokens | One | The maximum number of tokens accepted by the transformer model | numeric | 512 |
| transformer-input-ids | One | The name or identifier for the transformer input IDs | string | input\_ids |
| transformer-attention-mask | One | The name or identifier for the transformer attention mask | string | attention\_mask |
| transformer-token-type-ids | One | The name or identifier for the transformer token type IDs. If the model does not use `token_type_ids` use ` ` | string | token\_type\_ids |
| transformer-output | One | The name or identifier for the transformer output | string | last\_hidden\_state |
| pooling-strategy | One | How the output vectors of the ONNX model is pooled to obtain a single vector representation. Valid values are `mean`,`cls` and `none` | string | mean |
| normalize | One | A boolean indicating whether to normalize the output embedding vector to unit length (length 1). Useful for `prenormalized-angular`[distance-metric](../schemas/schemas.html#distance-metric) | boolean | false |
| prepend | Optional | Prepend instructions that are prepended to the text input before tokenization and inference. Useful for models that have been trained with specific prompt instructions. The instructions are prepended to the input text.
- Element \ - Optional query prepend instruction.
- Element \ - Optional document prepend instruction.
```
```
query:
passage:
```
```
| Optional \ \ elements. | |
## Bert embedder
The Bert embedder is configured in [services.xml](../applications/services/services.html), within the `container` tag:
```
```
```
```
### Bert embedder reference config
In addition to [embedder ONNX parameters](#embedder-onnx-reference-config):
| Name | Occurrence | Description | Type | Default |
| --- | --- | --- | --- | --- |
| transformer-model | One | Use to point to the transformer ONNX model file | [model-type](#model-config-reference) | N/A |
| tokenizer-vocab | One | Use to point to the Huggingface `vocab.txt` tokenizer file with valid wordpiece tokens. Does not support `tokenizer.json` format. | [model-type](#model-config-reference) | N/A |
| max-tokens | One | The maximum number of tokens allowed in the input | integer | 384 |
| transformer-input-ids | One | The name or identifier for the transformer input IDs | string | input\_ids |
| transformer-attention-mask | One | The name or identifier for the transformer attention mask | string | attention\_mask |
| transformer-token-type-ids | One | The name or identifier for the transformer token type IDs. If the model does not use `token_type_ids` use ` ` | string | token\_type\_ids |
| transformer-output | One | The name or identifier for the transformer output | string | output\_0 |
| transformer-start-sequence-token | One | The start of sequence token | numeric | 101 |
| transformer-end-sequence-token | One | The start of sequence token | numeric | 102 |
| pooling-strategy | One | How the output vectors of the ONNX model is pooled to obtain a single vector representation. Valid values are `mean` and `cls` | string | mean |
## colbert embedder
The colbert embedder is configured in [services.xml](../applications/services/services.html), within the `container` tag:
```
```
32
256
```
```
The Vespa colbert implementation works with default configurations for transformer models that use WordPiece tokenization.
### colbert embedder reference config
In addition to [embedder ONNX parameters](#embedder-onnx-reference-config):
| Name | Occurrence | Description | Type | Default |
| --- | --- | --- | --- | --- |
| transformer-model | One | Use to point to the transformer ColBERT ONNX model file | [model-type](#model-config-reference) | N/A |
| tokenizer-model | One | Use to point to the `tokenizer.json` Huggingface tokenizer configuration file | [model-type](#model-config-reference) | N/A |
| max-tokens | One | Max length of token sequence the transformer-model can handle | numeric | 512 |
| max-query-tokens | One | The maximum number of ColBERT query token embeddings. Queries are padded to this length. Must be lower than max-tokens | numeric | 32 |
| max-document-tokens | One | The maximum number of ColBERT document token embeddings. Documents are not padded. Must be lower than max-tokens | numeric | 512 |
| transformer-input-ids | One | The name or identifier for the transformer input IDs | string | input\_ids |
| transformer-attention-mask | One | The name or identifier for the transformer attention mask | string | attention\_mask |
| transformer-mask-token | One | The mask token id used for ColBERT query padding | numeric | 103 |
| transformer-start-sequence-token | One | The start of sequence token id | numeric | 101 |
| transformer-end-sequence-token | One | The end of sequence token id | numeric | 102 |
| transformer-pad-token | One | The pad sequence token id | numeric | 0 |
| query-token-id | One | The colbert query token marker id | numeric | 1 |
| document-token-id | One | The colbert document token marker id | numeric | 2 |
| transformer-output | One | The name or identifier for the transformer output | string | contextual |
The Vespa colbert-embedder uses `[unused0]`token id 1 for `query-token-id`, and `[unused1]`, token id 2 for ` document-token-id`document marker. Document punctuation chars are filtered (not configurable). The following characters are removed `!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~`.
### splade embedder reference config
In addition to [embedder ONNX parameters](#embedder-onnx-reference-config):
| Name | Occurrence | Description | Type | Default |
| --- | --- | --- | --- | --- |
| transformer-model | One | Use to point to the transformer ONNX model file | [model-type](#model-config-reference) | N/A |
| tokenizer-model | One | Use to point to the `tokenizer.json` Huggingface tokenizer configuration file | [model-type](#model-config-reference) | N/A |
| term-score-threshold | One | An optional threshold to increase sparseness, tokens/terms with a score lower than this is not retained. | numeric | N/A |
| max-tokens | One | The maximum number of tokens accepted by the transformer model | numeric | 512 |
| transformer-input-ids | One | The name or identifier for the transformer input IDs | string | input\_ids |
| transformer-attention-mask | One | The name or identifier for the transformer attention mask | string | attention\_mask |
| transformer-token-type-ids | One | The name or identifier for the transformer token type IDs. If the model does not use `token_type_ids` use ` ` | string | token\_type\_ids |
| transformer-output | One | The name or identifier for the transformer output | string | logits |
## Huggingface tokenizer embedder
The Huggingface tokenizer embedder is configured in [services.xml](../applications/services/services.html), within the `container` tag:
```
```
```
```
### Huggingface tokenizer reference config
| Name | Occurrence | Description | Type | Default |
| --- | --- | --- | --- | --- |
| model | One To Many | Use to point to the `tokenizer.json` Huggingface tokenizer configuration file. Also supports `language`, which is only relevant if one wants to tokenize differently based on the document language. Use "unknown" for a model to be used for any language (i.e. by default). | [model-type](#model-config-reference) | N/A |
## Embedder ONNX reference config
Vespa uses [ONNX Runtime](https://onnxruntime.ai/) to accelerate inference of embedding models. These parameters are valid for both [Bert embedder](#bert-embedder) and [Huggingface embedder](#huggingface-embedder).
| Name | Occurrence | Description | Type | Default |
| --- | --- | --- | --- | --- |
| onnx-execution-mode | One | Low level ONNX execution model. Valid values are `parallel` or `sequential`. Only relevant for inference on CPU. See [ONNX runtime documentation](https://onnxruntime.ai/docs/performance/tune-performance/threading.html) on threading. | string | sequential |
| onnx-interop-threads | One | Low level ONNX setting.Only relevant for inference on CPU. | numeric | 1 |
| onnx-intraop-threads | One | Low level ONNX setting. Only relevant for inference on CPU. | numeric | 4 |
| onnx-gpu-device | One | The GPU device to run the model on. See [configuring GPU for Vespa container image](/en/operations/self-managed/vespa-gpu-container.html). Use `-1` to not use GPU for the model, even if the instance has available GPUs. | numeric | 0 |
## SentencePiece embedder
A native Java implementation of [SentencePiece](https://github.com/google/sentencepiece). SentencePiece breaks text into chunks independent of spaces, which is robust to misspellings and works with CJK languages. Prefer the [Huggingface tokenizer embedder](#huggingface-tokenizer-embedder) over this for better compatibility with Huggingface models.
This is suitable to use in conjunction with [custom components](../../applications/components.html), or the resulting tensor can be used in [ranking](../../basics/ranking.html).
To use the [SentencePiece embedder](https://github.com/vespa-engine/vespa/blob/master/linguistics-components/src/main/java/com/yahoo/language/sentencepiece/SentencePieceEmbedder.java), add it to [services.xml](../applications/services/services.html):
```
```
;
-
unknown
model/en.wiki.bpe.vs10000.model
```
```
See the options available for configuring SentencePiece in [the full configuration definition](https://github.com/vespa-engine/vespa/blob/master/linguistics-components/src/main/resources/configdefinitions/language.sentencepiece.sentence-piece.def).
## WordPiece embedder
A native Java implementation of [WordPiece](https://github.com/google-research/bert#tokenization), which is commonly used with BERT models. Prefer the [Huggingface tokenizer embedder](#huggingface-tokenizer-embedder) over this for better compatibility with Huggingface models.
This is suitable to use in conjunction with [custom components](../../applications/components.html), or the resulting tensor can be used in [ranking](../../basics/ranking.html).
To use the [WordPiece embedder](https://github.com/vespa-engine/vespa/blob/master/linguistics-components/src/main/java/com/yahoo/language/wordpiece/WordPieceEmbedder.java), add it to [services.xml](../applications/services/services.html) within the `container` tag:
```
```
class="com.yahoo.language.wordpiece.WordPieceEmbedder"
bundle="linguistics-components">
-
unknown
models/bert-base-uncased-vocab.txt
```
```
See the options available for configuring WordPiece in [the full configuration definition](https://github.com/vespa-engine/vespa/blob/master/linguistics-components/src/main/resources/configdefinitions/language.wordpiece.word-piece.def).
WordPiece is suitable to use in conjunction with custom components, or the resulting tensor can be used in [ranking](../../basics/ranking.html).
## Using an embedder from Java
When writing custom Java components (such as [Searchers](../../applications/searchers.html) or [Document processors](../../applications/document-processors.html#document-processors)), use embedders you have configured by [having them injected in the constructor](../../applications/dependency-injection.html), just as any other component:
```
```
class MyComponent {
@Inject
public MyComponent(ComponentRegistry embedders) {
// embedders contains all the embedders configured in your services.xml
}
}
```
```
See a concrete example of using an embedder in a custom searcher in[LLMSearcher](https://github.com/vespa-cloud/vespa-documentation-search/blob/main/src/main/java/ai/vespa/cloud/docsearch/LLMSearcher.java).
## Custom Embedders
Vespa provides a Java interface for defining components which can provide embeddings of text:[com.yahoo.language.process.Embedder](https://github.com/vespa-engine/vespa/blob/master/linguistics/src/main/java/com/yahoo/language/process/Embedder.java).
To define a custom embedder in an application and make it usable by Vespa (see [embedding a query text](../../rag/embedding.html#embedding-a-query-text)), implement this interface and add it as a [component](../../applications/developer-guide.html#developing-components) to [services.xml](../applications/services/container.html):
```
```
foo
```
```
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [Model config reference](#model-config-reference)
- [Huggingface Embedder](#huggingface-embedder)
- [Private Model Hub](#private-model-hub)
- [Huggingface embedder reference config](#huggingface-embedder-reference-config)
- [Bert embedder](#bert-embedder)
- [Bert embedder reference config](#bert-embedder-reference-config)
- [colbert embedder](#colbert-embedder)
- [colbert embedder reference config](#colbert-embedder-reference-config)
- [splade embedder reference config](#splade-embedder-reference-config)
- [Huggingface tokenizer embedder](#huggingface-tokenizer-embedder)
- [Huggingface tokenizer reference config](#huggingface-tokenizer-reference-config)
- [Embedder ONNX reference config](#embedder-onnx-reference-config)
- [SentencePiece embedder](#sentencepiece-embedder)
- [WordPiece embedder](#wordpiece-embedder)
- [Using an embedder from Java](#using-an-embedder-from-java)
- [Custom Embedders](#custom-embedders)
---
# Source: https://docs.vespa.ai/en/operations/enclave/enclave.html.md
# Vespa Cloud Enclave

Vespa Cloud Enclave allows Vespa Cloud applications to run inside the tenant's own cloud accounts while everything is still fully managed by Vespa Cloud's automation, giving the tenant full access to Vespa Cloud features inside their own cloud account. This allows tenant data to always remain within the bounds of services controlled by the tenant, and also to build closer integrations with Vespa applications inside the cloud services.
Vespa Cloud Enclave is available in AWS, Azure, and GCP.
**Note:** As the Vespa Cloud Enclave resources run in _your_ account, this incurs resource costs from your cloud provider in _addition_ to the Vespa Cloud costs.
## AWS
- [Getting started](aws-getting-started.html)
- [Architecture and security](aws-architecture)
## Azure
- [Getting started](azure-getting-started.html)
- [Architecture and security](azure-architecture)
## GCP
- [Getting started](gcp-getting-started.html)
- [Architecture and security](gcp-architecture)
## Guides
- [Log archive](archive)
- [Operations and Support](operations)
## FAQ
**Which kind of permission is needed for the Vespa control plane to access my AWS accounts / Azure subscriptions / GCP projects?**The permissions required are coded into the Terraform modules found at:
- [terraform-aws](https://github.com/vespa-cloud/terraform-aws-enclave/tree/main)
- [terraform-azure](https://github.com/vespa-cloud/terraform-azure-enclave/tree/main)
- [terraform-google](https://github.com/vespa-cloud/terraform-google-enclave/tree/main)
Navigate to the _modules_ directory for details.
**How can I configure agents/daemons on Vespa hosts securely?**Use terraform to grant Vespa hosts access to necessary secrets, and create an RPM that retrieves them and configures your application. See [enclave-examples](https://github.com/vespa-cloud/enclave-examples/tree/main/systemd-secrets)for a complete example.
**Deployment failure: Could not provision …**This happens if you deploy to new zones _before_ running the Terraform/CloudFormation templates:
```
Deployment failed: Invalid application: In container cluster 'mycluster': Could not provision load balancer mytenant:myapp:myinstance:mycluster: Expected to find exactly 1 resource, but got 0 for subnet with service 'tenantelb'
```
**Do we need to take any actions when AWS sends us Amazon EC2 Instance Retirement, Amazon EC2 Instance Availability Issue, or Amazon EC2 Maintenance notifications,?**
Vespa Cloud will take proactive actions on maintenance operations and replace instances that are scheduled for maintenance tasks ahead of time to reduce any impact the maintenance may incur.
All EC2 instance failures are detected by our control plane, and the problematic instances are automatically replaced. The system will, as part of the replacement process, also ensure that the document distribution is kept in line with your application configuration.
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [AWS](#aws)
- [Azure](#azure)
- [GCP](#gcp)
- [Guides](#guides)
- [FAQ](#faq)
---
# Source: https://docs.vespa.ai/en/operations/endpoint-routing.html.md
# Routing and endpoints
Vespa Cloud supports multiple methods of routing requests to an application. This guide describes how these routing methods work, failover, and how to configure them.
By default, each deployment of a Vespa Cloud application will have a zone endpoint. In addition to the default zone endpoint, one can configure global endpoints.
All endpoints for an application are available under the _endpoints_ tab of each deployment in the console.
## Endpoint format
Vespa Cloud endpoints are on the format: `{random}.{random}.{scope}.vespa-app.cloud`.
## Endpoint scopes
### Zone endpoint
This is the default endpoint for a deployment. Requests through a zone endpoint are sent directly to the zone.
Zone endpoints are created implicitly, one per container cluster declared in [services.xml](/en/reference/applications/services/container.html). Zone endpoints are not configurable.
Zone endpoints have the suffix `z.vespa-app.cloud`
### Global endpoint
A global endpoint is an endpoint that can route requests to multiple zones. It can be configured in [deployment.xml](/en/reference/applications/deployment.html#endpoints-global). Similar to how a [CDN](https://en.wikipedia.org/wiki/Content_delivery_network) works, requests through this endpoint will be routed to the nearest zone based on geo proximity, i.e. the zone that is nearest to the client.
Global endpoints have the suffix `g.vespa-app.cloud`
**Important:** Global endpoints do not support feeding. Feeding must be done through zone endpoints.
## Routing control
Vespa Cloud has two mechanisms for manually controlling routing of requests to a zone:
- Removing the `` element from the relevant `` elements in [deployment.xml](../reference/applications/deployment.html) and deploying a new version of your application.
- Changing the status through the console.
This section describes the latter mechanism. Navigate to the relevant deployment of your application in the console. Hovering over the _GLOBAL ROUTING_ badge will display the current status and when it was last changed.
### Change status
In case of a production emergency, a zone can be manually set out to prevent it from receiving requests:
1. Hover over the _GLOBAL ROUTING_ badge for the problematic deployment and click _Deactivate_.
2. Inspection of the status will now show the status set to _OUT_. To set the zone back in and have it continue receiving requests: Hover over the _GLOBAL ROUTING_ badge again and click _Activate_.
### Behaviour
Changing the routing status is independent of the endpoint scope used. You're technically overriding the routing status the deployment reports to the Vespa Cloud routing infrastructure. This means that a change to routing status affects both _zonal endpoints_ and _global endpoints_.
Deactivating a deployment disables routing of requests to that deployment through global endpoints until the deployment is activated again. As routing through these endpoints is DNS-based, it may take up between 5 and 15 minutes for all traffic to shift to other deployments.
If all deployments of an endpoint are deactivated, requests are distributed as if all deployments were active. This is because attempting to route traffic according to the original configuration is preferable to discarding all requests.
## AWS clients
While Vespa Cloud is hosted in AWS, clients that talk to Vespa Cloud from AWS nodes will be treated as any other client from the Internet. This means clients in AWS will generate regular Internet egress traffic even though they are talking to a service in AWS in the same zone.
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [Endpoint format](#)
- [Endpoint scopes](#)
- [Zone endpoint](#zone-endpoint)
- [Global endpoint](#global-endpoint)
- [Routing control](#routing-control)
- [Change status](#)
- [Behaviour](#)
- [AWS clients](#aws-clients)
---
# Source: https://docs.vespa.ai/en/operations/environments.html.md
# Environments
Vespa Cloud has two kinds of environments:
- Manual environment for rapid development and test: `dev`
- Automated environment with integrated CD pipeline: `prod`
An application is deployed to one or more _zones_ (see [zone list](zones.html)), which is a combination of an _environment_ and a _region_, like `vespa deploy -z dev.aws-us-east-1c`.
## Dev
The dev environment is built for rapid developments cycles, with auto-downscaling and auto-expiry for ease of use and cost control. The dev environment is the default, to deploy to this, use `vespa deploy`.
### Auto downscaling
One use case for the dev environment is to take an application package from a prod environment and deploy to the dev environment to debug. To minimize cost and make this speedy, Vespa Cloud will by default ignore [nodes](../reference/applications/services/services.html#nodes) and [resources](../reference/applications/services/services.html#resources) settings.
With this, you can safely download an application package from prod (that are normally large) and deploy to dev, with no changes.
To override this behavior and control the resources, specify them explicitly for the dev environment as described in [deployment variants](deployment-variants.html#services.xml-variants). Example:
```
```
>
```
```
**Important:** The `dev` environment has redundancy 1 by default, and there are no availability or data persistence guarantees. Do not use applications deployed to these zones for production serving use cases.
### Auto expiry
Deployments to `dev` expire after 14 days of inactivity, that is, 14 days after the last [deployment](../basics/applications.html#deploying-applications). **This applies to all plans**. To add 7 more days to the expiry period, redeploy the application or use the Vespa Cloud Console.
### Vespa version
The latest active Vespa version is used when deploying to the dev environment. The deployment is upgraded at a time which is most likely at night for the developer in order to minimize downtime (based on the time when last deployments were made). An upgrade will be skipped if metrics indicate ongoing feed or query load, but will still be done if current version is more than a week old.
## Prod
Applications are deployed to the `prod` environment for production serving. Deployments are passed through an integrated CD pipeline for system tests and staging tests. Read more in [automated deployments](automated-deployments.html).
## Test
The `test` environment is used by the integrated CD pipeline for prod deployments, to run [system tests](automated-deployments.html#system-tests). The test capacity is ephemeral and only used during test. Nodes in test and staging environments do not have access to data in prod environments.
Note that one cannot deploy directly to test and staging environments. For long-lived test applications (e.g., a QA system that is integrated with other services) use the prod environment.
System tests are always invoked, even if there are no tests defined. In this case, an instance is just started and then stopped. This has value in itself, as it ensures that the application is able to start.
Test runs can be [aborted](automated-deployments.html#disabling-tests).
## Staging
See system tests above, this applies to the staging, too. [Staging tests](automated-deployments.html#staging-tests) use a fraction of the configured prod capacity, this can be overridden to using 1 node regardless of prod cluster size:
```
```
```
```
## Reference
Environment settings:
| Name | Description | Expiry | Cluster sizes |
| --- | --- | --- | --- |
| `dev` | Used for manual development testing. | 14 days | `1` |
| `test` | Used for [automated system tests](../applications/testing.html#system-tests). | - | `1` |
| `staging` | Used for [automated staging tests](../applications/testing.html#staging-tests). | - | `min(max(2, 0.05 * spec), spec)` |
| `prod` | Hosts all production deployments. | No expiry | `max(2, spec)` |
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [Dev](#dev)
- [Auto downscaling](#auto-downscaling)
- [Auto expiry](#auto-expiry)
- [Vespa version](#vespa-version)
- [Prod](#prod)
- [Test](#test)
- [Staging](#staging)
- [Reference](#reference)
---
# Source: https://docs.vespa.ai/en/schemas/exposing-schema-information.html.md
# Exposing schema information
Some applications need to expose information about schemas to data plane clients. This document explains how to add an API for that to your application.
You need to know two things:
- Your application can expose any custom API by implementing a [handler](../applications/request-handlers.html).
- Information about the deployed schemas are available in the component _com.yahoo.search.schema.SchemaInfo_.
With this information, we can add an API exposing schemas information through the following steps.
## 1. Make sure your application package can contain Java components
Application packages containing Java components must follow Maven layout. If your application package root contains a `pom.xml` and `src/main`you're good, otherwise convert it to this layout by copying the pom.xml from[the album-recommendation.java](https://github.com/vespa-engine/sample-apps/tree/master/album-recommendation-java)sample app and moving the files to follow this layout before moving on.
## 2. Add a handler exposing schema info
Add the following handler (to a package of your choosing):
```
```
package ai.vespa.example;
import com.yahoo.container.jdisc.HttpRequest;
import com.yahoo.container.jdisc.HttpResponse;
import com.yahoo.container.jdisc.ThreadedHttpRequestHandler;
import com.yahoo.jdisc.Metric;
import com.yahoo.search.schema.SchemaInfo;
import java.io.IOException;
import java.io.OutputStream;
import java.nio.charset.Charset;
import java.util.concurrent.Executor;
public class SchemaInfoHandler extends ThreadedHttpRequestHandler {
private final SchemaInfo schemaInfo;
public SchemaInfoHandler(Executor executor, Metric metric, SchemaInfo schemaInfo) {
super(executor, metric);
this.schemaInfo = schemaInfo;
}
@Override
public HttpResponse handle(HttpRequest httpRequest) {
// Creating JSON, handling different paths etc. left as an exercise for the reader
StringBuilder response = new StringBuilder();
for (var schema : schemaInfo.schemas().values()) {
response.append("schema: " + schema.name() + "\n");
for (var field : schema.fields().values())
response.append(" field: " + field.name() + "\n");
}
return new Response(200, response.toString());
}
private static class Response extends HttpResponse {
private final byte[] data;
Response(int code, byte[] data) {
super(code);
this.data = data;
}
Response(int code, String data) {
this(code, data.getBytes(Charset.forName(DEFAULT_CHARACTER_ENCODING)));
}
@Override
public String getContentType() {
return "application/json";
}
@Override
public void render(OutputStream outputStream) throws IOException {
outputStream.write(data);
}
}
private static class ErrorResponse extends Response {
ErrorResponse(int code, String message) {
super(code, "{\"error\":\"" + message + "\"}");
}
}
}
```
```
## 3. Add the new API handler to your container cluster
In your `services.xml` file, under ``, add:
```
```
http://*/schema/v1/*
```
```
## 4. Deploy the modified application
```
$ mvn install
$ vespa deploy
```
## 5. Verify that it works
```
$ vespa curl "schema/v1/"
```
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [1. Make sure your application package can contain Java components](#)
- [2. Add a handler exposing schema info](#)
- [3. Add the new API handler to your container cluster](#)
- [4. Deploy the modified application](#)
- [5. Verify that it works](#)
---
# Source: https://docs.vespa.ai/en/rag/external-llms.html.md
# External LLMs in Vespa
Please refer to [Large Language Models in Vespa](llms-in-vespa.html) for an introduction to using LLMs in Vespa.
Vespa provides a client for integration with OpenAI compatible APIs. This includes, but is not limited to [OpenAI](https://platform.openai.com/docs/overview), [Google Gemini](https://ai.google.dev/), [Anthropic](https://www.anthropic.com/api), [Cohere](https://docs.cohere.com/docs/compatibility-api) and [Together.ai](https://docs.together.ai/docs/openai-api-compatibility). You can also host your own OpenAI-compatible server using for example [VLLM](https://docs.vllm.ai/en/latest/getting_started/quickstart.html#quickstart-online) or [llama-cpp-server](https://llama-cpp-python.readthedocs.io/en/latest/server/).
**Note:** Note that this is currently a Beta feature so changes can be expected.
### Configuring the OpenAI client
To set up a connection to an LLM service such as OpenAI's ChatGPT, you need to define a component in your application's[services.xml](../reference/applications/services/services.html):
```
...
...
...
...
```
To see the full list of available configuration parameters, refer to the [llm-client config definition file](https://github.com/vespa-engine/vespa/blob/master/model-integration/src/main/resources/configdefinitions/llm-client.def).
This sets up a client component that can be used in a[searcher](../learn/glossary.html#searcher) or a [document processor](../learn/glossary.html#document-processor).
### API key configuration
Vespa provides several options to configure the API key used by the client.
1. Using the [Vespa Cloud secret store](../security/secret-store) to store the API key.
2. This is done by setting the `apiKeySecretRef` configuration parameter to the name of the secret
3. in the secret store. This is the recommended way for Vespa Cloud users.
4. Providing the API key in the `X-LLM-API-KEY` HTTP header of the Vespa query.
5. It is also possible to configure the API key in a custom component. For example, [this](https://github.com/vespa-engine/system-test/tree/master/tests/docproc/generate_field_openai) system-test shows how to retrieve the API key from a local file deployed with your Vespa application. Please note that this is NOT recommended for production use, as it is less secure than using the secret store, but it can be modified to suit your needs.
You can set up multiple connections with different settings. For instance, you might want to run different LLMs for different tasks. To distinguish between the connections, modify the `id` attribute in the component specification. We will see below how this is used to control which LLM is used for which task.
As a reminder, Vespa also has the option of running custom LLMs locally. Please refer to[running LLMs in your application](local-llms.html) for more information.
### Inference parameters
Please refer to the general discussion in [LLM parameters](llms-in-vespa.html#llm-parameters) for setting inference parameters.
The OpenAI-client also has the following inference parameters that can be sent along with the query:
| Parameter (Vespa) | Parameter (OpenAI) | Description |
| --- | --- | --- |
| `maxTokens` | `max_completion_tokens` | Maximum number of tokens that can be generated in the chat completion. |
| `temperature` | `temperature` | Number between 0 and 2. Higher values like 0.8 make output more random, while lower values like 0.2 make it more focused and deterministic. |
| `topP` | `top_p` | An alternative to temperature sampling. Model considers tokens with top\_p probability mass (0-1). Value of 0.1 means only tokens comprising top 10% probability are considered. |
| `seed` | `seed` | If specified, the system will attempt to sample deterministically, so repeated requests with the same seed should return similar results. Determinism is not guaranteed. |
| `npredict` | `n` | How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all choices. |
| `frequencypenalty` | `frequency_penalty` | Number between -2.0 and 2.0. Positive values penalize new tokens based on their frequency in the text so far, decreasing the likelihood of repetition. Negative values encourage repetition. |
| `presencepenalty` | `presence_penalty` | Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics. Negative values encourage repeating content from the prompt. |
Any parameter sent with the query will override configuration specified for the client component in `services.xml`.
Note that if you are not using OpenAI's API, the parameters may be handled differently than the descriptions above.
### Connecting to other OpenAI-compatible providers
By default, this particular client connects to the OpenAI service, but can be used against any[OpenAI chat completion compatible API](https://platform.openai.com/docs/guides/text-generation/chat-completions-api)by changing the `endpoint` configuration parameter.
### FAQ
- **Q: How do I know if my LLM is compatible with the OpenAI client?**
- A: The OpenAI client is compatible with any LLM that implements the OpenAI chat completion API. You can check the documentation of your LLM provider to see if they support this API.
- **Q: Can I use the [Responses](https://platform.openai.com/docs/api-reference/responses/create) provided by OpenAI**
- A: No, currently only the [Chat Completion API](https://platform.openai.com/docs/api-reference/chat) is supported.
- **Q: Can I use the OpenAI client for reranking?**
- A: Yes, but currently, you need to implement a [custom searcher](../applications/searchers.html) that uses the OpenAI client to rerank the results.
- **Q: Can I use the OpenAI client for retrieving embeddings?**
- A: No, currently, only the [Chat Completion API](https://platform.openai.com/docs/api-reference/chat) is supported.
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [Configuring the OpenAI client](#configuring-the-openai-client)
- [API key configuration](#api-key-configuration)
- [Inference parameters](#inference-parameters)
- [Connecting to other OpenAI-compatible providers](#connecting-to-other-openai-compatible-providers)
- [FAQ](#faq)
---
# Source: https://docs.vespa.ai/en/learn/faq.html.md
# FAQ - frequently asked questions
Refer to [Vespa Support](https://vespa.ai/support/) for more support options.
## Ranking
### Does Vespa support a flexible ranking score?
[Ranking](../basics/ranking.html) is maybe the primary Vespa feature - we like to think of it as scalable, online computation. A rank profile is where the application's logic is implemented, supporting simple types like `double` and complex types like `tensor`. Supply ranking data in queries in query features (e.g. different weights per customer), or look up in a [Searcher](../applications/searchers.html). Typically, a document (e.g. product) "feature vector"/"weights" will be compared to a user-specific vector (tensor).
### Where would customer specific weightings be stored?
Vespa doesn't have specific support for storing customer data as such. You can store this data as a separate document type in Vespa and look it up before passing the query, or store this customer meta-data as part of the other meta-data for the customer (i.e. login information) and pass it along the query when you send it to the backend. Find an example on how to look up data in[album-recommendation-docproc](https://github.com/vespa-engine/sample-apps/tree/master/examples/document-processing).
### How to create a tensor on the fly in the ranking expression?
Create a tensor in the ranking function from arrays or weighted sets using `tensorFrom...` functions - see [document features](../reference/ranking/rank-features.html#document-features).
### How to set a dynamic (query time) ranking drop threshold?
Pass a ranking feature like `query(threshold)` and use an `if` statement in the ranking expression - see [retrieval and ranking](../ranking/ranking-intro#retrieval-and-ranking). Example:
```
rank-profile drop-low-score {
function my_score() {
expression: ..... #custom first phase score
}
rank-score-drop-limit:0.0
first-phase {
if(my_score() < query(threshold), -1, my_score())
}
}
```
### Are ranking expressions or functions evaluated lazily?
Rank expressions are not evaluated lazily. No, this would require lambda arguments. Only doubles and tensors are passed between functions. Example:
```
function inline foo(tensor, defaultVal) {
expression: if (count(tensor) == 0, defaultValue, sum(tensor))
}
function bar() {
expression: foo(tensor, sum(tensor1 * tensor2))
}
```
### Does Vespa support early termination of matching and ranking?
Yes, this can be accomplished by configuring [match-phase](../reference/schemas/schemas.html#match-phase) in the rank profile, or by adding a range query item using _hitLimit_ to the query tree, see [capped numeric range search](../reference/querying/yql.html#numeric). Both methods require an _attribute_ field with _fast-search_. The capped range query is faster, but beware that if there are other restrictive filters in the query, one might end up with 0 hits. The additional filters are applied as a post filtering step over the hits from the capped range query. _match-phase_ on the other hand, is safe to use with filters or other query terms, and also supports diversification which the capped range query term does not support.
### What could cause the relevance field to be -Infinity
The returned [relevance](../reference/querying/default-result-format.html#relevance) for a hit can become "-Infinity" instead of a double. This can happen in two cases:
- The [ranking](../basics/ranking.html) expression used a feature which became `NaN` (Not a Number). For example, `log(0)` would produce -Infinity. One can use [isNan](../reference/ranking/ranking-expressions.html#isnan-x) to guard against this.
- Surfacing low scoring hits using [grouping](../querying/grouping.html), that is, rendering low ranking hits with `each(output(summary()))` that are outside of what Vespa computed and caches on a heap. This is controlled by the [keep-rank-count](../reference/schemas/schemas.html#keep-rank-count).
### How to pin query results?
To hard-code documents to positions in the result set, see the [pin results example](../ranking/multivalue-query-operators.html#pin-results-example).
## Documents
### What limits apply to document size?
There is a [maximum document size](../reference/applications/services/content.html#max-document-size) of 128 MiB, which is configurable per content cluster in services.xml.
### Is there any size limitation for multivalued fields?
No enforced limit, except resource usage (memory).
### Can a document have lists (key value pairs)?
E.g. a product is offered in a list of stores with a quantity per store. Use [multivalue fields](../searching-multi-valued-fields.html) (array of struct) or [parent child](../schemas/parent-child.html). Which one to chose depends on use case, see discussion in the latter link.
### Does a whole document need to be updated and re-indexed?
E.g. price and quantity available per store may often change vs the actual product attributes. Vespa supports [partial updates](../writing/reads-and-writes.html) of documents. Also, the parent/child feature is implemented to support use-cases where child elements are updated frequently, while a more limited set of parent elements are updated less frequently.
### What ACID guarantees if any does Vespa provide for single writes / updates / deletes vs batch operations etc?
See the [Vespa Consistency Model](../content/consistency). Vespa is not transactional in the traditional sense, it doesn't have strict ACID guarantees. Vespa is designed for high performance use-cases with eventual consistency as an acceptable (and to some extent configurable) trade-off.
### Does vespa support wildcard fields?
Wildcard fields are not supported in vespa. Workaround would be to use maps to store the wildcard fields. Map needs to be defined with `indexing: attribute` and hence will be stored in memory. Refer to [map](../reference/schemas/schemas.html#map).
### Can we set a limit for the number of elements that can be stored in an array?
Implement a [document processor](../applications/document-processors.html) for this.
### How to auto-expire documents / set up garbage collection?
Set a selection criterion on the `document` element in `services.xml`. The criterion selects documents to keep. I.e. to purge documents "older than two weeks", the expression should be "newer than two weeks". Read more about [document expiry](../schemas/documents.html#document-expiry).
### How to increase redundancy and track data migration progress?
Changing redundancy is a live and safe change (assuming there is headroom on disk / memory - e.g. from 2 to 3 is 50% more). The time to migrate will be quite similar to what it took to feed initially - a bit hard to say generally, and depends on IO and index settings, like if building an HNSW index. To monitor progress, take a look at the[multinode](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode)sample application for the _clustercontroller_ status page - this shows buckets pending, live. Finally, use the `.idealstate.merge_bucket.pending` metric to track progress - when 0, there are no more data syncing operations - see[monitor distance to ideal state](../operations/self-managed/admin-procedures.html#monitor-distance-to-ideal-state). Nodes will work as normal during data sync, and query coverage will be the same.
### How does namespace relate to schema?
It does not,_namespace_ is a mechanism to split the document space into parts that can be used for document selection - see [documentation](../schemas/documents.html#namespace). The namespace is not indexed and cannot be searched using the query api, but can be used by [visiting](../writing/visiting.html).
### Visiting does not dump all documents, and/or hangs.
There are multiple things that can cause this, see [visiting troubleshooting](../writing/visiting.html#troubleshooting).
### How to find number of documents in the index?
Run a query like `vespa query "select * from sources * where true"` and see the `totalCount` field. Alternatively, use metrics or `vespa visit` - see [examples](../writing/batch-delete.html#example).
### Can I define a default value for a field?
Not in the field definition, but it's possible to do this with the [choice](../writing/indexing.html#choice-example)expression in an indexing statement.
## Query
### Are hierarchical facets supported?
Facets is called [grouping](../grouping.html) in Vespa. Groups can be multi-level.
### Are filters supported?
Add filters to the query using [YQL](../querying/query-language.html)using boolean, numeric and [text matching](../querying/text-matching.html). Query terms can be annotated as filters, which means that they are not highlighted when bolding results.
### How to query for similar items?
One way is to describe items using tensors and query for the[nearest neighbor](../reference/querying/yql.html#nearestneighbor) - using full precision or approximate (ANN) - the latter is used when the set is too large for an exact calculation. Apply filters to the query to limit the neighbor candidate set. Using [dot products](../ranking/multivalue-query-operators.html) or [weak and](../ranking/wand.html) are alternatives.
### Does Vespa support stop-word removal?
Vespa does not have a stop-word concept inherently. See the [sample app](https://github.com/vespa-engine/sample-apps/pull/335/files)for how to use [filter terms](../reference/querying/yql.html#annotations).[Tripling the query performance of lexical search](https://blog.vespa.ai/tripling-the-query-performance-of-lexical-search/)it s good blog post on this subject.
### How to extract more than 400 hits / query and get ALL documents?
Trying to request more than 400 hits in a query, getting this error:`{'code': 3, 'summary': 'Illegal query', 'message': '401 hits requested, configured limit: 400.'}`.
- To increase max result set size (i.e. allow a higher [hits](../reference/api/query.html#hits)), configure `maxHits` in a [query profile](../reference/api/query.html#queryprofile), e.g. `500 ` in `search/query-profiles/default.xml` (create as needed). The [query timeout](../reference/api/query.html#timeout) can be increased, but it will still be costly and likely impact other queries - large limit more so than a large offset. It can be made cheaper by using a smaller [document summary](../querying/document-summaries.html), and avoiding fields on disk if possible.
- Using _visit_ in the [document/v1/ API](../writing/document-v1-api-guide.html)is usually a better option for dumping all the data.
### How to make a sub-query to get data to enrich the query, like get a user profile?
See the [UserProfileSearcher](https://github.com/vespa-engine/sample-apps/blob/master/news/app-6-recommendation-with-searchers/src/main/java/ai/vespa/example/UserProfileSearcher.java)for how to create a new query to fetch data - this creates a new Query, sets a new root and parameters - then `fill`s the Hits.
### How to create a cache that refreshes itself regularly
See the sub-query question above, in addition add something like:
```
```
public class ConfigCacheRefresher extends AbstractComponent {
private final ScheduledExecutorService configFetchService = Executors.newSingleThreadScheduledExecutor();
private Chain searcherChain;
void initialize() {
Runnable task = () -> refreshCache();
configFetchService.scheduleWithFixedDelay(task, 1, 1, TimeUnit.MINUTES);
searcherChain = executionFactory.searchChainRegistry().getChain(new ComponentId("configDefaultProvider"));
}
public void refreshCache() {
Execution execution = executionFactory.newExecution(searcherChain);
Query query = createQuery(execution);
public void deconstruct() {
super.deconstruct();
try {
configFetchService.shutdown();
configFetchService.awaitTermination(1, TimeUnit.MINUTES);
}catch(Exception e) {..}
}
}
```
```
### Is it possible to query Vespa using a list of document ids?
Yes, using the [in query operator](../reference/querying/yql.html#in). Example:
```
select * from data where user_id in (10, 20, 30)
```
The best article on the subject is[multi-lookup set filtering](../performance/feature-tuning.html#multi-lookup-set-filtering). Refer to the [in operator example](../ranking/multivalue-query-operators.html#in-example)on how to use it programmatically in a [Java Searcher](../applications/searchers.html).
### How to query documents where one field matches any values in a list? Similar to using SQL IN operator
Use the [in query operator](../reference/querying/yql.html#in). Example:
```
select * from data where category in ('cat1', 'cat2', 'cat3')
```
See [multi-lookup set filtering](#is-it-possible-to-query-vespa-using-a-list-of-document-ids)above for more details.
### How to count hits / all documents without returning results?
Count all documents using a query like [select \* from doc where true](../querying/query-language.html) - this counts all documents from the "doc" source. Using `select * from doc where true limit 0` will return the count and no hits, alternatively add [hits=0](../reference/api/query.html#hits). Pass [ranking.profile=unranked](../reference/api/query.html#ranking.profile)to make the query less expensive to run. If an _estimate_ is good enough, use [hitcountestimate=true](../reference/api/query.html#hitcountestimate).
### Must all fields in a fieldset have compatible type and matching settings?
Yes - a deployment warning with _This may lead to recall and ranking issues_ is emitted when fields with conflicting tokenization are put in the same[fieldset](../reference/schemas/schemas.html#fieldset). This is because a given query item searching one fieldset is tokenized just once, so there's no right choice of tokenization in this case. If you have user input that you want to apply to multiple fields with different tokenization, include the userInput multiple times in the query:
```
select * from sources * where ({defaultIndex: 'fieldsetOrField1'}userInput(@query)) or ({defaultIndex: 'fieldsetOrField2'}userInput(@query))
```
More details on [stack overflow](https://stackoverflow.com/questions/72784136/why-vepsa-easily-warning-me-this-may-lead-to-recall-and-ranking-issues).
### How is the query timeout computed?
Find query timeout details in the [Query API Guide](../querying/query-api.html#timeout)and the [Query API Reference](../reference/api/query.html#timeout).
### How does backslash escapes work?
Backslash is used to escape special characters in YQL. For example, to query with a literal backslash, which is useful in regexpes, you need to escape it with another backslash: \. Unescaped backslashes in YQL will lead to "token recognition error at: ''".
In addition, Vespa CLI unescapes double backslashes to single (while single backslashes are left alone), so if you query with Vespa CLI you need to escape with another backslash: \\. The same applies to strings in Java.
Also note that both log messages and JSON results escape backslashes, so any \ becomes \.
### Is it possible to have multiple SELECT statements in a single call (subqueries)?
E.g. two select queries with slightly different filtering condition and have a limit operator for each of the subquery. This makes it impossible to do via OR conditions to select both collection of documents - something equivalent to:
```
SELECT 1 AS x
UNION ALL
SELECT 2 AS y;
```
This isn’t possible, need to run 2 queries. Alternatively, split a single incoming query into two running in parallel in a [Searcher](../applications/searchers.html) - example:
```
FutureResult futureResult = new AsyncExecution(settings).search(query);
FutureResult otherFutureResult = new AsyncExecution(settings).search(otherQuery);
```
### Is it possible to query for the number of elements in an array
There is no index or attribute data structure that allows efficient _searching_ for documents where an array field has a certain number of elements or items. The _grouping language_ has a [size()](../reference/querying/grouping-language.html#list-expressions) operator that can be used in queries.
### Is it possible to query for fields with NaN/no value set/null/none
The [visiting](../writing/visiting.html#analyzing-field-values) API using document selections supports it, with a linear scan over all documents. If the field is an _attribute_ one can query using grouping to identify Nan Values, see count and list [fields with NaN](../querying/grouping.html#count-fields-with-nan).
### How to retrieve random documents using YQL? Functionality similar to MySQL "ORDER BY rand()"
See the [random.match](../reference/ranking/rank-features.html#random.match) rank feature - example:
```
rank-profile random {
first-phase {
expression: random.match
}
}
```
Run queries, seeding the random generator:
```
$ vespa query 'select * from music where true' \
ranking=random \
rankproperty.random.match.seed=2
```
### Some of the query results have too many hits from the same source, how to create a diverse result set?
See [result diversity](../querying/result-diversity) for strategies on how to create result sets from different sources.
### How to find most distant neighbor in a embedding field called clip\_query\_embedding?
If you want to search for the most dissimilar items, you can with angular distance multiply your `clip_query_embedding` by the scalar -1. Then you are searching for the points that are closest to the point which is the farthest away from your `clip_query_embedding`.
Also see a [pyvespa example](https://vespa-engine.github.io/pyvespa/examples/pyvespa-examples.html#Neighbors).
## Feeding
### How to debug a feeding 400 response?
The best option is to use `--verbose` option, like `vespa feed --verbose myfile.jsonl` - see [documentation](../clients/vespa-cli.html#documents). A common problem is a mismatch in schema names and [document IDs](../schemas/documents.html#document-ids) - a schema like:
```
schema article {
document article {
...
}
}
```
will have a document feed like:
```
{"put": "id:mynamespace:article::1234", "fields": { ... }}
```
Note that the [namespace](glossary.html#namespace) is not mentioned in the schema, and the schema name is the same as the document name.
### How to debug document processing chain configuration?
This configuration is a combination of content and container cluster configuration, see [indexing](../writing/indexing.html) and [feed troubleshooting](../operations/self-managed/admin-procedures.html#troubleshooting).
### I feed documents with no error, but they are not in the index
This is often a problem if using [document expiry](../schemas/documents.html#document-expiry), as documents already expired will not be persisted, they are silently dropped and ignored. Feeding stale test data with old timestamps in combination with document-expiry can cause this behavior.
### How to feed many files, avoiding 429 error?
Using too many HTTP clients can generate a 429 response code. The Vespa sample apps use [vespa feed](../clients/vespa-cli.html#documents) which uses HTTP/2 for high throughput - it is better to stream the feed files through this client.
### Can I use Kafka to feed to Vespa?
Vespa does not have a Kafka connector. Refer to third-party connectors like [kafka-connect-vespa](https://github.com/vinted/kafka-connect-vespa).
## Text Search
### Does Vespa support addition of flexible NLP processing for documents and search queries?
E.g. integrating NER, word sense disambiguation, specific intent detection. Vespa supports these things well:
- [Query (and result) processing](../applications/searchers.html)
- [Document processing](../applications/document-processors.html)and document processors working on semantic annotations of text
### Does Vespa support customization of the inverted index?
E.g. instead of using terms or n-grams as the unit, we might use terms with specific word senses - e.g. bark (dog bark) vs. bark (tree bark), or BCG (company) vs. BCG (vaccine name). Creating a new index _format_ means changing the core. However, for the examples above, one just need control over the tokens which are indexed (and queried). That is easily done in some Java code. The simplest way to do this is to plug in a [custom tokenizer](../linguistics/linguistics.html). That gets called from the query parser and bundled linguistics processing [Searchers](../applications/searchers.html)as well as the [Document Processor](../applications/document-processors.html)creating the annotations that are consumed by the indexing operation. Since all that is Searchers and Docprocs which you can replace and/or add custom components before and after, you can also take full control over these things without modifying the platform itself.
### Does vespa provide any support for named entity extraction?
It provides the building blocks but not an out-of-the-box solution. We can write a [Searcher](../applications/searchers.html) to detect query-side entities and rewrite the query, and a [DocProc](../applications/document-processors.html) if we want to handle them in some special way on the indexing side.
### Does vespa provide support for text extraction?
You can write a document processor for text extraction, Vespa does not provide it out of the box.
### How to do Text Search in an imported field?
[Imported fields](../schemas/parent-child.html) from parent documents are defined as [attributes](../content/attributes.html), and have limited text match modes (i.e. `indexing: index` cannot be used).[Details](https://stackoverflow.com/questions/71936330/parent-child-mode-cannot-be-searched-by-parent-column).
## Semantic search
### Why is closeness 1 for all my vectors?
If you have added vectors to your documents and queries, and see that the rank feature closeness(field, yourEmbeddingField) produces 1.0 for all documents, you are likely using[distance-metric](../reference/schemas/schemas.html#distance-metric): innerproduct/prenormalized-angular, but your vectors are not normalized, and the solution is normally to switch to[distance-metric: angular](../reference/schemas/schemas.html#angular)or use[distance-metric: dotproduct](../reference/schemas/schemas.html#dotproduct)(available from Vespa 8.170.18).
With non-normalized vectors, you often get negative distances, and those are capped to 0, leading to closeness 1.0. Some embedding models, such as models from sbert.net, claim to output normalized vectors but might not.
## Programming Vespa
### Is Python plugins supported / is there a scripting language?
Plugins have to run in the JVM - [jython](https://www.jython.org/) might be an alternative, however Vespa Team has no experience with it. Vespa does not have a language like[painless](https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-scripting-painless.html) - it is more flexible to write application logic in a JVM-supported language, using[Searchers](../applications/searchers.html) and [Document Processors](../applications/document-processors.html).
### How can I batch-get documents by ids in a Searcher
A [Searcher](../applications/searchers.html) intercepts a query and/or result. To get a number of documents by id in a Searcher or other component like a [Document processor](../applications/document-processors.html), you can have an instance of [com.yahoo.documentapi.DocumentAccess](../reference/applications/components.html#injectable-components)injected and use that to get documents by id instead of the HTTP API.
### Does Vespa work with Java 20?
Vespa uses Java 17 - it will support 20 some time in the future.
### How to write debug output from a custom component?
Use `System.out.println` to write text to the [vespa.log](../reference/operations/log-files.html).
## Performance
### What is the latency of documents being ingested vs indexed and available for search?
Vespa has a near real-time indexing core with typically sub-second latencies from document ingestion to being indexed. This depends on the use-case, available resources and how the system is tuned. Some more examples and thoughts can be found in the [scaling guide](../performance/sizing-search.html).
### Is there a batch ingestion mode, what limits apply?
Vespa does not have a concept of "batch ingestion" as it contradicts many of the core features that are the strengths of Vespa, including [serving elasticity](../content/elasticity.html) and sub-second indexing latency. That said, we have numerous use-cases in production that do high throughput updates to large parts of the (sometimes entire) document set. In cases where feed throughput is more important than indexing latency, you can tune this to meet your requirements. Some of this is detailed in the [feed sizing guide](../performance/sizing-feeding.html).
### Can the index support up to 512 GB index size in memory?
Yes. The [content node](../content/proton.html) is implemented in C++ and not memory constrained other than what the operating system does.
### Get request for a document when document is not in sync in all the replica nodes?
If the replicas are in sync the request is only sent to the primary content node. Otherwise, it's sent to several nodes, depending on replica metadata. Example: if a bucket has 3 replicas A, B, C and A & B both have metadata state X and C has metadata state Y, a request will be sent to A and C (but not B since it has the same state as A and would therefore not return a potentially different document).
### How to keep indexes in memory?
[Attribute](../content/attributes.html) (with or without `fast-search`) is always in memory, but does not support tokenized matching. It is for structured data.[Index](../basics/schemas.html#document-fields) (where there’s no such thing as fast-search since it is always fast) is in memory to the extent there is available memory and supports tokenized matching. It is for unstructured text.
It is possible to guarantee that fields that are defined with `index`have both the dictionary and the postings in memory by changing from `mmap` to `populate`, see [index \> io \> search](../reference/applications/services/content.html#index-io-search). Make sure that the content nodes run on nodes with plenty of memory available, during index switch the memory footprint will 2x. Familiarity with Linux tools like `pmap` can help diagnose what is mapped and if it’s resident or not.
Fields that are defined with `attribute` are in-memory, fields that have both `index` and `attribute` have separate data structures, queries will use the default mapped on disk data structures that supports `text` matching, while grouping, summary and ranking can access the field from the `attribute` store.
A Vespa query is executed in two phases as described in [sizing search](../performance/sizing-search.html), and summary requests can touch disk (and also uses `mmap` by default). Due to their potential size there is no populate option here, but one can define [dedicated document summary](../querying/document-summaries.html#performance)containing only fields that are defined with `attribute`.
The [practical performance guide](../performance/practical-search-performance-guide)can be a good starting point as well to understand Vespa query execution, difference between `index` and `attribute` and summary fetching performance.
### Is memory freed when deleting documents?
Deleting documents, by using the [document API](../writing/reads-and-writes.html)or [garbage collection](../schemas/documents.html#document-expiry) will increase the capacity on the content nodes. However, this is not necessarily observable in system metrics - this depends on many factors, like what kind of memory that is released, when [flush](../content/proton.html#proton-maintenance-jobs) jobs are run and document [schema](../basics/schemas.html).
In short, Vespa is not designed to release memory once used. It is designed for sustained high throughput, low latency, keeping maximum memory used under control using features like [feed block](../writing/feed-block.html).
When deleting documents, one can observe a slight increase in memory. A deleted document is represented using a [tombstone](../operations/self-managed/admin-procedures.html#content-cluster-configuration), that will later be removed, see [removed-db-prune-age](../reference/applications/services/content.html#removed-db-prune-age). When running garbage collection, the summary store is scanned using mmap and both VIRT and page cache memory usage increases.
Read up on [attributes](../content/attributes.html) to understand more of how such fields are stored and managed.[Paged attributes](../content/attributes.html#paged-attributes) trades off memory usage vs. query latency for a lower max memory usage.
### Do empty fields consume memory?
A field is of type _index_ or _attribute_ - [details](../querying/text-matching.html#index-and-attribute).
Fields with _index_ use no incremental memory at deployment, if the field has no value.
Fields with _attribute_ use memory, even if the field value is not set,
Attributes are optimized for random access: To be able to jump to the value of any document in O(1) time. That requires allocating a constant amount of memory (the value, or a pointer) per document, regardless of whether there is a value. In short, knowing that a value is unset is a value in itself for attributes, so deploying new fields or new schemas with attributes will cause an incremental increase in memory. Applications with many unused schemas and fields can factor this in when sizing for memory. Refer to [attributes](../content/attributes.html#attribute-memory-usage) for details.
### What is the best practice for scaling Vespa for day vs night?
[Autoscaling](../operations/autoscaling.html) is the best guide to understand how to size and autoscale the system. Container clusters are stateless and can be autoscaled more quickly than content clusters.
### We can spike 8x in 5 minutes in terms of throughput requirements.
It is not possible to autoscale content clusters for 8x load increase in 5 minutes, as this requires both provisioning and data migration. Such use cases are best discussed with the Vespa Team to understand the resource bottlenecks, tradeoffs and mitigations. Also read [Graceful Degradation](../performance/graceful-degradation.html).
### How much lower-level configuration do we need to do? For example, do we need to alter the number of threads per container?
It depends. Vespa aims to adapt to resources (like auto thread config based on virtual node thread count) and actual use (when to run maintenance jobs like compaction), but there are tradeoffs that applications owners can/should make. Start off by reading the [Vespa Serving Scaling Guide](../performance/sizing-search.html), then run [benchmarks](../performance/benchmarking-cloud.html) and use the [dashboards](../operations/monitoring.html).
## Administration
### Self-managed: Can one do a partial deploy to the config server / update the schema without deploying all the node configs?
Yes, deployment is using this web service API, which allows you to create an edit session from the currently deployed package, make modifications, and deploy (prepare+activate) it: [deploy-rest-api-v2.html](../reference/api/deploy-v2.html). However, this is only useful in cases where you want to avoid transferring data to the config server unnecessarily. When you resend everything, the config server will notice that you did not actually change e.g. the node configs and avoid unnecessary noop changes.
### How fast can nodes be added and removed from a running cluster?
[Elasticity](../content/elasticity.html) is a core Vespa strength - easily add and remove nodes with minimal (if any) serving impact. The exact time needed depends on how much data will need to be migrated in the background for the system to converge to [ideal data distribution](../content/idealstate.html).
### Should Vespa API search calls be load balanced or does Vespa do this automatically?
You will need to load balance incoming requests between the nodes running the[stateless Java container cluster(s)](overview.html). This can typically be done using a simple network load balancer available in most cloud services. This is included when using [Vespa Cloud](https://cloud.vespa.ai/), with an HTTPS endpoint that is already load balanced - both locally within the region and globally across regions.
### Supporting index partitions
[Search sizing](../performance/sizing-search.html) is the intro to this. Topology matters, and this is much used in the high-volume Vespa applications to optimise latency vs. cost.
### Can a running cluster be upgraded with zero downtime?
With [Vespa Cloud](https://cloud.vespa.ai/), we do automated background upgrades daily without noticeable serving impact. If you host Vespa yourself, you can do this, but need to implement the orchestration logic necessary to handle this. The high level procedure is found in [live-upgrade](../operations/self-managed/live-upgrade.html).
### Can Vespa be deployed multi-region?
[Vespa Cloud](https://cloud.vespa.ai/en/reference/zones) has integrated support - query a global endpoint. Writes will have to go to each zone. There is no auto-sync between zones.
### Can Vespa serve an Offline index?
Building indexes offline requires the partition layout to be known in the offline system, which is in conflict with elasticity and auto-recovery (where nodes can come and go without service impact). It is also at odds with realtime writes. For these reasons, it is not recommended, and not supported.
### Does vespa give us any tool to browse the index and attribute data?
Use [visiting](../writing/visiting.html) to dump all or a subset of the documents. See [data-management-and-backup](https://cloud.vespa.ai/en/data-management-and-backup) for more information.
### What is the response when data is written only on some nodes and not on all replica nodes (Based on the redundancy count of the content cluster)?
Failure response will be given in case the document is not written on some replica nodes.
### When the doc is not written to some nodes, will the document become available due to replica reconciliation?
Yes, it will be available, eventually. Also try [Multinode testing and observability](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode).
### Does vespa provide soft delete functionality?
Yes just add a `deleted` attribute, add [fast-search](../content/attributes.html#fast-search) on it and create a searcher which adds an `andnot deleted` item to queries.
### Can we configure a grace period for bucket distribution so that buckets are not redistributed as soon as a node goes down?
You can set a [transition-time](../reference/applications/services/content.html#transition-time) in services.xml to configure the cluster controller how long a node is to be kept in maintenance mode before being automatically marked down.
### What is the recommended redundant/searchable-copies config when using grouping distribution?
Grouped distribution is used to reduce search latency. Content is distributed to a configured set of groups, such that the entire document collection is contained in each group. Setting the redundancy and searchable-copies equal to the number of groups ensures that data can be queried from all groups.
### How to set up for disaster recovery / backup?
Refer to [#17898](https://github.com/vespa-engine/vespa/issues/17898) for a discussion of options.
### Self-managed: How to check Vespa version for a running instance?
Use [/state/v1/version](../reference/api/state-v1.html#state-v1-version) to find Vespa version.
### Deploy rollback
See [rollback](../applications/deployment.html#rollback) for options.
## Troubleshooting
### Deployment fails with response code 413
If deployment fails with error message "Deployment failed, code: 413 ("Payload Too Large.")" you might need to increase the config server's JVM heap size. The config server has a default JVM heap size of 2 Gb. When deploying an app with e.g. large models this might not be enough, try increasing the heap to e.g. 4 Gb when executing 'docker run …' by adding an environment variable to the command line:
```
docker run --env VESPA_CONFIGSERVER_JVMARGS=-Xmx4g
```
### The endpoint does not come up after deployment
When deploying an application package, with some kind of error, the endpoints might fail, like:
```
$ vespa deploy --wait 300
Uploading application package ... done
Success: Deployed target/application.zip
Waiting up to 5m0s for query service to become available ...
Error: service 'query' is unavailable: services have not converged
```
Another example:
```
[INFO] [03:33:48] Failed to get 100 consecutive OKs from endpoint ...
```
There are many ways this can fail, the first step is to check the Vespa Container:
```
$ docker exec vespa vespa-logfmt -l error
[2022-10-21 10:55:09.744] ERROR container
Container.com.yahoo.container.jdisc.ConfiguredApplication
Reconfiguration failed, your application package must be fixed, unless this is a JNI reload issue:
Could not create a component with id 'ai.vespa.example.album.MetalSearcher'.
Tried to load class directly, since no bundle was found for spec: album-recommendation-java.
If a bundle with the same name is installed,
there is a either a version mismatch or the installed bundle's version contains a qualifier string.
...
```
[Bundle plugin troubleshooting](../applications/bundles.html#bundle-plugin-troubleshooting) is a good resource to analyze Vespa container startup / bundle load problems.
### Starting Vespa using Docker on M1 fails
Using an M1 MacBook Pro / AArch64 makes the Docker run fail:
```
WARNING: The requested image’s platform (linux/amd64) does not match the detected host platform (linux/arm64/v8)
and no specific platform was requested
```
Make sure you are running a recent version of the Docker image, do `docker pull vespaengine/vespa`.
### Deployment fails / nothing is listening on 19071
Make sure all [Config servers](../operations/self-managed/configuration-server.html#troubleshooting) are started, and are able to establish ZooKeeper quorum (if more than one) - see the [multinode](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode) sample application. Validate that the container has [enough memory](../operations/self-managed/docker-containers.html).
### Startup problems in multinode Kubernetes cluster - readinessProbe using 19071 fails
The Config Server cluster with 3 nodes fails to start. The ZooKeeper cluster the Config Servers use waits for hosts on the network, the hosts wait for ZooKeeper in a catch 22 - see [sampleapp troubleshooting](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations#troubleshooting).
### How to display vespa.log?
Use [vespa-logfmt](../reference/operations/self-managed/tools.html#vespa-logfmt) to dump logs. If Vespa is running in a local container (named "vespa"), run `docker exec vespa vespa-logfmt`.
### How to fix encoding problems in document text?
See [encoding troubleshooting](../linguistics/troubleshooting-encoding.html)for how to handle and remove control characters from the document feed.
## Login, Tenants and Plans
### How to get started?
[Deploy an application](../basics/deploy-an-application.html) to create a tenant and start your [free trial](https://vespa.ai/free-trial/). This tenant can be your personal tenant, or shared with others. It can not be renamed.
### How to create a company tenant?
If the tenant is already created, add more users to it. Click the "account" button in the Vespa Cloud Console (top right in the tenant view), then "users". From this view you can manage users in the tenant, and their roles - from here, you can add/set tenant admins.
### How to accept Terms of Service?
When starting the free trial, you are asked to accept Terms of Service. For paid plans, this is covered by the contract.
### How do I switch from free trial to a paid plan?
Click "account", then "billing" in the console to enter information required for billing. Use [Vespa Support](https://vespa.ai/support/) if you need to provide this information without console login.
### Does Vespa Cloud support Single Sign-On (SSO)?
Yes, contact [Vespa Support](https://vespa.ai/support/) to set it up.
## Vespa Cloud Operations
### How can I change the cost of my Vespa Cloud usage?
See [node resources](../performance/node-resources) to assess current and auto-suggested resources and [autoscaling](../operations/autoscaling.html) for how to automate.
### How can I manually modify resources used?
Managing resources is easy, as most changes are automated. Adding / removing / changing nodes starts automated data migration, see [elasticity](../content/elasticity.html).
### How to modify a schema?
Schema changes might require data reindexing, which is automated, but takes some time. Other schema changes require data refeed - [details](../reference/schemas/schemas.html#modifying-schemas)
### How to evaluate how much memory a field is using?
Use the [Memory Visualizer](../performance/memory-visualizer.html) to evaluate how memory is allocated to the fields. Fields can be `index`, `attribute` and `summary`, and combinations of these, with settings like `fast-search` that affects memory usage.[Attributes](../content/attributes.html) is a great read for understanding Vespa memory usage.
### Archive access failed with Permission 'serviceusage.services.use' denied
Listing archived objects can fail, e.g. `gsutil -u my_project ls gs://vespa-cloud-data-prod-gcp-us-central1-f-12345f/my_tenant` can fail with`AccessDeniedException: 403 me@mymail.com does not have serviceusage.services.use access to the Google Cloud project.
Permission \'serviceusage.services.use\' denied on resource (or it may not exist).`This can be due to missing rights on your Google project (my\_project in the example above) - from the Google documentation:_"The user account accessing the Cloud Storage Bucket must be granted the Service Usage Consumer role (see [https://cloud.google.com/service-usage/docs/access-control](https://cloud.google.com/service-usage/docs/access-control)) in order to charge the specified user project for the bucket usage cost"_
### How do I integrate with my current monitoring infrastructure?
Vespa Cloud applications have a Prometheus endpoint. Find guides for how to integrate with Grafana and AWS Cloudwatch at [monitoring](../operations/monitoring.html).
### What is the best way to monitor instantaneously what is happening in Vespa? CPU usage? Memory usage? htop? Cloudwatch metrics?
Vespa Cloud has detailed dashboards linked from the _monitoring_ tab in the Console, one for each zone the instance is deployed to.
### How are Vespa versions upgrades handled - only for new deploys?
Vespa is normally upgraded daily. There are exceptions, like holidays and weekends. During upgrades, nodes are stopped one-by-one per cluster. As all clusters have one redundant node, serving and write traffic is not impacted by upgrades. Before the upgrade, the application's [system and staging tests](../operations/automated-deployments.html) are run, halting the upgrade if they fail. Documents are re-migrated to the upgraded node before doing the next node, see [Elastic Vespa](../content/elasticity.html) for details.
### How do we get alerted to issues like Feed Block? Searchable copy going offline?
Issues like Feed Blocked, Deployment and Deprecation warnings show up in the console. There are no warnings on redundancy level / searchable copies, as redundant document buckets are activated for queries automatically, and auto data-migration kicks in for node failures / replacements.
### What actions are needed when deploying schema changes?
- Schema changes that [require service restart](../reference/schemas/schemas.html#changes-that-require-restart-but-not-re-feed)are handled automatically by Vespa Cloud. A deployment job involves waiting for these to complete.
- Schema changes that [require reindexing](../reference/schemas/schemas.html#changes-that-require-reindexing)of data require a validation override, and will trigger automatic reindexing. Status can be tracked in the console application view. Vespa Cloud also periodically re-indexes all data, with minimal resource usage, to account for changes in linguistics libraries.
- Schema changes that [require refeeding](../reference/schemas/schemas.html#changes-that-require-re-feed)data require a validation override, and the user must refeed the data after deployment.
### What are the Vespa Cloud data retention policies?
The management of data stored in an application running on Vespa Cloud is the responsibility of the application owner and, as such, Vespa Cloud does not have any retention policy for this data as long as it is stored by the application.
The following data retention policies applies to Vespa Cloud:
- After a node previously allocated to an application has been deallocated (e.g. due to application being deleted by application owner), all application data will be deleted within_four hours_.
- All application log data will be deleted from Vespa servers after no more than _30 days_ (most often sooner) dependent on log volume, allocated disk resources, etc. _PLEASE NOTE:_ This is the theoretical maximum retention time - see [archive guide](../cloud/archive-guide.html) for how to ensure access to your application logs.
### Is Vespa Cloud certified for ISO 27001 or SOC 2?
Yes, Vespa.ai has a SOC 2 attestation: [Trust Center](https://trust.vespa.ai).
### Is Vespa Cloud GDPR compliant?
Read more in [GDPR](https://cloud.vespa.ai/en/gdpr).
### Does Vespa store information from the information sources with which it is integrated?
Vespa is most often used for queries in data written from the information sources, although it can also be used without data, e.g. for model serving. It is the application owner that writes the integration with Vespa Cloud to write data.
### What is the encryption algorithm used at rest?
Vespa Cloud uses the following Cloud providers:
- AWS EC2 instances, with local or remote storage
- GCP Compute instances, with local or remote storage
- Azure Compute instances, with local or remote storage
The storage devices are encrypted per Cloud provider, at rest.
### Does the Vespa console have audit trails/logs module and can it be accessed by an Admin user?
See the [security guide](../security/guide.html) for roles and permissions. The Vespa Cloud Console has a log view tool, and logs / access logs can be exported to the customer's AWS account easily. Deployment operations are tracked in the deployment view, with a history. Vespa Cloud Operators do not have node access, unless specifically granted by the customer, audit logged.
### Once the service purchased with Vespa is terminated, is there a secure deletion procedure for the information collected from the customer?
At termination, all application instances are removed, with data, before the tenant can be deactivated.
### Why is the CPU usage for my application above 100%?
In `dev` zones we use shared resources hence have more than one node on each host/instance. In order to provide a best possible overall responsiveness we do not restrict CPU resources for the individual application nodes.
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [Ranking](#ranking)
- [Does Vespa support a flexible ranking score?](#does-vespa-support-a-flexible-ranking-score)
- [Where would customer specific weightings be stored?](#where-would-customer-specific-weightings-be-stored)
- [How to create a tensor on the fly in the ranking expression?](#how-to-create-a-tensor-on-the-fly-in-the-ranking-expression)
- [How to set a dynamic (query time) ranking drop threshold?](#how-to-set-a-dynamic-query-time-ranking-drop-threshold)
- [Are ranking expressions or functions evaluated lazily?](#are-ranking-expressions-or-functions-evaluated-lazily)
- [Does Vespa support early termination of matching and ranking?](#does-vespa-support-early-termination-of-matching-and-ranking)
- [What could cause the relevance field to be -Infinity](#what-could-cause-the-relevance-field-to-be--infinity)
- [How to pin query results?](#how-to-pin-query-results)
- [Documents](#documents)
- [What limits apply to document size?](#what-limits-apply-to-document-size)
- [Is there any size limitation for multivalued fields?](#is-there-any-size-limitation-for-multivalued-fields)
- [Can a document have lists (key value pairs)?](#can-a-document-have-lists-key-value-pairs)
- [Does a whole document need to be updated and re-indexed?](#does-a-whole-document-need-to-be-updated-and-re-indexed)
- [What ACID guarantees if any does Vespa provide for single writes / updates / deletes vs batch operations etc?](#what-acid-guarantees-if-any-does-vespa-provide-for-single-writes--updates--deletes-vs-batch-operations-etc)
- [Does vespa support wildcard fields?](#does-vespa-support-wildcard-fields)
- [Can we set a limit for the number of elements that can be stored in an array?](#can-we-set-a-limit-for-the-number-of-elements-that-can-be-stored-in-an-array)
- [How to auto-expire documents / set up garbage collection?](#how-to-auto-expire-documents--set-up-garbage-collection)
- [How to increase redundancy and track data migration progress?](#how-to-increase-redundancy-and-track-data-migration-progress)
- [How does namespace relate to schema?](#how-does-namespace-relate-to-schema)
- [Visiting does not dump all documents, and/or hangs.](#visiting-does-not-dump-all-documents-andor-hangs)
- [How to find number of documents in the index?](#how-to-find-number-of-documents-in-the-index)
- [Can I define a default value for a field?](#can-i-define-a-default-value-for-a-field)
- [Query](#query)
- [Are hierarchical facets supported?](#are-hierarchical-facets-supported)
- [Are filters supported?](#are-filters-supported)
- [How to query for similar items?](#how-to-query-for-similar-items)
- [Does Vespa support stop-word removal?](#does-vespa-support-stop-word-removal)
- [How to extract more than 400 hits / query and get ALL documents?](#how-to-extract-more-than-400-hits--query-and-get-all-documents)
- [How to make a sub-query to get data to enrich the query, like get a user profile?](#how-to-make-a-sub-query-to-get-data-to-enrich-the-query-like-get-a-user-profile)
- [How to create a cache that refreshes itself regularly](#how-to-create-a-cache-that-refreshes-itself-regularly)
- [Is it possible to query Vespa using a list of document ids?](#is-it-possible-to-query-vespa-using-a-list-of-document-ids)
- [How to query documents where one field matches any values in a list? Similar to using SQL IN operator](#how-to-query-documents-where-one-field-matches-any-values-in-a-list-similar-to-using-sql-in-operator)
- [How to count hits / all documents without returning results?](#how-to-count-hits--all-documents-without-returning-results)
- [Must all fields in a fieldset have compatible type and matching settings?](#must-all-fields-in-a-fieldset-have-compatible-type-and-matching-settings)
- [How is the query timeout computed?](#how-is-the-query-timeout-computed)
- [How does backslash escapes work?](#how-does-backslash-escapes-work)
- [Is it possible to have multiple SELECT statements in a single call (subqueries)?](#is-it-possible-to-have-multiple-select-statements-in-a-single-call-subqueries)
- [Is it possible to query for the number of elements in an array](#is-it-possible-to-query-for-the-number-of-elements-in-an-array)
- [Is it possible to query for fields with NaN/no value set/null/none](#is-it-possible-to-query-for-fields-with-nanno-value-setnullnone)
- [How to retrieve random documents using YQL? Functionality similar to MySQL "ORDER BY rand()"](#how-to-retrieve-random-documents-using-yql-functionality-similar-to-mysql-order-by-rand)
- [Some of the query results have too many hits from the same source, how to create a diverse result set?](#some-of-the-query-results-have-too-many-hits-from-the-same-source-how-to-create-a-diverse-result-set)
- [How to find most distant neighbor in a embedding field called clip\_query\_embedding?](#how-to-find-most-distant-neighbor-in-a-embedding-field-called-clip_query_embedding)
- [Feeding](#feeding)
- [How to debug a feeding 400 response?](#how-to-debug-a-feeding-400-response)
- [How to debug document processing chain configuration?](#how-to-debug-document-processing-chain-configuration)
- [I feed documents with no error, but they are not in the index](#i-feed-documents-with-no-error-but-they-are-not-in-the-index)
- [How to feed many files, avoiding 429 error?](#how-to-feed-many-files-avoiding-429-error)
- [Can I use Kafka to feed to Vespa?](#can-i-use-kafka-to-feed-to-vespa)
- [Text Search](#text-search)
- [Does Vespa support addition of flexible NLP processing for documents and search queries?](#does-vespa-support-addition-of-flexible-nlp-processing-for-documents-and-search-queries)
- [Does Vespa support customization of the inverted index?](#does-vespa-support-customization-of-the-inverted-index)
- [Does vespa provide any support for named entity extraction?](#does-vespa-provide-any-support-for-named-entity-extraction)
- [Does vespa provide support for text extraction?](#does-vespa-provide-support-for-text-extraction)
- [How to do Text Search in an imported field?](#how-to-do-text-search-in-an-imported-field)
- [Semantic search](#semantic-search)
- [Why is closeness 1 for all my vectors?](#why-is-closeness-1-for-all-my-vectors)
- [Programming Vespa](#programming-vespa)
- [Is Python plugins supported / is there a scripting language?](#is-python-plugins-supported--is-there-a-scripting-language)
- [How can I batch-get documents by ids in a Searcher](#how-can-i-batch-get-documents-by-ids-in-a-searcher)
- [Does Vespa work with Java 20?](#does-vespa-work-with-java-20)
- [How to write debug output from a custom component?](#how-to-write-debug-output-from-a-custom-component)
- [Performance](#performance)
- [What is the latency of documents being ingested vs indexed and available for search?](#what-is-the-latency-of-documents-being-ingested-vs-indexed-and-available-for-search)
- [Is there a batch ingestion mode, what limits apply?](#is-there-a-batch-ingestion-mode-what-limits-apply)
- [Can the index support up to 512 GB index size in memory?](#can-the-index-support-up-to-512-gb-index-size-in-memory)
- [Get request for a document when document is not in sync in all the replica nodes?](#get-request-for-a-document-when-document-is-not-in-sync-in-all-the-replica-nodes)
- [How to keep indexes in memory?](#how-to-keep-indexes-in-memory)
- [Is memory freed when deleting documents?](#is-memory-freed-when-deleting-documents)
- [Do empty fields consume memory?](#do-empty-fields-consume-memory)
- [What is the best practice for scaling Vespa for day vs night?](#what-is-the-best-practice-for-scaling-vespa-for-day-vs-night)
- [We can spike 8x in 5 minutes in terms of throughput requirements.](#we-can-spike-8x-in-5-minutes-in-terms-of-throughput-requirements)
- [How much lower-level configuration do we need to do? For example, do we need to alter the number of threads per container?](#how-much-lower-level-configuration-do-we-need-to-do-for-example-do-we-need-to-alter-the-number-of-threads-per-container)
- [Administration](#administration)
- [Self-managed: Can one do a partial deploy to the config server / update the schema without deploying all the node configs?](#self-managed-can-one-do-a-partial-deploy-to-the-config-server--update-the-schema-without-deploying-all-the-node-configs)
- [How fast can nodes be added and removed from a running cluster?](#how-fast-can-nodes-be-added-and-removed-from-a-running-cluster)
- [Should Vespa API search calls be load balanced or does Vespa do this automatically?](#should-vespa-api-search-calls-be-load-balanced-or-does-vespa-do-this-automatically)
- [Supporting index partitions](#supporting-index-partitions)
- [Can a running cluster be upgraded with zero downtime?](#can-a-running-cluster-be-upgraded-with-zero-downtime)
- [Can Vespa be deployed multi-region?](#can-vespa-be-deployed-multi-region)
- [Can Vespa serve an Offline index?](#can-vespa-serve-an-offline-index)
- [Does vespa give us any tool to browse the index and attribute data?](#does-vespa-give-us-any-tool-to-browse-the-index-and-attribute-data)
- [What is the response when data is written only on some nodes and not on all replica nodes (Based on the redundancy count of the content cluster)?](#what-is-the-response-when-data-is-written-only-on-some-nodes-and-not-on-all-replica-nodes-based-on-the-redundancy-count-of-the-content-cluster)
- [When the doc is not written to some nodes, will the document become available due to replica reconciliation?](#when-the-doc-is-not-written-to-some-nodes-will-the-document-become-available-due-to-replica-reconciliation)
- [Does vespa provide soft delete functionality?](#does-vespa-provide-soft-delete-functionality)
- [Can we configure a grace period for bucket distribution so that buckets are not redistributed as soon as a node goes down?](#can-we-configure-a-grace-period-for-bucket-distribution-so-that-buckets-are-not-redistributed-as-soon-as-a-node-goes-down)
- [What is the recommended redundant/searchable-copies config when using grouping distribution?](#what-is-the-recommended-redundantsearchable-copies-config-when-using-grouping-distribution)
- [How to set up for disaster recovery / backup?](#how-to-set-up-for-disaster-recovery--backup)
- [Self-managed: How to check Vespa version for a running instance?](#self-managed-how-to-check-vespa-version-for-a-running-instance)
- [Deploy rollback](#deploy-rollback)
- [Troubleshooting](#troubleshooting)
- [Deployment fails with response code 413](#deployment-fails-with-response-code-413)
- [The endpoint does not come up after deployment](#the-endpoint-does-not-come-up-after-deployment)
- [Starting Vespa using Docker on M1 fails](#starting-vespa-using-docker-on-m1-fails)
- [Deployment fails / nothing is listening on 19071](#deployment-fails--nothing-is-listening-on-19071)
- [Startup problems in multinode Kubernetes cluster - readinessProbe using 19071 fails](#startup-problems-in-multinode-kubernetes-cluster---readinessprobe-using-19071-fails)
- [How to display vespa.log?](#how-to-display-vespalog)
- [How to fix encoding problems in document text?](#how-to-fix-encoding-problems-in-document-text)
- [Login, Tenants and Plans](#login-tenants-and-plans)
- [How to get started?](#how-to-get-started)
- [How to create a company tenant?](#how-to-create-a-company-tenant)
- [How to accept Terms of Service?](#how-to-accept-terms-of-service)
- [How do I switch from free trial to a paid plan?](#how-do-i-switch-from-free-trial-to-a-paid-plan)
- [Does Vespa Cloud support Single Sign-On (SSO)?](#does-vespa-cloud-support-single-sign-on-sso)
- [Vespa Cloud Operations](#vespa-cloud-operations)
- [How can I change the cost of my Vespa Cloud usage?](#how-can-i-change-the-cost-of-my-vespa-cloud-usage)
- [How can I manually modify resources used?](#how-can-i-manually-modify-resources-used)
- [How to modify a schema?](#how-to-modify-a-schema)
- [How to evaluate how much memory a field is using?](#how-to-evaluate-how-much-memory-a-field-is-using)
- [Archive access failed with Permission 'serviceusage.services.use' denied](#archive-access-failed-with-permission-serviceusageservicesuse-denied)
- [How do I integrate with my current monitoring infrastructure?](#how-do-i-integrate-with-my-current-monitoring-infrastructure)
- [What is the best way to monitor instantaneously what is happening in Vespa? CPU usage? Memory usage? htop? Cloudwatch metrics?](#what-is-the-best-way-to-monitor-instantaneously-what-is-happening-in-vespa-cpu-usage-memory-usage-htop-cloudwatch-metrics)
- [How are Vespa versions upgrades handled - only for new deploys?](#how-are-vespa-versions-upgrades-handled---only-for-new-deploys)
- [How do we get alerted to issues like Feed Block? Searchable copy going offline?](#how-do-we-get-alerted-to-issues-like-feed-block-searchable-copy-going-offline)
- [What actions are needed when deploying schema changes?](#what-actions-are-needed-when-deploying-schema-changes)
- [What are the Vespa Cloud data retention policies?](#what-are-the-vespa-cloud-data-retention-policies)
- [Is Vespa Cloud certified for ISO 27001 or SOC 2?](#is-vespa-cloud-certified-for-iso-27001-or-soc-2)
- [Is Vespa Cloud GDPR compliant?](#is-vespa-cloud-gdpr-compliant)
- [Does Vespa store information from the information sources with which it is integrated?](#does-vespa-store-information-from-the-information-sources-with-which-it-is-integrated)
- [What is the encryption algorithm used at rest?](#what-is-the-encryption-algorithm-used-at-rest)
- [Does the Vespa console have audit trails/logs module and can it be accessed by an Admin user?](#does-the-vespa-console-have-audit-trailslogs-module-and-can-it-be-accessed-by-an-admin-user)
- [Once the service purchased with Vespa is terminated, is there a secure deletion procedure for the information collected from the customer?](#once-the-service-purchased-with-vespa-is-terminated-is-there-a-secure-deletion-procedure-for-the-information-collected-from-the-customer)
- [Why is the CPU usage for my application above 100%?](#why-is-the-cpu-usage-for-my-application-above-100)
---
# Source: https://docs.vespa.ai/en/performance/feature-tuning.html.md
# Vespa Serving Tuning
This document describes how to tune certain features of an application for high query serving performance, where the main focus is on content cluster search features; see [Container tuning](container-tuning.html) for tuning of container clusters. The [search sizing guide](sizing-search.html) is about _scaling_ an application deployment.
## Attribute vs index
The [attribute](../content/attributes.html) documentation summarizes when to use [attribute](../reference/schemas/schemas.html#attribute) in the [indexing](../reference/schemas/schemas.html#indexing) statement. Also see the [procedure](/en/reference/schemas/schemas.html#modifying-schemas) for changing from attribute to index and vice-versa.
```
field timestamp type long {
indexing: summary | attribute
}
```
If both index and attribute are configured for string-type fields, Vespa will search and match against the index with default match `text`. All numeric type fields and tensor fields are attribute (in-memory) fields in Vespa.
## When to use fast-search for attribute fields
By default, Vespa does not build any posting list index structures over _attribute_ fields. Adding _fast-search_ to the attribute definition as shown below will add an in-memory B-tree posting list structure which enables faster search for some cases (but not all, see next paragraph):
```
field timestamp type long {
indexing: summary | attribute
attribute: fast-search
rank: filter
}
```
When Vespa runs a query with multiple query items, it builds a query execution plan. It tries to optimize the plan so that the temporary result set is as small as possible. To do this, restrictive query tree items (matching few documents) are evaluated early. The query execution plan looks at hit count estimates for each part of the query tree using the index and B-tree dictionaries, which track the number of documents in which a given term occurs.
However, for attribute fields without [fast-search](../content/attributes.html#fast-search) there is no hit count estimate, so the estimate becomes the total number of documents (matches all) and the query tree item is moved to the end of the query evaluation. A query with only one query term searching an attribute field without `fast-search` would be a linear scan over all documents and thus expensive:
```
select * from sources * where range(timestamp, 0, 100)
```
But if this query term is _and_-ed with another term that matches fewer documents, that term will determine the cost instead, and fast-search won't be necessary, e.g.:
```
select * from sources * where range(timestamp, 0, 100) and uuid contains "123e4567-e89b-12d3-a456-426655440000"
```
The general rules of thumb for when to use fast-search for an attribute field are:
- Use _fast-search_ if the attribute field is searched without any other query terms
- Use _fast-search_ if the attribute field could limit the total number of hits efficiently
Changing fast-search aspect of the attribute is a [live change](/en/reference/schemas/schemas.html#modifying-schemas) which does not require any re-feeding, so testing the performance with and without is low effort. Adding or removing _fast-search_ requires restart.
Note that _attribute_ fields with _fast-search_ that are not used in term based [ranking](../basics/ranking.html) should use _rank: filter_for optimal performance. See reference [rank: filter](../reference/schemas/schemas.html#rank).
See optimization for sorting on a _single-value numeric attribute with fast-search_ using [sorting.degrading](../reference/api/query.html#sorting.degrading).
## Tuning query performance for lexical search
Lexical search (or keyword-based search) is a method that matches query terms as they appear in indexed documents. It relies on the lexical representation of words rather than their meaning, and is one of the two retrieval methods used in [hybrid search](../learn/tutorials/hybrid-search.html). Lexical search in Vespa is done by querying string (text) [index](../basics/schemas.html#document-fields) fields, typically using the [weakAnd](../ranking/wand.html#weakand) query operator with [BM25](../ranking/bm25.html) ranking.
The following schema represents a simple article document with _title_ and _content_ fields, that can represent Wikipedia articles as an example. A _default_ fieldset is specified such that user queries are matched against both the _title_ and _content_ fields. BM25 ranking combines the scores of both fields in the _default_ rank profile. In addition, the _optimized_ rank profile specifies tuning parameters to improve query performance:
```
schema article {
document article {
field title type string {
indexing: index | summary
index: enable-bm25
}
field content type string {
indexing: index | summary
index: enable-bm25
}
}
fieldset default {
fields: title, content
}
rank-profile default {
first-phase {
expression: bm25(title) + bm25(content)
}
}
rank-profile optimized inherits default {
filter-threshold: 0.05
weakand {
stopword-limit: 0.6
adjust-target: 0.01
}
}
}
```
The following shows an example question-answer query against a collection of articles, using the _weakAnd_ query operator and the _optimized_ rank profile. Question-answer queries are often written in full sentences, and as a consequence, they tend to contain many stopwords that are present in many documents and of less relevance when it comes to ranking. E.g., terms as "the", "in", and "are" are typically present in more the 60% of the documents:
```
```
{
"yql": "select * from article where userQuery()",
"ranking.profile": "optimized",
"query": "what are the three highest mountains in the world"
}
```
```
The cost of evaluating such a query is primarily linear with the number of matched documents. The _AND_ operator is most effective, but often ends up being too restrictive by not returning enough matches. The _OR_ operator is less restrictive, but has the problem of returning too many matches, which is very costly. The _weakAnd_ operator is somewhere in between the two in cost.
### Posting Lists
To find matching documents, the query operator uses the _posting lists_ associated with each query term. A posting list is part of the inverted index and contains all occurrences of a term within a collection of documents. It consists of document IDs for documents that contain the term, and additional information such as the positions of the term within those documents (used for ranking purposes). For common terms (e.g., stopwords), the posting lists are very large and can be expensive to use during evaluation and ranking. CPU work is required to iterate them, and I/O work is required to load portions of them from disk to memory with MMAP. The last part is especially problematic when all posting lists of a disk index cannot fit into physical memory, and the system must constantly swap parts of them in and out of memory, leading to high I/O wait times.
To improve query performance, the following tuning parameters are available, as seen used in the _optimized_ rank profile. These are used to make tradeoffs between performance and quality.
- **Use more compact posting lists for common terms**: Setting [filter-threshold](../reference/schemas/schemas.html#filter-threshold) to 0.05 ensures that all terms that are estimated to occur in more than 5% of the documents are handled with [compact posting lists (bitvectors)](../content/proton.html#index) instead of the full posting lists. This makes matching faster at the cost of producing less information for BM25 ranking (only a boolean signal is available).
- **Avoid using large posting lists all together**: Setting [stopword-limit](../reference/schemas/schemas.html#weakand-stopword-limit) to 0.6, ensures that all terms that are estimated to occur in more than 60% of the documents are considered stopwords and dropped entirely from the query and also from ranking.
- **Reduce the number of hits produced by _weakAnd_**: Setting [adjust-target](../reference/schemas/schemas.html#weakand-adjust-target) ensures that documents that only match terms that occur very frequently in the documents are not considered hits. This also removes the need to calculate _first-phase_ ranking for these documents, which is beneficial if _first-phase_ ranking is more complex and expensive.
### Performance
The tuning parameters used in the _optimized_ rank profile have been shown to provide a good tradeoff between performance and quality in testing. A Wikipedia dataset with [SQuAD](https://nlp.stanford.edu/pubs/rajpurkar2016squad.pdf) (Stanford Question Answering Dataset) queries was used to analyze performance, and [trec-covid](https://ir.nist.gov/trec-covid/), [MS MARCO](https://microsoft.github.io/msmarco/) and [nfcorpus](https://huggingface.co/datasets/BeIR/nfcorpus) from the BEIR dataset to analyze quality implications.
For instance, the query performance was tripled without any measurable drop in quality with the Wikipedia dataset, using the tuning parameters in the _optimized_ rank profile. See the blog post [Tripling the query performance of lexical search](https://blog.vespa.ai/tripling-the-query-performance-of-lexical-search/) for more details. Note that testing should be conducted on your particular dataset to find the right tradeoff between performance and quality.
## Hybrid TAAT and DAAT query evaluation
Vespa supports **hybrid** query evaluation over inverted indexes, combining _TAAT_ and _DAAT_ evaluation to combine the best of both query evaluation techniques. Hybrid is not enabled per default and is triggered by a run-time query parameter.
- **TAAT:** _Term At A Time_ scores documents one query term at a time. The entire posting iterator can be read per query term, and the score of a document is accumulated. It is CPU cache friendly as posting data is read sequentially without randomly seeking the posting list iterator. The downside is that _TAAT_ limits the term-based ranking function to be a linear sum of term scores. This downside is one reason why most search engines use _DAAT_.
- **DAAT:** _Document At A Time_ scores documents completely one at a time. This requires multiple seeks in the term posting lists, which is CPU cache unfriendly but allows non-linear ranking functions.
Generally, Vespa does _DAAT_ (document-at-a-time) query evaluation and not _TAAT_ (term-at-a time) for the reason listed above.
Ranking (score calculation) and matching (does the document match the query logic) is not fully two separate disjunct phases, where one first finds matches and calculates the ranking score in a later phase. Matching and _first-phase_ score calculation is interleaved when using _DAAT_.
The _first-phase_ ranking score is assigned to the hit when it satisfies the query constraints. At that point, the term iterators are positioned at the document id and one can unpack additional data from the term posting lists - e.g., for term proximity scoring used by the [nativeRank](../ranking/nativerank.html) ranking feature, which also requires unpacking of positions of the term within the document.
The way hybrid query evaluation is done is that _TAAT_ is used for sub-branches of the overall query tree, which is not used for term-based ranking.
Using _TAAT_ can speed up query matching significantly (up to 30-50%) in cases where the query tree is large and complex, and where only parts of the query tree are used for term-based ranking. Examples of query tree branches that would require _DAAT_ is using text ranking features like [bm25 or nativeRank](../reference/ranking/rank-features.html). The list of ranking features which can handle _TAAT_ is long, but using [attribute or tensor](../ranking/tensor-user-guide.html) features only can have the entire tree evaluated using _TAAT_.
For example, for a query where there is a user text query from an end user, one can use _userQuery()_ YQL syntax and combine it with application-level constraints. The application level filter constraints in the query could benefit from using _TAAT_. Given the following document schema:
```
search news {
document news {
field title type string {}
field body type string{}
field popularity type float {}
field market type string {
rank:filter
indexing: attribute
attribute: fast-search
}
field language type string {
rank:filter
indexing: attribute
attribute: fast-search
}
}
fieldset default {
fields: title,body
}
rank-profile text-and-popularity {
first-phase {
expression: attribute(popularity) + log10(bm25(title)) + log10(bm25(body))
}
}
}
```
In this case, the rank profile only uses two ranking features, the popularity attribute and the [bm25](../ranking/bm25.html) score of the userQuery(). These are used in the default fieldset containing the title and body. Notice how neither _market_ nor _language_ is used in the ranking expression.
In this query example, there is a language constraint and a market constraint, where both language and market are queried with a long list of valid values using OR, meaning that the document should match any of the market constraints and any of the language constraints:
```
```
{
"hits": 10,
"ranking.profile": "text-and-popularity",
"yql": "select * from sources * where userQuery() and
(language contains \"en\" or language contains \"br\") and
(market contains \"us\" or market contains \"eu\" or market contains \"apac\" or market contains \"..\" )",
"query": "cat video",
"ranking.matching.termwiselimit": 0.1
}
```
```
The language and the market constraints in the query tree are not used in the ranking score, and that part of the query tree could be evaluated using _TAAT_. See also [multi lookup set filter](#multi-lookup-set-filtering) for how to most efficiently search with large set filters. The subtree result is then passed as a bit vector into the _DAAT_ query evaluation, which could significantly speed up the overall evaluation.
Enabling hybrid _TAAT_ is done by passing `ranking.matching.termwiselimit=0.1` as a request parameter. It's possible to evaluate the performance impact by changing this limit. Setting the limit to 0 will force termwise evaluation, which might hurt performance.
One can evaluate if using the hybrid evaluation improves search performance by adding the above parameter. The limit is compared to the hit fraction estimate of the entire query tree. If the hit fraction estimate is higher than the limit, the termwise evaluation is used to evaluate the sub-branch of the query.
## Indexing uuids
When configuring [string](../reference/schemas/schemas.html#string) type fields with `index`, the default [match](../reference/schemas/schemas.html#match) mode is `text`. This means Vespa will [tokenize](../linguistics/linguistics-opennlp.html#tokenization) the content and index the tokens.
The string representation of an [Universally unique identifier](https://en.wikipedia.org/wiki/Universally_unique_identifier) (UUID) is 32 hexadecimal (base 16) digits, in five groups, separated by hyphens, in the form 8-4-4-4-12, for a total of 36 characters (32 alphanumeric characters and four hyphens).
Example: Indexing `123e4567-e89b-12d3-a456-426655440000` with the above document definition, Vespa will tokenize this into 5 tokens: `[123e4567,e89b,12d3,a456,426655440000]`, each of which could be matched independently, leading to possible incorrect matches.
To avoid this, change the mode to [match: word](../reference/schemas/schemas.html#word) to treat the entire uuid as _one_ token/word:
```
field uuid type string {
indexing: summary | index
match: word
rank: filter
}
```
In addition, configure the `uuid` as a [rank: filter](../reference/schemas/schemas.html#rank) field - the field will then be represented as efficiently as possible during search and ranking. The `rank:filter` behavior can also be triggered at query time on a per-query item basis by the `com.yahoo.prelude.query.Item.setRanked()` in a [custom searcher](../applications/searchers.html).
## Parent child and search performance
When searching imported attribute fields (with `fast-search`) from parent document types, there is an additional indirection that can be reduced significantly if the imported field is defined with `rank:filter` and [visibility-delay](../reference/applications/services/content.html#visibility-delay) is configured to \> 0. The [rank:filter](../reference/schemas/schemas.html#rank) setting impacts posting list granularity and `visibility-delay` enables a cache for the indirection between the child and parent document.
## Ranking and ML Model inferences
Vespa [scales](sizing-search.html) with the number of hits the query retrieves per node/search thread, and which needs to be evaluated by the first-phase ranking function. Read more on [phased ranking](../ranking/phased-ranking.html). Phased ranking enables using more resources during the second phase ranking step than in the first phase. The first phase should focus on getting decent recall (retrieving relevant documents in the top k), while the second phase should tune precision.
For [text search](../ranking/nativerank.html) applications, consider using the [WAND](../ranking/wand.html) query operator - WAND can efficiently (sublinear) find the top-k documents using an inner scoring function.
## Multi Lookup - Set filtering
Several real-world search use cases are built around limiting or filtering based on a set filter. If the contents of a field in the document match any of the values in the query set, it should be retrieved. E.g., searching data for a set of users:
```
select * from sources * where user_id = 1 or user_id = 2 or user_id = 3 or user_id = 3 or user_id = 4 or user_id 5 ...
```
For OR filters over the same field, it is strongly recommended to use the [in query operator](../reference/querying/yql.html#in) instead. It has considerably better performance than plain OR for set filtering:
```
select * from sources * where user_id in (1, 2, 3, 4, 5)
```
**Note:** Large sets can slow down YQL-parsing of the query - see [parameter substitution](../reference/querying/yql.html#parameter-substitution)for how to send the set in a compact, performance-effective way.
Attribute fields used like the above without other stronger query terms, should have `fast-search` and `rank: filter`. If there is a large number of unique values in the field, it is also faster to use `hash` dictionary instead of `btree`, which is the default data structure for dictionaries for attribute fields with `fast-search`:
```
field user_id type long {
indexing: summary | attribute
attribute: fast-search
dictionary: hash
rank: filter
}
```
For `string` fields, we also need to include [match](/en/reference/schemas/schemas.html#match) settings if using the `hash` dictionary:
```
field user_id_str type string {
indexing: summary | attribute
attribute: fast-search
match: cased
rank: filter
dictionary {
hash
cased
}
}
```
If having 10M unique user\_ids in the dictionary and searching for 1000 users per query, the _btree_ dictionary would be 1000 lookup times log(10M), while _hash_ based would be 1000 lookups times O(1). Still, the _btree_ dictionary offers more flexibility in terms of [match](/en/reference/schemas/schemas.html#match) settings.
The `in` query set filtering approach can be used in combination with hybrid _TAAT_ evaluation to further improve performance. See the [hybrid TAAT/DAAT](#hybrid-taat-daat) section.
Also see the [dictionary schema reference](../reference/schemas/schemas.html#dictionary).
**Note:** For most use cases, the time spent on dictionary traversal is negligible compared to the time spent on query evaluation (matching and ranking). If the query is very selective, for example, using vespa as a key-value lookup store with ranking support, the dictionary traversal time can be significant.
## Document summaries - hits
If queries request many (thousands) of hits from a content cluster with few content nodes, increasing the [summary cache](caches-in-vespa.html) might reduce latency and cost.
Using [explicit document summaries](../querying/document-summaries.html), Vespa can support memory-only summary fetching if all fields referenced in the document summary are **all** defined with `attribute`. Dedicated in-memory summaries avoid (potential) disk read and summary chunk decompression. Vespa document summaries are stored using compressed [chunks](../reference/applications/services/content.html#summary-store-logstore-chunk). See also the [practical search performance guide on hits fetching](practical-search-performance-guide.html#hits-and-summaries).
## Boolean, numeric, text attribute
When using the attribute field type, considering performance, this is a rule of thumb:
1. Use boolean if a field is a boolean (max two values)
2. Use a string attribute if there is a set of values - only unique strings are stored
3. Use a numeric attribute for range searches
4. Use a numeric attribute if the data really is numeric; don't replace numeric with string numeric
Refer to [attributes](../content/attributes.html) for details.
## Tensor ranking
The ranking workload can be significant for large tensors - it is important to understand both the potential memory and computational cost for each query.
### Memory
Assume the dot product of two tensors with 1000 values of 8 bytes each, as in `tensor(x[1000])`. With one query tensor and one document tensor, the dot product is `sum(query(tensor1) * attribute(tensor2))`. Given a Haswell CPU architecture, where the theoretical upper memory bandwidth is 68 GB/sec, this gives 68 GB/sec / 8 KB = 9M ranking evaluations/sec. In other words, for a 1 M index, 9 queries per second before being memory bound.
See below for using smaller [cell value types](#cell-value-types), and read more about [quantization](https://blog.vespa.ai/from-research-to-production-scaling-a-state-of-the-art-machine-learning-system/#model-quantization).
### Compute
When using tensor types with at least one mapped dimension (sparse or mixed tensor), [attribute: fast-rank](../reference/schemas/schemas.html#attribute) can be used to optimize the tensor attribute for ranking expression evaluation at the cost of using more memory. This is a good tradeoff if benchmarking indicates significant latency improvements with `fast-rank`.
When optimizing ranking functions with tensors, try to avoid temporary objects. Use the [Tensor Playground](https://docs.vespa.ai/playground/) to evaluate what the expressions map to, using the execution details to list the detailed steps - find examples below.
### Multiphase ranking
To save both memory and compute resources, use [multiphase ranking](../ranking/phased-ranking.html). In short, use less expensive ranking evaluations to find the most promising candidates, then a high-precision evaluation for the top-k candidates.
The blog post series on [Building Billion-Scale Vector Search](https://blog.vespa.ai/building-billion-scale-vector-search/) is a good read.
### Cell value types
| Type | Description |
| --- | --- |
| double |
The default tensor cell type is the 64-bit floating-point `double` format. It gives the best precision at the cost of high memory usage and somewhat slower calculations. Using a smaller value type increases performance, trading off precision, so consider changing to one of the cell types below before scaling the application.
|
| float |
The 32-bit floating-point format `float` should usually be used for all tensors when scaling for production. Note that some frameworks like TensorFlow prefer 32-bit floats. A vector with 1000 dimensions, `tensor(x[1000])` uses approximately 4K memory per tensor value.
|
| bfloat16 |
This type has the range as a normal 32-bit float but only 8 bits of precision and can be thought of as a "float with lossy compression" - see [Wikipedia](https://en.wikipedia.org/wiki/Bfloat16_floating-point_format). If memory (or memory bandwidth) is a concern, change the most space-consuming tensors to use the `bfloat16` cell type. Some careful analysis of the data is required before using this type.
When doing calculations, `bfloat16` will act as if it was a 32-bit float, but the smaller size comes with a potential computational overhead. In most cases, the `bfloat16` needs conversion to a 32-bit float before the actual calculation can occur, adding an extra conversion step.
In some cases, having tensors with `bfloat16` cells might bypass some built-in optimizations (like matrix multiplication) that will be hardware-accelerated only if the cells are of the same type. To avoid this, use the [cell\_cast](../reference/ranking/ranking-expressions.html#cell_cast) tensor operation to make sure the cells are of the right type before doing the more expensive operations.
|
| int8 |
If using machine learning to generate a model with data quantization, one can target the `int8` cell value type, which is a signed integer with a range from -128 to +127 only. This is also treated like a "float with limited range and lossy compression" by the Vespa tensor framework, and gives results as if it were a 32-bit float when any calculation is done. This type is also suitable when representing boolean values (0 or 1).
**Note:** If the input for an `int8` cell is not directly representable, the resulting cell value is undefined, so take care to only input numbers in the `[-128,127]` range.
It's also possible to use `int8` representing binary data for [hamming distance](../reference/schemas/schemas.html#distance-metric) Nearest-Neighbor search. Refer to [billion-scale-knn](https://blog.vespa.ai/billion-scale-knn/) for example use.
|
### Inner/outer products
The following is a primer into inner/outer products and execution details:
| tensor a | tensor b | product | sum | comment |
| --- | --- | --- | --- | --- |
| tensor(x[3]):[1.0, 2.0, 3.0] | tensor(x[3]):[4.0, 5.0, 6.0] | tensor(x[3]):[4.0, 10.0, 18.0] | 32 |
[Playground example](https://docs.vespa.ai/playground/#N4KABGBEBmkFxgNrgmUrWQPYAd5QFNIAaFDSPBdDTAO30gEMSybIiFIAXA2gZywAnABQAPRAGYAugEo4iAIzEwAJmXTIrCAF9W20hmrlcDIgbaU0WugwBGLGpg5Qe-IWMmz5AFmUBWZQA2KU1HXQx9ViNME04zawp8aPJ6TlhzR3YGRgAqe2tw1EjDBNjCBwsk6whIVKhoAFdaAGMKzOdIPgaAW2Fc2xlQmkKdFCkQbSA). The dimension name and size are the same in both tensors - this is an inner product with a scalar result.
|
| tensor(x[3]):[1.0, 2.0, 3.0] | tensor(y[3]):[4.0, 5.0, 6.0] | tensor(x[3],y[3]):[
[4.0, 5.0, 6.0],
[8.0, 10.0, 12.0],
[12.0, 15.0, 18.0] ] | 90 |
[Playground example](https://docs.vespa.ai/playground/#N4KABGBEBmkFxgNrgmUrWQPYAd5QFNIAaFDSPBdDTAO30gEMSybIiFIAXA2gZywAnABQAPRAGYAugEo4iAIzEwAJmXTIrCAF9W20hmrlcDIgbaU0WugwBGLGpg5Qe-IWMmz5AFmUBWZQA2KU1HXQx9ViNME04zawp8aPJ6TlhzR3YGRgAqe2tw1EjDBNjCBwsk6whIVKhoAFdaAGMKzOdIPgaAW2Fc2xlQmkKdFCkQbSA). The dimension size is the same in both tensors, but dimensions have different names -\> this is an outer product; the result is a two-dimensional tensor.
|
| tensor(x[3]):[1.0, 2.0, 3.0] | tensor(x[2]):[4.0, 5.0] | undefined | |
[Playground example](https://docs.vespa.ai/playground/#N4KABGBEBmkFxgNrgmUrWQPYAd5QFNIAaFDSPBdDTAO30gEMSybIiFIAXA2gZywAnABQAPRAGYAugEo4iAIzEwAJmXTIrCAF9W20hmrlcDIgbaU0WugwBGLGpg5Qe-IWMQrZ8gCzKArFKajroY+qxGmCacZtYU+JHk9Jyw5o7sDIwAVPbWoajhhnHRhA4WCdYQkMlQ0ACutADGZenOkHx1ALbC2bYywTT5OihSINpAA). Two tensors in the same dimension but with different lengths -\> undefined.
|
| tensor(x[3]):[1.0, 2.0, 3.0] | tensor(y[2]):[4.0, 5.0] | tensor(x[3],y[2]):[
[4.0, 5.0],
[8.0, 10.0],
[12.0, 15.0] ] | 54 |
[Playground example](https://docs.vespa.ai/playground/#N4KABGBEBmkFxgNrgmUrWQPYAd5QFNIAaFDSPBdDTAO30gEMSybIiFIAXA2gZywAnABQAPRAGYAugEo4iAIzEwAJmXTIrCAF9W20hmrlcDIgbaU0WugwBGLGpg5Qe-IcICeiFbPkAWZQBWKU1HXQx9ViNME04zawp8aPJ6TlhzR3YGRgAqe2tw1EjDBNjCBwsk6whIVKhoAFdaAGMKzOdIPgaAW2Fc2xlQmkKdFCkQbSA). Two tensors with different names and dimensions -\> this is an outer product; the result is a two-dimensional tensor.
|
Inner product - observe optimized into `DenseDotProductFunction` with no temporary objects:
```
```
[ {
"class": "vespalib::eval::tensor_function::Inject",
"symbol": ""
},
{
"class": "vespalib::eval::tensor_function::Inject",
"symbol": ""
},
{
"class": "vespalib::eval::DenseDotProductFunction",
"symbol": "vespalib::eval::(anonymous namespace)::my_cblas_double_dot_product_op(vespalib::eval::InterpretedFunction::State&, unsigned long)"
} ]
```
```
Outer product, parsed into a tensor multiplication (`DenseSimpleExpandFunction`), followed by a `Reduce` operation:
```
```
[ {
"class": "vespalib::eval::tensor_function::Inject",
"symbol": ""
},
{
"class": "vespalib::eval::tensor_function::Inject",
"symbol": ""
},
{
"class": "vespalib::eval::DenseSimpleExpandFunction",
"symbol": "void vespalib::eval::(anonymous namespace)::my_simple_expand_op, true>(vespalib::eval::InterpretedFunction::State&, unsigned long)"
},
{
"class": "vespalib::eval::tensor_function::Reduce",
"symbol": "void vespalib::eval::instruction::(anonymous namespace)::my_full_reduce_op >(vespalib::eval::InterpretedFunction::State&, unsigned long)"
} ]
```
```
Note that an inner product can also be run on mapped tensors ([Playground example](https://docs.vespa.ai/playground/#N4KABGBEBmkFxgNrgmUrWQPYAd5QFNIAaFDSPBdDTAO30gEMSybIiFIAXA2gZywAnABQAPYAF8AlHGCiAjHHnEwogExw1K0QGY4OiZFYQJrCaQzVyuBkQttKaY3QYAjFjUwcoPfkLGSMnKKACwAdAAM2hoArJHaegBskYbOphjmrFaYNpx2zhT42eT0nACWtLQEgjiCWAAmAK4AxlwenoQMjABU7mlmKAC6IBJAA)):
```
```
[ {
"class": "vespalib::eval::tensor_function::Inject",
"symbol": ""
},
{
"class": "vespalib::eval::tensor_function::Inject",
"symbol": ""
},
{
"class": "vespalib::eval::SparseFullOverlapJoinFunction",
"symbol": "void vespalib::eval::(anonymous namespace)::my_sparse_full_overlap_join_op, true>(vespalib::eval::InterpretedFunction::State&, unsigned long)"
} ]
```
```
### Mapped lookups
`sum(model_id * models, m_id)`
| tensor name | tensor type |
| --- | --- |
| model\_id | `tensor(m_id{})` |
| models | `tensor(m_id{}, x[3])` |
Using a mapped dimension to select an indexed tensor can be considered a [mapped lookup](../ranking/tensor-examples.html#using-a-tensor-as-a-lookup-structure). This is similar to creating a slice but optimized into a single `MappedLookup` - see [Tensor Playground](https://docs.vespa.ai/playground/#N4KABGBEBmkFxgNrgmUrWQPYAd5QFNIAaFDSPBdDTAO30gFssATAgGwH0BLFksmpCIJIAFwK0AzlgBOACkY8WwAL4BKOMEYAmOAEYAdAAYVkARBUCVpDNXK4GRG4MppzdBszbtJ-GpmEocSlZBSVVYgAPRABmAF0NLT04RAAWY2IwAFYMsAA2YzjMnRSAdlyADlyATkLTd0sMawE7TAcRJ3cKfFbyehFJAFdGBVYOJTAAKjAvDklipTU-f0IGIZHZrl4pmbGfBd4lhqtnCF6odtXTzFdziEh+qEl2bgBjTpXVkVHvSUnNxZaJRwHT1fyNVDNWxdS5CZbkW7ue6PJgAQxwOAILE47CwWAA1oMcJwZFjBu94YJApBSSxyQQfnN-nslMR1sRFIczOCrCg4iAVEA) example.
```
```
[ {
"class": "vespalib::eval::tensor_function::Inject",
"symbol": ""
},
{
"class": "vespalib::eval::tensor_function::Inject",
"symbol": ""
},
{
"class": "vespalib::eval::MappedLookup",
"symbol": "void vespalib::eval::(anonymous namespace)::my_mapped_lookup_op(vespalib::eval::InterpretedFunction::State&, unsigned long)"
} ]
```
```
### Three-way dot product - mapped
`sum(query(model_id) * model_weights * model_features)`
| tensor name | tensor type |
| --- | --- |
| query(model\_id) | `tensor(model{})` |
| model\_weights | `tensor(model{}, feature{})` |
| model\_features | `tensor(feature{})` |
Three-way mapped (sparse) dot product: [Tensor Playground](https://docs.vespa.ai/playground/#N4KABGBEBmkFxgNrgmUrWQPYAd5QFNIAaFDSPBdDTAO30gEcBXAgJwE8AKAWywBMCAGwD6AS34BKEmRqQiCSABcCtAM5Y2AHmhCsAQyUA+XgOHAAvpLjAeABjgBGC5FkQLsi6QzVyuBkTecpRobnQMfIKiAO4EYgDmABZKajI0mApQKuqaOnqGJpHmXmDQBIbMbASW1sBgADq0tmZCcPbEpeVKlQRw0HYWTsSNzVFtdh1lFVV9znAATMNNRa08jpNdPX0DcADMS6PCbeud073QcwAsYC5hHhhesr6Y-oqBYRT4z+T0iisiU26VVSQXS8gY2Q02l0BmMXEBPRqNn6cAArJNHHAAGy3dL3VCPHwfV6ENLBL5hCCQX5QFjsbj-CSSMAAKjA-1iCWSalZ7JaAM2wLJYMyTFYnFMUXEUl5HLiSRSsv5CKFd08oNCYJJ4I1VJC33CijUzB4XDpEsZMrZcq5iutysFBDU0l1GQYxtN5oZ-KZSqlnIVPPtUpVTukaoeKAAuiALEA)
```
```
[ {
"class": "vespalib::eval::tensor_function::Inject",
"symbol": ""
},
{
"class": "vespalib::eval::tensor_function::Inject",
"symbol": ""
},
{
"class": "vespalib::eval::tensor_function::Inject",
"symbol": ""
},
{
"class": "vespalib::eval::Sparse112DotProduct",
"symbol": "void vespalib::eval::(anonymous namespace)::my_sparse_112_dot_product_op(vespalib::eval::InterpretedFunction::State&, unsigned long)"
} ]
```
```
### Three-way dot product - mixed
`sum(query(model_id) * model_weights * model_features)`
| tensor name | tensor type |
| --- | --- |
| query(model\_id) | `tensor(model{})` |
| model\_weights | `tensor(model{}, feature[2])` |
| model\_features | `tensor(feature[2])` |
Three-way mapped (mixed) dot product: [Tensor Playground](https://docs.vespa.ai/playground/#N4KABGBEBmkFxgNrgmUrWQPYAd5QFNIAaFDSPBdDTAO30gEcBXAgJwE8AKAWywBMCAGwD6AS34BKEmRqQiCSABcCtAM5Y2AHmhCsAQyUA+XgOHAAvpLjAeABjgBGC5FkQLsi6QzVyuBkTecpRobnQMfIKiAO4EYgDmABZKajI0mApQKuqaOnqGJpHmXmDQBIbMbASIAEwAutbAYAA6tLZmQnD2xKXlSpUEcHYWTsSt7VFddj1lFVVOIzVjbUWdPI4zfQNDIwDMyxPCXRu9c4POcAAsYC5hHhhesr6Y-oqBYRT4z+T0iqsis36VVSQXS8gY2Q02l0BmMXEBA1qDTgiAArMQAGx1Vzpe6oR4+D6vQhpYJfMIQSC-KAsdjcf4SSRgABUYH+sQSyTULLZHQBW2BpLBmSYrE4pii4ikPPZcSSKRlfIRgrunlBoTBxPB6spIW+4UUamYPC4tPFDOlrNlnIVVqVAoIamkOoyDCNJrN9L5jMVko58u5dslysd0lVDxQdRAFiAA)
```
```
[ {
"class": "vespalib::eval::tensor_function::Inject",
"symbol": ""
},
{
"class": "vespalib::eval::tensor_function::Inject",
"symbol": ""
},
{
"class": "vespalib::eval::tensor_function::Inject",
"symbol": ""
},
{
"class": "vespalib::eval::Mixed112DotProduct",
"symbol": "void vespalib::eval::(anonymous namespace)::my_mixed_112_dot_product_op(vespalib::eval::InterpretedFunction::State&, unsigned long)"
} ]
```
```
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [Attribute vs index](#attribute-vs-index)
- [When to use fast-search for attribute fields](#when-to-use-fast-search-for-attribute-fields)
- [Tuning query performance for lexical search](#tuning-query-performance-for-lexical-search)
- [Posting Lists](#posting-lists)
- [Performance](#performance)
- [Hybrid TAAT and DAAT query evaluation](#hybrid-taat-daat)
- [Indexing uuids](#indexing-uuids)
- [Parent child and search performance](#parent-child-and-search-performance)
- [Ranking and ML Model inferences](#ranking-and-ml-model-inferences)
- [Multi Lookup - Set filtering](#multi-lookup-set-filtering)
- [Document summaries - hits](#document-summaries-hits)
- [Boolean, numeric, text attribute](#boolean-numeric-text-attribute)
- [Tensor ranking](#tensor-ranking)
- [Memory](#memory)
- [Compute](#compute)
- [Multiphase ranking](#multiphase-ranking)
- [Cell value types](#cell-value-types)
- [Inner/outer products](#Inner-outer-products)
- [Mapped lookups](#mapped-lookups)
- [Three-way dot product - mapped](#three-way-dot-product-mapped)
- [Three-way dot product - mixed](#three-way-dot-product-mixed)
---
# Source: https://docs.vespa.ai/ja/features.html.md
# Source: https://docs.vespa.ai/en/learn/features.html.md
# Features
## What is Vespa?
Vespa is a platform for applications which need low-latency computation over large data sets. It allows you to write and persist any amount of data, and execute high volumes of queries over the data which typically complete in tens of milliseconds.
Queries can use both structured filters conditions, text and nearest neighbor vector search to select data. All the matching data is then ranked according to ranking functions - typically machine learned - to implement such use cases as search relevance, recommendation, targeting and personalization.
All the matching data can also be grouped into groups and subgroups where data is aggregated for each group to implement features like graphs, tag clouds, navigational tools, result diversity and so on.
Application specific behavior can be included by adding Java components for processing queries, results and writes to the application package.
Vespa is real time. It is architected to maintain constant response times with any data volume by executing queries in parallel over many data shards and cores, and with added query volume by executing queries in parallel over many copies of the same data (groups). It is optimized to return responses in tens of milliseconds. Writes to data becomes visible in a few milliseconds and can be handled at a rate of thousands to tens of thousands per node per second.
A lot of work has gone into making Vespa easy to set up and operate. Any Vespa application - from single node systems to systems running on hundreds of nodes in data centers - are fully configured by a single artifact called an _application package_. Low level configuration of nodes, processes and components is done by the system itself based on the desired traits specified in the application package.
Vespa is scalable. System sizes up to hundreds of nodes handling tens of billions of documents, and tens of thousands of queries per second are not uncommon, and no harder to set up and modify than single node systems. Since all system components, as well as stored data is redundant and self-correcting, hardware failures are not operational emergencies and can be handled by re-adding capacity when convenient.
Vespa is self-repairing and dynamic. When machines are lost or new ones added, data is automatically redistributed over the machines, while continuing serving and accepting writes to the data. Changes to configuration and Java components can be made while serving by deploying a changed application package - no downtime or restarts required.
## Features
This section provides an overview of the main features of Vespa. The remainder of the documentation goes into full detail.
### Data and writes
- Documents in Vespa may be added, replaced, modified (single fields or any subset) and removed.
- Writes are acknowledged back to the client issuing them when they are durable and visible in queries, in a few milliseconds.
- Writes can be issued at a sustained volume of thousands to tens of thousands per node per second while serving queries.
- Data is replicated with a configurable redundancy.
- An even data distribution, with the desired redundancy is automatically maintained when nodes are added, removed or lost unexpectedly.
- Data corruption is automatically repaired from an uncorrupted replica of the data.
- Data is written over a simple HTTP/2 API, or (for high volume) using a small, standalone client.
- Document data schemas allow fields of any of the usual primitive types as well as collections, structs and tensors.
- Any number of data schemas can be used at the same time.
- Documents may reference each other and field from referenced documents may be used in queries without performance penalty.
- Write operations can be processed by adding custom Java components.
- Data can be streamed out of the system for batch reprocessing.
### Queries
- Queries may contain any combination of structured filters, free text and vector search operators.
- Queries may contain large tensors and vectors (to represent e.g a user).
- Queries choose how results should be ranked and specify how they should be organized (see sections below).
- Queries and results may be processed by adding custom Java components - or any HTTP request may be turned into a query by custom request handlers.
- Query response times are typically in tens of milliseconds and can be maintained given any load and data size by adding more hardware.
- A _streaming search_ mode is available where search/selection is only supported on predefined groups of documents (e.g a user's document). In this mode each node can store and serve billions of documents while maintaining low response times.
### Ranking and inference
- All results are ranked using a configured ranking function, selected in the query.
- A ranking function may be any mathematical function over scalars or tensors (multidimensional arrays).
- Scalar functions include an "if" function to express business logic and decision trees.
- Tensor functions include a powerful set of primitives and composite functions which allows expression of advanced machine-learned ranking functions such as e.g. deep neural nets.
- Functions can also refer to ONNX models invoked locally on the content nodes.
- Multiple ranking phases are supported to allocate more CPU to ranking promising candidates.
- A powerful set of text ranking features using positional information from the documents is provided out of the box.
- Other ranking features include 2D distance and freshness.
### Organizing data and presenting results
- Matches to a query can be grouped and aggregated according to a specification in the query.
- All the matches are included, even though they reside on multiple machines executing in parallel.
- Matches can be grouped by a unique value or by a numerical bucket.
- Any level of groups and subgroups are supported, and multiple parallel groupings can be specified in one query.
- Data can be aggregated (counted, averaged etc.) and selected within each group and subgroup.
- Any selection of data from documents can be included with the final result returned to the client.
- Search engine style keyword highlighting in matching fields is supported.
## Configuration and operations
- Vespa can be installed using rpm files or a Docker image - on personal laptops, owned datacenters or in AWS.
- An application of Vespa is fully specified as a separate buildable artifact: An _application package_ - individual machines or processes need never be configured individually.
- Systems may contain multiple clusters of each type (stateless and stateful), each containing any number of nodes.
- Systems of any size may be specified by two short configuration files in the application package.
- Document schemas, Java components and ranking functions/models are also configured in the application package.
- An application package is deployed as a single unit to Vespa to realizes the system desired by the application.
- Most application changes (including Java component changes) can be performed by deploying a changed application package - the system will manage its own change process while serving and handling writes.
- Most document schema changes (excluding field type changes) can be made while the system is live.
- Application package changes are validated on deployment to prevent destructive changes to live systems.
- Vespa has no single-point-of-failures and automatically routes around failing nodes.
- System logs are collected to a central server in real time.
- Selected metrics may be emitted to a third-party metrics/alerting system from all the nodes.
Copyright © 2026 - [Cookie Preferences](#)
### On this page:
- [What is Vespa?](#what-is-vespa)
- [Features](#features)
- [Data and writes](#data-and-writes)
- [Queries](#queries)
- [Ranking and inference](#ranking-and-inference)
- [Organizing data and presenting results](#organizing-data-and-presenting-results)
- [Configuration and operations](#configuration-and-operations)
---
# Source: https://docs.vespa.ai/en/querying/federation.html.md
# Federation

The Vespa Container allows multiple sources of data to be _federated_ to a common search service. The sources of data may be both search clusters belonging to the same application, or external services, backed by Vespa or any other kind of service. The container may be used as a pure _federation platform_ by setting up a system consisting solely of container nodes federating to external services.
This document gives a short intro to federation, explains how to create an application package doing federation and shows what support is available for choosing the sources given a query, and the final result given the query and some source specific results.
_Federation_ allows users to access data from multiple sources of various kinds through one interface. This is useful to:
- enrich the results returned from an application with auxiliary data, like finding appropriate images to accompany news articles.
- provide more comprehensive results by finding data from alternative sources in the cases where the application has none, like back-filling web results.
- create applications whose main purpose is not to provide access to some data set but to provide users or frontend applications a single starting point to access many kinds of data from various sources. Examples are browse pages created dynamically for any topic by pulling together data from external sources.
The main tasks in creating a federation solution are:
1. creating connectors to the various sources
2. selecting the data sources which will receive a given query
3. rewriting the received request to an executable query returning the desired data from each source
4. creating the final result by selecting from, organizing and combining the returned data from each selected source
The container aids with these tasks by providing a way to organize a federated execution as a set of search chains which can be configured through the application package. Read the [Container intro](../applications/containers.html) and[Chained components](../applications/chaining.html) before proceeding. Refer to the `com.yahoo.search.federation`[Javadoc](https://javadoc.io/doc/com.yahoo.vespa/container-search/latest/com/yahoo/search/federation/package-summary.html).
## Configuring Providers
A _provider_ is a search chain that produces data (in the form of a Result) from a data source. The provider must contain a Searcher which connects to the data source and produces a Result from the returned data. Configure a provider as follows:
```
```
You can add multiple searchers in the provider just like in other chains.
Search chains that provide data from some content cluster in the same application are also _providers_. To explicitly configure a provider talking to internal content clusters, set the attribute type="local" on the provider. That will automatically add the searchers necessary to talk to internal content clusters to the search chain. Example: querying this provider will not lowercase / stem terms:
```
```
## Configuring Sources
A single provider may be used to produce multiple kinds of results. To implement and present each kind of result, we can use _sources_. A _source_ is a search chain that provides a specific kind of result by extending or modifying the behavior of one or more providers.
Suppose that we want to retrieve two kinds of results from my-provider: Web results and java API documentation:
```
```
This results in two _source search chains_ being created,`web@my-provider` and `java-api@my-provider`. Each of them constitutes a source, namely `web` and `java-api` respectively. As the example suggests, these search chains are named after the source and the enclosing provider. The @-sign in the name should be read as _in_, so `web@my-provider` should for example be read as _web in my-provider_.
The JavaApiSearcher is responsible for modifying the query so that we only get hits from the java API documentation. We added this searcher directly inside the source element; source search chains and providers are both instances of search chains. All the options for configuring regular search chains are therefore also available for them.
How does the `web@my-provider`and `java-api@my-provider` source search chains use the`my-provider` provider to send queries to the external service? Internally, the source search chains _inherit_ from the enclosing provider. Since the provider contains searchers that know how to talk to the external service, the sources will also contain the same searchers. As an example, consider the "web" search chain; It will contain exactly the same searcher instances as the`my-provider` search chain. By organizing chains for talking to data providers, we can reuse the same connections and logic for talking to remote services ("providers") for multiple purposes ("sources").
The provider search chain `my-provider` is _not modified_ by adding sources. To verify this, try to send queries to the three search chains`my-provider`, `web@my-provider` and `java-api@my-provider`.
### Multiple Providers per Source
You can create a source that consists of source search chains from several providers. Effectively, this lets you vary which provider should be used to satisfy each request to the source:
```
```
Here, the two source search chains `common-search@news-search` and`common-search@my-provider` constitutes a single source `common-search`. The source search chains using the `idref` attribute are called participants, while the ones using the `id` attribute are called leaders. Each source must consist of a single leader and zero or more participants.
Per default, only the leader search chain is used when _federating_ to a source. To use one of the participants instead, use [sources](../reference/api/query.html#model.sources) and _source_:
```
http://[host]:[port]/?sources=common-search&source.common-search.provider=news-search
```
## Federation
Now we can search both the web and the java API documentation at the same time, and get a combined result set back. We achieve this by setting up a _federation_ searcher:
```
```
Inside the Federation element, we list the sources we want to use. Do not let the name _source_ fool you; If it behaves like a source, then you can use it as a source (i.e. all types of search chains including providers are accepted). As an example, try replacing the _web_ reference with _my-provider_.
When searching, select a subset of the sources specified in the federation element by specifying the [sources](../reference/api/query.html#model.sources) query parameter.
## Built-in Federation
The built-in search chains _native_ and_vespa_ contain a federation searcher named _federation._This searcher has been configured to federate to:
- All sources
- All providers that does not contain a source
If configuring your own federation searcher, you are not limited to a subset of these sources - you can use any provider, source or search chain.
## Inheriting default Sources
To get the same sources as the built-in federation searcher, inherit the default source set:
```
...
```
## Changing content cluster chains
With the information above, we can create a configuration where we modify the search chain sending queries to and receiving queries form a single content cluster (here, removing a searcher and adding another):
```
```
## Timeout behavior
What if we want to limit how much time a provider is allowed to use to answer a query?
```
```
The provider search chain will then be limited to use 100 ms to execute each query. The Federation layer allows all providers to continue until the non-optional provider with the longest timeout is finished or canceled.
In some cases it is useful to be able to keep executing the request to a provider longer than we are willing to wait for it in that particular query. This allows us to populate caches inside sources which can only meet the timeout after caches are populated. To use this option, specify a [request timeout](../reference/applications/services/search.html#federationoptions)for the provider:
```
...
```
Also see [Searcher timeouts](../applications/searchers.html#timeouts).
## Non-essential Providers
Now let us add a provider that retrieves ads:
```
```
Suppose that it is more important to return the result to the user as fast as possible, than to retrieve ads. To signal this, we mark the ads provider as _optional_:
```
```
The Federation searcher will then only wait for ads as long as it waits for mandatory providers. If the ads are available in time, they are used, otherwise they are dropped.
If only optional providers are selected for Federation, they will all be treated as mandatory. Otherwise, they would not get a chance to return any results.
## Federation options inheritance
The sources automatically use the same Federation options as the enclosing provider._override_ one or more of the federation options in the sources:
```
```
You can use a single source in different Federation searchers. If you send queries with different cost to the same source from different federation searchers, you might also want to _override_ the federation options for when they are used:
```
```
## Selecting Search Chains programmatically
If we have complicated rules for when a search chain should be used, we can select search chains programmatically instead of setting up sources under federation in services.xml. The selection code is implemented as a[TargetSelector](https://javadoc.io/doc/com.yahoo.vespa/container-search/latest/com/yahoo/search/federation/selection/TargetSelector.html). This TargetSelector is used by registering it on a federation searcher.
```
```
package com.yahoo.example;
import com.google.common.base.Preconditions;
import com.yahoo.component.chain.Chain;
import com.yahoo.processing.execution.chain.ChainRegistry;
import com.yahoo.search.Query;
import com.yahoo.search.Result;
import com.yahoo.search.result.Hit;
import com.yahoo.search.Searcher;
import com.yahoo.search.federation.selection.FederationTarget;
import com.yahoo.search.federation.selection.TargetSelector;
import com.yahoo.search.searchchain.model.federation.FederationOptions;
import java.util.Arrays;
import java.util.Collection;
class MyTargetSelector implements TargetSelector