# Jina Serve
> This section includes the API documentation from the`jina`codebase, as extracted from the`docstrings`_ in the code.
---
# Source: https://github.com/jina-ai/serve/blob/master/docs/api-rst.rst
======================
:fab:`python` Python API
======================
This section includes the API documentation from the `jina` codebase, as extracted from the `docstrings `_ in the code.
For further details, please refer to the full :ref:`user guide `.
:mod:`jina.orchestrate.deployments` - Deployment
--------------------
.. currentmodule:: jina.orchestrate.deployments
.. autosummary::
:nosignatures:
:template: class.rst
__init__.Deployment
:mod:`jina.orchestrate.flow` - Flow
--------------------
.. currentmodule:: jina.orchestrate.flow
.. autosummary::
:nosignatures:
:template: class.rst
base.Flow
asyncio.AsyncFlow
:mod:`jina.serve.executors` - Executor
--------------------
.. currentmodule:: jina.serve.executors
.. autosummary::
:nosignatures:
:template: class.rst
Executor
BaseExecutor
decorators.requests
decorators.monitor
:mod:`jina.clients` - Clients
--------------------
.. currentmodule:: jina.clients
.. autosummary::
:nosignatures:
:template: class.rst
Client
grpc.GRPCClient
grpc.AsyncGRPCClient
http.HTTPClient
http.AsyncHTTPClient
websocket.WebSocketClient
websocket.AsyncWebSocketClient
:mod:`jina.types.request` - Networking messages
--------------------
.. currentmodule:: jina.types.request
.. autosummary::
:nosignatures:
:template: class.rst
Request
data.DataRequest
data.Response
status.StatusMessage
:mod:`jina.serve.runtimes` - Flow internals
--------------------
.. currentmodule:: jina.serve.runtimes
.. autosummary::
:nosignatures:
:template: class.rst
asyncio.AsyncNewLoopRuntime
gateway.GatewayRuntime
gateway.grpc.GRPCGatewayRuntime
gateway.http.HTTPGatewayRuntime
gateway.websocket.WebSocketGatewayRuntime
worker.WorkerRuntime
head.HeadRuntime
---
# Source: https://github.com/jina-ai/serve/blob/master/docs/cli/index.rst
:octicon:`terminal` Command-Line Interface
==========================================
.. argparse::
:noepilog:
:nodescription:
:ref: jina.parsers.get_main_parser
:prog: jina
---
# Source: https://github.com/jina-ai/serve/blob/master/docs/cloud-nativeness/docker-compose.md
(docker-compose)=
# {fab}`docker` Docker Compose Support
One of the simplest ways to prototype or serve in
production is to run your {class}`~jina.Flow` with `docker-compose`.
A {class}`~jina.Flow` is composed of {class}`~jina.Executor`s which run Python code
that operates on `Documents`. These `Executors` live in different runtimes depending on how you want to deploy
your Flow.
By default, if you are serving your Flow locally they live within processes. Nevertheless,
because Jina-serve is cloud native your Flow can easily manage Executors that live in containers and that are
orchestrated by your favorite tools. One of the simplest is Docker Compose which is supported out of the box.
You can deploy a Flow with Docker Compose in one line:
```{code-block} python
---
emphasize-lines: 3
---
from jina import Flow
flow = Flow(...).add(...).add(...)
flow.to_docker_compose_yaml('docker-compose.yml')
```
Jina-serve generates a `docker-compose.yml` configuration file corresponding with your Flow. You can use this directly with
Docker Compose, avoiding the overhead of manually defining all of your Flow's services.
````{admonition} Use Docker-based Executors
:class: caution
All Executors in the Flow should be used with `jinaai+docker://...` or `docker://...`.
````
````{admonition} Health check available from 3.1.3
:class: caution
If you use Executors that rely on Docker images built with a version of Jina-serve prior to 3.1.3, remove the
health check from the dumped YAML file, otherwise your Docker Compose services will
always be "unhealthy."
````
````{admonition} Matching Jina-serve versions
:class: caution
If you change the Docker images in your Docker Compose generated file, ensure that all services included in
the Gateway are built with the same Jina-serve version to guarantee compatibility.
````
## Example: Index and search text using your own built Encoder and Indexer
Install [`Docker Compose`](https://docs.docker.com/compose/install/) locally before starting this tutorial.
For this example we recommend that you read {ref}`how to build and containerize the Executors to be run in Kubernetes. `
### Deploy the Flow
First define the Flow and generate the Docker Compose YAML configuration:
````{tab} YAML
In a `flow.yml` file :
```yaml
jtype: Flow
with:
port: 8080
protocol: http
executors:
* name: encoder
uses: jinaai+docker:///EncoderPrivate
replicas: 2
* name: indexer
uses: jinaai+docker:///IndexerPrivate
shards: 2
```bash
```shell
jina export docker-compose flow.yml docker-compose.yml
```
````
````{tab} Python
In python run
```python
from jina import Flow
flow = (
Flow(port=8080, protocol='http')
.add(name='encoder', uses='jinaai+docker:///EncoderPrivate', replicas=2)
.add(
name='indexer',
uses='jinaai+docker:///IndexerPrivate',
shards=2,
)
)
flow.to_docker_compose_yaml('docker-compose.yml')
```
````
````{admonition} Hint
:class: hint
You can use a custom jina Docker image for the Gateway service by setting the environment variable `JINA_GATEWAY_IMAGE` to the desired image before generating the configuration.
````
let's take a look at the generated compose file:
```yaml
version: '3.3'
...
services:
encoder-rep-0: # # # # # # # # # # #
# Encoder #
encoder-rep-1: # # # # # # # # # # #
indexer-head: # # # # # # # # # # #
# #
indexer-0: # Indexer #
# #
indexer-1: # # # # # # # # # # #
gateway:
...
ports:
* 8080:8080
```
```{tip}
:class: caution
The default compose file generated by the Flow contains no special configuration or settings. You may want to
adapt it to your own needs.
```
You can see that six services are created:
* 1 for the **Gateway** which is the entrypoint of the **Flow**.
* 2 associated with the encoder for the two Replicas.
* 3 associated with the indexer, one for the Head and two for the Shards.
Now, you can deploy this Flow :
```shell
docker-compose -f docker-compose.yml up
```
### Query the Flow
Once we see that all the services in the Flow are ready, we can send index and search requests.
First define a client:
```python
from jina.clients import Client
client = Client(host='http://localhost:8080')
```
```python
from typing import List, Optional
from docarray import DocList, BaseDoc
from docarray.typing import NdArray
class MyDoc(BaseDoc):
text: str
embedding: Optional[NdArray] = None
class MyDocWithMatches(MyDoc):
matches: DocList[MyDoc] = []
scores: List[float] = []
docs = client.post(
'/index',
inputs=DocList[MyDoc]([MyDoc(text=f'This is document indexed number {i}') for i in range(100)]),
return_type=DocList[MyDoc],
request_size=10
)
print(f'Indexed documents: {len(docs)}')
docs = client.post(
'/search',
inputs=DocList[MyDoc]([MyDoc(text=f'This is document query number {i}') for i in range(10)]),
return_type=DocList[MyDocWithMatches],
request_size=10
)
for doc in docs:
print(f'Query {doc.text} has {len(doc.matches)} matches')
```
---
# Source: https://github.com/jina-ai/serve/blob/master/docs/cloud-nativeness/k8s.md
(kubernetes-docs)=
# {fas}`dharmachakra` Kubernetes Support
```{toctree}
:hidden:
kubernetes
```
Jina-serve is a cloud-native framework and therefore runs natively and easily on Kubernetes.
Deploying a Jina-serve Deploymenr or Flow on Kubernetes is actually the recommended way to use Jina-serve in production.
A {class}`~jina.Deployment` and {class}`~jina.Flow` are services composed of single or multiple microservices called {class}`~jina.Executor` and {class}`~jina.Gateway`s which natively run in containers. This means that Kubernetes can natively take over the lifetime management of Executors.
Deploying a {class}`~jina.Deployment` or `~jina.Flow` on Kubernetes means wrapping these services containers in the appropriate K8s abstraction (Deployment, StatefulSet, and so on), exposing them internally via K8s service and connecting them together by passing the right set of parameters.
```{hint}
This documentation is designed for users who want to **manually** deploy a Jina-serve project on Kubernetes.
Check out {ref}`jcloud` if you want a **one-click** solution to deploy and host Jina, leveraging a cloud-native stack of Kubernetes, Prometheus and Grafana, **without worrying about provisioning**.
```
## Automatically translate a Deployment or Flow to Kubernetes concept
```{hint}
Manually building these Kubernetes YAML object is long and cumbersome. Therefore we provide a helper function {meth}`~jina.Flow.to_kubernetes_yaml` that does most of this
translation work automatically.
```
This helper function can be called from:
* Jina-serve's Python interface to translate a Flow defined in Python to K8s YAML files
* Jina-serve's CLI interface to export a YAML Flow to K8s YAML files
```{seealso}
More detail in the {ref}`Deployment export documentation` and {ref}`Flow export documentation `
```
## Extra Kubernetes options
In general, Jina-serve follows a single principle when it comes to deploying in Kubernetes:
You, the user, know your use case and requirements the best.
This means that, while Jina-serve generates configurations for you that run out of the box, as a professional user you should always see them as just a starting point to get you off the ground.
```{hint}
The export function {meth}`~jina.Deployment.to_kubernetes_yaml` and {meth}`~jina.Flow.to_kubernetes_yaml` are helper functions to get your stared off the ground. **There are meant to be updated and adapted to every use case**
```
````{admonition} Matching Jina versions
:class: caution
If you change the Docker images for {class}`~jina.Executor` and {class}`~jina.Gateway` in your Kubernetes-generated file, ensure that all of them are built with the same Jina-serve version to guarantee compatibility.
````
You can't add basic Kubernetes features like `Secrets`, `ConfigMap` or `Labels` via the Pythonic or YAML interface. This is intentional and doesn't mean that we don't support these features. On the contrary, we let you fully express your Kubernetes configuration by using the Kubernetes API to add your own Kubernetes standard to Jina-serve.
````{admonition} Hint
:class: hint
We recommend you dump the Kubernetes configuration files and then edit them to suit your needs.
````
Here are possible configuration options you may need to add or change
* Add labels `selector`s to the Deployments to suit your case
* Add `requests` and `limits` for the resources of the different Pods
* Set up persistent volume storage to save your data on disk
* Pass custom configuration to your Executor with `ConfigMap`
* Manage credentials of your Executor with Kubernetes secrets, you can use `f.add(..., env_from_secret={'SECRET_PASSWORD': {'name': 'mysecret', 'key': 'password'}})` to map them to Pod environment variables
* Edit the default rolling update configuration
(service-mesh-k8s)=
## Required service mesh
```{caution}
A Service Mesh is required to be installed and correctly configured in the K8s cluster in which your deployed your Flow.
```
Service meshes work by attaching a tiny proxy to each of your Kubernetes Pods, allowing for smart rerouting, load balancing, request retrying, and host of [other features](https://linkerd.io/2.11/features/).
Jina relies on a service mesh to load balance requests between replicas of the same Executor.
You can use your favourite Kubernetes service mesh in combination with your Jina services, but the configuration files
generated by `to_kubernetes_yaml()` already include all necessary annotations for the [Linkerd service mesh](https://linkerd.io).
````{admonition} Hint
:class: hint
You can use any service mesh with Jina-serve, but Jina-serve Kubernetes configurations come with Linkerd annotations out of the box.
````
To use Linkerd you can follow the [install the Linkerd CLI guide](https://linkerd.io/2.11/getting-started/).
````{admonition} Caution
:class: caution
Many service meshes can perform retries themselves.
Be careful about setting up service mesh level retries in combination with Jina, as it may lead to unwanted behaviour in combination with
Jina's own {ref}`retry policy `.
Instead, you can disable Jina level retries by setting `Flow(retries=0)` in Python, or `retries: 0` in the Flow
YAML's `with` block.
````
(kubernetes-replicas)=
## Scaling Executors: Replicas and shards
Jina supports two types of scaling:
* **Replicas** can be used with any Executor type and are typically used for performance and availability.
* **Shards** are used for partitioning data and should only be used with indexers since they store state.
Check {ref}`here ` for more information about these scaling mechanisms.
For shards, Jina creates one separate Deployment in Kubernetes per Shard.
Setting `Deployment(..., shards=num_shards)` is sufficient to create a corresponding Kubernetes configuration.
For replicas, Jina-serve uses [Kubernetes native replica scaling](https://kubernetes.io/docs/tutorials/kubernetes-basics/scale/scale-intro/) and **relies on a service mesh** to load-balance requests between replicas of the same Executor.
Without a service mesh installed in your Kubernetes cluster, all traffic will be routed to the same replica.
````{admonition} See Also
:class: seealso
The impossibility of load balancing between different replicas is a limitation of Kubernetes in combination with gRPC.
If you want to learn more about this limitation, see [this](https://kubernetes.io/blog/2018/11/07/grpc-load-balancing-on-kubernetes-without-tears/) Kubernetes Blog post.
````
## Scaling the Gateway
The {ref}`Gateway ` is responsible for providing the API of the {ref}`Flow `.
If you have a large Flow with many Clients and many replicated Executors, the Gateway can become the bottleneck.
In this case you can also scale up the Gateway deployment to be backed by multiple Kubernetes Pods. For this reason, you can add `replicas` parameter to your Gateway before converting the Flow to Kubernetes.
This can be done in a Pythonic way or in YAML:
````{tab} Using Python
You can use {meth}`~jina.Flow.config_gateway` to add `replicas` parameter
```python
from jina import Flow
f = Flow().config_gateway(replicas=3).add()
f.to_kubernetes_yaml('./k8s_yaml_path')
```
````
````{tab} Using YAML
You can add `replicas` in the `gateway` section of your Flow YAML
```yaml
jtype: Flow
gateway:
replicas: 3
executors:
* name: encoder
```
````
Alternatively, this can be done by the regular means of Kubernetes: Either increase the number of replicas in the {ref}`generated yaml configuration files ` or [add replicas while running](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#scaling-a-deployment).
To expose your Gateway replicas outside Kubernetes, you can add a load balancer as described {ref}`here `.
````{admonition} Hint
:class: hint
You can use a custom Docker image for the Gateway deployment by setting the environment variable `JINA_GATEWAY_IMAGE` to the desired image before generating the configuration.
````
## See also
* {ref}`Step by step deployment of a Jina-serve Flow on Kubernetes `
* {ref}`Export a Flow to Kubernetes `
* {meth}`~jina.Flow.to_kubernetes_yaml`
* {ref}`Deploy a standalone Executor on Kubernetes `
* [Kubernetes Documentation](https://kubernetes.io/docs/home/)
---
# Source: https://github.com/jina-ai/serve/blob/master/docs/cloud-nativeness/kubernetes.md
(kubernetes)=
# Deploy on Kubernetes
This how-to will go through deploying a Deployment and a simple Flow using Kubernetes, customizing the Kubernetes configuration
to your needs, and scaling Executors using replicas and shards.
Deploying Jina-serve services in Kubernetes is the recommended way to use Jina-serve in production because Kubernetes can easily take over the lifetime management of Executors and Gateways.
```{seelaso}
This page is a step by step guide, refer to the {ref}`Kubernetes support documentation ` for more details
```
```{hint}
This guide is designed for users who want to **manually** deploy a Jina-serve project on Kubernetes.
Check out {ref}`jcloud` if you want a **one-click** solution to deploy and host Jina, leveraging a cloud-native stack of Kubernetes, Prometheus and Grafana, **without worrying about provisioning**.
```
## Preliminaries
To follow this how-to, you need access to a Kubernetes cluster.
You can either set up [`minikube`](https://minikube.sigs.k8s.io/docs/start/), or use one of many managed Kubernetes
solutions in the cloud:
* [Google Kubernetes Engine](https://cloud.google.com/kubernetes-engine)
* [Amazon EKS](https://aws.amazon.com/eks)
* [Azure Kubernetes Service](https://azure.microsoft.com/en-us/services/kubernetes-service)
* [Digital Ocean](https://www.digitalocean.com/products/kubernetes/)
You need to install Linkerd in your K8s cluster. To use Linkerd, [install the Linkerd CLI](https://linkerd.io/2.11/getting-started/) and [its control plane](https://linkerd.io/2.11/getting-started/) in your cluster.
This automatically sets up and manages the service mesh proxies when you deploy the Flow.
To understand why you need to install a service mesh like Linkerd refer to this {ref}`section `
(build-containerize-for-k8s)=
## Build and containerize your Executors
First, we need to build the Executors that we are going to use and containerize them {ref}`manually ` or by leveraging {ref}`Executor Hub `. In this example,
we are going to use the Hub.
We are going to build two Executors, the first is going to use `CLIP` to encode textual Documents, and the second is going to use an in-memory vector index. This way
we can build a simple neural search system.
First, we build the encoder Executor.
````{tab} executor.py
```{code-block} python
import torch
from typing import Optional
from transformers import CLIPModel, CLIPTokenizer
from docarray import DocList, BaseDoc
from docarray.typing import NdArray
from jina import Executor, requests
class MyDoc(BaseDoc):
text: str
embedding: Optional[NdArray] = None
class Encoder(Executor):
def __init__(
self, pretrained_model_name_or_path: str = 'openai/clip-vit-base-patch32', device: str = 'cpu', *args,**kwargs ):
super().__init__(*args, **kwargs)
self.device = device
self.tokenizer = CLIPTokenizer.from_pretrained(pretrained_model_name_or_path)
self.model = CLIPModel.from_pretrained(pretrained_model_name_or_path)
self.model.eval().to(device)
def _tokenize_texts(self, texts):
x = self.tokenizer(
texts,
max_length=77,
padding='longest',
truncation=True,
return_tensors='pt',
)
return {k: v.to(self.device) for k, v in x.items()}
@requests
def encode(self, docs: DocList[MyDoc], **kwargs) -> DocList[MyDoc]:
with torch.inference_mode():
input_tokens = self._tokenize_texts(docs.text)
docs.embedding = self.model.get_text_features(**input_tokens).cpu().numpy()
return docs
```
````
````{tab} requirements.txt
```
torch==1.12.0
transformers==4.16.2
```
````
````{tab} config.yml
```
jtype: Encoder
metas:
name: EncoderPrivate
py_modules:
* executor.py
```
````
Putting all these files into a folder named CLIPEncoder and calling `jina hub push CLIPEncoder --private` should give:
```shell
╭────────────────────────── Published ───────────────────────────╮
│ │
│ 📛 Name EncoderPrivate │
│ 🔗 Jina Hub URL https://cloud.jina.ai/executor// │
│ 👀 Visibility private │
│ │
╰────────────────────────────────────────────────────────────────╯
╭───────────────────────────────────────────────────── Usage ─────────────────────────────────────────────────────╮
│ │
│ Container YAML uses: jinaai+docker:///EncoderPrivate:latest │
│ Python .add(uses='jinaai+docker:///EncoderPrivate:latest') │
│ │
│ Source YAML uses: jinaai:///EncoderPrivate:latest │
│ Python .add(uses='jinaai:///EncoderPrivate:latest') │
│ │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
```
Then we can build an indexer to provide `index` and `search` endpoints:
````{tab} executor.py
```{code-block} python
from typing import Optional, List
from docarray import DocList, BaseDoc
from docarray.index import InMemoryExactNNIndex
from docarray.typing import NdArray
from jina import Executor, requests
class MyDoc(BaseDoc):
text: str
embedding: Optional[NdArray] = None
class MyDocWithMatches(MyDoc):
matches: DocList[MyDoc] = []
scores: List[float] = []
class Indexer(Executor):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self._indexer = InMemoryExactNNIndex[MyDoc]()
@requests(on='/index')
def index(self, docs: DocList[MyDoc], **kwargs) -> DocList[MyDoc]:
self._indexer.index(docs)
return docs
@requests(on='/search')
def search(self, docs: DocList[MyDoc], **kwargs) -> DocList[MyDocWithMatches]:
res = DocList[MyDocWithMatches]()
ret = self._indexer.find_batched(docs, search_field='embedding')
matched_documents = ret.documents
matched_scores = ret.scores
for query, matches, scores in zip(docs, matched_documents, matched_scores):
output_doc = MyDocWithMatches(**query.dict())
output_doc.matches = matches
output_doc.scores = scores.tolist()
res.append(output_doc)
return res
```
````
````{tab} config.yml
```
jtype: Indexer
metas:
name: IndexerPrivate
py_modules:
* executor.py
```
````
Putting all these files into a folder named Indexer and calling `jina hub push Indexer --private` should give:
```shell
╭────────────────────────── Published ───────────────────────────╮
│ │
│ 📛 Name IndexerPrivate │
│ 🔗 Jina Hub URL https://cloud.jina.ai/executor// │
│ 👀 Visibility private │
│ │
╰────────────────────────────────────────────────────────────────╯
╭───────────────────────────────────────────────────── Usage ─────────────────────────────────────────────────────╮
│ │
│ Container YAML uses: jinaai+docker:///IndexerPrivate:latest │
│ Python .add(uses='jinaai+docker:///IndexerPrivate:latest') │
│ │ │
│ Source YAML uses: jinaai:///IndexerPrivate:latest │
│ Python .add(uses='jinaai:///IndexerPrivate:latest') │
│ │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
```
Now, since we have created private Executors, we need to make sure that K8s has the right credentials to download
from the private registry:
First, we need to create the namespace where our Flow will run:
```shell
kubectl create namespace custom-namespace
```
Second, we execute this python script:
```python
import json
import os
import base64
JINA_CONFIG_JSON_PATH = os.path.join(os.path.expanduser('~'), os.path.join('.jina', 'config.json'))
CONFIG_JSON = 'config.json'
with open(JINA_CONFIG_JSON_PATH) as fp:
auth_token = json.load(fp)['auth_token']
config_dict = dict()
config_dict['auths'] = dict()
config_dict['auths']['registry.hubble.jina.ai'] = {'auth': base64.b64encode(f':{auth_token}'.encode()).decode()}
with open(CONFIG_JSON, mode='w') as fp:
json.dump(config_dict, fp)
```
Finally, we add a secret to be used as [imagePullSecrets](https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/) in the namespace from our config.json:
```shell script
kubectl -n custom-namespace create secret generic regcred --from-file=.dockerconfigjson=config.json --type=kubernetes.io/dockerconfigjson
```
## Deploy an embedding model inside a Deployment
Now we are ready to first deploy our embedding model as an embedding service in Kubernetes.
For now, define a Deployment,
either in {ref}`YAML ` or directly in Python, as we do here:
```python
from jina import Deployment
d = Deployment(port=8080, name='encoder', uses='jinaai+docker:///EncoderPrivate', image_pull_secrets=['regcred'])
```
You can serve any Deployment you want.
Just ensure that the Executor is containerized, either by using *'jinaai+docker'*, or by {ref}`containerizing your local
Executors `.
Next, generate Kubernetes YAML configs from the Flow. Notice, that this step may be a little slow, because [Executor Hub](https://cloud.jina.ai/) may
adapt the image to your Jina-serve and docarray version.
```python
d.to_kubernetes_yaml('./k8s_deployment', k8s_namespace='custom-namespace')
```
The following file structure will be generated - don't worry if it's slightly different -- there can be
changes from one Jina-serve version to another:
```text
└── k8s_deployment
└── encoder.yml
```
You can inspect these files to see how Deployment and Executor concepts are mapped to Kubernetes entities.
And as always, feel free to modify these files as you see fit for your use case.
````{admonition} Caution: Executor YAML configurations
:class: caution
As a general rule, the configuration files produced by `to_kubernetes_yaml()` should run out of the box, and if you strictly
follow this how-to they will.
However, there is an exception to this: If you use a local dockerized Executor, and this Executors configuration is stored
in a file other than `config.yaml`, you will have to adapt this Executor's Kubernetes YAML.
To do this, open the file and replace `config.yaml` with the actual path to the Executor configuration.
This is because when a Flow contains a Docker image, it can't see what Executor
configuration was used to create that image.
Since all of our tutorials use `config.yaml` for that purpose, the Flow uses this as a best guess.
Please adapt this if you named your Executor configuration file differently.
````
Next you can actually apply these configuration files to your cluster, using `kubectl`.
This launches the Deployment service.
Now, deploy this Deployment to your cluster:
```shell
kubectl apply -R -f ./k8s_deployment
```
Check that the Pods were created:
```shell
kubectl get pods -n custom-namespace
```
```text
NAME READY STATUS RESTARTS AGE
encoder-81a5b3cf9-ls2m3 1/1 Running 0 60m
```
Once you see that the Deployment ready, you can start embedding documents:
```python
from typing import Optional
import portforward
from docarray import DocList, BaseDoc
from docarray.typing import NdArray
from jina.clients import Client
class MyDoc(BaseDoc):
text: str
embedding: Optional[NdArray] = None
with portforward.forward('custom-namespace', 'encoder-81a5b3cf9-ls2m3', 8080, 8080):
client = Client(host='localhost', port=8080)
client.show_progress = True
docs = client.post(
'/encode',
inputs=DocList[MyDoc]([MyDoc(text=f'This is document indexed number {i}') for i in range(100)]),
return_type=DocList[MyDoc],
request_size=10
)
for doc in docs:
print(f'{doc.text}: {doc.embedding}')
```
## Deploy a simple Flow
Now we are ready to build a Flow composed of multiple Executors.
By *simple* in this context we mean a Flow without replicated or sharded Executors - you can see how to use those in
Kubernetes {ref}`later on `.
For now, define a Flow,
either in {ref}`YAML ` or directly in Python, as we do here:
```python
from jina import Flow
f = (
Flow(port=8080, image_pull_secrets=['regcred'])
.add(name='encoder', uses='jinaai+docker:///EncoderPrivate')
.add(
name='indexer',
uses='jinaai+docker:///IndexerPrivate',
)
)
```
You can essentially define any Flow of your liking.
Just ensure that all Executors are containerized, either by using *'jinaai+docker'*, or by {ref}`containerizing your local
Executors `.
The example Flow here simply encodes and indexes text data using two Executors pushed to the [Executor Hub](https://cloud.jina.ai/).
Next, generate Kubernetes YAML configs from the Flow. Notice, that this step may be a little slow, because [Executor Hub](https://cloud.jina.ai/) may
adapt the image to your Jina-serve and docarray version.
```python
f.to_kubernetes_yaml('./k8s_flow', k8s_namespace='custom-namespace')
```
The following file structure will be generated - don't worry if it's slightly different -- there can be
changes from one Jina-serve version to another:
```text
└── k8s_flow
├── gateway
│ └── gateway.yml
└── encoder
│ └── encoder.yml
└── indexer
└── indexer.yml
```
You can inspect these files to see how Flow concepts are mapped to Kubernetes entities.
And as always, feel free to modify these files as you see fit for your use case.
Next you can actually apply these configuration files to your cluster, using `kubectl`.
This launches all Flow microservices.
Now, deploy this Flow to your cluster:
```shell
kubectl apply -R -f ./k8s_flow
```
Check that the Pods were created:
```shell
kubectl get pods -n custom-namespace
```
```text
NAME READY STATUS RESTARTS AGE
encoder-8b5575cb9-bh2x8 1/1 Running 0 60m
gateway-66d5f45ff5-4q7sw 1/1 Running 0 60m
indexer-8f676fc9d-4fh52 1/1 Running 0 60m
```
Note that the Jina gateway was deployed with name `gateway-7df8765bd9-xf5tf`.
Once you see that all the Deployments in the Flow are ready, you can start indexing documents:
```python
from typing import List, Optional
import portforward
from docarray import DocList, BaseDoc
from docarray.typing import NdArray
from jina.clients import Client
class MyDoc(BaseDoc):
text: str
embedding: Optional[NdArray] = None
class MyDocWithMatches(MyDoc):
matches: DocList[MyDoc] = []
scores: List[float] = []
with portforward.forward('custom-namespace', 'gateway-66d5f45ff5-4q7sw', 8080, 8080):
client = Client(host='localhost', port=8080)
client.show_progress = True
docs = client.post(
'/index',
inputs=DocList[MyDoc]([MyDoc(text=f'This is document indexed number {i}') for i in range(100)]),
return_type=DocList[MyDoc],
request_size=10
)
print(f'Indexed documents: {len(docs)}')
docs = client.post(
'/search',
inputs=DocList[MyDoc]([MyDoc(text=f'This is document query number {i}') for i in range(10)]),
return_type=DocList[MyDocWithMatches],
request_size=10
)
for doc in docs:
print(f'Query {doc.text} has {len(doc.matches)} matches')
```
### Deploy with shards and replicas
After your service mesh is installed, your cluster is ready to run a Flow with scaled Executors.
You can adapt the Flow from above to work with two replicas for the encoder, and two shards for the indexer:
```python
from jina import Flow
f = (
Flow(port=8080, image_pull_secrets=['regcred'])
.add(name='encoder', uses='jinaai+docker:///CLIPEncoderPrivate', replicas=2)
.add(
name='indexer',
uses='jinaai+docker:///IndexerPrivate',
shards=2,
)
)
```
Again, you can generate your Kubernetes configuration:
```python
f.to_kubernetes_yaml('./k8s_flow', k8s_namespace='custom-namespace')
```
Now you should see the following file structure:
```text
└── k8s_flow
├── gateway
│ └── gateway.yml
└── encoder
│ └─ encoder.yml
└── indexer
├── indexer-0.yml
├── indexer-1.yml
└── indexer-head.yml
```
Apply your configuration like usual:
````{admonition} Hint: Cluster cleanup
:class: hint
If you already have the simple Flow from the first example running on your cluster, make sure to delete it using `kubectl delete -R -f ./k8s_flow`.
````
```shell
kubectl apply -R -f ./k8s_flow
```
### Deploy with custom environment variables and secrets
You can customize the environment variables that are available inside runtime, either defined directly or read from a [Kubernetes secret](https://kubernetes.io/docs/concepts/configuration/secret/):
````{tab} with Python
```python
from jina import Flow
f = (
Flow(port=8080, image_pull_secrets=['regcred'])
.add(
name='indexer',
uses='jinaai+docker:///IndexerPrivate',
env={'k1': 'v1', 'k2': 'v2'},
env_from_secret={
'SECRET_USERNAME': {'name': 'mysecret', 'key': 'username'},
'SECRET_PASSWORD': {'name': 'mysecret', 'key': 'password'},
},
)
)
f.to_kubernetes_yaml('./k8s_flow', k8s_namespace='custom-namespace')
```
````
````{tab} with flow YAML
In a `flow.yml` file :
```yaml
jtype: Flow
version: '1'
with:
protocol: http
executors:
* name: indexer
uses: jinaai+docker:///IndexerPrivate
env:
k1: v1
k2: v2
env_from_secret:
SECRET_USERNAME:
name: mysecret
key: username
SECRET_PASSWORD:
name: mysecret
key: password
```
You can generate Kubernetes YAML configs using `jina export`:
```shell
jina export kubernetes flow.yml ./k8s_flow --k8s-namespace custom-namespace
```
````
After creating the namespace, you need to create the secrets mentioned above:
```shell
kubectl -n custom-namespace create secret generic mysecret --from-literal=username=jina --from-literal=password=123456
```
Then you can apply your configuration.
(kubernetes-expose)=
## Exposing the service
The previous examples use port-forwarding to send documents to the services.
In real world applications,
you may want to expose your service to make it reachable by users so that you can serve search requests.
```{caution}
Exposing the Deployment or Flow only works if the environment of your `Kubernetes cluster` supports `External Loadbalancers`.
```
Once the service is deployed, you can expose a service. In this case we give an example of exposing the encoder when using a Deployment,
but you can expose the gateway service when using a Flow:
```bash
kubectl expose deployment executor --name=executor-exposed --type LoadBalancer --port 80 --target-port 8080 -n custom-namespace
sleep 60 # wait until the external ip is configured
```
Export the external IP address. This is needed for the client when sending Documents to the Flow in the next section.
```bash
export EXTERNAL_IP=`kubectl get service executor-expose -n custom-namespace -o=jsonpath='{.status.loadBalancer.ingress[0].ip}'`
```
### Client
The client:
* Sends Documents to the exposed service on `$EXTERNAL_IP`
* Gets the responses.
You should configure your Client to connect to the service via the external IP address as follows:
```python
import os
from typing import List, Optional
from docarray import DocList, BaseDoc
from docarray.typing import NdArray
from jina.clients import Client
class MyDoc(BaseDoc):
text: str
embedding: Optional[NdArray] = None
class MyDocWithMatches(MyDoc):
matches: DocList[MyDoc] = []
scores: List[float] = []
host = os.environ['EXTERNAL_IP']
port = 80
client = Client(host=host, port=port)
client.show_progress = True
docs = DocList[MyDoc]([MyDoc(text=f'This is document indexed number {i}') for i in range(100)])
queried_docs = client.post("/search", inputs=docs, return_type=DocList[MyDocWithMatches])
matches = queried_docs[0].matches
print(f"Matched documents: {len(matches)}")
```
## Update your Executor in Kubernetes
In Kubernetes, you can update your Executors by patching the Deployment corresponding to your Executor.
For instance, in the example above, you can change the CLIPEncoderPrivate's `pretrained_model_name_or_path` parameter by changing the content of the Deployment inside the `executor.yml` dumped by `.to_kubernetes_yaml`.
You need to add `--uses_with` and pass the batch size argument to it. This is passed to the container inside the Deployment:
```yaml
spec:
containers:
* args:
* executor
* --name
* encoder
* --k8s-namespace
* custom-namespace
* --uses
* config.yml
* --port
* '8080'
* --uses-metas
* '{}'
* --uses-with
* '{"pretrained_model_name_or_path": "other_model"}'
* --native
command:
* jina
```
After doing so, re-apply your configuration so the new Executor will be deployed without affecting the other unchanged Deployments:
```shell script
kubectl apply -R -f ./k8s_deployment
```
````{admonition} Other patching options
:class: seealso
In Kubernetes Executors are ordinary Kubernetes Deployments, so you can use other patching options provided by Kubernetes:
* `kubectl replace` to replace an Executor using a complete configuration file
* `kubectl patch` to patch an Executor using only a partial configuration file
* `kubectl edit` to edit an Executor configuration on the fly in your editor
You can find more information about these commands in the [official Kubernetes documentation](https://kubernetes.io/docs/concepts/cluster-administration/manage-deployment/).
````
## Key takeaways
In short, there are just three key steps to deploy Jina on Kubernetes:
1. Use `.to_kubernetes_yaml()` to generate Kubernetes configuration files from a Jina Deployment or Flow object.
2. Apply the generated file via `kubectl`(Modify the generated files if necessary)
3. Expose your service outside the K8s cluster
## See also
* {ref}`Kubernetes support documentation `
* {ref}`Monitor service once it is deployed `
* {ref}`See how failures and retries are handled `
* {ref}`Learn more about scaling Executors `
---
# Source: https://github.com/jina-ai/serve/blob/master/docs/cloud-nativeness/monitoring.md
(monitoring)=
# Prometheus/Grafana Support (Legacy)
```{admonition} Deprecated
:class: caution
The Prometheus-only based feature will soon be deprecated in favor of the OpenTelemetry Setup. Refer to {ref}`OpenTelemetry Setup ` for the details on OpenTelemetry setup for Jina-serve.
Refer to the {ref}`OpenTelemetry migration guide ` for updating your existing Prometheus and Grafana configurations.
```
We recommend the Prometheus/Grafana stack to leverage the metrics exposed by Jina-serve. In this setup, Jina-serve exposes different metrics, and Prometheus scrapes these endpoints, as well as
collecting, aggregating, and storing the metrics.
External entities (like Grafana) can access these aggregated metrics via the query language [PromQL](https://prometheus.io/docs/prometheus/latest/querying/basics/) and let users visualize the metrics with dashboards.
```{hint}
Jina supports exposing metrics, but you are in charge of installing and managing your Prometheus/Grafana instances.
```
In this guide, we deploy the Prometheus/Grafana stack and use it to monitor a Flow.
(deploy-flow-monitoring)=
## Deploying the Flow and the monitoring stack
### Deploying on Kubernetes
One challenge of monitoring a {class}`~jina.Flow` is communicating its different metrics endpoints to Prometheus.
Fortunately, the [Prometheus operator for Kubernetes](https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/user-guides/getting-started.md) makes this fairly easy because it can automatically discover new metrics endpoints to scrape.
We recommend deploying your Jina-serve Flow on Kubernetes to leverage the full potential of the monitoring feature because:
* The Prometheus operator can automatically discover new endpoints to scrape.
* You can extend monitoring with the rich built-in Kubernetes metrics.
You can deploy Prometheus and Grafana on your Kubernetes cluster by running:
```bash
helm install prometheus prometheus-community/kube-prometheus-stack --set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false
```
```{hint}
setting the `serviceMonitorSelectorNilUsesHelmValues` to false allows the Prometheus Operator to discover metrics endpoint outside of the helm scope which is needed to discover the Flow metrics endpoints.
```
Deploy the Flow that we want to monitor:
For this example we recommend reading {ref}`how to build and containerize the Executors to be run in Kubernetes. `
````{tab} via YAML
This example shows how to start a Flow with monitoring enabled via YAML:
In a `flow.yaml` file
```yaml
jtype: Flow
with:
monitoring: true
executors:
* uses: jinaai+docker:///EncoderPrivate
```
```bash
jina export kubernetes flow.yml ./config ```
````
````{tab} via Python API
```python
from jina import Flow
f = Flow(monitoring=True).add(uses='jinaai+docker:///EncoderPrivate')
f.to_kubernetes_yaml('config')
```
````
This creates a `config` folder containing the Kubernetes YAML definition of the Flow.
```{seealso}
You can see in-depth how to deploy a Flow on Kubernetes {ref}`here `
```
Then deploy the Flow:
```bash
kubectl apply -R -f config
```
Wait for a couple of minutes, and you should see that the Pods are ready:
```bash
kubectl get pods
```
```{figure} ../../.github/2.0/kubectl_pods.png
:align: center
```
Then you can see that the new metrics endpoints are automatically discovered:
```bash
kubectl port-forward svc/prometheus-operated 9090:9090
```
```{figure} ../../.github/2.0/prometheus_target.png
:align: center
```text
```bash
kubectl port-forward svc/gateway 8080:8080
```
To access Grafana, run:
```bash
kb port-forward svc/prometheus-grafana 3000:80
```
Then open `http://localhost:3000` in your browser. The username is `admin` and password is `prom-operator`.
You should see the Grafana home page.
### Deploying locally
Deploy the Flow that we want to monitor:
````{tab} via Python code
```python
from jina import Flow
with Flow(monitoring=True, port_monitoring=8000, port=8080).add(
uses='jinaai+docker:///EncoderPrivate', port_monitoring=9000
) as f:
f.block()
```
````
````{tab} via docker-compose
```python
from jina import Flow
Flow(monitoring=True, port_monitoring=8000, port=8080).add(
uses='jinaai+docker:///EncoderPrivate', port_monitoring=9000
).to_docker_compose_yaml('config.yaml')
```
```bash
docker-compose -f config.yaml up
```
````
To monitor a Flow locally you need to install Prometheus and Grafana locally. The easiest way to do this is with
Docker Compose.
First clone the repo which contains the config file:
```bash
git clone https://github.com/jina-ai/example-grafana-prometheus
cd example-grafana-prometheus/prometheus-grafana-local
```
then
```bash
docker-compose up
```
Access the Grafana dashboard at `http://localhost:3000`. The username is `admin` and the password is `foobar`.
```{caution}
This example works locally because Prometheus is configured to listen to ports 8000 and 9000. However,
in contrast to deploying on Kubernetes, you need to tell Prometheus which port to look at. You can change these
ports by modifying [prometheus.yml](https://github.com/jina-ai/example-grafana-prometheus/blob/8baf519f7258da68cfe224775fc90537a749c305/prometheus-grafana-local/prometheus/prometheus.yml#L64).
```
### Deploying on Jcloud
If your Flow is deployed on JCloud, you don't need to provision a monitoring stack yourself. Prometheus and Grafana are
handled by JCloud and you can find a dashboard URL with `jc status `
## Using Grafana to visualize metrics
Access the Grafana homepage, then go to `Browse` then `import` and copy and paste the [JSON file](https://github.com/jina-ai/example-grafana-prometheus/blob/main/grafana-dashboards/flow.json)
You should see the following dashboard:
```{figure} ../../.github/2.0/grafana.png
:align: center
```
````{admonition} Hint
:class: hint
You should query your Flow to generate the first metrics. Otherwise the dashboard looks empty.
````
You can query the Flow by running:
```python
from typing import Optional
from docarray import DocList, BaseDoc
from docarray.typing import NdArray
from jina import Client
class MyDoc(BaseDoc):
text: str
embedding: Optional[NdArray] = None
client = Client(port=51000)
client.post(on='/', inputs=DocList[MyDoc]([MyDoc(text=f'Text for document {i}') for in range(100)]), return_type=DocList[MyDoc], request_size=10,)
```
## See also
* [Using Grafana to visualize Prometheus metrics](https://grafana.com/docs/grafana/latest/getting-started/getting-started-prometheus/)
* {ref}`Defining custom metrics in an Executor `
---
# Source: https://github.com/jina-ai/serve/blob/master/docs/cloud-nativeness/opentelemetry-migration.md
(opentelemetry-migration)=
# Migrate from Prometheus/Grafana to OpenTelemetry
The {ref}`Prometheus/Grafana ` based monitoring setup will soon be deprecated in favor of the {ref}`OpenTelemetry setup `. This section provides the details required to update/migrate your Prometheus configuration and Grafana dashboard to continue monitoring with OpenTelemetry. Refer to {ref}`Opentelemetry setup ` for the new setup before proceeding further.
```{hint}
:class: seealso
Refer to {ref}`Prometheus/Grafana-only ` section for the soon to be deprecated setup.
```
## Update Prometheus configuration
With a Prometheus-only setup, you need to set up a `scrape_configs` configuration or service discovery plugin to specify the targets for pulling metrics data. In the OpenTelemetry setup, each Pod pushes metrics to the OpenTelemetry Collector. The Prometheus configuration now only needs to scrape from the OpenTelemetry Collector to get all the data from OpenTelemetry-instrumented applications.
The new Prometheus configuration for the `otel-collector` Collector hostname is:
```yaml
scrape_configs:
* job_name: 'otel-collector'
scrape_interval: 500ms
static_configs:
* targets: ['otel-collector:8888'] # metrics from the collector itself
* targets: ['otel-collector:8889'] # metrics collected from other applications
```
## Update Grafana dashboard
The OpenTelemetry [Histogram](https://opentelemetry.io/docs/reference/specification/metrics/api/#histogram) provides quantile window buckets automatically (unlike the Prometheus [Summary](https://prometheus.io/docs/concepts/metric_types/#summary) instrument). You need to manually configure the required quantile window. The quatile window metric will then be available as a separate time series metric.
In addition, the OpenTelemetry `Counter/UpDownCounter` instruments do not add the `_total` suffix to the base metric name.
To adapt Prometheus queries in Grafana:
* Use the [histogram_quantile](https://prometheus.io/docs/prometheus/latest/querying/functions/#histogram_quantile) function to query the average or desired quantile window time series data from Prometheus. For example, to view the 0.99 quantile of the `jina_receiving_request_seconds` metric over the last 10 minutes, use query `histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[10m]))`.
* Remove the `_total` prefix from the Counter/UpDownCounter metric names.
You can download a [sample Grafana dashboard JSON file](https://github.com/jina-ai/example-grafana-prometheus/blob/main/grafana-dashboards/flow-histogram-metrics.json) and import it into Grafana to get started with some pre-built graphs.
```{hint}
A list of available metrics is in the {ref}`Flow Instrumentation ` section.
```
---
# Source: https://github.com/jina-ai/serve/blob/master/docs/cloud-nativeness/opentelemetry.md
(opentelemetry)=
# {octicon}`telescope-fill` OpenTelemetry Support
```{toctree}
:hidden:
opentelemetry-migration
monitoring
```
```{hint}
Prometheus-only based metrics collection will soon be deprecated. Refer to {ref}`Monitor with Prometheus and Grafana ` for the old setup.
```
There are two major setups required to visualize/monitor your application's signals using [OpenTelemetry](https://opentelemetry.io). The first setup is covered by Jina-serve which integrates the [OpenTelemetry API and SDK](https://opentelemetry-python.readthedocs.io/en/stable/api/index.html) at the application level. The {ref}`Flow Instrumentation ` page covers in detail the steps required to enable OpenTelemetry in a Flow. A {class}`~jina.Client` can also be instrumented which is documented in the {ref}`Client Instrumentation ` section.
This section covers the OpenTelemetry infrastructure setup required to collect, store and visualize the traces and metrics data exported by the Pods. This setup is the user's responsibility, and this section only serves as the initial/introductory guide to running OpenTelemetry infrastructure components.
Since OpenTelemetry is open source and is mostly responsible for the API standards and specification, various providers implement the specification. This section follows the default recommendations from the OpenTelemetry documentation that also fits into the Jina-serve implementations.
## Exporting traces and metrics data
Pods created using a {class}`~jina.Flow` with tracing or metrics enabled use the [SDK Exporters](https://opentelemetry.io/docs/instrumentation/python/exporters/) to send the data to a central [Collector](https://opentelemetry.io/docs/collector/) component. You can use this collector to further process and store the data for visualization and alerting.
The push/export-based mechanism also allows the application to start pushing data immediately on startup. This differs from the pull-based mechanism where you need a separate scraping registry to discovery service to identify data scraping targets.
You can configure the exporter backend host and port using the `traces_exporter_host`, `traces_exporter_port`, `metrics_exporter_host` and `metrics_exporter_port`. Even though the Collector is metric data-type agnostic (it accepts any type of OpenTelemetry API data model), we provide separate configuration for Tracing and Metrics to give you more flexibility in choosing infrastructure components.
Jina-serve's default exporter implementation is `OTLPSpanExporter` and `OTLPMetricExporter`. The exporters also use the gRPC data transfer protocol. The following environment variables can be used to further configure the exporter client based on your requirements. The full list of exporter related environment variables are documented by the [PythonSDK library](https://opentelemetry-python.readthedocs.io/en/latest/exporter/otlp/otlp.html). Apart from `OTEL_EXPORTER_OTLP_PROTOCOL` and `OTEL_EXPORTER_OTLP_TRACES_ENDPOINT`, you can use all other library version specific environment variables to configure the exporter clients.
## Collector
The [Collector](https://opentelemetry.io/docs/collector/) is a huge ecosystem of components that support features like scraping, collecting, processing and further exporting data to storage backends. The collector itself can also expose endpoints to allow scraping data. We recommend reading the official documentation to understand the the full set of features and configuration required to run a Collector. Read the below section to understand the minimum number of components and the respective configuration required for operating with Jina-serve.
We recommend using the [OpenTelemetry Collector](https://opentelemetry.io/docs/collector/) from the contrib repository. We also use:
* [Jaeger](https://www.jaegertracing.io) for collecting traces, visualizing tracing data and alerting based on tracing data.
* [Prometheus](https://prometheus.io) for collecting metric data and/or alerting.
* [Grafana](https://grafana.com) for visualizing data from Prometheus/Jaeger and/or alerting based on the data queried.
```{hint}
Jaeger provides a comprehensive out of the box tools for end-to-end tracing monitoring, visualization and alerting. You can substitute other tools to achieve the necessary goals of observability and performance analysis. The same can be said for Prometheus and Grafana.
```
### Docker Compose
A minimal `docker-compose.yml` file can look like:
```yaml
version: "3"
services:
# Jaeger
jaeger:
image: jaegertracing/all-in-one:latest
ports:
* "16686:16686"
otel-collector:
image: otel/opentelemetry-collector:0.61.0
command: [ "--config=/etc/otel-collector-config.yml" ]
volumes:
* ${PWD}/otel-collector-config.yml:/etc/otel-collector-config.yml
ports:
* "8888" # Prometheus metrics exposed by the collector
* "8889" # Prometheus exporter metrics
* "4317:4317" # OTLP gRPC receiver
depends_on:
* jaeger
prometheus:
container_name: prometheus
image: prom/prometheus:latest
volumes:
* ${PWD}/prometheus-config.yml:/etc/prometheus/prometheus.yml
ports:
* "9090:9090"
grafana:
container_name: grafana
image: grafana/grafana-oss:latest
ports:
* 3000:3000
```
The corresponding OpenTelemetry Collector configuration below needs to be stored in file `otel-collector-config.yml`:
```yaml
receivers:
otlp:
protocols:
grpc:
exporters:
jaeger:
endpoint: jaeger:14250
tls:
insecure: true
prometheus:
endpoint: "0.0.0.0:8889"
resource_to_telemetry_conversion:
enabled: true
# can be used to add additional labels
const_labels:
label1: value1
processors:
batch:
service:
extensions: []
pipelines:
traces:
receivers: [otlp]
exporters: [jaeger]
processors: [batch]
metrics:
receivers: [otlp]
processors: [batch]
exporters: [prometheus]
```
This setup creates a gRPC Collector Receiver on port 4317 that collects data pushed by the Flow Pods. Collector exporters for Jaeger and Prometheus backends are configured to export tracing and metrics data respectively. The final **service** section creates a collector pipeline combining the receiver (collect data) and exporter (to backend), process (batching) sub-components.
The minimal Prometheus configuration needs to be stored in `prometheus-config.yml`.
```yaml
scrape_configs:
* job_name: 'otel-collector'
scrape_interval: 500ms
static_configs:
* targets: ['otel-collector:8889']
* targets: ['otel-collector:8888']
```
The Prometheus configuration now only needs to scrape from the OpenTelemetry Collector to get all the data from OpenTelemetry Metrics instrumented applications.
### Running a Flow locally
Run the Flow and a sample request that we want to instrument locally. If the backends are running successfully the Flow has exported data to the Collector which can be queried and viewed.
First start a Flow:
```python
from jina import Flow, Executor, requests
from docarray import DocList, BaseDoc
import time
class MyExecutor(Executor):
@requests
def foo(self, docs: DocList[BaseDoc], **kwargs) -> DocList[BaseDoc]:
time.sleep(0.5)
return docs
with Flow(
port=54321,
tracing=True,
traces_exporter_host='http://localhost',
traces_exporter_port=4317,
metrics=True,
metrics_exporter_host='http://localhost',
metrics_exporter_port=4317,
).add(uses=MyExecutor) as f:
f.block()
```
Second execute requests using the instrumented {class}`jina.Client`:
```python
from jina import Client
from docarray import DocList, BaseDoc
client = Client(
host='grpc://localhost:54321',
tracing=True,
traces_exporter_host='http://localhost',
traces_exporter_port=4317,
)
client.post('/', DocList[BaseDoc]([BaseDoc()]), return_type=DocList[BaseDoc])
client.teardown_instrumentation()
```
```{hint}
The {class}`jina.Client` currently only supports OpenTelemetry Tracing.
```
## Viewing Traces in Jaeger UI
You can open the Jaeger UI [here](http://localhost:16686). You can find more information on the Jaeger UI in the official [docs](https://www.jaegertracing.io/docs/1.38/external-guides/#using-jaeger).
```{hint}
The list of available traces are documented in the {ref}`Flow Instrumentation ` section.
```
## Monitor with Prometheus and Grafana
External entities (like Grafana) can access these aggregated metrics via the [PromQL](https://prometheus.io/docs/prometheus/latest/querying/basics/) query language, and let users visualize metrics with dashboards. Check out a [comprehensive tutorial](https://prometheus.io/docs/visualization/grafana/) for more information.
Download a [sample Grafana dashboard JSON file](https://github.com/jina-ai/example-grafana-prometheus/blob/main/grafana-dashboards/flow-histogram-metrics.json) and import it into Grafana to get started with some pre-built graphs:
```{figure} ../../.github/2.0/grafana-histogram-metrics.png
:align: center
```
```{hint}
:class: seealso
A list of available metrics is in the {ref}`Flow Instrumentation ` section.
To update your existing Prometheus and Grafana configurations, refer to the {ref}`OpenTelemetry migration guide `.
```
---
# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/client/callbacks.md
(callback-functions)=
# Callbacks
After performing {meth}`~jina.clients.mixin.PostMixin.post`, you may want to further process the obtained results.
For this purpose, Jina-serve implements a promise-like interface, letting you specify three kinds of callback functions:
* `on_done` is executed while streaming, after successful completion of each request
* `on_error` is executed while streaming, whenever an error occurs in each request
* `on_always` is always performed while streaming, no matter the success or failure of each request
Note that these callbacks only work for requests (and failures) *inside the stream*, for example inside an Executor.
If the failure is due to an error happening outside of
streaming, then these callbacks will not be triggered.
For example, a `SIGKILL` from the client OS during the handling of the request, or a networking issue,
will not trigger the callback.
Callback functions in Jina-serve expect a `Response` of the type {class}`~jina.types.request.data.DataRequest`, which contains resulting Documents,
parameters, and other information.
## Handle DataRequest in callbacks
`DataRequest`s are objects that are sent by Jina-serve internally. Callback functions process DataRequests, and `client.post()`
can return DataRequests.
`DataRequest` objects can be seen as a container for data relevant for a given request, it contains the following fields:
````{tab} header
The request header.
```python
from pprint import pprint
from jina import Client
Client().post(on='/', on_done=lambda x: pprint(x.header))
```
```console
request_id: "ea504823e9de415d890a85d1d00ccbe9"
exec_endpoint: "/"
target_executor: ""
```
````
````{tab} parameters
The input parameters of the associated request. In particular, `DataRequest.parameters['__results__']` is a
reserved field that gets populated by Executors returning a Python `dict`.
Information in those returned `dict`s gets collected here, behind each Executor ID.
```python
from pprint import pprint
from jina import Client
Client().post(on='/', on_done=lambda x: pprint(x.parameters))
```
```console
{'__results__': {}}
```
````
````{tab} routes
The routing information of the data request. It contains the which Executors have been called, and the order in which they were called.
The timing and latency of each Executor is also recorded.
```python
from pprint import pprint
from jina import Client
Client().post(on='/', on_done=lambda x: pprint(x.routes))
```
```console
[executor: "gateway"
start_time {
seconds: 1662637747
nanos: 790248000
}
end_time {
seconds: 1662637747
nanos: 794104000
}
, executor: "executor0"
start_time {
seconds: 1662637747
nanos: 790466000
}
end_time {
seconds: 1662637747
nanos: 793982000
}
]
```
````
````{tab} docs
The DocList being passed between and returned by the Executors. These are the Documents usually processed in a callback function, and are often the main payload.
```python
from pprint import pprint
from jina import Client
Client().post(on='/', on_done=lambda x: pprint(x.docs))
```
```console
```
````
Accordingly, a callback that processing documents can be defined as:
```{code-block} python
---
emphasize-lines: 4
---
from jina.types.request.data import DataRequest
def my_callback(resp: DataRequest):
foo(resp.docs)
```
## Handle exceptions in callbacks
Server error can be caught by Client's `on_error` callback function. You can get the error message and traceback from `header.status`:
```python
from pprint import pprint
from jina import Flow, Client, Executor, requests
class MyExec1(Executor):
@requests
def foo(self, **kwargs):
raise NotImplementedError
with Flow(port=12345).add(uses=MyExec1) as f:
c = Client(port=f.port)
c.post(on='/', on_error=lambda x: pprint(x.header.status))
```
```text
code: ERROR
description: "NotImplementedError()"
exception {
name: "NotImplementedError"
stacks: "Traceback (most recent call last):\n"
stacks: " File \"/Users/hanxiao/Documents/jina/jina/serve/runtimes/worker/__init__.py\", line 181, in process_data\n result = await self._data_request_handler.handle(requests=requests)\n"
stacks: " File \"/Users/hanxiao/Documents/jina/jina/serve/runtimes/request_handlers/data_request_handler.py\", line 152, in handle\n return_data = await self._executor.__acall__(\n"
stacks: " File \"/Users/hanxiao/Documents/jina/jina/serve/executors/__init__.py\", line 301, in __acall__\n return await self.__acall_endpoint__(__default_endpoint__, **kwargs)\n"
stacks: " File \"/Users/hanxiao/Documents/jina/jina/serve/executors/__init__.py\", line 322, in __acall_endpoint__\n return func(self, **kwargs)\n"
stacks: " File \"/Users/hanxiao/Documents/jina/jina/serve/executors/decorators.py\", line 213, in arg_wrapper\n return fn(executor_instance, *args, **kwargs)\n"
stacks: " File \"/Users/hanxiao/Documents/jina/toy44.py\", line 10, in foo\n raise NotImplementedError\n"
stacks: "NotImplementedError\n"
executor: "MyExec1"
}
```
In the example below, our Flow passes the message then prints the result when successful.
If something goes wrong, it beeps. Finally, the result is written to output.txt.
```python
from jina import Flow, Client
from docarray import BaseDoc
def beep(*args):
# make a beep sound
import sys
sys.stdout.write('\a')
with Flow().add() as f, open('output.txt', 'w') as fp:
client = Client(port=f.port)
client.post(
'/',
BaseDoc(),
on_done=print,
on_error=beep,
on_always=lambda x: x.docs.save(fp),
)
```
````{admonition} What errors can be handled by the callback?
:class: caution
Callbacks can handle errors that are caused by Executors raising an Exception.
A callback will not receive exceptions:
* from the Gateway having connectivity errors with the Executors.
* between the Client and the Gateway.
````
---
# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/client/index.md
(client)=
# {fas}`laptop-code` Client
{class}`~jina.Client` enables you to send Documents to a running {class}`~jina.Flow`. Same as Gateway, Client supports four networking protocols: **gRPC**, **HTTP**, **WebSocket** and **GraphQL** with/without TLS.
You may have observed two styles of using a Client in the docs:
````{tab} Implicit, inside a Flow
```{code-block} python
---
emphasize-lines: 6
---
from jina import Flow
f = Flow()
with f:
f.post('/')
```
````
````{tab} Explicit, outside a Flow
```{code-block} python
---
emphasize-lines: 3,4
---
from jina import Client
c = Client(...) # must match the Flow setup
c.post('/')
```
````
The implicit style is easier in debugging and local development, as you don't need to specify the host, port and protocol of the Flow. However, it makes very strong assumptions on (1) one Flow only corresponds to one client (2) the Flow is running on the same machine as the Client. For those reasons, explicit style is recommended for production use.
```{hint}
If you want to connect to your Flow from a programming language other than Python, please follow the third party
client {ref}`documentation `.
```
## Connect
To connect to a Flow started by:
```python
from jina import Flow
with Flow(port=1234, protocol='grpc') as f:
f.block()
```
```text
────────────────────────── 🎉 Flow is ready to serve! ──────────────────────────
╭────────────── 🔗 Endpoint ───────────────╮
│ ⛓ Protocol GRPC │
│ 🏠 Local 0.0.0.0:1234 │
│ 🔒 Private 192.168.1.126:1234 │
│ 🌍 Public 87.191.159.105:1234 │
╰──────────────────────────────────────────╯
```
The Client has to specify the followings parameters to match the Flow and how it was set up:
* the `protocol` it needs to use to communicate with the Flow
* the `host` and the `port` as exposed by the Flow
* if it needs to use `TLS` encryption (to connect to a {class}`~jina.Flow` that has been {ref}`configured to use TLS ` in combination with gRPC, http, or websocket)
````{Hint} Default port
The default port for the Client is `80` unless you are using `TLS` encryption it will be `443`
````
You can define these parameters by passing a valid URI scheme as part of the `host` argument:
````{tab} TLS disabled
```python
from jina import Client
Client(host='http://my.awesome.flow:1234')
Client(host='ws://my.awesome.flow:1234')
Client(host='grpc://my.awesome.flow:1234')
```
````
````{tab} TLS enabled
```python
from jina import Client
Client(host='https://my.awesome.flow:1234')
Client(host='wss://my.awesome.flow:1234')
Client(host='grpcs://my.awesome.flow:1234')
```
````
Equivalently, you can pass each relevant parameter as a keyword argument:
````{tab} TLS disabled
```python
from jina import Client
Client(host='my.awesome.flow', port=1234, protocol='http')
Client(host='my.awesome.flow', port=1234, protocol='websocket')
Client(host='my.awesome.flow', port=1234, protocol='grpc')
```
````
````{tab} TLS enabled
```python
from jina import Client
Client(host='my.awesome.flow', port=1234, protocol='http', tls=True)
Client(host='my.awesome.flow', port=1234, protocol='websocket', tls=True)
Client(host='my.awesome.flow', port=1234, protocol='grpc', tls=True)
```
````
You can also use a mix of both:
```python
from jina import Client
Client(host='https://my.awesome.flow', port=1234)
Client(host='my.awesome.flow:1234', protocol='http', tls=True)
```
````{admonition} Caution
:class: caution
You can't define these parameters both by keyword argument and by host scheme - you can't have two sources of truth.
Example: the following code will raise an exception:
```python
from jina import Client
Client(host='https://my.awesome.flow:1234', port=4321)
```
````
````{admonition} Caution
:class: caution
We apply `RLock` to avoid [this gRPC issue](https://github.com/grpc/grpc/issues/25364), so that `grpc` clients can be used in a multi-threaded environment.
What you should do, is to rely on asynchronous programming or multi-processing rather than multi-threading.
For instance, if you're building a web server, you can introduce multi-processing based parallelism to your app using
`gunicorn`: `gunicorn main:app --workers 4 --worker-class uvicorn.workers.UvicornWorker ...`
````
## Client API
When using `docarray>=0,30`, you specify the schema that you expect the Deployment or Flow to return. You can pass the return type by using the `return_type` parameter in the `client.post` method:
```{code-block} python
---
emphasize-lines: 7
---
from jina import Client
from docarray import DocList, BaseDoc
class InputDoc(BaseDoc):
text: str = ''
class OutputDoc(BaseDoc):
tags: Dict[str, int] = {}
c = Client(host='https://my.awesome.flow:1234', port=4321)
c.post(
on='/',
inputs=InputDoc(),
return_type=DocList[OutputDoc],
)
```
(client-compress)=
## Enable compression
If the communication to the Gateway is via gRPC, you can pass `compression` parameter to {meth}`~jina.clients.mixin.PostMixin.post` to benefit from [gRPC compression](https://grpc.github.io/grpc/python/grpc.html#compression) methods.
The supported choices are: None, `gzip` and `deflate`.
```python
from jina import Client
client = Client()
client.post(..., compression='Gzip')
```
Note that this setting is only effective the communication between the client and the Flow's gateway.
One can also specify the compression of the internal communication {ref}`as described here`.
## Test readiness of the server
```{include} ../orchestration/readiness.md
:start-after:
:end-before:
```
## Simple profiling of the latency
Before sending any real data, you can test the connectivity and network latency by calling the {meth}`~jina.clients.mixin.ProfileMixin.profiling` method:
```python
from jina import Client
c = Client(host='grpc://my.awesome.flow:1234')
c.profiling()
```
```text
Roundtrip 24ms 100%
├── Client-server network 17ms 71%
└── Server 7ms 29%
├── Gateway-executors network 0ms 0%
├── executor0 5ms 71%
└── executor1 2ms 29%
```
## Logging configuration
Similar to the {ref}`Flow logging configuration `, the {class}`jina.Client` also accepts the `log_config` argument. The Client can be configured as below:
```python
from jina import Client
client = Client(log_config='./logging.json.yml')
```
If the Flow is configured with custom logging, the argument will be forwarded to the implicit client.
```python
from jina import Flow
f = Flow(log_config='./logging.json.yml')
with f:
# the implicit client automatically uses the log_config from the Flow for consistency
f.post('/')
```
```{toctree}
:hidden:
send-receive-data
send-parameters
send-graphql-mutation
transient-errors
callbacks
rate-limit
instrumentation
third-party-clients
```
---
# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/client/instrumentation.md
(instrumenting-client)=
## Instrumentation
The {class}`~jina.Client` supports request tracing, giving you an end-to-end view of a request's lifecycle. The client supports **gRPC**, **HTTP** and **WebSocket** protocols.
````{tab} Implicit, inside a Flow
```{code-block} python
---
emphasize-lines: 4, 5, 6
---
from jina import Flow
f = Flow(
tracing=True,
traces_exporter_host='http://localhost',
traces_exporter_port=4317,
)
with f:
f.post('/')
```
````
````{tab} Explicit, outside a Flow
```{code-block} python
---
emphasize-lines: 5, 6, 7
---
from jina import Client
# must match the Flow setup
c = Client(
tracing=True,
traces_exporter_host='http://localhost',
traces_exporter_port=4317,
)
c.post('/')
```
````
Each protocol client creates the first trace ID which will be propagated to the `Gateway`. The `Gateway` then creates child spans using the available trace ID which is further propagated to each Executor request. Using the trace ID, all associated spans can be collected to build a trace view of the whole request lifecycle.
```{admonition} Using custom/external tracing context
:class: caution
The {class}`~jina.Client` doesn't currently support external tracing context which can potentially be extracted from an upstream request.
```
You can find more about instrumentation from the resources below:
* [Tracing in OpenTelemetry](https://opentelemetry.io/docs/concepts/signals/traces/)
* {ref}`Instrumenting a Flow `
* {ref}`Deploying and using OpenTelemetry in Jina-serve `
---
# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/client/rate-limit.md
(client-post-prefetch)=
# Rate Limit
There are two ways of applying a rate limit using the {class}`~jina.Client`.
1. Set using the `Client` class constructor and defaults to 1,000 requests.
1. Set the argument when using {meth}`~jina.clients.mixin.PostMixin.post` method. If not provided, the default value of
1,000 requests will be used. The method argument will override the argument provided in the `Client` class constructor.
The `prefetch` argument controls the number of in flight requests made by the {meth}`~jina.clients.mixin.PostMixin.post`
method. Using the default value might overload the {class}`~jina.Gateway` or {class}`~jina.Executor` especially if the operation characteristics of the `Deployment` or `Flow`
are unknown. Furthermore the Client can send various types of requests which can have varying resource usage.
For example, a high number of `index` requests can contain a large data payload requiring high input/output operation.
This increases CPU consumption and eventually lead to a build up of the requests on the Flow. If the queue of in-flight requests
is already large, a very light weight `search` request to return the total number of
Documents in the index might be blocked until the queue of `index` requests can be completely processed. To prevent such a scenario,
apply the `prefetch` value on the {meth}`~jina.clients.mixin.PostMixin.post` method to limit the rate of
requests for expensive operations.
Apply the `prefetch` argument on the {meth}`~jina.clients.mixin.PostMixin.post` method to dynamically increase
the server responsiveness for customer-facing requests which require faster response times vs. background requests such as cronjobs or
analytics requests which can be processed slowly.
```python
from jina import Client
client = Client()
# uses the default limit of 1,000 requests
search_responses = client.post(...)
# sets a hard limit of 5 in flight requests
index_responses = client.post(..., prefetch=5)
```
A global rate limit on the {class}`~jina.Gateway` can also be set using the {ref}`prefetch ` option in the `Flow`.
This argument however serves as a global rate limit and cannot be customized based on the request workload. The `prefetch`
argument for the `Client` serves as a class level rate limit for all requests made from the client. The `prefetch`
argument for the {meth}`~jina.clients.mixin.PostMixin.post` method serves as a method level overriding the arguments at the
`Client` and the `Flow`.
---
# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/client/send-graphql-mutation.md
# Send GraphQL Mutation
If the Flow is configured with GraphQL endpoint, then you can use Jina-serve {class}`~jina.Client` {meth}`~jina.clients.mixin.MutateMixin.mutate` to fetch data via GraphQL mutations:
````{admonition} Only available for docarray<0.30
:class: note
This feature is only available when using `docarray<0.30`.
````
```python
from jina import Client
PORT = ...
c = Client(port=PORT)
mut = '''
mutation {
docs(data: {text: "abcd"}) {
id
matches {
embedding
}
}
}
'''
response = c.mutate(mutation=mut)
```
Note that `response` here is `Dict` not a `DocumentArray`. This is because GraphQL allows the user to specify only certain fields that they want to have returned, so the output might not be a valid DocumentArray, it can be only a string.
## Mutations and arguments
The Flow GraphQL API exposes the mutation `docs`, which sends its inputs to the Flow's Executors,
just like HTTP `post` as described {ref}`above `.
A GraphQL mutation takes the same set of arguments used in {ref}`HTTP `.
The response from GraphQL can include all fields available on a DocumentArray.
````{admonition} See Also
:class: seealso
For more details on the GraphQL format of Document and DocumentArray, see the [documentation page](https://docarray.jina.ai/advanced/graphql-support/)
or [developer reference](https://docarray.jina.ai/api/docarray.document.mixins.strawberry/).
````
## Fields
The available fields in the GraphQL API are defined by the [Document Strawberry type](https://docarray.jina.ai/advanced/graphql-support/?highlight=graphql).
Essentially, you can ask for any property of a Document, including `embedding`, `text`, `tensor`, `id`, `matches`, `tags`,
and more.
---
# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/client/send-parameters.md
(client-executor-parameters)=
# Send Parameters
The {class}`~jina.Client` can send key-value pairs as parameters to {class}`~jina.Executor`s as shown below:
```{code-block} python
---
emphasize-lines: 15
---
from jina import Client, Executor, Deployment, requests
from docarray import BaseDoc
class MyExecutor(Executor):
@requests
def foo(self, parameters, **kwargs):
print(parameters['hello'])
dep = Deployment(uses=MyExecutor)
with dep:
client = Client(port=dep.port)
client.post('/', BaseDoc(), parameters={'hello': 'world'})
```
````{hint}
:class: note
You can send a parameters-only data request via:
```python
with dep:
client = Client(port=dep.port)
client.post('/', parameters={'hello': 'world'})
```
This might be useful to control `Executor` objects during their lifetime.
````
Since Executors {ref}`can use Pydantic models to have strongly typed parameters `, you can also send parameters as Pydantic models in the client API
(specific-params)=
## Send parameters to specific Executors
You can send parameters to specific Executor by using the `executor__parameter` syntax.
The Executor named `executorname` will receive the parameter `paramname` (without the `executorname__` in the key name)
and none of the other Executors will receive it.
For instance in the following Flow:
```python
from jina import Flow, Client
from docarray import BaseDoc, DocList
with Flow().add(name='exec1').add(name='exec2') as f:
client = Client(port=f.port)
client.post(
'/index',
DocList[BaseDoc]([BaseDoc()]),
parameters={'exec1__parameter_exec1': 'param_exec1', 'exec2__parameter_exec1': 'param_exec2'},
)
```
The Executor `exec1` will receive `{'parameter_exec1':'param_exec1'}` as parameters, whereas `exec2` will receive `{'parameter_exec1':'param_exec2'}`.
This feature is intended for the case where there are multiple Executors that take the same parameter names, but you want to use different values for each Executor.
This is often the case for Executors from the Hub, since they tend to share a common interface for parameters.
```{admonition} Difference to target_executor
Why do we need this feature if we already have `target_executor`?
On the surface, both of them is about sending information to a partial Flow, i.e. a subset of Executors. However, they work differently under the hood. `target_executor` directly send info to those specified executors, ignoring the topology of the Flow; whereas `executor__parameter`'s request follows the topology of the Flow and only send parameters to the Executor that matches.
Think about roll call and passing notes in a classroom. `target_executor` is like calling a student directly, whereas `executor__parameter` is like asking him/her to pass the notes to the next student one by one while each picks out the note with its own name.
```
---
# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/client/send-receive-data.md
# Send & Receive Data
After a {class}`~jina.Client` has connected to a {class}`~jina.Deployment` or a {class}`~jina.Flow`, it can send requests to the service using its
{meth}`~jina.clients.mixin.PostMixin.post` method.
This expects as inputs the {ref}`Executor endpoint ` that you want to target, as well as a Document or
Iterable of Documents:
````{tab} A single Document
```{code-block} python
---
emphasize-lines: 6
---
from docarray.documents import TextDoc
d1 = TextDoc(text='hello')
client = Client(...)
client.post('/endpoint', d1)
```
````
````{tab} A list of Documents
```{code-block} python
---
emphasize-lines: 7
---
from docarray.documents import TextDoc
d1 = TextDoc(text='hello')
d2 = TextDoc(text='world')
client = Client(...)
client.post('/endpoint', inputs=[d1, d2])
```
````
````{tab} A DocList
```{code-block} python
---
emphasize-lines: 6
---
from docarray import DocList
d1 = TextDoc(text='hello')
d2 = TextDoc(text='world')
da = DocList[TextDoc]([d1, d2])
client = Client(...)
client.post('/endpoint', da)
```
````
````{tab} A Generator of Document
```{code-block} python
---
emphasize-lines: 3-5, 9
---
from docarray.documents import TextDoc
def doc_gen():
for j in range(10):
yield TextDoc(content=f'hello {j}')
client = Client(...)
client.post('/endpoint', doc_gen)
```
````
````{tab} No Document
```{code-block} python
---
emphasize-lines: 3
---
client = Client(...)
client.post('/endpoint')
```
````
```{admonition} Caution
:class: caution
`Flow` and `Deployment` also provide a `.post()` method that follows the same interface as `client.post()`.
However, once your solution is deployed remotely, these objects are not present anymore.
Hence, `deployment.post()` and `flow.post()` are not recommended outside of testing or debugging use cases.
```
(request-size-client)=
## Send data in batches
Especially during indexing, a Client can send up to thousands or millions of Documents to a {class}`~jina.Flow`.
Those Documents are internally batched into a `Request`, providing a smaller memory footprint and faster response times
thanks
to {ref}`callback functions `.
The size of these batches can be controlled with the `request_size` keyword.
The default `request_size` is 100 Documents. The optimal size will depend on your use case.
```python
from jina import Deployment, Client
from docarray import DocList, BaseDoc
with Deployment() as dep:
client = Client(port=f.port)
client.post('/', DocList[BaseDoc](BaseDoc() for _ in range(100)), request_size=10)
```
## Send data asynchronously
There is an async version of the Python Client which works with {meth}`~jina.clients.mixin.PostMixin.post` and
{meth}`~jina.clients.mixin.MutateMixin.mutate`.
While the standard `Client` is also asynchronous under the hood, its async version exposes this fact to the outside
world,
by allowing *coroutines* as input, and returning an *asynchronous iterator*.
This means you can iterate over Responses one by one, as they come in.
```python
import asyncio
from jina import Client, Deployment
from docarray import BaseDoc
async def async_inputs():
for _ in range(10):
yield BaseDoc()
await asyncio.sleep(0.1)
async def run_client(port):
client = Client(port=port, asyncio=True)
async for resp in client.post('/', async_inputs, request_size=1):
print(resp)
with Deployment() as dep: # Using it as a Context Manager will start the Deployment
asyncio.run(run_client(dep.port))
```
Async send is useful when calling an external service from an Executor.
```python
from jina import Client, Executor, requests
from docarray import DocList, BaseDoc
class DummyExecutor(Executor):
c = Client(host='grpc://0.0.0.0:51234', asyncio=True)
@requests
async def process(self, docs: DocList[BaseDoc], **kwargs) -> DocList[BaseDoc]:
return self.c.post('/', docs, return_type=DocList[BaseDoc])
```
## Send data to specific Executors
Usually a {class}`~jina.Flow` will send each request to all {class}`~jina.Executor`s with matching endpoints as
configured. But the {class}`~jina.Client` also allows you to only target specific Executors in a Flow using
the `target_executor` keyword. The request will then only be processed by the Executors which match the provided
target_executor regex. Its usage is shown in the listing below.
```python
from jina import Client, Executor, Flow, requests
from docarray import DocList
from docarray.documents import TextDoc
class FooExecutor(Executor):
@requests
async def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
for doc in docs:
doc.text = f'foo was here and got {len(docs)} document'
class BarExecutor(Executor):
@requests
async def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
for doc in docs:
doc.text = f'bar was here and got {len(docs)} document'
f = (
Flow()
.add(uses=FooExecutor, name='fooExecutor')
.add(uses=BarExecutor, name='barExecutor')
)
with f: # Using it as a Context Manager will start the Flow
client = Client(port=f.port)
docs = client.post(on='/', inputs=TextDoc(text=''), target_executor='bar*', return_type=DocList[TextDoc])
print(docs.text)
```
This will send the request to all Executors whose names start with 'bar', such as 'barExecutor'.
In the simplest case, you can specify a precise Executor name, and the request will be sent only to that single
Executor.
## Use Unary or Streaming gRPC
The Flow with **gRPC** protocol implements the unary and the streaming RPC lifecycle for communicating with the clients.
When sending more than one request using the batching or the iterator mechanism, the RPC lifecycle for the
{meth}`~jina.clients.mixin.PostMixin.post` method can be controlled using the `stream` boolean method argument. By
default the stream option is set to `True` which uses the streaming RPC to send the data to the Flow. If the stream
option is set to `False`, the unary RPC is used to send the data to the Flow.
Both RPC lifecycles are implemented to provide the flexibility for the clients.
There might be performance penalties when using the streaming RPC in the Python gRPC implementation.
```{hint}
This option is only valid for **gRPC** protocol.
Refer to the gRPC [Performance Best Practices](https://grpc.io/docs/guides/performance/#general) guide for more implementations details and considerations.
```
(client-grpc-channel-options)=
## Configure gRPC Client options
The `Client` supports the `grpc_channel_options` parameter which allows more customization of the **gRPC** channel
construction. The `grpc_channel_options` parameter accepts a dictionary of **gRPC** configuration options which will be
used to overwrite the default options. The default **gRPC** options are:
```python
('grpc.max_receive_message_length', -1),
('grpc.keepalive_time_ms', 9999),
# send keepalive ping every 9 second, default is 2 hours.
('grpc.keepalive_timeout_ms', 4999),
# keepalive ping time out after 4 seconds, default is 20 seconds
('grpc.keepalive_permit_without_calls', True),
# allow keepalive pings when there's no gRPC calls
('grpc.http1.max_pings_without_data', 0),
# allow unlimited amount of keepalive pings without data
('grpc.http1.min_time_between_pings_ms', 10000),
# allow grpc pings from client every 9 seconds
('grpc.http1.min_ping_interval_without_data_ms', 5000),
# allow grpc pings from client without data every 4 seconds
```
If the `max_attempts` is greater than 1 on the {meth}`~jina.clients.mixin.PostMixin.post` method,
the `grpc.service_config` option will not be applied since the retry
options will be configured internally.
Refer to the [channel_arguments](https://grpc.github.io/grpc/python/glossary.html#term-channel_arguments) section for
the full list of available **gRPC** options.
```{hint}
:class: seealso
Refer to the {ref}`Configure Executor gRPC options ` section for configuring the `Executor` **gRPC** options.
```
## Returns
{meth}`~jina.clients.mixin.PostMixin.post` returns a `DocList` containing all Documents flattened over all
Requests. When setting `return_responses=True`, this behavior is changed to returning a list of
{class}`~jina.types.request.data.Response` objects.
If a callback function is provided, `client.post()` will return none.
````{tab} Return as DocList objects
```python
from jina import Deployment, Client
from docarray import DocList
from docarray.documents import TextDoc
with Deployment() as dep:
client = Client(port=dep.port)
docs = client.post(on='', inputs=TextDoc(text='Hi there!'), return_type=DocList[TextDoc])
print(docs)
print(docs.text)
```
```console
['Hi there!']
```
````
````{tab} Return as Response objects
```python
from docarray import DocList
from docarray.documents import TextDoc
with Deployment() as dep:
client = Client(port=dep.port)
resp = client.post(on='', inputs=TextDoc(text='Hi there!'), return_type=DocList[TextDoc], return_responses=True)
print(resp)
print(resp[0].docs.text)
```
```console
[]
['Hi there!']
```
````
````{tab} Handle response via callback
```python
from jina import Flow, Client
from docarray import DocList
from docarray.documents import TextDoc
with Deployment() as dep:
client = Client(port=f.port)
resp = client.post(
on='',
inputs=TextDoc(text='Hi there!'),
on_done=lambda resp: print(resp.docs.texts),
)
print(resp)
```
```console
['Hi there!']
None
```
````
### Return type
{meth}`~jina.clients.mixin.PostMixin.post` returns the Documents as the server sends them back. In order for the client to
return the user's expected document type, the `return_type` argument is required.
The `return_type` can be a parametrized `DocList` or a single `BaseDoc` type. If the return type parameter is a `BaseDoc` type,
the results will be returned as a `DocList[T]` except if the result contains a single Document, in that case the only Document in the list is returned
instead of the DocList.
### Callbacks vs returns
Callback operates on every sub-request generated by `request_size`. The callback function consumes the response one by
one. The old response is immediately free from the memory after the consumption.
When callback is not provided, the client accumulates all DocLists of all Requests before returning.
This means you will not receive results until all Requests have been processed, which is slower and requires more
memory.
### Force the order of responses
Note that the Flow processes Documents in an asynchronous and a distributed manner. The order of the Flow processing the
requests may not be the same order as the Client sending them. Hence, the response order may also not be consistent as
the sending order.
To force the order of the results to be deterministic and the same as when they are sent, passing `results_in_order`
parameter to {meth}`~jina.clients.mixin.PostMixin.post`.
```python
import random
import time
from jina import Deployment, Executor, requests, Client
from docarray import DocList
from docarray.documents import TextDoc
class RandomSleepExecutor(Executor):
@requests
def foo(self, *args, **kwargs):
rand_sleep = random.uniform(0.1, 1.3)
time.sleep(rand_sleep)
dep = Deployment(uses=RandomSleepExecutor, replicas=3)
input_text = [f'ordinal-{i}' for i in range(180)]
input_da = DocList[TextDoc]([TextDoc(text=t) for t in input_text])
with f:
c = Client(port=dep.port, protocol=dep.protocol)
output_da = c.post('/', inputs=input_da, request_size=10, return_type=DocList[TextDoc], results_in_order=True)
for input, output in zip(input_da, output_da):
assert input.text == output.text
```
---
# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/client/third-party-clients.md
(third-party-client)=
# Third-party clients
This page is about accessing the Flow with other clients, e.g. `curl`, or programming languages other than Python.
````{admonition} Mostly developed for docarray<0.30
:class: note
Note that most of these clients have been developed for versions of Jina compatible with `docarray<0.30.0`. This means, they will only be able to communicate with services
using Jina-serve with docarray<0.30.0
````
## Golang
Our [Go Client](https://github.com/jina-ai/client-go) supports gRPC, HTTP and WebSocket protocols, allowing you to connect to Jina-serve from your Go applications.
## PHP
A big thanks to our community member [Jonathan Rowley](https://jina-ai.slack.com/team/U03973EA7BN) for developing a [PHP client](https://github.com/Dco-ai/php-jina) for Jina-serve!
## Kotlin
A big thanks to our community member [Peter Willemsen](https://jina-ai.slack.com/team/U03R0KNBK98) for developing a [Kotlin client](https://github.com/peterwilli/JinaKotlin) for Jina-serve!
(http-interface)=
## HTTP
```{admonition} Available Protocols
:class: caution
Jina-serve Flows can use one of {ref}`three protocols `: gRPC, HTTP, or WebSocket.
Only Flows that use HTTP can be accessed via the methods described below.
```
Apart from using the {ref}`Jina Client `, the most common way of interacting with your deployed Flow is via HTTP.
You can always use `post` to interact with a Flow, using the `/post` HTTP endpoint.
With the help of [OpenAPI schema](https://swagger.io/specification/), one can send data requests to a Flow via `cURL`, JavaScript, [Postman](https://www.postman.com/), or any other HTTP client or programming library.
(http-arguments)=
### Arguments
Your HTTP request can include the following parameters:
| Name | Required | Description | Example |
| ---------------- | ------------ | -------------------------------------------------------------------------------------- | ------------------------------------------------- |
| `execEndpoint` | **required** | Executor endpoint to target | `"execEndpoint": "/index"` |
| `data` | optional | List specifying the input [Documents](https://docarray.jina.ai/fundamentals/document/) | `"data": [{"text": "hello"}, {"text": "world"}]`. |
| `parameters` | optional | Dictionary of parameters to be sent to the Executors | `"parameters": {"param1": "hello world"}` |
| `targetExecutor` | optional | String indicating an Executor to target. Default targets all Executors | `"targetExecutor": "MyExec"` |
Instead of using the generic `/post` endpoint, you can directly use endpoints like `/index` or `/search` to perform a specific operation.
In this case your data request is sent to the corresponding Executor endpoint, so you don't need to specify the parameter `execEndpoint`.
`````{dropdown} Example
````{tab} cURL
```{code-block} bash
---
emphasize-lines: 2
---
curl --request POST \
'http://localhost:12345/search' \
--header 'Content-Type: application/json' -d '{"data": [{"text": "hello world"}]}'
```
````
````{tab} javascript
```{code-block} javascript
---
emphasize-lines: 2
---
fetch(
'http://localhost:12345/search',
{
method: 'POST',
headers: {
'Content-Type': 'application/json'
},
body: JSON.stringify({"data": [{"text": "hello world"}]})
}).then(response => response.json()).then(data => console.log(data));
```
````
`````
The response you receive includes `data` (an array of [Documents](https://docarray.jina.ai/fundamentals/document/)), as well as the fields `routes`, `parameters`, and `header`.
```{admonition} See also: Flow REST API
:class: seealso
For a more detailed description of the REST API of a generic Flow, including the complete request body schema and request samples, please check:
1. [OpenAPI Schema](https://schemas.jina.ai/rest/latest.json)
2. [Redoc UI](https://schemas.jina.ai/rest/)
For a specific deployed Flow, you can get the same overview by accessing the `/redoc` endpoint.
```
(swagger-ui)=
### Use cURL
Here's an example that uses `cURL`:
```bash
curl --request POST 'http://localhost:12345/post' --header 'Content-Type: application/json' -d '{"data": [{"text": "hello world"}],"execEndpoint": "/search"}'
```
````{dropdown} Sample response
```
{
"requestId": "e2978837-e5cb-45c6-a36d-588cf9b24309",
"data": {
"docs": [
{
"id": "84d9538e-f5be-11eb-8383-c7034ef3edd4",
"granularity": 0,
"adjacency": 0,
"parentId": "",
"text": "hello world",
"chunks": [],
"weight": 0.0,
"matches": [],
"mimeType": "",
"tags": {
"mimeType": "",
"parentId": ""
},
"location": [],
"offset": 0,
"embedding": null,
"scores": {},
"modality": "",
"evaluations": {}
}
],
"groundtruths": []
},
"header": {
"execEndpoint": "/index",
"targetPeapod": "",
"noPropagate": false
},
"parameters": {},
"routes": [
{
"pod": "gateway",
"podId": "5742d5dd-43f1-451f-88e7-ece0588b7557",
"startTime": "2021-08-05T07:26:58.636258+00:00",
"endTime": "2021-08-05T07:26:58.636910+00:00",
"status": null
}
],
"status": {
"code": 0,
"description": "",
"exception": null
}
}
```
````
### Use JavaScript
Sending a request from the front-end JavaScript code is a common use case too. Here's how this looks:
```javascript
fetch('http://localhost:12345/post', {
method: 'POST',
headers: {
'Content-Type': 'application/json'
},
body: JSON.stringify({"data": [{"text": "hello world"}],"execEndpoint": "/search"})
}).then(response => response.json()).then(data => console.log(data));
```
````{dropdown} Output
```javascript
{
"data": [
{
"id": "37e6f1bc7ec82fc4ba75691315ae54a6",
"text": "hello world"
"matches": ...
},
"header": {
"requestId": "c725217aa7714de88039866fb5aa93d2",
"execEndpoint": "/index",
"targetExecutor": ""
},
"routes": [
{
"executor": "gateway",
"startTime": "2022-04-01T13:11:57.992497+00:00",
"endTime": "2022-04-01T13:11:57.997802+00:00"
},
{
"executor": "executor0",
"startTime": "2022-04-01T13:11:57.993686+00:00",
"endTime": "2022-04-01T13:11:57.997274+00:00"
}
],
]
}
```
````
### Use Swagger UI
Flows provide a customized [Swagger UI](https://swagger.io/tools/swagger-ui/) which you can use to visually interact with the Flow
through a web browser.
```{admonition} Available Protocols
:class: caution
Only Flows that have enabled {ref}`CORS ` expose the Swagger UI interface.
```
For a Flow that is exposed on port `PORT`, you can navigate to the Swagger UI at `http://localhost:PORT/docs`:
```{figure} ../../../.github/2.0/swagger-ui.png
:align: center
```
Here you can see all the endpoints that are exposed by the Flow, such as `/search` and `/index`.
To send a request, click on the endpoint you want to target, then `Try it out`.
Now you can enter your HTTP request, and send it by clicking `Execute`.
You can again use the [REST HTTP request schema](https://schemas.jina.ai/rest/), but do not need to specify `execEndpoint`.
Below, in `Responses`, you can see the reply, together with a visual representation of the returned Documents.
### Use Postman
[Postman](https://www.postman.com/) is an application that allows the testing of web APIs from a graphical interface. You can store all the templates for your REST APIs in it, using Collections.
We provide a [suite of templates for Jina Flow](https://github.com/jina-ai/jina/tree/master/.github/Jina.postman_collection.json). You can import it in Postman in **Collections**, with the **Import** button. It provides templates for the main operations. You need to create an Environment to define the `{{url}}` and `{{port}}` environment variables. These would be the hostname and the port where the Flow is listening.
This contribution was made by [Jonathan Rowley](https://jina-ai.slack.com/archives/C0169V26ATY/p1649689443888779?thread_ts=1649428823.420879&cid=C0169V26ATY), in our [community Slack](https://slack.jina.ai).
## gRPC
To use the gRPC protocol with a language other than Python you will need to:
* Download the two proto definition files: `jina.proto` and `docarray.proto` from [GitHub](https://github.com/jina-ai/jina/tree/master/jina/proto) (be sure to use the latest release branch)
* Compile them with [protoc](https://grpc.io/docs/protoc-installation/) and specify which programming language you want to compile them with.
* Add the generated files to your project and import them into your code.
You should finally be able to communicate with your Flow using the gRPC protocol. You can find more information on the gRPC
`message` and `service` that you can use to communicate in the [Protobuf documentation](../../proto/docs.md).
(flow-graphql)=
## GraphQL
````{admonition} See Also
:class: seealso
This article does not serve as the introduction to GraphQL.
If you are not already familiar with GraphQL, we recommend you learn more about GraphQL from the [official documentation](https://graphql.org/learn/).
You may also want to learn about [Strawberry](https://strawberry.rocks/), the library that powers Jina-serve's GraphQL support.
````
Jina Flows that use the HTTP protocol can also provide a GraphQL API, which is located behind the `/graphql` endpoint.
GraphQL has the advantage of letting you define your own response schema, which means that only the fields you require
are sent over the wire.
This is especially useful when you don't need potentially large fields, like image tensors.
You can access the Flow from any GraphQL client, like `sgqlc`.
```python
from sgqlc.endpoint.http import HTTPEndpoint
HOSTNAME, PORT = ...
endpoint = HTTPEndpoint(url=f'{HOSTNAME}:{PORT}/graphql')
mut = '''
mutation {
docs(data: {text: "abcd"}) {
id
matches {
embedding
}
}
}
'''
response = endpoint(mut)
```
## WebSocket
WebSocket uses persistent connections between the client and Flow, hence allowing streaming use cases.
While you can always use the Python client to stream requests like any other protocol, WebSocket allows streaming JSON from anywhere (CLI / Postman / any other programming language).
You can use the same set of arguments as {ref}`HTTP ` in the payload.
We use [subprotocols](https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_API/Writing_WebSocket_servers#subprotocols) to separate streaming JSON vs bytes.
The Flow defaults to `json` if you don't specify a sub-protocol while establishing the connection (Our Python client uses `bytes` streaming by using [jina-serve.proto](../../proto/docs.md) definition).
````{Hint}
* Choose WebSocket over HTTP if you want to stream requests.
* Choose WebSocket over gRPC if
* you want to stream using JSON, not bytes.
* your client language doesn't support gRPC.
* you don't want to compile the [Protobuf definitions](../../proto/docs.md) for your gRPC client.
````
## See also
* {ref}`Access a Flow with the Client `
* {ref}`Configure a Flow `
* [Flow REST API reference](https://schemas.jina.ai/rest/)
---
# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/client/transient-errors.md
(transient-errors)=
# Transient Errors
Most transient errors can be attributed to network issues between the client and target server or between a server's
dependencies like a database. The errors can be:
* ignored if an operation produced by a generator or sequence of operations isn't relevant to the overall success.
* retried up to a certain limit which assumes that the recovery logic kicks in to repair transient errors.
* accept that the operation cannot be successfully completed.
## Transient fault handling with retries
The {meth}`~jina.clients.mixin.PostMixin.post` method accepts `max_attempts`, `initial_backoff`, `max_backoff`
and `backoff_multiplier` parameters to control the capacity to retry requests when a transient connectivity error
occurs, using an exponential backoff strategy.
This can help to overcome transient network connectivity issues which are broadly captured by the
{class}`~grpc.aio.AioRpcError`, {class}`~aiohttp.ClientError`, {class}`~asyncio.CancelledError` and
{class}`~jina.excepts.InternalNetworkError`
exception types.
The `max_attempts` parameter determines the number of sending attempts, including the original request.
The `initial_backoff`, `max_backoff`, and `backoff_multiplier` parameters determine the randomized delay in seconds
before retry attempts.
The initial retry attempt will occur at `initial_backoff`. In general, the *n-th* attempt will occur
at `random(0, min(initial_backoff*backoff_multiplier**(n-1), max_backoff))`.
### Handling gRPC retries for streaming and unary RPC methods
The {meth}`~jina.clients.mixin.PostMixin.post` method supports the `stream` boolean parameter (defaults to `True`). If
set to `True`,
the **gRPC** server side streaming RPC method will be invoked. If set to `False`, the server side unary RPC method will
be invoked. Some important implication of
using retries with **gRPC** are:
* The built-in **gRPC** retries are limited in scope and are implemented to work under certain circumstances. More
details are specified in the [design document](https://github.com/grpc/proposal/blob/master/A6-client-retries.md).
* If the `stream` parameter is set to True and if the `inputs` parameters is a `GeneratorType` or
an `Iterable`, the retry must be handled as below because the result must be consumed to check for errors in the
stream of responses. The **gRPC** service retry is still configured but cannot be guaranteed.
```python
from jina import Client
from dorcarray import BaseDoc
from jina.clients.base.retry import wait_or_raise_err
from jina.helper import run_async
client = Client(host='grpc://localhost:12345')
max_attempts = 5
initial_backoff = 0.8
backoff_multiplier = 1.5
max_backoff = 5
def input_generator():
for _ in range(10):
yield BaseDoc()
for attempt in range(1, max_attempts + 1):
try:
response = client.post(
'/',
inputs=input_generator(),
request_size=2,
timeout=0.5,
)
assert len(response) == 1
except ConnectionError as err:
run_async(
wait_or_raise_err,
attempt=attempt,
err=err,
max_attempts=max_attempts,
backoff_multiplier=backoff_multiplier,
initial_backoff=initial_backoff,
max_backoff=max_backoff,
)
```
* If the `stream` parameter is set to True and the `inputs` parameter is a `Document` or a `DocList`, the retry is
handled internally on the `max_attempts`, `initial_backoff`, `backoff_multiplier` and `max_backoff`
parameters.
* If the `stream` parameter is set to False, the {meth}`~jina.clients.mixin.PostMixin.post` method invokes the unary
RPC method and the
retry is handled internally.
```{hint}
The retry parameters `max_attempts`, `initial_backoff`, `backoff_multiplier` and `max_backoff` of the {meth}`~jina.clients.mixin.PostMixin.post` method will be used to set the **gRPC** retry service options. This improves the chances of success if the gRPC retry conditions are met.
```
## Continue streaming when an Executor error occurs
The {meth}`~jina.clients.mixin.PostMixin.post` accepts a `continue_on_error` parameter. When set to `True`, the Client
will keep trying to send the remaining requests. The `continue_on_error` parameter will only apply
to Exceptions caused by an Executor, but in case of network connectivity issues, an Exception will be raised.
The `continue_on_error` parameter handles the errors that are returned by the Executor as part of its response. The
errors can be logical errors that might be raised
during the execution of the operation. This doesn't include transient errors represented by
{class}`~grpc.aio.AioRpcError`, {class}`~aiohttp.ClientError`, {class}`~asyncio.CancelledError` and
{class}`~jina.excepts.InternalNetworkError` triggered during the Gateway and Executor communication.
The `retries` parameter of the Gateway control the number of retries for the transient errors that arise between the
Gateway and Executor communication.
```{hint}
Refer to {ref}`Network Errors ` section for more information.
```
## Retries with a large inputs or long-running operations
When using the gRPC client, it is recommended to set the `stream` parameter to False so that the unary RPC is invoked by
the {class}`~jina.Client`
which performs the retry internally with the request from the `inputs` iterator or generator. The `request_size`
parameter must also be set to perform smaller operations which can be retried without much overhead on the server.
The **HTTP** and **WebSocket**
```{hint}
Refer to {ref}`Callbacks ` section for dealing with success and failures after retries.
```
---
# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/jcloud/configuration.md
(jcloud-configuration)=
# {octicon}`file-code` Configuration
JCloud extends Jina-serve's {ref}`Flow YAML specification` by introducing the special field `jcloud`. This lets you define resources and scaling policies for each Executor and Gateway.
Here's a Flow with two Executors that have specific resource needs: `indexer` requires a 10 GB `ebs` disk, whereas `encoder` requires a G4 instance, which implies that two cores and 4 GB RAM are used. See the below sections for further information about instance types.
```{code-block} yaml
---
emphasize-lines: 5-7,10-16
---
jtype: Flow
executors:
* name: encoder
uses: jinaai+docker:///Encoder
jcloud:
resources:
instance: C4
* name: indexer
uses: jinaai+docker:///Indexer
jcloud:
resources:
storage:
kind: ebs
size: 10G
```
## Allocate Executor resources
Since each Executor has its own business logic, it may require different cloud resources. One Executor might need more RAM, whereas another might need a bigger disk.
In JCloud, you can pass highly customizable, finely-grained resource requests for each Executor using the `jcloud.resources` argument in your Flow YAML.
### Instance
JCloud uses the concept of an "instance" to represent a specific set of hardware specifications.
In the above example, a C4 instance type represents two cores and 4 GB RAM based on the CPU tiers instance definition table below.
````{admonition} Note
:class: note
We will translate the raw numbers from input to instance tier that fits most closely if you are still using the legacy resource specification interface, such as:
```{code-block} yaml
jcloud:
resources:
cpu: 8
memory: 8G
```
There are circumstances in the instance tier where they don't exactly fulfill the CPU cores and memory you need, like in the above example.
In cases like this we "ceil" the requests to the lowest tier that satisfies all the specifications.
In this case, `C6` would be considered, as `C5`'s `Cores` are lower than what's being requested (4 vs 8).
````
There are also two types of instance tiers, one for CPU instances, one for GPU.
(jcloud-pricing)=
#### Pricing
Each instance has a fixed `Credits Per Hour` number, indicating how many credits JCloud will charge
if a certain instance is used. For example, if an Executor uses `C3`, it implies that `10` credits will be spent
from the operating user account. Other important facts to note:
* If the Flow is powering other App(s) you create, you will be charged by the App(s), not the underlying Flow.
* `Credits Per Hour` is on an Executor/Gateway basis, the total `Credits Per Hour` of a Flow is the sum of all the credits
each components cost.
* If shards/replicas are used in an Executor/Gateway, the same instance type will be used, so `Credits Per Hour` will be multiplied.
For example, if an Executor uses `C3` and it has two replicas, the `Credits Per Hour` for the Executor would double to `20`.
The only exception is when sharding is used. In that case `C1` would be used for the shards head, regardless of what instance type has been entered for the shared Executor.
```{hint}
Please visit [Jina AI Cloud Pricing](https://cloud.jina.ai/pricing/) for more information about billing and credits.
```
#### CPU tiers
| Instance | Cores | Memory | Credits per hour |
| -------- | ----- | ------ | ---------------- |
| C1 | 0.1 | 0.2 GB | 1 |
| C2 | 0.5 | 1 GB | 5 |
| C3 | 1 | 2 GB | 10 |
| C4 | 2 | 4 GB | 20 |
| C5 | 4 | 8 GB | 40 |
| C6 | 8 | 16 GB | 80 |
| C7 | 16 | 32 GB | 160 |
| C8 | 32 | 64 GB | 320 |
By default, C1 is allocated to each Executor and Gateway.
JCloud offers the general Intel Xeon processor (Skylake 8175M or Cascade Lake 8259CL) for the CPU instances.
#### GPU tiers
JCloud supports GPU workloads with two different usages: `shared` or `dedicated`.
If GPU is enabled, JCloud will provide NVIDIA A10G Tensor Core GPUs with 24 GB memory for workloads in both usage types.
```{hint}
When using GPU resources, it may take a few extra minutes before all Executors are ready to serve traffic.
```
| Instance | GPU | Memory | Credits per hour |
| -------- | ------ | ------ | ---------------- |
| G1 | shared | 14 GB | 100 |
| G2 | 1 | 14 GB | 125 |
| G3 | 2 | 24 GB | 250 |
| G4 | 4 | 56 GB | 500 |
##### Shared GPU
An Executor using a `shared` GPU shares this GPU with up to four other Executors.
This enables time-slicing, which allows workloads that land on oversubscribed GPUs to interleave with one another.
To use `shared` GPU, `G1` needs to be specified as the instance type.
The tradeoffs with a `shared` GPU are increased latency, jitter, and potential out-of-memory (OOM) conditions when many different applications are time-slicing on the GPU. If your application is consuming a lot of memory, we suggest using a dedicated GPU.
##### Dedicated GPU
Using a dedicated GPU is the default way to provision a GPU for an Executor. This automatically creates nodes or assigns the Executor to a GPU node. In this case, the Executor owns the whole GPU.
To use a `dedicated` GPU, `G2`/ `G3` / `G4` needs to be specified as instance type.
### Storage
JCloud supports three kinds of storage: ephemeral (default), [efs](https://aws.amazon.com/efs/) (network file storage) and [ebs](https://aws.amazon.com/ebs/) (block device).
`ephemeral` storage will assign space to an Executor when it is created. Data in `ephemeral` storage is deleted permanently if Executors are restarted or rescheduled.
````{hint}
By default, we assign `ephemeral` storage to all Executors in a Flow. This lets the storage resize dynamically, so you don't need to shrink/grow volumes manually.
If your Executor needs to share data with other Executors and retain data persistency, consider using `efs`. Note that:
* IO performance is slower compared to `ebs` or `ephemeral`
* The disk can be shared with other Executors or Flows.
* Default storage size is 5 GB.
If your Executor needs high IO, you can use `ebs` instead. Note that:
* The disk cannot be shared with other Executors or Flows.
* Default storage size is 5 GB.
````
JCloud also supports retaining the data that a Flow was using while it was active. You can set the `retain` argument to `true` to enable this feature.
```{code-block} yaml
---
emphasize-lines: 5-10,12,15
---
jtype: Flow
executors:
* name: executor1
uses: jinaai+docker:///Executor1
jcloud:
resources:
storage:
kind: ebs
size: 10G
retain: true
* name: executor2
uses: jinaai+docker:///Executor2
jcloud:
resources:
storage:
kind: efs
```
#### Pricing (2)
Here are the numbers in terms of credits per GB per month for the three kinds of storage described above.
| Instance | Credits per GB per month |
| --------- | ------------------------ |
| Ephemeral | 0 |
| EBS | 30 |
| EFS | 75 |
For example, using 10 GB of EBS storage for a month costs `30` credits.
If shards/replicas are used, we will multiply credits further by the number of storages created.
## Scale out Executors
On JCloud, demand-based autoscaling functionality is naturally offered thanks to the underlying Kubernetes architecture. This means that you can maintain [serverless](https://en.wikipedia.org/wiki/Serverless_computing) deployments in a cost-effective way with no headache of setting the [right number of replicas](https://jina.ai/serve/how-to/scale-out/#scale-out-your-executor) anymore!
### Autoscaling with `jinaai+serverless://`
The easiest way to scale out your Executor is to use a Serverless Executor. This can be enabled by using `jinaai+serverless://` instead of `jinaai+docker://` in Executor's `uses`, such as:
```{code-block} yaml
---
emphasize-lines: 4
---
jtype: Flow
executors:
* name: executor1
uses: jinaai+serverless:///Executor1
```
JCloud autoscaling leverages [Knative](https://knative.dev/docs/) behind the scenes, and `jinahub+serverless` uses a set of Knative configurations as defaults.
```{hint}
For more information about the Knative autoscaling configurations, please visit [Knative autoscaling](https://knative.dev/docs/serving/autoscaling/).
```
### Autoscaling with custom args
If `jinaai+serverless://` doesn't meet your requirements, you can further customize autoscaling configurations by using the `autoscale` argument on a per-Executor basis in the Flow YAML, such as:
```{code-block} yaml
---
emphasize-lines: 5-10
---
jtype: Flow
executors:
* name: executor1
uses: jinaai+docker:///Executor1
jcloud:
autoscale:
min: 1
max: 2
metric: rps
target: 50
```
Below are the defaults and requirements for the configurations:
| Name | Default | Allowed | Description |
| ------ | ----------- | ------------------------ | ------------------------------------------------- |
| min | 1 | int | Minimum number of replicas (`0` means serverless) |
| max | 2 | int, up to 5 | Maximum number of replicas |
| metric | concurrency | `concurrency` / `rps` / `cpu` / `memory` | Metric for scaling |
| scale_down_delay | 30s | str, `0s` <= value <= `1h` | Time window which must pass at reduced concurrency before a scaling down |
| target | 100 | int | Target number the replicas try to maintain. |
The unit of `target` depends of the metric specified. Refer to the table below:
| Metric | Target |
| ---- | ----- |
| `concurrency` | Number of concurrent requests processed at any given time. |
| `rps` | Number of requests processed per second per replica. |
| `cpu` | Average % CPU utilization of each pod
(e.g. `60` means replicas will be scaled up when pods on average reach 60% CPU utilization) |
| `memory` | Average mebibytes of memory used by each pod
(e.g. `200` means replicas will be scaled up when the average pods' memory consumption exceeds 200MiB). |
After you make a JCloud deployment using the autoscaling configuration, the Flow serving part is just the same: the only difference you may notice is it takes a few extra seconds to handle the initial requests since it needs to scale the deployments behind the scenes. Let JCloud handle the scaling from now on, and you can deal with the code!
Note, that if `metric` is `cpu` or `memory`, `min` will be reset to 1 if user sets it to set to 0.
### Pricing (3)
At present, pricing for autoscaled Executor/Gateway largely follows the same {ref}`JCloud pricing rules ` as other Jina AI services.
We track the minimum number of replicas in autoscale configurations and use it as a multiplier for the replicas used when calculating the
`Credits Per Hour`.
### Restrictions
```{admonition} **Restrictions**
* Autoscale does not currently allow the use of `ebs` as a storage type in combination. Please use `efs` and `ephemeral` instead.
* Autoscale is not supported for multi-protocol Gateways.
```
## Configure availability tolerance
If service issues cause disruption of Executors, JCloud lets you specify a tolerance level for number of replicas that stay up or go down.
The JCloud parameters `minAvailable` and `maxUnavailable` ensure that Executors will stay up even if a certain number of replicas go down.
| Name | Default | Allowed | Description |
| :--------------- | :-----: | :---------------------------------------------------------------------------------------: | :------------------------------------------------------- |
| `minAvailable` | N/A | Lower than number of [replicas](https://jina.ai/serve/concepts/flow/scale-out/#scale-out) | Minimum number of replicas available during disruption |
| `maxUnavailable` | N/A | Lower than numbers of [replicas](https://jina.ai/serve/concepts/flow/scale-out/#scale-out) | Maximum number of replicas unavailable during disruption |
```{code-block} yaml
---
emphasize-lines: 5-6
---
jtype: Flow
executors:
* uses: jinaai+docker:///Executor1
replicas: 5
jcloud:
minAvailable: 2
```
In case of disruption, ensure at least two replicas will still be available, while three may be down.
```{code-block} yaml
---
emphasize-lines: 5-6
---
jtype: Flow
executors:
* uses: jinaai+docker:///Executor1
replicas: 5
jcloud:
maxUnavailable: 2
```
In case of disruption, ensure that if a maximum of two replicas are down, at least three replicas will still be available.
## Configure Gateway
The Gateway can be customized just like an Executor.
### Set timeout
By default, the Gateway will close connections that have been idle for over 600 seconds. If you want a longer connection timeout threshold, change the `timeout` parameter under `gateway.jcloud`.
```{code-block} yaml
---
emphasize-lines: 2-4
---
jtype: Flow
gateway:
jcloud:
timeout: 800
executors:
* name: executor1
uses: jinaai+docker:///Executor1
```
### Control Gateway resources
To customize the Gateway's CPU or memory, specify the instance type under `gateway.jcloud.resources`:
```{code-block} yaml
---
emphasize-lines: 2-6
---
jtype: Flow
gateway:
jcloud:
resources:
instance: C3
executors:
* name: encoder
uses: jinaai+docker:///Encoder
```
## Expose Executors
A Flow deployment without a Gateway is often used for {ref}`external-executors`, which can be shared between different Flows. You can expose an Executor by setting `expose: true` (and un-expose the Gateway by setting `expose: false`):
```{code-block} yaml
---
emphasize-lines: 2-4, 8-9
---
jtype: Flow
gateway:
jcloud:
expose: false # don't expose the Gateway
executors:
* name: custom
uses: jinaai+docker:///CustomExecutor
jcloud:
expose: true # expose the Executor
```
```{figure} img/expose-executor.png
:width: 70%
```
You can expose the Gateway along with Executors:
```{code-block} yaml
---
emphasize-lines: 2-4,8-9
---
jtype: Flow
gateway:
jcloud:
expose: true
executors:
* name: custom1
uses: jinaai+docker:///CustomExecutor1
jcloud:
expose: true # expose the Executor
```
```{figure} img/gateway-and-executors.png
:width: 70%
```
## Other deployment options
### Customize Flow name
You can use the `name` argument to specify the Flow name in the Flow YAML:
```{code-block} yaml
---
emphasize-lines: 2-3
---
jtype: Flow
jcloud:
name: my-name
executors:
* name: executor1
uses: jinaai+docker:///Executor1
```
### Specify Jina version
To control Jina's version while deploying a Flow to `jcloud`, you can pass the `version` argument in the Flow YAML:
```{code-block} yaml
---
emphasize-lines: 2-3
---
jtype: Flow
jcloud:
version: 3.10.0
executors:
* name: executor1
uses: jinaai+docker:///Executor1
```
### Add Labels
You can use `labels` (as key-value pairs) to attach metadata to your Flows and Executors:
Flow level `labels`:
```{code-block} yaml
---
emphasize-lines: 2-5
---
jtype: Flow
jcloud:
labels:
username: johndoe
app: fashion-search
executors:
* name: executor1
uses: jinaai+docker:///Executor1
```
Executor level `labels`:
```{code-block} yaml
---
emphasize-lines: 5-8
---
jtype: Flow
executors:
* name: executor1
uses: jinaai+docker:///Executor1
jcloud:
labels:
index: partial
group: backend
```
```{hint}
Keys in `labels` have the following restrictions:
* Must be 63 characters or fewer.
* Must begin and end with an alphanumeric character ([a-z0-9A-Z]) with dashes (-), underscores (_), dots (.), and alphanumerics between.
* The following keys are skipped if passed in the Flow YAML.
* `user`
* `jina`-version
```
### Monitoring
To enable [tracing support](https://jina.ai/serve/cloud-nativeness/opentelemetry/) in Flows, you can pass `enable: true` argument in the Flow YAML. (Tracing support is not enabled by default in JCloud)
```{code-block} yaml
---
emphasize-lines: 2-5
---
jtype: Flow
jcloud:
monitor:
traces:
enable: true
executors:
* name: executor1
uses: jinaai+docker:///Executor1
```
You can pass the `enable: true` argument to `gateway` to only enable tracing support in the Gateway:
```{code-block} yaml
---
emphasize-lines: 2-6
---
jtype: Flow
gateway:
jcloud:
monitor:
traces:
enable: true
executors:
* name: executor1
uses: jinaai+docker:///Executor1
```
You can also only enable tracing support in `executor1`.
```{code-block} yaml
---
emphasize-lines: 5-8
---
jtype: Flow
executors:
* name: executor1
uses: jinaai+docker:///Executor1
jcloud:
monitor:
traces:
enable: true
```
---
# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/jcloud/index.md
(jcloud)=
# Jina AI Cloud Hosting
```{toctree}
:hidden:
configuration
```
```{figure} https://jina.ai/serve/_images/jcloud-banner.png
:width: 0 %
:scale: 0 %
```
```{figure} img/jcloud-banner.png
:scale: 0 %
:width: 0 %
```
After building a Jina-serve project, the next step is to deploy and host it on the cloud. [Jina AI Cloud](https://cloud.jina.ai/) is Jina-serve's reliable, scalable and production-ready cloud-hosting solution that manages your project lifecycle without surprises or hidden development costs.
```{tip}
Are you ready to unlock the power of AI with Jina AI Cloud? Take a look at our [pricing options](https://cloud.jina.ai/pricing) now!
```
In addition to deploying Flows, `jcloud` supports the creation of secrets and jobs which are created in the Flow's namespace.
## Basics
Jina AI Cloud provides a CLI that you can use via `jina cloud` from the terminal (or `jcloud` or simply `jc` for minimalists.)
````{hint}
You can also install just the JCloud CLI without installing the Jina-serve package.
```bash
pip install jcloud
jc -h
```
If you installed the JCloud CLI individually, all of its commands fall under the `jc` or `jcloud` executable.
In case the command `jc` is already occupied by another tool, use `jcloud` instead. If your pip install doesn't register bash commands for you, you can run `python -m jcloud -h`.
````
For the rest of this section, we use `jc` or `jcloud`. But again they are interchangeable with `jina cloud`.
## Flows
### Deploy
In Jina's idiom, a project is a [Flow](https://jina.ai/serve/concepts/orchestration/flow/), which represents an end-to-end task such as indexing, searching or recommending. In this document, we use "project" and "Flow" interchangeably.
A Flow can have two types of file structure: a single YAML file or a project folder.
#### Single YAML file
A self-contained YAML file, consisting of all configuration at the [Flow](https://jina.ai/serve/concepts/orchestration/flow/)-level and [Executor](https://jina.ai/serve/concepts/serving/executor/)-level.
> All Executors' `uses` must follow the format `jinaai+docker:///MyExecutor` (from [Executor Hub](https://cloud.jina.ai)) to avoid any local file dependencies:
```yaml
# flow.yml
jtype: Flow
executors:
* name: sentencizer
uses: jinaai+docker://jina-ai/Sentencizer
```
To deploy:
```bash
jc flow deploy flow.yml
```
````{caution}
When `jcloud` deploys a flow it automatically appends the following global arguments to the `flow.yml`, if not present:
```yaml
jcloud:
version: jina-version
docarray: docarray-version
```
The `jina` and `docarray` corresponds to your development environment's `jina` and `docarray` versions.
````
````{tip}
We recommend testing locally before deployment:
```bash
jina flow --uses flow.yml
```
````
#### Project folder
````{tip}
The best practice for creating a Jina AI Cloud project is to use:
```bash
jc new
```text
````
Just like a regular Python project, you can have sub-folders of Executor implementations and a `flow.yml` on the top-level to connect all Executors together.
You can create an example local project using `jc new hello`. The default structure looks like:
```text
├── .env
├── executor1
│ ├── config.yml
│ ├── executor.py
│ └── requirements.txt
└── flow.yml
```
Where:
* `hello/` is your top-level project folder.
* `executor1` directory has all Executor related code/configuration. You can read the best practices for [file structures](https://jina.ai/serve/concepts/serving/executor/file-structure/). Multiple Executor directories can be created.
* `flow.yml` Your Flow YAML.
* `.env` All environment variables used during deployment.
To deploy:
```bash
jc flow deploy hello
```
The Flow is successfully deployed when you see:
```{figure} img/deploy.png
:width: 70%
```
---
You will get a Flow ID, say `merry-magpie-82b9c0897f`. This ID is required to manage, view logs and remove the Flow.
As this Flow is deployed with the default gRPC gateway (feel free to change it to `http` or `websocket`), you can use `jina.Client` to access it:
```python
from jina import Client, Document
print(
Client(host='grpcs://merry-magpie-82b9c0897f.wolf.jina.ai').post(
on='/', inputs=Document(text='hello')
)
)
```
(jcloud-flow-status)=
### Get status
To get the status of a Flow:
```bash
jc flow status merry-magpie-82b9c0897f
```
```{figure} img/status.png
:width: 70%
```
### Monitoring
Basic monitoring is provided to Flows deployed on Jina AI Cloud.
To access the [Grafana](https://grafana.com/)-powered dashboard, first get {ref}`the status of the Flow`. The `Grafana Dashboard` link is displayed at the bottom of the pane. Visit the URL to find basic metrics like 'Number of Request Gateway Received' and 'Time elapsed between receiving a request and sending back the response':
```{figure} img/monitoring.png
:width: 80%
```
### List Flows
To list all of your "Starting", "Serving", "Failed", "Updating", and "Paused" Flows:
```bash
jc flows list
```
```{figure} img/list.png
:width: 90%
```
You can also filter your Flows by passing a phase:
```bash
jc flows list --phase Deleted
```
```{figure} img/list_deleted.png
:width: 90%
```
Or see all Flows:
```bash
jc flows list --phase all
```
```{figure} img/list_all.png
:width: 90%
```
### Remove Flows
You can remove a single Flow, multiple Flows or even all Flows by passing different identifiers.
To remove a single Flow:
```bash
jc flow remove merry-magpie-82b9c0897f
```
To remove multiple Flows:
```bash
jc flow remove merry-magpie-82b9c0897f wondrous-kiwi-b02db6a066
```
To remove all Flows:
```bash
jc flow remove all
```
By default, removing multiple or all Flows is an interactive process where you must give confirmation before each Flow is deleted. To make it non-interactive, set the below environment variable before running the command:
```bash
export JCLOUD_NO_INTERACTIVE=1
```
### Update a Flow
You can update a Flow by providing an updated YAML.
To update a Flow:
```bash
jc flow update super-mustang-c6cf06bc5b flow.yml
```
```{figure} img/update_flow.png
:width: 70%
```
### Pause / Resume Flow
You have the option to pause a Flow that is not currently in use but may be needed later. This will allow the Flow to be resumed later when it is needed again by using `resume`.
To pause a Flow:
```bash
jc flow pause super-mustang-c6cf06bc5b
```
```{figure} img/pause_flow.png
:width: 70%
```
To resume a Flow:
```bash
jc flow resume super-mustang-c6cf06bc5b
```
```{figure} img/resume_flow.png
:width: 70%
```
### Restart Flow, Executor or Gateway
If you need to restart a Flow, there are two options: restart all Executors and the Gateway associated with the Flow, or selectively restart only a specific Executor or the Gateway.
To restart a Flow:
```bash
jc flow restart super-mustang-c6cf06bc5b
```
```{figure} img/restart_flow.png
:width: 70%
```
To restart the Gateway:
```bash
jc flow restart super-mustang-c6cf06bc5b --gateway
```
```{figure} img/restart_gateway.png
:width: 70%
```
To restart an Executor:
```bash
jc flow restart super-mustang-c6cf06bc5b --executor executor0
```
```{figure} img/restart_executor.png
:width: 70%
```
### Recreate a Deleted Flow
To recreate a deleted Flow:
```bash
jc flow recreate profound-rooster-eec4b17c73
```
```{figure} img/recreate_flow.png
:width: 70%
```
### Scale an Executor
You can also manually scale any Executor.
```bash
jc flow scale good-martin-ca6bfdef84 --executor executor0 --replicas 2
```
```{figure} img/scale_executor.png
:width: 70%
```
### Normalize a Flow
To normalize a Flow:
```bash
jc flow normalize flow.yml
```
```{hint}
Normalizing a Flow is the process of building the Executor image and pushing the image to Hubble.
```
### Get Executor or Gateway logs
To get the Gateway logs:
```bash
jc flow logs --gateway central-escargot-354a796df5
```
```{figure} img/gateway_logs.png
:width: 70%
```
To get the Executor logs:
```bash
jc flow logs --executor executor0 central-escargot-354a796df5
```
```{figure} img/executor_logs.png
:width: 70%
```
## Secrets
### Create a Secret
To create a Secret for a Flow:
```bash
jc secret create mysecret rich-husky-af14064067 --from-literal "{'env-name': 'secret-value'}"
```
```{tip}
You can optionally pass the `--update` flag to automatically update the Flow spec with the updated secret information. This flag will update the Flow which is hosted on the cloud. Finally, you can also optionally pass a Flow's yaml file path with `--path` to update the yaml file locally. Refer to [this](https://jina.ai/serve/cloud-nativeness/kubernetes/#deploy-flow-with-custom-environment-variables-and-secrets) section for more information.
```
```{caution}
If the `--update` flag is not passed then you have to manually update the flow with `jc update flow rich-husky-af14064067 updated-flow.yml`
```
### List Secrets
To list all the Secrets created in a Flow's namespace:
```bash
jc secret list rich-husky-af14064067
```
```{figure} img/list_secrets.png
:width: 90%
```
### Get a Secret
To retrieve a Secret's details:
```bash
jc secret get mysecret rich-husky-af14064067
```
```{figure} img/get_secret.png
:width: 90%
```
### Remove Secret
```bash
jc secret remove rich-husky-af14064067 mysecret
```
### Update a Secret
You can update a Secret for a Flow.
```bash
jc secret update rich-husky-af14064067 mysecret --from-literal "{'env-name': 'secret-value'}"
```
```{tip}
You can optionally pass the `--update` flag to automatically update the Flow spec with the updated secret information. This flag will update the Flow which is hosted on the cloud. Finally, you can also optionally pass a Flow's yaml file path with `--path` to update the yaml file locally. Refer to [this](https://jina.ai/serve/cloud-nativeness/kubernetes/#deploy-flow-with-custom-environment-variables-and-secrets) section for more information.
```
```{caution}
Updating a Secret automatically restarts a Flow.
```
## Jobs
### Create a Job
To create a Job for a Flow:
```bash
jc job create job-name rich-husky-af14064067 image 'job entrypoint' --timeout 600 --backofflimit 2
```
```{tip}
`image` can be any Executor image passed to a Flow's Executor `uses` or any normal docker image prefixed with `docker://`
```
### List Jobs
To listg all Jobs created in a Flow's namespace:
```bash
jc jobs list rich-husky-af14064067
```
```{figure} img/list_jobs.png
:width: 90%
```
### Get a Job
To retrieve a Job's details:
```bash
jc job get myjob1 rich-husky-af14064067
```
```{figure} img/get_job.png
:width: 90%
```
### Remove Job
```bash
jc job remove rich-husky-af14064067 myjob1
```
### Get Job Logs
To get the Job logs:
```bash
jc job logs myjob1 -f rich-husky-af14064067
```
```{figure} img/job_logs.png
:width: 90%
```
## Deployments
### Deploy (2)
```{caution}
When `jcloud` deploys a deployment it automatically appends the following global arguments to the `deployment.yml`, if not present:
```
```yaml
jcloud:
version: jina-version
docarray: docarray-version
```
#### Single YAML file (2)
A self-contained YAML file, consisting of all configuration information at the [Deployment](https://jina.ai/serve/concepts/orchestration/deployment/)-level and [Executor](https://jina.ai/serve/concepts/serving/executor/)-level.
> A Deployment's `uses` parameter must follow the format `jinaai+docker:///MyExecutor` (from [Executor Hub](https://cloud.jina.ai)) to avoid any local file dependencies:
```yaml
# deployment.yml
jtype: Deployment
with:
protocol: grpc
uses: jinaai+docker://jina-ai/Sentencizer
```
To deploy:
```bash
jc deployment deploy ./deployment.yaml
```
The Deployment is successfully deployed when you see:
```{figure} img/deployment/deploy.png
:width: 70%
```
---
You will get a Deployment ID, for example `pretty-monster-130a5ac952`. This ID is required to manage, view logs, and remove the Deployment.
Since this Deployment is deployed with the default gRPC protocol (feel free to change it to `http`), you can use `jina.Client` to access it:
```python
from jina import Client, Document
print(
Client(host='grpcs://executor-pretty-monster-130a5ac952.wolf.jina.ai').post(
on='/', inputs=Document(text='hello')
)
)
```
(jcloud-deployoment-status)=
### Get status (2)
To get the status of a Deployment:
```bash
jc deployment status pretty-monster-130a5ac952
```
```{figure} img/deployment/status.png
:width: 70%
```
### List Deployments
To list all of your "Starting", "Serving", "Failed", "Updating", and "Paused" Deployments:
```bash
jc deployment list
```
```{figure} img/deployment/list.png
:width: 90%
```
You can also filter your Deployments by passing a phase:
```bash
jc deployment list --phase Deleted
```
```{figure} img/deployment/list_deleted.png
:width: 90%
```
Or see all Deployments:
```bash
jc deployment list --phase all
```
```{figure} img/deployment/list_all.png
:width: 90%
```
### Remove Deployments
You can remove a single Deployment, multiple Deployments, or even all Deployments by passing different commands to the `jc` executable at the command line.
To remove a single Deployment:
```bash
jc deployment remove pretty-monster-130a5ac952
```
To remove multiple Deployments:
```bash
jc deployment remove pretty-monster-130a5ac952 artistic-tuna-ab154c4dcc
```
To remove all Deployments:
```bash
jc deployment remove all
```
By default, removing all or multiple Deployments is an interactive process where you must give confirmation before each Deployment is deleted. To make it non-interactive, set the below environment variable before running the command:
```bash
export JCLOUD_NO_INTERACTIVE=1
```
### Update a Deployment
You can update a Deployment by providing an updated YAML.
To update a Deployment:
```bash
jc deployment update pretty-monster-130a5ac952 deployment.yml
```
```{figure} img/deployment/update.png
:width: 70%
```
### Pause / Resume Deployment
You have the option to pause a Deployment that is not currently in use but may be needed later. This will allow the Deployment to be resumed later when it is needed again by using `resume`.
To pause a Deployment:
```bash
jc deployment pause pretty-monster-130a5ac952
```
```{figure} img/deployment/pause.png
:width: 70%
```
To resume a Deployment:
```bash
jc eployment resume pretty-monster-130a5ac952
```
```{figure} img/deployment/resume.png
:width: 70%
```
### Restart Deployment
To restart a Deployment:
```bash
jc deployment restart pretty-monster-130a5ac952
```
```{figure} img/deployment/restart.png
:width: 70%
```
### Recreate a Deleted Deployment
To recreate a deleted Deployment:
```bash
jc deployment recreate pretty-monster-130a5ac952
```
```{figure} img/deployment/recreate.png
:width: 70%
```
### Scale a Deployment
You can also manually scale any Deployment.
```bash
jc deployment scale pretty-monster-130a5ac952 --replicas 2
```
```{figure} img/deployment/scale.png
:width: 70%
```
### Get Deployment logs
To get the Deployment logs:
```bash
jc deployment logs pretty-monster-130a5ac952
```
```{figure} img/deployment/logs.png
:width: 70%
```
## Configuration
Please refer to {ref}`Configuration ` for configuring the Flow on Jina AI Cloud.
## Restrictions
Jina AI Cloud scales according to your needs. You can demand different instance types with GPU/memory/CPU predefined based on the needs of your Flows and Executors. If you have specific resource requirements, please contact us [on Discord](https://discord.jina.ai) or raise a [GitHub issue](https://github.com/jina-ai/jcloud/issues/new/choose).
```{admonition} Restrictions
* Deployments are only supported in the `us-east` region.
```
---
# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/orchestration/add-executors.md
(add-executors)=
# Add Executors
## Define Executor with `uses`
An {class}`~jina.Executor`'s type is defined by the `uses` keyword:
````{tab} Deployment
```python
from jina import Deployment
dep = Deployment(uses=MyExec)
```
````
````{tab} Flow
```python
from jina import Flow
f = Flow().add(uses=MyExec)
```
````
Note that some usages are not supported on JCloud due to security reasons and the nature of facilitating local debugging.
| Local Dev | JCloud | `uses=...` | Description |
|-----------|--------|-----------------------------------------------|-----------------------------------------------------------------------------------------------------------|
| ✅ | ❌ | `ExecutorClass` | Use `ExecutorClass` from the inline context. |
| ✅ | ❌ | `'my.py_modules.ExecutorClass'` | Use `ExecutorClass` from `my.py_modules`. |
| ✅ | ✅ | `'executor-config.yml'` | Use an Executor from a YAML file defined by {ref}`Executor YAML interface `. |
| ✅ | ❌ | `'jinaai://jina-ai/TransformerTorchEncoder/'` | Use an Executor as Python source from Executor Hub. |
| ✅ | ✅ | `'jinaai+docker://jina-ai/TransformerTorchEncoder'` | Use an Executor as a Docker container from Executor Hub. |
| ✅ | ❌ | `'docker://sentence-encoder'` | Use a pre-built Executor as a Docker container. |
````{admonition} Hint: Load multiple Executors from the same directory
:class: hint
You don't need to specify the parent directory for each Executor.
Instead, you can configure a common search path for all Executors:
```
.
├── app
│ └── ▶ main.py
└── executor
├── config1.yml
├── config2.yml
└── my_executor.py
```
```{code-block} python
dep = Deployment(extra_search_paths=['../executor']).add(uses='config1.yml')) # Deployment
f = Flow(extra_search_paths=['../executor']).add(uses='config1.yml').add(uses='config2.yml') # Flow
```
````
(flow-configure-executors)=
## Configure Executors
You can set and override {class}`~jina.Executor` configuration when adding them to an Orchestration.
This example shows how to start a Flow with an Executor using the Python API:
````{tab} Deployment
```python
from jina import Deployment
dep = Deployment(
uses='MyExecutor',
py_modules=["executor.py"],
uses_with={"parameter_1": "foo", "parameter_2": "bar"},
uses_metas={
"name": "MyExecutor",
"description": "MyExecutor does a thing to the stuff in your Documents",
},
uses_requests={"/index": "my_index", "/search": "my_search", "/random": "foo"},
workspace="some_custom_path",
)
with dep:
...
```
````
````{tab} Flow
```python
from jina import Flow
f = Flow().add(
uses='MyExecutor',
py_modules=["executor.py"],
uses_with={"parameter_1": "foo", "parameter_2": "bar"},
uses_metas={
"name": "MyExecutor",
"description": "MyExecutor does a thing to the stuff in your Documents",
},
uses_requests={"/index": "my_index", "/search": "my_search", "/random": "foo"},
workspace="some_custom_path",
)
with f:
...
```
````
* `py_modules` is a list of strings that defines the Executor's Python dependencies;
* `uses_with` is a key-value map that defines the {ref}`arguments of the Executor'` `__init__` method.
* `uses_requests` is a key-value map that defines the {ref}`mapping from endpoint to class method`. This is useful to overwrite the default endpoint-to-method mapping defined in the Executor python implementation.
* `uses_metas` is a key-value map that defines some of the Executor's {ref}`internal attributes`. It contains the following fields:
* `name` is a string that defines the name of the Executor;
* `description` is a string that defines the description of this Executor. It is used in the automatic docs UI;
* `workspace` is a string that defines the {ref}`workspace `.
### Set `with` via `uses_with`
To set/override an Executor's `with` configuration, use `uses_with`. The `with` configuration refers to user-defined
constructor kwargs.
````{tab} Deployment
```python
from jina import Executor, requests, Deployment
class MyExecutor(Executor):
def __init__(self, param1=1, param2=2, param3=3, *args, **kwargs):
super().__init__(*args, **kwargs)
self.param1 = param1
self.param2 = param2
self.param3 = param3
@requests
def foo(self, docs, **kwargs):
print('param1:', self.param1)
print('param2:', self.param2)
print('param3:', self.param3)
dep = Deployment(uses=MyExecutor, uses_with={'param1': 10, 'param3': 30})
with dep:
dep.post('/')
```
```text
executor0@219662[L]:ready and listening
gateway@219662[L]:ready and listening
Deployment@219662[I]:🎉 Deployment is ready to use!
🔗 Protocol: GRPC
🏠 Local access: 0.0.0.0:32825
🔒 Private network: 192.168.1.101:32825
🌐 Public address: 197.28.82.165:32825
param1: 10
param2: 2
param3: 30
```
````
````{tab} Flow
```python
from jina import Executor, requests, Flow
class MyExecutor(Executor):
def __init__(self, param1=1, param2=2, param3=3, *args, **kwargs):
super().__init__(*args, **kwargs)
self.param1 = param1
self.param2 = param2
self.param3 = param3
@requests
def foo(self, docs, **kwargs):
print('param1:', self.param1)
print('param2:', self.param2)
print('param3:', self.param3)
f = Flow().add(uses=MyExecutor, uses_with={'param1': 10, 'param3': 30})
with f:
f.post('/')
```
```text
executor0@219662[L]:ready and listening
gateway@219662[L]:ready and listening
Flow@219662[I]:🎉 Flow is ready to use!
🔗 Protocol: GRPC
🏠 Local access: 0.0.0.0:32825
🔒 Private network: 192.168.1.101:32825
🌐 Public address: 197.28.82.165:32825
param1: 10
param2: 2
param3: 30
```
````
### Set `requests` via `uses_requests`
You can set/override an Executor's `requests` configuration and bind methods to custom endpoints.
In the following code:
* We replace the endpoint `/foo` bound to the `foo()` function with both `/non_foo` and `/alias_foo`.
* We add a new endpoint `/bar` for binding `bar()`.
Note the `all_req()` function is bound to **all** endpoints except those explicitly bound to other functions, i.e. `/non_foo`, `/alias_foo` and `/bar`.
````{tab} Deployment
```python
from jina import Executor, requests, Deployment
class MyExecutor(Executor):
@requests
def all_req(self, parameters, **kwargs):
print(f'all req {parameters.get("recipient")}')
@requests(on='/foo')
def foo(self, parameters, **kwargs):
print(f'foo {parameters.get("recipient")}')
def bar(self, parameters, **kwargs):
print(f'bar {parameters.get("recipient")}')
dep = Deployment(
uses=MyExecutor,
uses_requests={
'/bar': 'bar',
'/non_foo': 'foo',
'/alias_foo': 'foo',
},
)
with dep
dep.post('/bar', parameters={'recipient': 'bar()'})
dep.post('/non_foo', parameters={'recipient': 'foo()'})
dep.post('/foo', parameters={'recipient': 'all_req()'})
dep.post('/alias_foo', parameters={'recipient': 'foo()'})
```
```text
executor0@221058[L]:ready and listening
gateway@221058[L]:ready and listening
Deployment@221058[I]:🎉 Deployment is ready to use!
🔗 Protocol: GRPC
🏠 Local access: 0.0.0.0:36507
🔒 Private network: 192.168.1.101:36507
🌐 Public address: 197.28.82.165:36507
bar bar()
foo foo()
all req all_req()
foo foo()
```
````
````{tab} Flow
```python
from jina import Executor, requests, Flow
class MyExecutor(Executor):
@requests
def all_req(self, parameters, **kwargs):
print(f'all req {parameters.get("recipient")}')
@requests(on='/foo')
def foo(self, parameters, **kwargs):
print(f'foo {parameters.get("recipient")}')
def bar(self, parameters, **kwargs):
print(f'bar {parameters.get("recipient")}')
f = Flow().add(
uses=MyExecutor,
uses_requests={
'/bar': 'bar',
'/non_foo': 'foo',
'/alias_foo': 'foo',
},
)
with f:
f.post('/bar', parameters={'recipient': 'bar()'})
f.post('/non_foo', parameters={'recipient': 'foo()'})
f.post('/foo', parameters={'recipient': 'all_req()'})
f.post('/alias_foo', parameters={'recipient': 'foo()'})
```
```text
executor0@221058[L]:ready and listening
gateway@221058[L]:ready and listening
Flow@221058[I]:🎉 Flow is ready to use!
🔗 Protocol: GRPC
🏠 Local access: 0.0.0.0:36507
🔒 Private network: 192.168.1.101:36507
🌐 Public address: 197.28.82.165:36507
bar bar()
foo foo()
all req all_req()
foo foo()
```
````
### Set `metas` via `uses_metas`
To set/override an Executor's `metas` configuration, use `uses_metas`:
````{tab} Deployment
```python
from jina import Executor, requests, Deployment
class MyExecutor(Executor):
@requests
def foo(self, docs, **kwargs):
print(self.metas.name)
dep = Deployment(
uses=MyExecutor,
uses_metas={'name': 'different_name'},
)
with dep:
dep.post('/')
```
```text
executor0@219291[L]:ready and listening
gateway@219291[L]:ready and listening
Deployment@219291[I]:🎉 Deployment is ready to use!
🔗 Protocol: GRPC
🏠 Local access: 0.0.0.0:58827
🔒 Private network: 192.168.1.101:58827
different_name
```
````
````{tab} Flow
```python
from jina import Executor, requests, Flow
class MyExecutor(Executor):
@requests
def foo(self, docs, **kwargs):
print(self.metas.name)
flow = Flow().add(
uses=MyExecutor,
uses_metas={'name': 'different_name'},
)
with flow as f:
f.post('/')
```
```text
executor0@219291[L]:ready and listening
gateway@219291[L]:ready and listening
Flow@219291[I]:🎉 Flow is ready to use!
🔗 Protocol: GRPC
🏠 Local access: 0.0.0.0:58827
🔒 Private network: 192.168.1.101:58827
different_name
```
````
(external-executors)=
## Use external Executors
Usually an Orchestration starts and stops its own Executor(s). External Executors are owned by *other* Orchestrations, meaning they can reside on any machine and their lifetime are controlled by others.
Using external Executors is useful for sharing expensive Executors (like stateless, GPU-based encoders) between Orchestrations.
Both {ref}`served and shared Executors ` can be used as external Executors.
When you add an external Executor, you have to provide a `host` and `port`, and enable the `external` flag:
````{tab} Deployment
```python
from jina import Deployment
Deployment(host='123.45.67.89', port=12345, external=True)
# or
Deployment(host='123.45.67.89:12345', external=True)
```
````
````{tab} Flow
```python
from jina import Flow
Flow().add(host='123.45.67.89', port=12345, external=True)
# or (2)
Flow().add(host='123.45.67.89:12345', external=True)
```
````
The Orchestration doesn't start or stop this Executor and assumes that it is externally managed and available at `123.45.67.89:12345`.
Despite the lifetime control, the external Executor behaves just like a regular one. You can even add the same Executor to multiple Orchestrations.
### Enable TLS
You can also use external Executors with `tls`:
````{tab} Deployment
```python
from jina import Deployment
Deployment(host='123.45.67.89:443', external=True, tls=True)
```
````
````{tab} Flow
```python
from jina import Flow
Flow().add(host='123.45.67.89:443', external=True, tls=True)
```
````
After that, the external Executor behaves just like an internal one. You can even add the same Executor to multiple Orchestrations.
```{hint}
Using `tls` to connect to the External Executor is especially needed to use an external Executor deployed with JCloud. See the JCloud {ref}`documentation ` for further details
```
### Pass arguments
External Executors may require extra configuration to run. Think about an Executor that requires authentication to run. You can pass the `grpc_metadata` parameter to the Executor. `grpc_metadata` is a dictionary of key-value pairs to be passed along with every gRPC request sent to that Executor.
````{tab} Deployment
```python
from jina import Deployment
Deployment(
host='123.45.67.89',
port=443,
external=True,
grpc_metadata={'authorization': ''},
)
```
````
````{tab} Flow
```python
from jina import Flow
Flow().add(
host='123.45.67.89',
port=443,
external=True,
grpc_metadata={'authorization': ''},
)
```
````
```{hint}
The `grpc_metadata` parameter here follows the `metadata` concept in gRPC. See [gRPC documentation](https://grpc.io/docs/what-is-grpc/core-concepts/#metadata) for details.
```
---
# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/orchestration/deployment-args.md
| Name | Description | Type | Default |
|----|----|----|----|
| `name` | The name of this object.
This will be used in the following places:
- how you refer to this object in Python/YAML/CLI
- visualization
- log message header
- ...
When not given, then the default naming strategy will apply. | `string` | `None` |
| `workspace` | The working directory for any IO operations in this object. If not set, then derive from its parent `workspace`. | `string` | `None` |
| `log_config` | The config name or the absolute path to the YAML config file of the logger used in this object. | `string` | `default` |
| `quiet` | If set, then no log will be emitted from this object. | `boolean` | `False` |
| `quiet_error` | If set, then exception stack information will not be added to the log | `boolean` | `False` |
| `suppress_root_logging` | If set, then no root handlers will be suppressed from logging. | `boolean` | `False` |
| `uses` | The YAML path represents a flow. It can be either a local file path or a URL. | `string` | `None` |
| `reload` | If set, auto-reloading on file changes is enabled: the Flow will restart while blocked if YAML configuration source is changed. This also applies apply to underlying Executors, if their source code or YAML configuration has changed. | `boolean` | `False` |
| `env` | The map of environment variables that are available inside runtime | `object` | `None` |
| `inspect` | The strategy on those inspect deployments in the flow.
If `REMOVE` is given then all inspect deployments are removed when building the flow. | `string` | `COLLECT` |
---
# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/orchestration/deployment.md
(deployment)=
# Deployment
```{important}
A Deployment is part of the orchestration layer {ref}`Orchestration `. Be sure to read up on that too!
```
A {class}`~jina.Deployment` orchestrates a single {class}`~jina.Executor` to accomplish a task. Documents are processed by Executors.
You can think of a Deployment as an interface to configure and launch your {ref}`microservice architecture `, while the heavy lifting is done by the {ref}`service ` itself.
(why-deployment)=
## Why use a Deployment?
Once you've learned about Documents, DocLists and Executors, you can split a big task into small independent modules and services.
* Deployments let you scale these Executors independently to match your requirements.
* Deployments let you easily use other cloud-native orchestrators, such as Kubernetes, to manage your service.
(create-deployment)=
## Create
The most trivial {class}`~jina.Deployment` is an empty one. It can be defined in Python or from a YAML file:
````{tab} Python
```python
from jina import Deployment
dep = Deployment()
```
````
````{tab} YAML
```yaml
jtype: Deployment
```
````
For production, you should define your Deployments with YAML. This is because YAML files are independent of the Python logic code and easier to maintain.
## Minimum working example
````{tab} Pythonic style
```python
from jina import Deployment, Executor, requests
from docarray import DocList, BaseDoc
class MyExecutor(Executor):
@requests(on='/bar')
def foo(self, docs: DocList[BaseDoc], **kwargs) -> DocList[BaseDoc]:
print(docs)
dep = Deployment(name='myexec1', uses=MyExecutor)
with dep:
dep.post(on='/bar', inputs=BaseDoc(), return_type=DocList[BaseDoc], on_done=print)
```
````
````{tab} Deployment-as-a-Service style
Server:
```python
from jina import Deployment, Executor, requests
from docarray import DocList, BaseDoc
class MyExecutor(Executor):
@requests(on='/bar')
def foo(self, docs: DocList[BaseDoc], **kwargs) -> DocList[BaseDoc]:
print(docs)
dep = Deployment(port=12345, name='myexec1', uses=MyExecutor)
with dep:
dep.block()
```
Client:
```python
from jina import Client
from docarray import DocList, BaseDoc
c = Client(port=12345)
c.post(on='/bar', inputs=BaseDoc(), return_type=DocList[BaseDoc], on_done=print)
```
````
````{tab} Load from YAML
`deployment.yml`:
```yaml
jtype: Deployment
name: myexec1
uses: FooExecutor
py_modules: exec.py
```
`exec.py`:
```python
from jina import Deployment, Executor, requests
from docarray import DocList, BaseDoc
from docarray.documents import TextDoc
class FooExecutor(Executor):
@requests
def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
for doc in docs:
doc.text = 'foo was here'
docs.summary()
return docs
```
```python
from jina import Deployment
from docarray import DocList, BaseDoc
from docarray.documents import TextDoc
dep = Deployment.load_config('deployment.yml')
with dep:
try:
dep.post(on='/bar', inputs=TextDoc(), on_done=print)
except Exception as ex:
# handle exception
pass
```
````
```{caution}
The statement `with dep:` starts the Deployment, and exiting the indented with block stops the Deployment, including its Executors.
Exceptions raised inside the `with dep:` block will close the Deployment context manager. If you don't want this, use a `try...except` block to surround the statements that could potentially raise an exception.
```
## Convert between Python and YAML
A Python Deployment definition can easily be converted to/from a YAML definition:
````{tab} Load from YAML
```python
from jina import Deployment
dep = Deployment.load_config('flow.yml')
```
````
````{tab} Export to YAML
```python
from jina import Deployment
dep = Deployment()
dep.save_config('deployment.yml')
```
````
## Start and stop
When a {class}`~jina.Deployment` starts, all the replicated Executors will start as well, making it possible to {ref}`reach the service through its API `.
There are three ways to start a Deployment: In Python, from a YAML file, or from the terminal.
* Generally in Python: use Deployment as a context manager.
* As an entrypoint from terminal: use `Jina CLI ` and a Deployment YAML file.
* As an entrypoint from Python code: use Deployment as a context manager inside `if __name__ == '__main__'`
* No context manager, manually call {meth}`~jina.Deployment.start` and {meth}`~jina.Deployment.close`.
````{tab} General in Python
```python
from jina import Deployment
dep = Deployment()
with dep:
pass
```text
````
````{tab} Jina-serve CLI entrypoint
```bash
jina deployment --uses deployment.yml
```
````
````{tab} Python entrypoint
```python
from jina import Deployment
dep = Deployment()
if __name__ == '__main__':
with dep:
pass
```text
````
````{tab} Python no context manager
```python
from jina import Deployment
dep = Deployment()
dep.start()
dep.close()
```
````
Your addresses and entrypoints can be found in the output. When you enable more features such as monitoring, HTTP gateway, TLS encryption, this display expands to contain more information.
(multiprocessing-spawn)=
### Set multiprocessing `spawn`
Some corner cases require forcing a `spawn` start method for multiprocessing, for example if you encounter "Cannot re-initialize CUDA in forked subprocess".
You can use `JINA_MP_START_METHOD=spawn` before starting the Python script to enable this.
```bash
JINA_MP_START_METHOD=spawn python app.py
```
```{caution}
In case you set `JINA_MP_START_METHOD=spawn`, make sure to use Flow as a context manager inside `if __name__ == '__main__'`.
The script entrypoint (starting the flow) [needs to be protected when using `spawn` start method](https://docs.python.org/3/library/multiprocessing.html#the-spawn-and-forkserver-start-methods).
```
````{hint}
There's no need to set this for Windows, as it only supports spawn method for multiprocessing.
````
## Serve
### Serve forever
In most scenarios, a Deployment should remain reachable for prolonged periods of time. This can be achieved from the terminal:
````{tab} Python
```python
from jina import Deployment
dep = Deployment()
with dep:
dep.block()
````
````{tab} YAML
```shell
jina-serve deployment --uses deployment.yml
```
````
The `.block()` method blocks the execution of the current thread or process, enabling external clients to access the Deployment.
In this case, the Deployment can be stopped by interrupting the thread or process.
### Serve until an event
Alternatively, a `multiprocessing` or `threading` `Event` object can be passed to `.block()`, which stops the Deployment once set.
```python
from jina import Deployment
import threading
def start_deployment(stop_event):
"""start a blocking Deployment."""
dep = Deployment()
with dep:
dep.block(stop_event=stop_event)
e = threading.Event() # create new Event
t = threading.Thread(name='Blocked-Flow', target=start_flow, args=(e,))
t.start() # start Deployment in new Thread
# do some stuff
e.set() # set event and stop (unblock) the Deployment
```
## Export
A Deployment YAML can be exported as a Docker Compose YAML or Kubernetes YAML bundle.
(docker-compose-export)=
### Docker Compose
````{tab} Python
```python
from jina import Deployment
dep = Deployment()
dep.to_docker_compose_yaml()
```
````
````{tab} Terminal
```shell
jina-serve export docker-compose deployment.yml docker-compose.yml
```
````
This will generate a single `docker-compose.yml` file.
For advanced utilization of Docker Compose with Jina-serve, refer to {ref}`How to `
(deployment-kubernetes-export)=
### Kubernetes
````{tab} Python
```python
from jina import Deployment
dep = Deployment
dep.to_kubernetes_yaml('dep_k8s_configuration')
```
````
````{tab} Terminal
```shell
jina-serve export kubernetes deployment.yml ./my-k8s
```
````
The generated folder can be used directly with `kubectl` to deploy the Deployment to an existing Kubernetes cluster.
For advanced utilisation of Kubernetes with Jina-serve please refer to {ref}`How to `
```{tip}
Based on your local Jina version, Executor Hub may rebuild the Docker image during the YAML generation process.
If you do not wish to rebuild the image, set the environment variable `JINA_HUB_NO_IMAGE_REBUILD`.
```
```{tip}
If an Executor requires volumes to be mapped to persist data, Jina will create a StatefulSet for that Executor instead of a Deployment.
You can control the access mode, storage class name and capacity of the attached Persistent Volume Claim by using {ref}`Jina environment variables `
`JINA_K8S_ACCESS_MODES`, `JINA_K8S_STORAGE_CLASS_NAME` and `JINA_K8S_STORAGE_CAPACITY`. Only the first volume will be considered to be mounted.
```
```{admonition} See also
:class: seealso
For more in-depth guides on deployment, check our how-tos for {ref}`Docker compose ` and {ref}`Kubernetes `.
```
```{caution}
The port or ports arguments are ignored when calling the Kubernetes YAML, Jina-serve will start the services binding to the ports 8080, except when multiple protocols
need to be served when the consecutive ports (8081, ...) will be used. This is because the Kubernetes service will direct the traffic from you and it is irrelevant
to the services around because in Kubernetes services communicate via the service names irrespective of the internal port.
```
(logging-configuration)=
## Logging
The default {class}`jina.logging.logger.JinaLogger` uses rich console logging that writes to the system console. The `log_config` argument can be used to pass in a string of the pre-configured logging configuration names in Jina-serve or the absolute YAML file path of the custom logging configuration. For most cases, the default logging configuration sufficiently covers local, Docker and Kubernetes environments.
Custom logging handlers can be configured by following the Python official [Logging Cookbook](https://docs.python.org/3/howto/logging-cookbook.html#logging-cookbook) examples. An example custom logging configuration file defined in a YAML file `logging.json.yml` is:
```yaml
handlers:
* StreamHandler
level: INFO
configs:
StreamHandler:
format: '%(asctime)s:{name:>15}@%(process)2d[%(levelname).1s]:%(message)s'
formatter: JsonFormatter
```
The logging configuration can be used as follows:
````{tab} Python
```python
from jina import Deployment
dep = Deployment(log_config='./logging.json.yml')
```
````
````{tab} YAML
```yaml
jtype: Deployment
with:
log_config: './logging.json.yml'
```
````
### Supported protocols
A Deployment can be used to deploy an Executor and serve it using `gRPC` or `HTTP` protocol, or a composition of them.
### gRPC protocol
gRPC is the default protocol used by a Deployment to expose Executors to the outside world, and is used to communicate between the Gateway and an Executor inside a Flow.
### HTTP protocol
HTTP can be used for a stand-alone Deployment (without being part of a Flow), which allows external services to connect via REST.
```python
from jina import Deployment, Executor, requests
from docarray import DocList
from docarray.documents import TextDoc
class MyExec(Executor):
@requests
def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
for doc in docs:
doc.text = 'foo was here'
docs.summary()
return docs
dep = Deployment(protocol='http', port=12345, uses=MyExec)
with dep:
dep.block()
````
This will make it available at port 12345 and you can get the [OpenAPI schema](https://swagger.io/specification/) for the service.
```{figure} images/http-deployment-swagger.png
:scale: 70%
```
### Composite protocol
A Deployment can also deploy an Executor and serve it with a combination of gRPC and HTTP protocols.
```python
from jina import Deployment, Executor, requests
from docarray import DocList
from docarray.documents import TextDoc
class MyExec(Executor):
@requests
def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
for doc in docs:
doc.text = 'foo was here'
docs.summary()
return docs
dep = Deployment(protocol=['grpc', 'http'], port=[12345, 12346], uses=MyExec)
with dep:
dep.block()
````
This will make the Deployment reachable via gRPC and HTTP simultaneously.
## Methods
The most important methods of the `Deployment` object are the following:
| Method | Description |
|--------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| {meth}`~jina.Deployment.start()` | Starts the Deployment. This will start all its Executors and check if they are ready to be used. |
| {meth}`~jina.Deployment.close()` | Stops and closes the Deployment. This will stop and shutdown all its Executors. |
| `with` context manager | Uses the Deployment as a context manager. It will automatically start and stop your Deployment. | |
| {meth}`~jina.clients.mixin.PostMixin.post()` | Sends requests to the Deployment API. |
| {meth}`~jina.Deployment.block()` | Blocks execution until the program is terminated. This is useful to keep the Deployment alive so it can be used from other places (clients, etc). |
| {meth}`~jina.Deployment.to_docker_compose_yaml()` | Generates a Docker-Compose file listing all Executors as services. |
| {meth}`~jina.Deployment.to_kubernetes_yaml()` | Generates Kubernetes configuration files in ``. Based on your local Jina-serve version, Executor Hub may rebuild the Docker image during the YAML generation process. If you do not wish to rebuild the image, set the environment variable `JINA_HUB_NO_IMAGE_REBUILD`. |
| {meth}`~jina.clients.mixin.HealthCheckMixin.is_deployment_ready()` | Check if the Deployment is ready to process requests. Returns a boolean indicating the readiness. |
---
# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/orchestration/flow-args.md
| Name | Description | Type | Default |
|----|----|----|----|
| `name` | The name of this object.
This will be used in the following places:
- how you refer to this object in Python/YAML/CLI
- visualization
- log message header
- ...
When not given, then the default naming strategy will apply. | `string` | `None` |
| `workspace` | The working directory for any IO operations in this object. If not set, then derive from its parent `workspace`. | `string` | `None` |
| `log_config` | The config name or the absolute path to the YAML config file of the logger used in this object. | `string` | `default` |
| `quiet` | If set, then no log will be emitted from this object. | `boolean` | `False` |
| `quiet_error` | If set, then exception stack information will not be added to the log | `boolean` | `False` |
| `suppress_root_logging` | If set, then no root handlers will be suppressed from logging. | `boolean` | `False` |
| `uses` | The YAML path represents a flow. It can be either a local file path or a URL. | `string` | `None` |
| `reload` | If set, auto-reloading on file changes is enabled: the Flow will restart while blocked if YAML configuration source is changed. This also applies apply to underlying Executors, if their source code or YAML configuration has changed. | `boolean` | `False` |
| `env` | The map of environment variables that are available inside runtime | `object` | `None` |
| `inspect` | The strategy on those inspect deployments in the flow.
If `REMOVE` is given then all inspect deployments are removed when building the flow. | `string` | `COLLECT` |
---
# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/orchestration/flow.md
(flow-cookbook)=
# Flow
```{important}
A Flow is a set of {ref}`Deployments `. Be sure to read up on those before diving more deeply into Flows!
```
A {class}`~jina.Flow` orchestrates {class}`~jina.Executor`s into a processing pipeline to accomplish a task. Documents "flow" through the pipeline and are processed by Executors.
You can think of Flow as an interface to configure and launch your {ref}`microservice architecture `, while the heavy lifting is done by the {ref}`services ` themselves. In particular, each Flow also launches a {ref}`Gateway ` service, which can expose all other services through an API that you define.
## Why use a Flow?
Once you've learned about Documents, DocList and Executor,, you can split a big task into small independent modules and services.
But you need to chain them together to create, build ,and serve an application. Flows enable you to do exactly this.
* Flows connect microservices (Executors) to build a service with proper client/server style interfaces over HTTP, gRPC, or WebSockets.
* Flows let you scale these Executors independently to match your requirements.
* Flows let you easily use other cloud-native orchestrators, such as Kubernetes, to manage your service.
(create-flow)=
## Create
The most trivial {class}`~jina.Flow` is an empty one. It can be defined in Python or from a YAML file:
````{tab} Python
```python
from jina import Flow
f = Flow()
```
````
````{tab} YAML
```yaml
jtype: Flow
```
````
```{important}
All arguments received by {class}`~jina.Flow()` API will be propagated to other entities (Gateway, Executor) with the following exceptions:
* `uses` and `uses_with` won't be passed to Gateway
* `port`, `port_monitoring`, `uses` and `uses_with` won't be passed to Executor
```
```{tip}
An empty Flow contains only {ref}`the Gateway`.
```
```{figure} images/zero-flow.svg
:scale: 70%
```
For production, you should define your Flows with YAML. This is because YAML files are independent of the Python logic code and easier to maintain.
## Minimum working example
````{tab} Pythonic style
```python
from jina import Flow, Executor, requests
from docarray import DocList, BaseDoc
class MyExecutor(Executor):
@requests(on='/bar')
def foo(self, docs: DocList[BaseDoc], **kwargs) -> DocList[BaseDoc]:
print(docs)
f = Flow().add(name='myexec1', uses=MyExecutor)
with f:
f.post(on='/bar', inputs=BaseDoc(), return_type=DocList[BaseDoc], on_done=print)
```
````
````{tab} Flow-as-a-Service style
Server:
```python
from jina import Flow, Executor, requests
from docarray import DocList, BaseDoc
class MyExecutor(Executor):
@requests(on='/bar')
def foo(self, docs: DocList[BaseDoc], **kwargs) -> DocList[BaseDoc]:
print(docs)
f = Flow(port=12345).add(name='myexec1', uses=MyExecutor)
with f:
f.block()
```
Client:
```python
from jina import Client, Document
c = Client(port=12345)
c.post(on='/bar', inputs=BaseDoc(), return_type=DocList[BaseDoc], on_done=print)
```
````
````{tab} Load from YAML
`my.yml`:
```yaml
jtype: Flow
executors:
* name: myexec1
uses: FooExecutor
py_modules: exec.py
```
`exec.py`:
```python
from jina import Deployment, Executor, requests
from docarray import DocList, BaseDoc
from docarray.documents import TextDoc
class FooExecutor(Executor):
@requests
def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
for doc in docs:
doc.text = 'foo was here'
docs.summary()
return docs
```
```python
from jina import Flow
from docarray import DocList, BaseDoc
from docarray.documents import TextDoc
f = Flow.load_config('my.yml')
with f:
try:
f.post(on='/bar', inputs=TextDoc(), on_done=print)
except Exception as ex:
# handle exception
pass
```
````
```{caution}
The statement `with f:` starts the Flow, and exiting the indented with block stops the Flow, including all Executors defined in it.
Exceptions raised inside the `with f:` block will close the Flow context manager. If you don't want this, use a `try...except` block to surround the statements that could potentially raise an exception.
```
## Start and stop
When a {class}`~jina.Flow` starts, all included Executors (single for a Deployment, multiple for a Flow) will start as well, making it possible to {ref}`reach the service through its API `.
There are three ways to start an Flow: In Python, from a YAML file, or from the terminal.
* Generally in Python: use Deployment or Flow as a context manager in Python.
* As an entrypoint from terminal: use `Jina CLI ` and a Flow YAML file.
* As an entrypoint from Python code: use Flow as a context manager inside `if __name__ == '__main__'`
* No context manager: manually call {meth}`~jina.Flow.start` and {meth}`~jina.Flow.close`.
````{tab} General in Python
```python
from jina import Flow
f = Flow()
with f:
pass
```
````
````{tab} Jina-serve CLI entrypoint
```bash
jina flow --uses flow.yml
```
````
````{tab} Python entrypoint
```python
from jina import Flow
f = Flow()
if __name__ == '__main__':
with f:
pass
```
````
````{tab} Python no context manager
```python
from jina import Flow
f = Flow()
f.start()
f.close()
```
````
The statement `with f:` starts the Flow, and exiting the indented `with` block stops the Flow, including all its Executors.
A successful start of a Flow looks like this:
```{figure} images/success-flow.png
:scale: 70%
```
Your addresses and entrypoints can be found in the output. When you enable more features such as monitoring, HTTP gateway, TLS encryption, this display expands to contain more information.
```{admonition} Multiprocessing spawn
:class: warning
Some corner cases require forcing a `spawn` start method for multiprocessing, for example if you encounter "Cannot re-initialize CUDA in forked subprocess". Read {ref}`more in the docs `
```
## Serve
### Serve forever
In most scenarios, a Flow should remain reachable for prolonged periods of time. This can be achieved from Python or the terminal:
````{tab} Python
```python
from jina import Flow
f = Flow()
with f:
f.block()
```text
````
````{tab} Terminal
```shell
jina flow --uses flow.yml
```
````
In this case, the Flow can be stopped by interrupting the thread or process.
### Serve until an event
Alternatively, a `multiprocessing` or `threading` `Event` object can be passed to `.block()`, which stops the Flow once set.
```python
from jina import Flow
import threading
def start_flow(stop_event):
"""start a blocking Flow."""
f = Flow()
with f:
f.block(stop_event=stop_event)
e = threading.Event() # create new Event
t = threading.Thread(name='Blocked-Flow', target=start_flow, args=(e,))
t.start() # start Flow in new Thread
# do some stuff
e.set() # set event and stop (unblock) the Flow
```
### Serve on Google Colab
```{admonition} Example built with docarray<0.30
:class: note
This example is built using docarray<0.30 version. Most of the concepts are similar, but some APIs of how Executors are built change when using newer docarray version.
```
[Google Colab](https://colab.research.google.com/) provides an easy-to-use Jupyter notebook environment with GPU/TPU support. Flows are fully compatible with Google Colab and you can use it in the following ways:
```{figure} images/jina-on-colab.svg
:align: center
:width: 70%
```
```{button-link} https://colab.research.google.com/github/jina-ai/jina/blob/master/docs/Using_Jina_on_Colab.ipynb
:color: primary
:align: center
{octicon}`link-external` Open the notebook on Google Colab
```
Please follow the walkthrough and enjoy the free GPU/TPU!
```{tip}
Hosing services on Google Colab is not recommended if your server aims to be long-lived or permanent. It is often used for quick experiments, demonstrations or leveraging its free GPU/TPU. For stable, secure and free hosting of your Flow, check out [JCloud](https://jina.ai/serve/concepts/jcloud/).
```
## Export
A Flow YAML can be exported as a Docker Compose YAML or Kubernetes YAML bundle.
(docker-compose-export)=
### Docker Compose
````{tab} Python
```python
from jina import Flow
f = Flow().add()
f.to_docker_compose_yaml()
```
````
````{tab} Terminal
```shell
jina export docker-compose flow.yml docker-compose.yml
```
````
This will generate a single `docker-compose.yml` file.
For advanced utilization of Docker Compose with Jina, refer to {ref}`How to `
(flow-kubernetes-export)=
### Kubernetes
````{tab} Python
```python
from jina import Flow
f = Flow().add()
f.to_kubernetes_yaml('flow_k8s_configuration')
```
````
````{tab} Terminal
```shell
jina export kubernetes flow.yml ./my-k8s
```
````
The generated folder can be used directly with `kubectl` to deploy the Flow to an existing Kubernetes cluster.
For advanced utilisation of Kubernetes with Jina please refer to {ref}`How to `
```{tip}
Based on your local Jina-serve version, Executor Hub may rebuild the Docker image during the YAML generation process.
If you do not wish to rebuild the image, set the environment variable `JINA_HUB_NO_IMAGE_REBUILD`.
```
```{tip}
If an Executor requires volumes to be mapped to persist data, Jina-serve will create a StatefulSet for that Executor instead of a Deployment.
You can control the access mode, storage class name and capacity of the attached Persistent Volume Claim by using {ref}`Jina environment variables `
`JINA_K8S_ACCESS_MODES`, `JINA_K8S_STORAGE_CLASS_NAME` and `JINA_K8S_STORAGE_CAPACITY`. Only the first volume will be considered to be mounted.
```
```{admonition} See also
:class: seealso
For more in-depth guides on Flow deployment, check our how-tos for {ref}`Docker compose ` and {ref}`Kubernetes `.
```
```{caution}
The port or ports arguments are ignored when calling the Kubernetes YAML, Jina will start the services binding to the ports 8080, except when multiple protocols
need to be served when the consecutive ports (8081, ...) will be used. This is because the Kubernetes service will direct the traffic from you and it is irrelevant
to the services around because in Kubernetes services communicate via the service names irrespective of the internal port.
```
## Add Executors
```{important}
This section is for Flow-specific considerations when working with Executors. Check more information on {ref}`working with Executors `.
```
A {class}`~jina.Flow` orchestrates its {class}`~jina.Executor`s as a graph and sends requests to all Executors in the order specified by {meth}`~jina.Flow.add` or listed in {ref}`a YAML file`.
When you start a Flow, Executors always run in **separate processes**. Multiple Executors run in **different processes**. Multiprocessing is the lowest level of separation when you run a Flow locally. When running a Flow on Kubernetes, Docker Swarm, {ref}`jcloud`, different Executors run in different containers, pods or instances.
Executors can be added into a Flow with {meth}`~jina.Flow.add`.
```python
from jina import Flow
f = Flow().add()
```
This adds an "empty" Executor called {class}`~jina.serve.executors.BaseExecutor` to the Flow. This Executor (without any parameters) performs no actions.
```{figure} images/no-op-flow.svg
:scale: 70%
```
To more easily identify an Executor, you can change its name by passing the `name` parameter:
```python
from jina import Flow
f = Flow().add(name='myVeryFirstExecutor').add(name='secondIsBest')
```
```{figure} images/named-flow.svg
:scale: 70%
```
You can also define the above Flow in YAML:
```yaml
jtype: Flow
executors:
* name: myVeryFirstExecutor
* name: secondIsBest
```
Save it as `flow.yml` and run it:
```bash
jina flow --uses flow.yml
```
More Flow YAML specifications can be found in {ref}`Flow YAML Specification`.
### How Executors process Documents in a Flow
Let's understand how Executors process Documents's inside a Flow, and how changes are chained and applied, affecting downstream Executors in the Flow.
```python
from jina import Executor, requests, Flow
from docarray import DocList, BaseDoc
from docarray.documents import TextDoc
class PrintDocuments(Executor):
@requests
def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
for doc in docs:
print(f' PrintExecutor: received document with text: "{doc.text}"')
return docs
class ProcessDocuments(Executor):
@requests(on='/change_in_place')
def in_place(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
# This Executor only works on `docs` and doesn't consider any other arguments
for doc in docs:
print(f'ProcessDocuments: received document with text "{doc.text}"')
doc.text = 'I changed the executor in place'
@requests(on='/return_different_docarray')
def ret_docs(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
# This executor only works on `docs` and doesn't consider any other arguments
ret = DocList[TextDoc]()
for doc in docs:
print(f'ProcessDocuments: received document with text: "{doc.text}"')
ret.append(TextDoc(text='I returned a different Document'))
return ret
f = Flow().add(uses=ProcessDocuments).add(uses=PrintDocuments)
with f:
f.post(on='/change_in_place', inputs=DocList[TextDoc]([TextDoc(text='request1')]), return_type=DocList[TextDoc])
f.post(
on='/return_different_docarray', inputs=DocList[TextDoc]([TextDoc(text='request2')]), return_type=DocList[TextDoc]))
)
```
```shell
────────────────────────── 🎉 Flow is ready to serve! ──────────────────────────
╭────────────── 🔗 Endpoint ───────────────╮
│ ⛓ Protocol GRPC │
│ 🏠 Local 0.0.0.0:58746 │
│ 🔒 Private 192.168.1.187:58746 │
│ 🌍 Public 212.231.186.65:58746 │
╰──────────────────────────────────────────╯
ProcessDocuments: received document with text "request1"
PrintExecutor: received document with text: "I changed the executor in place"
ProcessDocuments: received document with text: "request2"
PrintExecutor: received document with text: "I returned a different Document"
```
### Define topologies over Executors
{class}`~jina.Flow`s are not restricted to sequential execution. Internally they are modeled as graphs, so they can represent any complex, non-cyclic topology.
A typical use case for such a Flow is a topology with a common pre-processing part, but different indexers separating embeddings and data.
To define a custom topology you can use the `needs` keyword when adding an {class}`~jina.Executor`. By default, a Flow assumes that every Executor needs the previously added Executor.
```python
from jina import Executor, requests, Flow
from docarray import DocList
from docarray.documents import TextDoc
class FooExecutor(Executor):
@requests
async def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
docs.append(TextDoc(text=f'foo was here and got {len(docs)} document'))
class BarExecutor(Executor):
@requests
async def bar(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
docs.append(TextDoc(text=f'bar was here and got {len(docs)} document'))
class BazExecutor(Executor):
@requests
async def baz(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
docs.append(TextDoc(text=f'baz was here and got {len(docs)} document'))
class MergeExecutor(Executor):
@requests
async def merge(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
return docs
f = (
Flow()
.add(uses=FooExecutor, name='fooExecutor')
.add(uses=BarExecutor, name='barExecutor', needs='fooExecutor')
.add(uses=BazExecutor, name='bazExecutor', needs='fooExecutor')
.add(uses=MergeExecutor, needs=['barExecutor', 'bazExecutor'])
)
```
```{figure} images/needs-flow.svg
:width: 70%
:align: center
Complex Flow where one Executor requires two Executors to process Documents beforehand
```
When sending message to this Flow,
```python
with f:
print(f.post('/', return_type=DocList[TextDoc]).text)
```
This gives the output:
```text
['foo was here and got 0 document', 'bar was here and got 1 document', 'baz was here and got 1 document']
```
Both `BarExecutor` and `BazExecutor` only received a single `Document` from `FooExecutor` because they are run in parallel. The last Executor `executor3` receives both DocLists and merges them automatically.
This automated merging can be disabled with `no_reduce=True`. This is useful for providing custom merge logic in a separate Executor. In this case the last `.add()` call would look like `.add(needs=['barExecutor', 'bazExecutor'], uses=CustomMergeExecutor, no_reduce=True)`. This feature requires Jina >= 3.0.2.
## Chain Executors in Flow with different schemas
When using `docarray>=0.30.0`, when building a Flow you should ensure that the Document types used as input of an Executor match the schema
of the output of its incoming previous Flow.
For instance, this Flow will fail to start because the Document types are wrongly chained.
````{tab} Valid Flow
```{code-block} python
from jina import Executor, requests, Flow
from docarray import DocList, BaseDoc
from docarray.typing import NdArray
import numpy as np
class SimpleStrDoc(BaseDoc):
text: str
class TextWithEmbedding(SimpleStrDoc):
embedding: NdArray
class TextEmbeddingExecutor(Executor):
@requests(on='/foo')
def foo(docs: DocList[SimpleStrDoc], **kwargs) -> DocList[TextWithEmbedding]
ret = DocList[TextWithEmbedding]()
for doc in docs:
ret.append(TextWithEmbedding(text=doc.text, embedding=np.ramdom.rand(10))
return ret
class ProcessEmbedding(Executor):
@requests(on='/foo')
def foo(docs: DocList[TextWithEmbedding], **kwargs) -> DocList[TextWithEmbedding]
for doc in docs:
self.logger.info(f'Getting embedding with shape {doc.embedding.shape}')
flow = Flow().add(uses=TextEmbeddingExecutor, name='embed').add(uses=ProcessEmbedding, name='process')
with flow:
flow.block()
```
````
````{tab} Invalid Flow
```{code-block} python
from jina import Executor, requests, Flow
from docarray import DocList, BaseDoc
from docarray.typing import NdArray
import numpy as np
class SimpleStrDoc(BaseDoc):
text: str
class TextWithEmbedding(SimpleStrDoc):
embedding: NdArray
class TextEmbeddingExecutor(Executor):
@requests(on='/foo')
def foo(docs: DocList[SimpleStrDoc], **kwargs) -> DocList[TextWithEmbedding]
ret = DocList[TextWithEmbedding]()
for doc in docs:
ret.append(TextWithEmbedding(text=doc.text, embedding=np.ramdom.rand(10))
return ret
class ProcessText(Executor):
@requests(on='/foo')
def foo(docs: DocList[SimpleStrDoc], **kwargs) -> DocList[TextWithEmbedding]
for doc in docs:
self.logger.info(f'Getting embedding with type {doc.text}')
# This Flow will fail to start because the input type of "process" does not match the output type of "embed"
flow = Flow().add(uses=TextEmbeddingExecutor, name='embed').add(uses=ProcessText, name='process')
with flow:
flow.block()
```
````
Jina is also compatible with docarray<0.30, when using that version, only a single Document schema existed (equivalent to [LegacyDocument]() in docarray>0.30) and therefore
there were no explicit compatibility issues between schemas. However, the complexity was implicitly there (An Executor may expect a Document to be filled with `text` and only fail at Runtime).
(floating-executors)=
### Floating Executors
Some Executors in your Flow can be used for asynchronous background tasks that take time and don't generate a required output. For instance,
logging specific information in external services, storing partial results, etc.
You can unblock your Flow from such tasks by using *floating Executors*.
Normally, all Executors form a pipeline that handles and transforms a given request until it is finally returned to the Client.
However, floating Executors do not feed their outputs back into the pipeline. Therefore, the Executor's output does not affect the response for the Client, and the response can be returned without waiting for the floating Executor to complete its task.
Those Executors are marked with the `floating` keyword when added to a `Flow`:
```python
import time
from jina import Executor, requests, Flow
from docarray import DocList
from docarray.documents import TextDoc
class FastChangingExecutor(Executor):
@requests()
def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
for doc in docs:
doc.text = 'Hello World'
class SlowChangingExecutor(Executor):
@requests()
def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
time.sleep(2)
print(f' Received {docs.text}')
for doc in docs:
doc.text = 'Change the document but will not affect response'
f = (
Flow()
.add(name='executor0', uses=FastChangingExecutor)
.add(
name='floating_executor',
uses=SlowChangingExecutor,
needs=['gateway'],
floating=True,
)
)
with f:
f.post(on='/endpoint', inputs=DocList[TextDoc]([TextDoc()]), return_type=DocList[TextDoc]) # we need to send a first
start_time = time.time()
response = f.post(on='/endpoint', inputs=DocList[TextDoc]([TextDoc(), TextDoc()]), return_type=DocList[TextDoc])
end_time = time.time()
print(f' Response time took {end_time - start_time}s')
print(f' {response.texts}')
```
```text
Response time took 0.011997222900390625s
['Hello World', 'Hello World']
Received ['Hello World', 'Hello World']
```
In this example the response is returned without waiting for the floating Executor to complete. However, the Flow is not closed until
the floating Executor has handled the request.
You can plot the Flow and see the Executor is floating disconnected from the **Gateway**.
```{figure} images/flow_floating.svg
:width: 70%
```
A floating Executor can *never* come before a non-floating Executor in your Flow's {ref}`topology `.
This leads to the following behaviors:
* **Implicit reordering**: When you add a non-floating Executor after a floating Executor without specifying its `needs` parameter, the non-floating Executor is chained after the previous non-floating one.
```python
from jina import Flow
f = Flow().add().add(name='middle', floating=True).add()
f.plot()
```
```{figure} images/flow_middle_1.svg
:width: 70%
```
* **Chaining floating Executors**: To chain more than one floating Executor, you need to add all of them with the `floating` flag, and explicitly specify the `needs` argument.
```python
from jina import Flow
f = Flow().add().add(name='middle', floating=True).add(needs=['middle'], floating=True)
f.plot()
```
```{figure} images/flow_chain_floating.svg
:width: 70%
```
* **Overriding the `floating` flag**: If you add a floating Executor as part of `needs` parameter of a non-floating Executor, then the floating Executor is no longer considered floating.
```python
from jina import Flow
f = Flow().add().add(name='middle', floating=True).add(needs=['middle'])
f.plot()
```
```{figure} images/flow_cancel_floating.svg
:width: 70%
```
(conditioning)=
### Add Conditioning
Sometimes you may not want all Documents to be processed by all Executors. For example when you process text and image Documents you want to forward them to different Executors depending on their data type.
You can set conditioning for every {class}`~jina.Executor` in the Flow. Documents that don't meet the condition will be removed before reaching that Executor. This allows you to build a selection control in the Flow.
#### Define conditions
To add a condition to an Executor, pass it to the `when` parameter of {meth}`~jina.Flow.add` method of the Flow. This then defines *when* a Document is processed by the Executor:
You can use the [MongoDB query language](https://www.mongodb.com/docs/compass/current/query/filter/#query-your-data) used in [docarray](https://docs.docarray.org/API_reference/utils/filter/) which follows to specify a filter condition for each Executor.
```python
from jina import Flow
f = Flow().add(when={'tags__key': {'$eq': 5}})
```
Then only Documents that satisfy the `when` condition will reach the associated Executor. Any Documents that don't satisfy that condition won't reach the Executor.
If you are trying to separate Documents according to the data modality they hold, you need to choose a condition accordingly.
````{admonition} See Also
:class: seealso
In addition to `$exists` you can use a number of other operators to define your filter: `$eq`, `$gte`, `$lte`, `$size`, `$and`, `$or` and many more. For details, consult [MongoDB query language](https://www.mongodb.com/docs/compass/current/query/filter/#query-your-data) and [docarray](https://docs.docarray.org/API_reference/utils/filter/).
````
```python
# define filter conditions
text_condition = {'text': {'$exists': True}}
tensor_condition = {'tensor': {'$exists': True}}
```
These conditions specify that only Documents that hold data of a specific modality can pass the filter.
````{tab} Python
```{code-block} python
---
emphasize-lines: 16, 24
---
from jina import Flow, Executor, requests
from docarray import DocList, BaseDoc
from typing import Dict
class MyDoc(BaseDoc):
text: str = ''
tags: Dict[str, int]
class MyExec
@requests
def foo(self, docs: DocList[MyDoc], **kwargs) -> DocList[MyDoc]:
for doc in docs:
print(f'{doc.tags}')
f = Flow().add(uses=MyExec).add(uses=MyExec, when={'tags__key': {'$eq': 5}}) # Create the empty Flow, add condition
with f: # Using it as a Context Manager starts the Flow
ret = f.post(
on='/search',
inputs=DocList[MyDoc]([MyDoc(tags={'key': 5}), MyDoc(tags={'key': 4})]),
return_type=DocList[MyDoc]
)
for doc in ret:
print(f'{doc.tags}') # only the Document fulfilling the condition is processed and therefore returned.
```
```shell
{'key': 5.0}
```
````
````{tab} YAML
```yaml
jtype: Flow
executors:
* name: executor
uses: MyExec
when:
tags__key:
$eq: 5
```
```{code-block} python
---
emphasize-lines: 9
---
from jina import Flow
f = Flow.load_config('flow.yml') # Load the Flow definition from Yaml file
with f: # Using it as a Context Manager starts the Flow
ret = f.post(
on='/search',
inputs=DocList[MyDoc]([MyDoc(tags={'key': 5}), MyDoc(tags={'key': 4})]),
return_type=DocList[MyDoc]
)
for doc in ret:
print(f'{doc.tags}') # only the Document fulfilling the condition is processed and therefore returned.
```
```shell
{'key': 5.0}
```
````
Note that if a Document does not satisfy the `when` condition of a filter, the filter removes the Document *for that entire branch of the Flow*.
This means that every Executor located behind a filter is affected by this, not just the specific Executor that defines the condition.
As with a real-life filter, once something fails to pass through it, it no longer continues down the pipeline.
Naturally, parallel branches in a Flow do not affect each other. So if a Document gets filtered out in only one branch, it can
still be used in the other branch, and also after the branches are re-joined:
````{tab} Parallel Executors
```{code-block} python
---
emphasize-lines: 18, 19
---
from jina import Flow, Executor, requests
from docarray import DocList, BaseDoc
from typing import Dict
class MyDoc(BaseDoc):
text: str = ''
tags: Dict[str, int]
class MyExec
@requests
def foo(self, docs: DocList[MyDoc], **kwargs) -> DocList[MyDoc]:
for doc in docs:
print(f'{doc.tags}')
f = (
Flow()
.add(uses=MyExec, name='first')
.add(uses=MyExec, when={'tags__key': {'$eq': 5}}, needs='first', name='exec1')
.add(uses=MyExec, when={'tags__key': {'$eq': 4}}, needs='first', name='exec2')
.needs_all(uses=MyExec, name='join')
)
```
```{figure} images/conditional-flow.svg
:width: 70%
:align: center
```
```python
with f:
ret = f.post(
on='/search',
inputs=DocList[MyDoc]([MyDoc(tags={'key': 5}), MyDoc(tags={'key': 4})]),
return_type=DocList[MyDoc]
)
for doc in ret:
print(f'{doc.tags}')
```
```shell
{'key': 5.0}
{'key': 4.0}
```
````
````{tab} Sequential Executors
```{code-block} python
---
emphasize-lines: 18, 19
---
from jina import Flow, Executor, requests
from docarray import DocList, BaseDoc
from typing import Dict
class MyDoc(BaseDoc):
text: str = ''
tags: Dict[str, int]
class MyExec
@requests
def foo(self, docs: DocList[MyDoc], **kwargs) -> DocList[MyDoc]:
for doc in docs:
print(f'{doc.tags}')
f = (
Flow()
.add(uses=MyExec, name='first')
.add(uses=MyExec, when={'tags__key': {'$eq': 5}}, name='exec1', needs='first')
.add(uses=MyExec, when={'tags__key': {'$eq': 4}}, needs='exec1', name='exec2')
)
```
```{figure} images/sequential-flow.svg
:width: 70%
```
```python
with f:
ret = f.post(
on='/search',
inputs=DocList[MyDoc]([MyDoc(tags={'key': 5}), MyDoc(tags={'key': 4})]),
return_type=DocList[MyDoc]
)
for doc in ret:
print(f'{doc.tags}')
```
```shell
```
````
This feature is useful to prevent some specialized Executors from processing certain Documents.
It can also be used to build *switch-like nodes*, where some Documents pass through one branch of the Flow,
while other Documents pass through a different parallel branch.
Note that whenever a Document does not satisfy the condition of an Executor, it is not even sent to that Executor.
Instead, only a tailored Request without any payload is transferred.
This means that you can not only use this feature to build complex logic, but also to minimize your networking overhead.
(merging-upstream)=
### Merging upstream Documents
Often when you're building a Flow, you want an Executor to receive Documents from multiple upstream Executors.
```{figure} images/flow-merge-executor.svg
:width: 70%
:align: center
```
For this you can use the `docs_matrix` or `docs_map` parameters (part of Executor endpoints signature). These Flow-specific arguments that can be used alongside an Executor's {ref}`default arguments `:
```{code-block} python
---
emphasize-lines: 11, 12
---
from typing import Dict, Union, List, Optional
from jina import Executor, requests
from docarray import DocList
class MergeExec(Executor):
@requests
async def foo(
self,
docs: DocList[...],
parameters: Dict,
docs_matrix: Optional[List[DocList[...]]],
docs_map: Optional[Dict[str, DocList[...]]],
) -> DocList[MyDoc]:
pass
```
* Use `docs_matrix` to receive a List of all incoming DocLists from upstream Executors:
```python
[
DocList[...](...), # from Executor1
DocList[...](...), # from Executor2
DocList[...](...), # from Executor3
]
```
* Use `docs_map` to receive a Dict, where each item's key is the name of an upstream Executor and the value is the DocList coming from that Executor:
```python
{
'Executor1': DocList[...](...),
'Executor2': DocList[...](...),
'Executor3': DocList[...](...),
}
```
(no-reduce)=
#### Reducing multiple DocLists to one DocList
The `no_reduce` argument determines whether DocLists are reduced into one when being received:
* To reduce all incoming DocLists into **one single DocList**, do not set `no_reduce` or set it to `False`. The `docs_map` and `docs_matrix` will be `None`.
* To receive **a list all incoming DocList** set `no_reduce` to `True`. The Executor will receive the DocLists independently under `docs_matrix` and `docs_map`.
```python
from jina import Flow, Executor, requests
from docarray import DocList, BaseDoc
class MyDoc(BaseDoc):
text: str = ''
class Exec1(Executor):
@requests
def foo(self, docs: DocList[MyDoc], **kwargs) -> DocList[MyDoc]:
for doc in docs:
doc.text = 'Exec1'
class Exec2(Executor):
@requests
def foo(self, docs: DocList[MyDoc], **kwargs) -> DocList[MyDoc]:
for doc in docs:
doc.text = 'Exec2'
class MergeExec(Executor):
@requests
def foo(self, docs: DocList[MyDoc], docs_matrix, **kwargs) -> DocList[MyDoc]:
documents_to_return = DocList[MyDoc]()
for doc1, doc2 in zip(*docs_matrix):
print(
f'MergeExec processing pairs of Documents "{doc1.text}" and "{doc2.text}"'
)
documents_to_return.append(
MyDoc(text=f'Document merging from "{doc1.text}" and "{doc2.text}"')
)
return documents_to_return
f = (
Flow()
.add(uses=Exec1, name='exec1')
.add(uses=Exec2, name='exec2')
.add(uses=MergeExec, needs=['exec1', 'exec2'], no_reduce=True)
)
with f:
returned_docs = f.post(on='/', inputs=MyDoc(), return_type=DocList[MyDoc])
print(f'Resulting documents {returned_docs[0].text}')
```
```shell
────────────────────────── 🎉 Flow is ready to serve! ──────────────────────────
╭────────────── 🔗 Endpoint ───────────────╮
│ ⛓ Protocol GRPC │
│ 🏠 Local 0.0.0.0:55761 │
│ 🔒 Private 192.168.1.187:55761 │
│ 🌍 Public 212.231.186.65:55761 │
╰──────────────────────────────────────────╯
MergeExec processing pairs of Documents "Exec1" and "Exec2"
Resulting documents Document merging from "Exec1" and "Exec2"
```
## Visualize
A {class}`~jina.Flow` has a built-in `.plot()` function which can be used to visualize the `Flow`:
```python
from jina import Flow
f = Flow().add().add()
f.plot('flow.svg')
```
```{figure} images/flow.svg
:width: 70%
```
```python
from jina import Flow
f = Flow().add(name='e1').add(needs='e1').add(needs='e1')
f.plot('flow-2.svg')
```
```{figure} images/flow-2.svg
:width: 70%
```
You can also do it in the terminal:
```bash
jina export flowchart flow.yml flow.svg
```
You can also visualize a remote Flow by passing the URL to `jina export flowchart`.
(logging-configuration)=
## Logging
The default {class}`jina.logging.logger.JinaLogger` uses rich console logging that writes to the system console. The `log_config` argument can be used to pass in a string of the pre-configured logging configuration names in Jina or the absolute YAML file path of the custom logging configuration. For most cases, the default logging configuration sufficiently covers local, Docker and Kubernetes environments.
Custom logging handlers can be configured by following the Python official [Logging Cookbook](https://docs.python.org/3/howto/logging-cookbook.html#logging-cookbook) examples. An example custom logging configuration file defined in a YAML file `logging.json.yml` is:
```yaml
handlers:
* StreamHandler
level: INFO
configs:
StreamHandler:
format: '%(asctime)s:{name:>15}@%(process)2d[%(levelname).1s]:%(message)s'
formatter: JsonFormatter
```
The logging configuration can be used as follows:
````{tab} Python
```python
from jina import Flow
f = Flow(log_config='./logging.json.yml')
```
````
````{tab} YAML
```yaml
jtype: Flow
with:
log_config: './logging.json.yml'
```
````
(logging-override)=
### Custom logging configuration
The default {ref}`logging ` or custom logging configuration at the Flow level will be propagated to the `Gateway` and `Executor` entities. If that is not desired, every `Gateway` or `Executor` entity can be provided with its own custom logging configuration.
You can configure two different `Executors` as in the below example:
```python
from jina import Flow
f = (
Flow().add(log_config='./logging.json.yml').add(log_config='./logging.file.yml')
) # Create a Flow with two Executors
```
`logging.file.yml` is another YAML file with a custom `FileHandler` configuration.
````{hint}
Refer to {ref}`Gateway logging configuration ` section for configuring the `Gateway` logging.
````
````{caution}
When exporting the Flow to Kubernetes, the log_config file path must refer to the absolute local path of each container. The custom logging
file must be included during the containerization process. If the availability of the file is unknown then its best to rely on the default
configuration. This restriction also applies to dockerized `Executors`. When running a dockerized Executor locally, the logging configuration
file can be mounted using {ref}`volumes `.
````
## Methods
The most important methods of the `Flow` object are the following:
| Method | Description |
|--------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| {meth}`~jina.Flow.add` | Adds an Executor to the Flow |
| {meth}`~jina.Flow.start()` | Starts the Flow. This will start all its Executors and check if they are ready to be used. |
| {meth}`~jina.Flow.close()` | Stops and closes the Flow. This will stop and shutdown all its Executors. |
| `with` context manager | Uses the Flow as a context manager. It will automatically start and stop your Flow. | |
| {meth}`~jina.Flow.plot()` | Visualizes the Flow. Helpful for building complex pipelines. |
| {meth}`~jina.clients.mixin.PostMixin.post()` | Sends requests to the Flow API. |
| {meth}`~jina.Flow.block()` | Blocks execution until the program is terminated. This is useful to keep the Flow alive so it can be used from other places (clients, etc). |
| {meth}`~jina.Flow.to_docker_compose_yaml()` | Generates a Docker-Compose file listing all Executors as services. |
| {meth}`~jina.Flow.to_kubernetes_yaml()` | Generates Kubernetes configuration files in ``. Based on your local Jina and docarray versions, Executor Hub may rebuild the Docker image during the YAML generation process. If you do not wish to rebuild the image, set the environment variable `JINA_HUB_NO_IMAGE_REBUILD`. |
| {meth}`~jina.clients.mixin.HealthCheckMixin.is_flow_ready()` | Check if the Flow is ready to process requests. Returns a boolean indicating the readiness. |
---
# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/orchestration/gateway-args.md
| Name | Description | Type | Default |
|----|----|----|----|
| `name` | The name of this object.
This will be used in the following places:
- how you refer to this object in Python/YAML/CLI
- visualization
- log message header
- ...
When not given, then the default naming strategy will apply. | `string` | `gateway` |
| `workspace` | The working directory for any IO operations in this object. If not set, then derive from its parent `workspace`. | `string` | `None` |
| `log_config` | The config name or the absolute path to the YAML config file of the logger used in this object. | `string` | `default` |
| `quiet` | If set, then no log will be emitted from this object. | `boolean` | `False` |
| `quiet_error` | If set, then exception stack information will not be added to the log | `boolean` | `False` |
| `timeout_ctrl` | The timeout in milliseconds of the control request, -1 for waiting forever | `number` | `60` |
| `entrypoint` | The entrypoint command overrides the ENTRYPOINT in Docker image. when not set then the Docker image ENTRYPOINT takes effective. | `string` | `None` |
| `docker_kwargs` | Dictionary of kwargs arguments that will be passed to Docker SDK when starting the docker '
container.
More details can be found in the Docker SDK docs: https://docker-py.readthedocs.io/en/stable/ | `object` | `None` |
| `prefetch` | Number of requests fetched from the client before feeding into the first Executor.
Used to control the speed of data input into a Flow. 0 disables prefetch (1000 requests is the default) | `number` | `1000` |
| `title` | The title of this HTTP server. It will be used in automatics docs such as Swagger UI. | `string` | `None` |
| `description` | The description of this HTTP server. It will be used in automatics docs such as Swagger UI. | `string` | `None` |
| `cors` | If set, a CORS middleware is added to FastAPI frontend to allow cross-origin access. | `boolean` | `False` |
| `no_debug_endpoints` | If set, `/status` `/post` endpoints are removed from HTTP interface. | `boolean` | `False` |
| `no_crud_endpoints` | If set, `/index`, `/search`, `/update`, `/delete` endpoints are removed from HTTP interface.
Any executor that has `@requests(on=...)` bound with those values will receive data requests. | `boolean` | `False` |
| `expose_endpoints` | A JSON string that represents a map from executor endpoints (`@requests(on=...)`) to HTTP endpoints. | `string` | `None` |
| `uvicorn_kwargs` | Dictionary of kwargs arguments that will be passed to Uvicorn server when starting the server
More details can be found in Uvicorn docs: https://www.uvicorn.org/settings/ | `object` | `None` |
| `ssl_certfile` | the path to the certificate file | `string` | `None` |
| `ssl_keyfile` | the path to the key file | `string` | `None` |
| `expose_graphql_endpoint` | If set, /graphql endpoint is added to HTTP interface. | `boolean` | `False` |
| `protocol` | Communication protocol of the server exposed by the Gateway. This can be a single value or a list of protocols, depending on your chosen Gateway. Choose the convenient protocols from: ['GRPC', 'HTTP', 'WEBSOCKET']. | `array` | `[]` |
| `host` | The host address of the runtime, by default it is 0.0.0.0. | `string` | `0.0.0.0` |
| `proxy` | If set, respect the http_proxy and https_proxy environment variables. otherwise, it will unset these proxy variables before start. gRPC seems to prefer no proxy | `boolean` | `False` |
| `uses` | The config of the gateway, it could be one of the followings:
* the string literal of an Gateway class name
* a Gateway YAML file (.yml, .yaml, .jaml)
* a docker image (must start with `docker://`)
* the string literal of a YAML config (must start with `!` or `jtype: `)
* the string literal of a JSON config
When use it under Python, one can use the following values additionally:
- a Python dict that represents the config
- a text file stream has `.read()` interface | `string` | `None` |
| `uses_with` | Dictionary of keyword arguments that will override the `with` configuration in `uses` | `object` | `None` |
| `py_modules` | The customized python modules need to be imported before loading the gateway
Note that the recommended way is to only import a single module - a simple python file, if your
gateway can be defined in a single file, or an ``__init__.py`` file if you have multiple files,
which should be structured as a python package. | `array` | `None` |
| `replicas` | The number of replicas of the Gateway. This replicas will only be applied when converted into Kubernetes YAML | `number` | `1` |
| `grpc_server_options` | Dictionary of kwargs arguments that will be passed to the grpc server as options when starting the server, example : {'grpc.max_send_message_length': -1} | `object` | `None` |
| `graph_description` | Routing graph for the gateway | `string` | `{}` |
| `graph_conditions` | Dictionary stating which filtering conditions each Executor in the graph requires to receive Documents. | `string` | `{}` |
| `deployments_addresses` | JSON dictionary with the input addresses of each Deployment | `string` | `{}` |
| `deployments_metadata` | JSON dictionary with the request metadata for each Deployment | `string` | `{}` |
| `deployments_no_reduce` | list JSON disabling the built-in merging mechanism for each Deployment listed | `string` | `[]` |
| `compression` | The compression mechanism used when sending requests from the Head to the WorkerRuntimes. For more details, check https://grpc.github.io/grpc/python/grpc.html#compression. | `string` | `None` |
| `timeout_send` | The timeout in milliseconds used when sending data requests to Executors, -1 means no timeout, disabled by default | `number` | `None` |
| `runtime_cls` | The runtime class to run inside the Pod | `string` | `GatewayRuntime` |
| `timeout_ready` | The timeout in milliseconds of a Pod waits for the runtime to be ready, -1 for waiting forever | `number` | `600000` |
| `env` | The map of environment variables that are available inside runtime | `object` | `None` |
| `env_from_secret` | The map of environment variables that are read from kubernetes cluster secrets | `object` | `None` |
| `floating` | If set, the current Pod/Deployment can not be further chained, and the next `.add()` will chain after the last Pod/Deployment not this current one. | `boolean` | `False` |
| `reload` | If set, the Gateway will restart while serving if YAML configuration source is changed. | `boolean` | `False` |
| `port` | The port for input data to bind the gateway server to, by default, random ports between range [49152, 65535] will be assigned. The port argument can be either 1 single value in case only 1 protocol is used or multiple values when many protocols are used. | `number` | `random in [49152, 65535]` |
| `monitoring` | If set, spawn an http server with a prometheus endpoint to expose metrics | `boolean` | `False` |
| `port_monitoring` | The port on which the prometheus server is exposed, default is a random port between [49152, 65535] | `number` | `random in [49152, 65535]` |
| `retries` | Number of retries per gRPC call. If <0 it defaults to max(3, num_replicas) | `number` | `-1` |
| `tracing` | If set, the sdk implementation of the OpenTelemetry tracer will be available and will be enabled for automatic tracing of requests and customer span creation. Otherwise a no-op implementation will be provided. | `boolean` | `False` |
| `traces_exporter_host` | If tracing is enabled, this hostname will be used to configure the trace exporter agent. | `string` | `None` |
| `traces_exporter_port` | If tracing is enabled, this port will be used to configure the trace exporter agent. | `number` | `None` |
| `metrics` | If set, the sdk implementation of the OpenTelemetry metrics will be available for default monitoring and custom measurements. Otherwise a no-op implementation will be provided. | `boolean` | `False` |
| `metrics_exporter_host` | If tracing is enabled, this hostname will be used to configure the metrics exporter agent. | `string` | `None` |
| `metrics_exporter_port` | If tracing is enabled, this port will be used to configure the metrics exporter agent. | `number` | `None` |
| `stateful` | If set, start consensus module to make sure write operations are properly replicated between all the replicas | `boolean` | `False` |
| `pod_ports` | When using StatefulExecutors, if they want to restart it is important to keep the RAFT cluster configuration | `number` | `None` |
---
# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/orchestration/handle-exceptions.md
(flow-error-handling)=
# Handle Exceptions
When building a complex solution, things sometimes go wrong. Jina-serve does its best to recover from failures, handle them gracefully, and report useful failure information to the user.
The following outlines (more or less) common failure cases, and explains how Jina-serve responds to each.
## Executor errors
In general there are two places where an Executor level error can happen:
* If an {class}`~jina.Executor`'s `__init__` method raises an Exception, the Orchestration cannot start. In this case this Executor runtime raises the Exception, and the Orchestration throws a `RuntimeFailToStart` Exception.
* If one of the Executor's `@requests` methods raises an Exception, the error message is added to the response and sent back to the client. If the gRPC or WebSockets protocols are used, the networking stream is not interrupted and can accept further requests.
In both cases, the {ref}`Jina Client ` raises an Exception.
### Terminate an Executor on certain errors
Some exceptions like network errors or request timeouts can be transient and can recover automatically. Sometimes fatal errors or user-defined errors put the Executor in an unusable state, in which case it can be restarted. Locally the Orchestration must be re-run manually to restore Executor availability.
On Kubernetes deployments, this can be automated by terminating the Executor process, causing the Pod to terminate. The autoscaler restores availability by creating a new Pod to replace the terminated one. Termination can be enabled for one or more errors by using the `exit_on_exceptions` argument when adding the Executor to an Orchestration When it matches the caught exception, the Executor terminates gracefully.
A sample Orchestration can be `Deployment(uses=MyExecutor, exit_on_exceptions: ['Exception', 'RuntimeException'])`. The `exit_on_exceptions` argument accepts a list of Python or user-defined Exception or Error class names.
## Network errors
When an Orchestration Gateway can't reach an {ref}`Executor or Head `, the Orchestration attempts to re-connect to the faulty deployment according to a retry policy. The same applies to calls to Executors that time out. The specifics of this policy depend on the Orchestration's environment, as outlined below.
````{admonition} Hint: Prevent Executor timeouts
:class: hint
If you regularly experience Executor call timeouts, set the Orchestration's `timeout_send` attribute to a larger value
by setting `Deployment(timeout_send=time_in_ms)` or `Flow(timeout_send=time_in_ms)` in Python
or `timeout_send: time_in_ms` in your Orchestration YAML with-block.
Neural network forward passes on CPU (and other unusually expensive operations) commonly lead to timeouts with the default setting.
````
````{admonition} Hint: Custom retry policy
:class: hint
You can override the default retry policy and instead choose a number of retries performed for each Executor
with `Orchestration(retries=n)` in Python, or `retries: n` in the Orchestration
YAML `with` block.
````
If, during the complete execution of this policy, no successful call to any Executor replica can be made, the request is aborted and the failure is {ref}`reported to the client `.
### Request retries: Local deployment
If an Orchestration is deployed locally (with or without {ref}`containerized Executors `), the following policy for failed requests applies on a per-Executor basis:
* If there are multiple replicas of the target Executor, try each replica at least once, or until the request succeeds.
* Irrespective of the number of replicas, try the request at least three times, or until it succeeds. If there are fewer than three replicas, try them in a round-robin fashion.
### Request retries: Deployment with Kubernetes
If an Orchestration is {ref}`deployed in Kubernetes ` without a service mesh, retries cannot be distributed to different replicas of the same Executor.
````{admonition} See Also
:class: seealso
The impossibility of retries across different replicas is a limitation of Kubernetes in combination with gRPC.
If you want to learn more about this limitation, see [this](https://kubernetes.io/blog/2018/11/07/grpc-load-balancing-on-kubernetes-without-tears/) Kubernetes blog post.
An easy way to overcome this limitation is to use a service mesh like [Linkerd](https://linkerd.io/).
````
Concretely, this results in the following per-Executor retry policy:
* Try the request three times, or until it succeeds, always on the same replica of the Executor
### Request retries: Deployment with Kubernetes and service mesh
A Kubernetes service mesh can enable load balancing, and thus retries, between an Executor's replicas.
````{admonition} Hint
:class: hint
While Jina supports any service mesh, the output of `f.to_kubernetes_yaml()` already includes the necessary annotations for [Linkerd](https://linkerd.io/).
````
If a service mesh is installed alongside Jina-serve in the Kubernetes cluster, the following retry policy applies for each Executor:
* Try the request at least three times, or until it succeeds
* Distribute the requests to the replicas according to the service mesh's configuration
````{admonition} Caution
:class: caution
Many service meshes have the ability to perform retries themselves. Be careful about setting up service mesh level retries in combination with Jina, as it may lead to unwanted behaviour in combination with Jina's own retry policy.
Instead, you may want to disable Jina level retries by setting `Orchestration(retries=0)` or `Deployment(retries=0)` in Python, or `retries: 0` in the Orchestration YAML `with` block.
````
(failure-reporting)=
### Failure reporting
If the retry policy is exhausted for a given request, the error is reported back to the corresponding client.
The resulting error message contains the *network address* of the failing Executor. If multiple replicas are present, all addresses are reported - unless the Orchestration is deployed using Kubernetes, in which case the replicas are managed by Kubernetes and only a single address is available.
Depending on the client-to-gateway protocol, and the type of error, the error message is returned in one of the following ways:
**Could not connect to Executor:**
* **gRPC**: A response with the gRPC status code 14 (*UNAVAILABLE*) is issued, and the error message is contained in the `details` field.
* **HTTP**: A response with the HTTP status code 503 (*SERVICE_UNAVAILABLE*) is issued, and the error message is contained in `response['header']['status']['description']`.
* **WebSockets**: The stream closes with close code 1011 (*INTERNAL_ERROR*) and the message is contained in the WebSocket close message.
**Call to Executor timed out:**
* **gRPC**: A response with the gRPC status code 4 (*DEADLINE_EXCEEDED*) is issued, and the error message is contained in the `details` field.
* **HTTP**: A response with the HTTP status code 504 (*GATEWAY_TIMEOUT*) is issued, and the error message is contained in `response['header']['status']['description']`.
* **WebSockets**: The stream closes with close code 1011 (*INTERNAL_ERROR*) and the message is contained in the WebSockets close message.
For any of these scenarios, the {ref}`Jina Client ` raises a `ConnectionError` containing the error message.
## Debug via breakpoint
Standard Python breakpoints don't work inside `Executor` methods when called inside an Orchestration context manager. Nevertheless, `import epdb; epdb.set_trace()` works just like a native Python breakpoint. Note that you need to `pip install epdb` to access this type of breakpoints.
```{admonition} Debugging in Flows
:class: info
The below code is for Deployments, but can easily be adapted for Flows.
```
````{tab} ✅ Do
```{code-block} python
---
emphasize-lines: 7
---
from jina import Deployment, Executor, requests
class CustomExecutor(Executor):
@requests
def foo(self, **kwargs):
a = 25
import epdb; epdb.set_trace()
print(f'\n\na={a}\n\n')
def main():
dep = Deployment(uses=CustomExecutor)
with dep:
dep.post(on='')
if __name__ == '__main__':
main()
```
````
````{tab} 😔 Don't
```{code-block} python
---
emphasize-lines: 7
---
from jina import Deployment, Executor, requests
class CustomExecutor(Executor):
@requests
def foo(self, **kwargs):
a = 25
breakpoint()
print(f'\n\na={a}\n\n')
def main():
dep = Deployment(uses=CustomExecutor)
with dep:
dep.post(on='')
if __name__ == '__main__':
main()
```
````
---
# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/orchestration/health-check.md
# Health Check
Once an Orchestration is running, you can use `jina ping` [CLI](../../api/jina_cli.rst) to run a health check of the complete Orchestration or (in the case of a Flow) individual Executors or Gateway.
````{tab} Deployment
Start a Deployment in Python:
```python
from jina import Deployment
dep = Deployment(protocol='grpc', port=12345)
with dep:
dep.block()
```
Check the readiness of the Deployment:
```bash
jina ping deployment grpc://localhost:12345
```
````
````{tab} Flow
Start a Flow in Python:
```python
from jina import Flow
f = Flow(protocol='grpc', port=12345).add(port=12346)
with f:
f.block()
```
Check the readiness of the Flow:
```bash
jina ping flow grpc://localhost:12345
```
You can also check the readiness of an individual Executor:
```bash
jina ping executor localhost:12346
```
...or the readiness of the Gateway service:
```bash
jina ping gateway grpc://localhost:12345
```
````
When these commands succeed, you should see something like:
```text
INFO JINA@28600 readiness check succeeded 1 times!!!
```
```{admonition} Use in Kubernetes
:class: note
The CLI exits with code 1 when the readiness check is not successful, which makes it a good choice to be used as readinessProbe for Executor and Gateway when
deployed in Kubernetes.
```
---
# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/orchestration/hot-reload.md
# Hot Reload
While developing your Orchestration, you may want it to reload automatically as you change the YAML configuration.
For this you can use the Orchestration's `reload` argument to reload it with the updated configuration every time you change the YAML configuration.
````{admonition} Caution
:class: caution
This feature aims to let developers iterate faster while developing, but is not intended for production use.
````
````{admonition} Note
:class: note
This feature requires `watchfiles>=0.18` to be installed.
````
````{tab} Deployment
To see how this works, let's define a Deployment in `deployment.yml` with a `reload` option:
```yaml
jtype: Deployment
uses: ConcatenateTextExecutor
uses_with:
text_to_concat: foo
with:
port: 12345
reload: True
```
Load and expose the Orchestration:
```python
import os
from jina import Deployment, Executor, requests
from docarray import DocList
from docarray.documents import TextDoc
class ConcatenateTextExecutor(Executor):
@requests
def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
for doc in docs:
doc.text += text_to_concat
return docs
os.environ['JINA_LOG_LEVEL'] = 'DEBUG'
dep = Deployment.load_config('deployment.yml')
with dep:
dep.block()
```
You can see that the Orchestration is running and serving:
```python
from jina import Client
from docarray import DocList
from docarray.documents import TextDoc
c = Client(port=12345)
print(c.post(on='/', inputs=DocList[TextDoc](TextDoc()), return_type=DocList[TextDoc])[0].text)
```
```text
foo
```
You can edit the Orchestration YAML file and save the changes:
```yaml
jtype: Deployment
uses: ConcatenateTextExecutor
uses_with:
text_to_concat: bar
with:
port: 12345
reload: True
```
You should see the following in the Orchestration's logs:
```text
INFO Deployment@28301 change in Deployment YAML deployment.yml observed, restarting Deployment
```
After this, the behavior of the Deployment's Executor will change:
```python
from jina import Client
from docarray import DocList
from docarray.documents import TextDoc
c = Client(port=12345)
print(c.post(on='/', inputs=DocList[TextDoc](TextDoc()), return_type=DocList[TextDoc])[0].text)
```
```text
bar
```
````
````{tab} Flow
To see how this works, let's define a Flow in `flow.yml` with a `reload` option:
```yaml
jtype: Flow
with:
port: 12345
reload: True
executors:
* name: exec1
uses: ConcatenateTextExecutor
```
Load and expose the Orchestration:
```python
import os
from jina import Deployment, Executor, requests
from docarray import DocList
from docarray.documents import TextDoc
class ConcatenateTextExecutor(Executor):
@requests
def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
for doc in docs:
doc.text += text_to_concat
return docs
os.environ['JINA_LOG_LEVEL'] = 'DEBUG'
f = Flow.load_config('flow.yml')
with f:
f.block()
```
You can see that the Flow is running and serving:
```python
from jina import Client
from docarray import DocList
from docarray.documents import TextDoc
c = Client(port=12345)
print(c.post(on='/', inputs=DocList[TextDoc](TextDoc()), return_type=DocList[TextDoc])[0].text)
```
```text
add text
```
You can edit the Flow YAML file and save the changes:
```yaml
jtype: Flow
with:
port: 12345
reload: True
executors:
* name: exec1
uses: ConcatenateTextExecutor
* name: exec2
uses: ConcatenateTextExecutor
```
You should see the following in the Flow's logs:
```text
INFO Flow@28301 change in Flow YAML flow.yml observed, restarting Flow
```
After this, the Flow will have two Executors with the new topology:
```python
from jina import Client
from docarray import DocList
from docarray.documents import TextDoc
c = Client(port=12345)
print(c.post(on='/', inputs=DocList[TextDoc](TextDoc()), return_type=DocList[TextDoc])[0].text)
```
```text
add text add text
```
---
# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/orchestration/index.md
(orchestration)=
# {fas}`network-wired` Orchestration
As seen in the {ref}`architecture overview `, Jina-serve is organized in different layers.
The Orchestration layer is composed of concepts that let you orchestrate, serve and scale your Executors with ease.
Two objects belong to this family:
* A single Executor ({class}`~Deployment`), ideal for serving a single model or microservice.
* A pipeline of Executors ({class}`~Flow`), ideal for more complex operations where Documents need to be processed in multiple ways.
Both Deployment and Flow share similar syntax and behavior. The main differences are:
* Deployments orchestrate a single Executor, while Flows orchestrate multiple Executors connected into a pipeline.
* Flows have a {ref}`Gateway `, while Deployments do not.
```{toctree}
:hidden:
deployment
flow
add-executors
scale-out
hot-reload
handle-exceptions
readiness
health-check
instrumentation
troubleshooting-on-multiprocess
yaml-spec
```
---
# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/orchestration/instrumentation.md
(instrumenting-flow)=
# Instrumentation
A {class}`~jina.Flow` exposes configuration parameters for leveraging [OpenTelemetry](https://opentelemetry.io) Tracing and Metrics observability features. These tools let you instrument and collect various signals which help to analyze your application's real-time behavior.
A {class}`~jina.Flow` is composed of several Pods, namely the {class}`~jina.serve.runtimes.gateway.GatewayRuntime`, {class}`~jina.Executor`s, and potentially a {class}`~jina.serve.runtimes.head.HeadRuntime` (see the {ref}`architecture overview `). Each Pod is its own microservice. These services expose their own metrics using the Python [OpenTelemetry API and SDK](https://opentelemetry-python.readthedocs.io/en/stable/api/trace.html).
Tracing and Metrics can be enabled and configured independently to allow more flexibility in the data collection and visualization setup.
```{hint}
:class: seealso
Refer to {ref}`OpenTelemetry Setup ` for a full detail on the OpenTelemetry data collection and visualization setup.
```
```{caution}
Prometheus-only based metrics collection will soon be deprecated. Refer to {ref}`Prometheus/Grafana Support ` section for the deprecated setup.
```
## Tracing
````{tab} Python
```python
from jina import Flow
f = Flow(
tracing=True,
traces_exporter_host='http://localhost',
traces_exporter_port=4317,
).add(uses='jinaai://jina-ai/SimpleIndexer')
with f:
f.block()
```
````
````{tab} YAML
In `flow.yaml`:
```yaml
jtype: Flow
with:
tracing: true
tracing_exporter_host: 'localhost'
tracing_exporter_port: 4317
executors:
* uses: jinaai://jina-ai/SimpleIndexer
```
```bash
jina flow --uses flow.yaml
```
````
This Flow creates two Pods: one for the Gateway, and one for the SimpleIndexer Executor. The Flow propagates the Tracing configuration to each Pod so you don't need to duplicate the arguments on each Executor.
The `traces_exporter_host` and `traces_exporter_port` arguments configure the traces [exporter](https://opentelemetry.io/docs/instrumentation/python/exporters/#trace-1) which are responsible for pushing collected data to the [collector](https://opentelemetry.io/docs/collector/) backend.
```{hint}
:class: seealso
Refer to {ref}`OpenTelemetry Setup ` for more details on exporter and collector setup and usage.
```
### Available Traces
Each Pod supports different default traces out of the box, and also lets you define your own custom traces in the Executor. The `Runtime` name is used to create the OpenTelemetry [Service](https://opentelemetry.io/docs/reference/specification/resource/semantic_conventions/#service) [Resource](https://opentelemetry.io/docs/reference/specification/resource/) attribute. The default value for the `name` argument is the `Runtime` or `Executor` class name.
Because not all Pods have the same role, they expose different kinds of traces:
#### Gateway Pods
| Operation name | Description |
|-------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------|
| `/jina.JinaRPC/Call` | Traces the request from the client to the Gateway server. |
| `/jina.JinaSingleDataRequestRPC/process_single_data` | Internal operation for the request originating from the Gateway to the target Head or Executor. |
#### Head Pods
| Operation name | Description |
|-------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------|
| `/jina.JinaSingleDataRequestRPC/process_single_data` | Internal operation for the request originating from the Gateway to the target Head. Another child span is created for the request originating from the Head to the Executor.|
#### Executor Pods
| Operation name | Description |
|-------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------|
| `/jina.JinaSingleDataRequestRPC/process_single_data` | Executor server operation for the request originating from the Gateway/Head to the Executor request handler. |
| `/endpoint` | Internal operation for the request originating from the Executor request handler to the target `@requests(=/endpoint)` method. The `endpoint` will be `default` if no endpoint name is provided. |
```{seealso}
Beyond the above-mentioned default traces, you can define {ref}`custom traces ` for your Executor.
```
## Metrics
```{hint}
Prometheus-only based metrics collection will soon be deprecated. Refer to {ref}`Prometheus/Grafana Support ` section for the deprecated setup.
```
````{tab} Python
```python
from jina import Flow
f = Flow(
metrics=True,
metrics_exporter_host='http://localhost',
metrics_exporter_port=4317,
).add(uses='jinaai://jina-ai/SimpleIndexer')
with f:
f.block()
```
````
````{tab} YAML
In `flow.yaml`:
```yaml
jtype: Flow
with:
metrics: true
metrics_exporter_host: 'localhost'
metrics_exporter_port: 4317
executors:
* uses: jinaai://jina-ai/SimpleIndexer
```
```bash
jina flow --uses flow.yaml
```
````
The Flow propagates the Metrics configuration to each Pod. The `metrics_exporter_host` and `metrics_exporter_port` arguments configure the metrics [exporter](https://opentelemetry.io/docs/instrumentation/python/exporters/#metrics-1) responsible for pushing collected data to the [collector](https://opentelemetry.io/docs/collector/) backend.
```{hint}
:class: seealso
Refer to {ref}`OpenTelemetry Setup ` for more details on the exporter and collector setup and usage.
```
### Available metrics
Each Pod supports different default metrics out of the box, also letting you define your own custom metrics in the Executor. All metrics add the `Runtime` name to the [metric attributes](https://opentelemetry.io/docs/reference/specification/metrics/semantic_conventions/) which can be used to filter data from different Pods.
Because not all Pods have the same role, they expose different kinds of metrics:
#### Gateway Pods (2)
| Metrics name | Metrics type | Description |
|-------------------------------------|------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------|
| `jina_receiving_request_seconds` | [Histogram](https://opentelemetry.io/docs/reference/specification/metrics/api/#histogram) | Measures time elapsed between receiving a request from the client and sending back the response. |
| `jina_sending_request_seconds` | [Histogram](https://opentelemetry.io/docs/reference/specification/metrics/api/#histogram) | Measures time elapsed between sending a downstream request to an Executor/Head and receiving the response back. |
| `jina_number_of_pending_requests` | [UpDownCounter](https://opentelemetry.io/docs/reference/specification/metrics/api/#updowncounter) | Counts the number of pending requests. |
| `jina_successful_requests` | [Counter](https://opentelemetry.io/docs/reference/specification/metrics/api/#counter) | Counts the number of successful requests returned by the Gateway. |
| `jina_failed_requests` | [Counter](https://opentelemetry.io/docs/reference/specification/metrics/api/#counter) | Counts the number of failed requests returned by the Gateway. |
| `jina_sent_request_bytes` | [Histogram](https://opentelemetry.io/docs/reference/specification/metrics/api/#histogram) | Measures the size in bytes of the request sent by the Gateway to the Executor or to the Head. |
| `jina_received_response_bytes` | [Histogram](https://opentelemetry.io/docs/reference/specification/metrics/api/#histogram) | Measures the size in bytes of the request returned by the Executor. |
| `jina_received_request_bytes` | [Histogram](https://opentelemetry.io/docs/reference/specification/metrics/api/#histogram) | Measures the size of the request in bytes received at the Gateway level. |
| `jina_sent_response_bytes` | [Histogram](https://opentelemetry.io/docs/reference/specification/metrics/api/#histogram) | Measures the size in bytes of the response returned from the Gateway to the Client. |
```{seealso}
You can find more information on the different type of metrics in Prometheus [here](https://prometheus.io/docs/concepts/metric_types/#metric-types)
```
#### Head Pods (2)
| Metric name | Metric type | Description |
|-----------------------------------------|-------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------|
| `jina_receiving_request_seconds` | [Histogram](https://opentelemetry.io/docs/reference/specification/metrics/api/#histogram) | Measures the time elapsed between receiving a request from the Gateway and sending back the response. |
| `jina_sending_request_seconds` | [Histogram](https://opentelemetry.io/docs/reference/specification/metrics/api/#histogram) | Measures the time elapsed between sending a downstream request to an Executor and receiving the response back. |
| `jina_number_of_pending_requests` | [UpDownCounter](https://opentelemetry.io/docs/reference/specification/metrics/api/#updowncounter)| Counts the number of pending requests. |
| `jina_successful_requests` | [Counter](https://opentelemetry.io/docs/reference/specification/metrics/api/#counter) | Counts the number of successful requests returned by the Head. |
| `jina_failed_requests` | [Counter](https://opentelemetry.io/docs/reference/specification/metrics/api/#counter) | Counts the number of failed requests returned by the Head. |
| `jina_sent_request_bytes` | [Histogram](https://opentelemetry.io/docs/reference/specification/metrics/api/#histogram) | Measures the size in bytes of the request sent by the Head to the Executor. |
| `jina_received_response_bytes` | [Histogram](https://opentelemetry.io/docs/reference/specification/metrics/api/#histogram) | Measures the size in bytes of the response returned by the Executor. |
| `jina_received_request_bytes` | [Histogram](https://opentelemetry.io/docs/reference/specification/metrics/api/#histogram) | Measures the size of the request in bytes received at the Head level. |
| `jina_sent_response_bytes` | [Histogram](https://opentelemetry.io/docs/reference/specification/metrics/api/#histogram) | Measures the size in bytes of the response returned from the Head to the Gateway. |
#### Executor Pods (2)
The Executor also adds the Executor class name and the request endpoint for the `@requests` or `@monitor` decorated method level metrics:
| Metric name | Metric type | Description |
|----------------------------------|----------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------|
| `jina_receiving_request_seconds` | [Histogram](https://opentelemetry.io/docs/reference/specification/metrics/api/#histogram) | Measures the time elapsed between receiving a request from the Gateway (or the head) and sending back the response. |
| `jina_process_request_seconds` | [Histogram](https://opentelemetry.io/docs/reference/specification/metrics/api/#histogram) | Measures the time spend calling the requested method |
| `jina_document_processed` | [Counter](https://opentelemetry.io/docs/reference/specification/metrics/api/#counter) | Counts the number of Documents processed by an Executor |
| `jina_successful_requests` | [Counter](https://opentelemetry.io/docs/reference/specification/metrics/api/#counter) | Total count of successful requests returned by the Executor across all endpoints |
| `jina_failed_requests` | [Counter](https://opentelemetry.io/docs/reference/specification/metrics/api/#counter) | Total count of failed requests returned by the Executor across all endpoints |
| `jina_received_request_bytes` | [Histogram](https://opentelemetry.io/docs/reference/specification/metrics/api/#histogram) | Measures the size in bytes of the request received at the Executor level |
| `jina_sent_response_bytes` | [Histogram](https://opentelemetry.io/docs/reference/specification/metrics/api/#histogram) | Measures the size in bytes of the response returned from the Executor to the Gateway |
```{seealso}
Beyond the default metrics outlined above, you can also define {ref}`custom metrics ` for your Executor.
```
```{hint}
`jina_process_request_seconds` and `jina_receiving_request_seconds` are different:
* `jina_process_request_seconds` only tracks time spent calling the function.
* `jina_receiving_request_seconds` tracks time spent calling the function **and** the gRPC communication overhead.
```
## See also
* {ref}`Defining custom traces and metrics in an Executor `
* {ref}`How to deploy and use OpenTelemetry in Jina-serve `
* [Tracing in OpenTelemetry](https://opentelemetry.io/docs/concepts/signals/traces/)
* [Metrics in OpenTelemetry](https://opentelemetry.io/docs/concepts/signals/metrics/)
---
# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/orchestration/readiness.md
# Readiness
An Orchestration is marked as "ready", when:
* Its Executor is fully loaded and ready (in the case of a Deployment)
* All its Executors and Gateway are fully loaded and ready (in the case of a Flow)
After that, an Orchestration is able to process requests.
{class}`~jina.Client` offers an API to query these readiness endpoints. You can do this via the Orchestration directly, via the Client, or via the CLI: You can call {meth}`~jina.clients.mixin.HealthCheckMixin.is_flow_ready` or {meth}`~jina.Flow.is_flow_ready`. It returns `True` if the Flow is ready, and `False` if it is not.
## Via Orchestration
````{tab} Deployment
```python
from jina import Deployment
dep = Deployment()
with dep:
print(dep.is_deployment_ready())
print(dep.is_deployment_ready())
```
```text
True
False
```
````
````{tab} Flow
```python
from jina import Flow
f = Flow.add()
with f:
print(f.is_flow_ready())
print(f.is_flow_ready())
```
```text
True
False
```
````
## Via Jina-serve Client
You can check the readiness from the client:
````{tab} Deployment
```python
from jina import Deployment
dep = Deployment(port=12345)
with dep:
dep.block()
```
```python
from jina import Client
client = Client(port=12345)
print(client.is_deployment_ready())
```
```text
True
```
````
````{tab} Flow
```python
from jina import Flow
f = Flow(port=12345).add()
with f:
f.block()
```
```python
from jina import Client
client = Client(port=12345)
print(client.is_flow_ready())
```
```text
True
```
````
### Via CLI
`````{tab} Deployment
```python
from jina import Deployment
dep = Deployment(port=12345)
with dep:
dep.block()
```
```bash
jina-serve ping executor grpc://localhost:12345
```
````{tab} Success
```text
INFO Jina-serve@92877 ping grpc://localhost:12345 at 0 round... [09/08/22 12:58:13]
INFO Jina-serve@92877 ping grpc://localhost:12345 at 0 round takes 0 seconds (0.04s)
INFO Jina-serve@92877 ping grpc://localhost:12345 at 1 round... [09/08/22 12:58:14]
INFO Jina-serve@92877 ping grpc://localhost:12345 at 1 round takes 0 seconds (0.01s)
INFO Jina-serve@92877 ping grpc://localhost:12345 at 2 round... [09/08/22 12:58:15]
INFO Jina-serve@92877 ping grpc://localhost:12345 at 2 round takes 0 seconds (0.01s)
INFO Jina-serve@92877 avg. latency: 24 ms [09/08/22 12:58:16]
```
````
````{tab} Failure
```text
INFO Jina-serve@92986 ping grpc://localhost:12345 at 0 round... [09/08/22 12:59:00]
ERROR GRPCClient@92986 Error while getting response from grpc server
WARNI… Jina-serve@92986 not responding, retry (1/3) in 1s
INFO Jina-serve@92986 ping grpc://localhost:12345 at 0 round takes 0 seconds (0.01s)
INFO Jina-serve@92986 ping grpc://localhost:12345 at 1 round... [09/08/22 12:59:01]
ERROR GRPCClient@92986 Error while getting response from grpc server
WARNI… Jina-serve@92986 not responding, retry (2/3) in 1s
INFO Jina-serve@92986 ping grpc://localhost:12345 at 1 round takes 0 seconds (0.01s)
INFO Jina-serve@92986 ping grpc://localhost:12345 at 2 round... [09/08/22 12:59:02]
ERROR GRPCClient@92986 Error while getting response from grpc server
WARNI… Jina-serve@92986 not responding, retry (3/3) in 1s
INFO Jina-serve@92986 ping grpc://localhost:12345 at 2 round takes 0 seconds (0.02s)
WARNI… Jina-serve@92986 message lost 100% (3/3)
```
````
`````
`````{tab} Flow
```python
from jina import Flow
f = Flow(port=12345)
with f:
f.block()
```
```bash
jina-serve ping flow grpc://localhost:12345
```
````{tab} Success
```text
INFO Jina-serve@92877 ping grpc://localhost:12345 at 0 round... [09/08/22 12:58:13]
INFO Jina-serve@92877 ping grpc://localhost:12345 at 0 round takes 0 seconds (0.04s)
INFO Jina-serve@92877 ping grpc://localhost:12345 at 1 round... [09/08/22 12:58:14]
INFO Jina-serve@92877 ping grpc://localhost:12345 at 1 round takes 0 seconds (0.01s)
INFO Jina-serve@92877 ping grpc://localhost:12345 at 2 round... [09/08/22 12:58:15]
INFO Jina-serve@92877 ping grpc://localhost:12345 at 2 round takes 0 seconds (0.01s)
INFO Jina-serve@92877 avg. latency: 24 ms [09/08/22 12:58:16]
```
````
````{tab} Failure
```text
INFO Jina-serve@92986 ping grpc://localhost:12345 at 0 round... [09/08/22 12:59:00]
ERROR GRPCClient@92986 Error while getting response from grpc server
WARNI… Jina-serve@92986 not responding, retry (1/3) in 1s
INFO Jina-serve@92986 ping grpc://localhost:12345 at 0 round takes 0 seconds (0.01s)
INFO Jina-serve@92986 ping grpc://localhost:12345 at 1 round... [09/08/22 12:59:01]
ERROR GRPCClient@92986 Error while getting response from grpc server
WARNI… Jina-serve@92986 not responding, retry (2/3) in 1s
INFO Jina-serve@92986 ping grpc://localhost:12345 at 1 round takes 0 seconds (0.01s)
INFO Jina-serve@92986 ping grpc://localhost:12345 at 2 round... [09/08/22 12:59:02]
ERROR GRPCClient@92986 Error while getting response from grpc server
WARNI… Jina-serve@92986 not responding, retry (3/3) in 1s
INFO Jina-serve@92986 ping grpc://localhost:12345 at 2 round takes 0 seconds (0.02s)
WARNI… Jina-serve@92986 message lost 100% (3/3)
```
````
`````
## Readiness check via third-party clients
You can check the status of a Flow using any gRPC/HTTP/WebSockets client, not just via Jina-serve Client.
To see how this works, first instantiate the Flow with its corresponding protocol and block it for serving:
````{tab} Deployment
```python
from jina import Deployment
import os
PROTOCOL = 'grpc' # it could also be http or websocket
os.environ[
'JINA_LOG_LEVEL'
] = 'DEBUG' # this way we can check what is the PID of the Executor
dep = Deployment(protocol=PROTOCOL, port=12345)
with dep:
dep.block()
```
```text
⠋ Waiting ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/0 -:--:--DEBUG gateway/rep-0@19075 adding connection for deployment executor0/heads/0 to grpc://0.0.0.0:12346 [05/31/22 18:10:16]
DEBUG executor0/rep-0@19074 start listening on 0.0.0.0:12346 [05/31/22 18:10:16]
DEBUG gateway/rep-0@19075 start server bound to 0.0.0.0:12345 [05/31/22 18:10:17]
DEBUG executor0/rep-0@19059 ready and listening [05/31/22 18:10:17]
DEBUG gateway/rep-0@19059 ready and listening [05/31/22 18:10:17]
╭─── 🎉 Deployment is ready to serve! ───╮
│ 🔗 Protocol GRPC │
│ 🏠 Local 0.0.0.0:12345 │
│ 🔒 Private 192.168.1.13:12345 │
╰────────────────────────────────────────╯
DEBUG Deployment@19059 2 Deployments (i.e. 2 Pods) are running in this Deployment
```
````
````{tab} Flow
```python
from jina import Flow
import os
PROTOCOL = 'grpc' # it could also be http or websocket
os.environ[
'JINA_LOG_LEVEL'
] = 'DEBUG' # this way we can check what is the PID of the Executor
f = Flow(protocol=PROTOCOL, port=12345).add()
with f:
f.block()
```
```text
⠋ Waiting ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/0 -:--:--DEBUG gateway/rep-0@19075 adding connection for deployment executor0/heads/0 to grpc://0.0.0.0:12346 [05/31/22 18:10:16]
DEBUG executor0/rep-0@19074 start listening on 0.0.0.0:12346 [05/31/22 18:10:16]
DEBUG gateway/rep-0@19075 start server bound to 0.0.0.0:12345 [05/31/22 18:10:17]
DEBUG executor0/rep-0@19059 ready and listening [05/31/22 18:10:17]
DEBUG gateway/rep-0@19059 ready and listening [05/31/22 18:10:17]
╭────── 🎉 Flow is ready to serve! ──────╮
│ 🔗 Protocol GRPC │
│ 🏠 Local 0.0.0.0:12345 │
│ 🔒 Private 192.168.1.13:12345 │
╰────────────────────────────────────────╯
DEBUG Flow@19059 2 Deployments (i.e. 2 Pods) are running in this Flow
```
````
### Using gRPC
When using grpc, use [grpcurl](https://github.com/fullstorydev/grpcurl) to access the Gateway's gRPC service that is responsible for reporting the Orchestration status.
```shell
docker pull fullstorydev/grpcurl:latest
docker run --network='host' fullstorydev/grpcurl -plaintext 127.0.0.1:12345 jina.JinaGatewayDryRunRPC/dry_run
```
The error-free output below signifies a correctly running Orchestration:
```json
{}
```
You can simulate an Executor going offline by killing its process.
```shell script
kill -9 $EXECUTOR_PID # in this case we can see in the logs that it is 19059
```
Then by doing the same check, you can see that it returns an error:
```shell
docker run --network='host' fullstorydev/grpcurl -plaintext 127.0.0.1:12345 jina.JinaGatewayDryRunRPC/dry_run
```
````{dropdown} Error output
```json
{
"code": "ERROR",
"description": "failed to connect to all addresses |Gateway: Communication error with deployment at address(es) 0.0.0.0:12346. Head or worker(s) may be down.",
"exception": {
"name": "InternalNetworkError",
"args": [
"failed to connect to all addresses |Gateway: Communication error with deployment at address(es) 0.0.0.0:12346. Head or worker(s) may be down."
],
"stacks": [
"Traceback (most recent call last):\n",
" File \"/home/joan/jina/jina/jina/serve/networking.py\", line 750, in task_wrapper\n timeout=timeout,\n",
" File \"/home/joan/jina/jina/jina/serve/networking.py\", line 197, in send_discover_endpoint\n await self._init_stubs()\n",
" File \"/home/joan/jina/jina/jina/serve/networking.py\", line 174, in _init_stubs\n self.channel\n",
" File \"/home/joan/jina/jina/jina/serve/networking.py\", line 1001, in get_available_services\n async for res in response:\n",
" File \"/home/joan/.local/lib/python3.7/site-packages/grpc/aio/_call.py\", line 326, in _fetch_stream_responses\n await self._raise_for_status()\n",
" File \"/home/joan/.local/lib/python3.7/site-packages/grpc/aio/_call.py\", line 237, in _raise_for_status\n self._cython_call.status())\n",
"grpc.aio._call.AioRpcError: \u003cAioRpcError of RPC that terminated with:\n\tstatus = StatusCode.UNAVAILABLE\n\tdetails = \"failed to connect to all addresses\"\n\tdebug_error_string = \"{\"created\":\"@1654012804.794351252\",\"description\":\"Failed to pick subchannel\",\"file\":\"src/core/ext/filters/client_channel/client_channel.cc\",\"file_line\":3134,\"referenced_errors\":[{\"created\":\"@1654012804.794350006\",\"description\":\"failed to connect to all addresses\",\"file\":\"src/core/lib/transport/error_utils.cc\",\"file_line\":163,\"grpc_status\":14}]}\"\n\u003e\n",
"\nDuring handling of the above exception, another exception occurred:\n\n",
"Traceback (most recent call last):\n",
" File \"/home/joan/jina/jina/jina/serve/runtimes/gateway/grpc/__init__.py\", line 155, in dry_run\n async for _ in self.streamer.stream(request_iterator=req_iterator):\n",
" File \"/home/joan/jina/jina/jina/serve/stream/__init__.py\", line 78, in stream\n async for response in async_iter:\n",
" File \"/home/joan/jina/jina/jina/serve/stream/__init__.py\", line 154, in _stream_requests\n response = self._result_handler(future.result())\n",
" File \"/home/joan/jina/jina/jina/serve/runtimes/gateway/request_handling.py\", line 146, in _process_results_at_end_gateway\n await asyncio.gather(gather_endpoints(request_graph))\n",
" File \"/home/joan/jina/jina/jina/serve/runtimes/gateway/request_handling.py\", line 88, in gather_endpoints\n raise err\n",
" File \"/home/joan/jina/jina/jina/serve/runtimes/gateway/request_handling.py\", line 80, in gather_endpoints\n endpoints = await asyncio.gather(*tasks_to_get_endpoints)\n",
" File \"/home/joan/jina/jina/jina/serve/networking.py\", line 754, in task_wrapper\n e=e, retry_i=i, dest_addr=connection.address\n",
" File \"/home/joan/jina/jina/jina/serve/networking.py\", line 697, in _handle_aiorpcerror\n details=e.details(),\n",
"jina.excepts.InternalNetworkError: failed to connect to all addresses |Gateway: Communication error with deployment at address(es) 0.0.0.0:12346. Head or worker(s) may be down.\n"
]
}
}
```
````
### Using HTTP or WebSockets
When using HTTP or WebSockets as the Gateway protocol, use curl to target the `/dry_run` endpoint and get the status of the Flow.
```shell
curl http://localhost:12345/dry_run
```
Error-free output signifies a correctly running Flow:
```json
{"code":0,"description":"","exception":null}
```
You can simulate an Executor going offline by killing its process:
```shell script
kill -9 $EXECUTOR_PID # in this case we can see in the logs that it is 19059
```
Then by doing the same check, you can see that the call returns an error:
```json
{"code":1,"description":"failed to connect to all addresses |Gateway: Communication error with deployment executor0 at address(es) {'0.0.0.0:12346'}. Head or worker(s) may be down.","exception":{"name":"InternalNetworkError","args":["failed to connect to all addresses |Gateway: Communication error with deployment executor0 at address(es) {'0.0.0.0:12346'}. Head or worker(s) may be down."],"stacks":["Traceback (most recent call last):\n"," File \"/home/joan/jina/jina/jina/serve/networking.py\", line 726, in task_wrapper\n timeout=timeout,\n"," File \"/home/joan/jina/jina/jina/serve/networking.py\", line 241, in send_requests\n await call_result,\n"," File \"/home/joan/.local/lib/python3.7/site-packages/grpc/aio/_call.py\", line 291, in __await__\n self._cython_call._status)\n","grpc.aio._call.AioRpcError: \n","\nDuring handling of the above exception, another exception occurred:\n\n","Traceback (most recent call last):\n"," File \"/home/joan/jina/jina/jina/serve/runtimes/gateway/http/app.py\", line 142, in _flow_health\n data_type=DataInputType.DOCUMENT,\n"," File \"/home/joan/jina/jina/jina/serve/runtimes/gateway/http/app.py\", line 399, in _get_singleton_result\n async for k in streamer.stream(request_iterator=request_iterator):\n"," File \"/home/joan/jina/jina/jina/serve/stream/__init__.py\", line 78, in stream\n async for response in async_iter:\n"," File \"/home/joan/jina/jina/jina/serve/stream/__init__.py\", line 154, in _stream_requests\n response = self._result_handler(future.result())\n"," File \"/home/joan/jina/jina/jina/serve/runtimes/gateway/request_handling.py\", line 148, in _process_results_at_end_gateway\n partial_responses = await asyncio.gather(*tasks)\n"," File \"/home/joan/jina/jina/jina/serve/runtimes/gateway/graph/topology_graph.py\", line 128, in _wait_previous_and_send\n self._handle_internalnetworkerror(err)\n"," File \"/home/joan/jina/jina/jina/serve/runtimes/gateway/graph/topology_graph.py\", line 70, in _handle_internalnetworkerror\n raise err\n"," File \"/home/joan/jina/jina/jina/serve/runtimes/gateway/graph/topology_graph.py\", line 125, in _wait_previous_and_send\n timeout=self._timeout_send,\n"," File \"/home/joan/jina/jina/jina/serve/networking.py\", line 734, in task_wrapper\n num_retries=num_retries,\n"," File \"/home/joan/jina/jina/jina/serve/networking.py\", line 697, in _handle_aiorpcerror\n details=e.details(),\n","jina.excepts.InternalNetworkError: failed to connect to all addresses |Gateway: Communication error with deployment executor0 at address(es) {'0.0.0.0:12346'}. Head or worker(s) may be down.\n"],"executor":""}}
```
---
# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/orchestration/scale-out.md
(scale-out)=
# Scale Out
By default, all Executors in an Orchestration run with a single instance. If an Executor is particularly slow, then it will reduce the overall throughput. To solve this, you can specify the number of `replicas` to scale out an Executor.
(replicate-executors)=
## Replicate stateless Executors
Replication creates multiple copies of the same {class}`~jina.Executor`. Each request in the Orchestration is then passed to only one replica (instance) of that Executor. **All replicas compete for a request. The idle replica gets the request first.**
This is useful for improving performance and availability:
* If you have slow Executors (e.g. embedding) you can scale up the number of instances to process multiple requests in parallel.
* Executors might need to be taken offline occasionally (for updates, failures, etc.), but you may want your Orchestration to still process requests without any downtime. Adding replicas allows any replica to be taken down as long as there is at least one still running. This ensures the high availability of your Orchestration.
### Replicate Executors in a Deployment
````{tab} Python
```python
from jina import Deployment
dep = Deployment(name='slow_encoder', replicas=3)
```
````
````{tab} YAML
```yaml
jtype: Deployment
uses: jinaai://jina-ai/CLIPEncoder
install_requirements: True
replicas: 5
```
````
### Replicate Executors in a Flow
````{tab} Python
```python
from jina import Flow
f = Flow().add(name='slow_encoder', replicas=3).add(name='fast_indexer')
```
````
````{tab} YAML
```yaml
jtype: Flow
executors:
* uses: jinaai://jina-ai/CLIPEncoder
install_requirements: True
replicas: 5
```
````
```{figure} images/replicas-flow.svg
:width: 70%
:align: center
Flow with three replicas of `slow_encoder` and one replica of `fast_indexer`
```
(scale-consensus)=
## Replicate stateful Executors with consensus using RAFT (Beta)
````{admonition} Python3.8 or newer version required on MacOS
:class: note
This feature requires at least Python3.8 version when working on MacOS.
````
````{admonition} Feature not supported on Windows
:class: note
This feature is not supported when using Windows
````
````{admonition} DocArray 0.30
:class: note
Starting from DocArray version 0.30, DocArray changed its interface and implementation drastically. We intend to support these new versions in the near future, but not every feature is yet available. Check {ref}`here ` for more information. This feature has been added with the new DocArray support.
````
````{admonition} gRPC protocol
:class: note
This feature is only available when using gRPC as the protocol for the Deployment or when the Deployment is part of a Flow
````
Replication is used to scale out Executors by creating copies of them that can handle requests in parallel, providing better RPS.
However, when an Executor maintains some sort of state, then it is not simple to guarantee that each copy of the Executor maintains the *same* state,
which can lead to undesired behavior, since each replica can provide different results depending on the specific state they hold.
In Jina, you can also have replication while guaranteeing the consensus between Executors. For this, we rely on [RAFT](https://raft.github.io/), which is
an algorithm that guarantees eventual consistency between replicas.
Consensus-based replication using RAFT is a distributed algorithm designed to provide fault tolerance and consistency in a distributed system. In a distributed system, the nodes may fail, and messages may be lost or delayed, which can lead to inconsistencies in the system.
The problem with traditional replication methods is that they can't guarantee consistency in a distributed system in the presence of failures. This is where consensus-based replication using RAFT comes in.
With this approach, each Executor can be considered as a Finite State Machine, meaning it has a set of potential states and a set of transitions that it can make between those states. Each request that is sent to the Executor can be considered as a log entry that needs to be replicated across the cluster.
To enable this kind of replication, we need to consider:
* Specify which methods of the Executor {ref}` can update its internal state `.
* Tell the Deployment to use the RAFT consensus algorithm by setting the `--stateful` argument.
* Set values of replicas compatible with RAFT. RAFT requires at least three replicas to guarantee consistency.
* Pass the `--peer-ports` argument so that the RAFT cluster can recover from a previous configuration of replicas if existed.
* Optionally you can pass `--raft-configuration` parameter to tweak the behavior of the consensus module. You can understand the values to pass from
[Hashicorp's RAFT library](https://github.com/ongardie/hashicorp-raft/blob/master/config.go).
```python
from jina import Deployment, Executor, requests
from jina.serve.executors.decorators import write
from docarray import DocList
from docarray.documents import TextDoc
class MyStateStatefulExecutor(Executor):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self._docs_dict = {}
@requests(on=['/index'])
@write
def index(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
for doc in docs:
self._docs_dict[doc.id] = doc
@requests(on=['/search'])
def search(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
for doc in docs:
self.logger.debug(f'Searching against {len(self._docs_dict)} documents')
doc.text = self._docs_dict[doc.id].text
d = Deployment(name='stateful_executor',
uses=MyStateStatefulExecutor,
replicas=3,
stateful=True,
workspace='./raft',
peer_ports=[12345, 12346, 12347])
with d:
d.block()
```
This capacity allows you not only to have replicas that work with robustness and availability, it also can help achieve higher throughput in some cases.
Let's imagine we write an Executor that is used to index and query documents from a vector index.
For this, we will use an in-memory solution from [DocArray](https://docs.docarray.org/user_guide/storing/index_in_memory/) that performs exact vector search.
```python
from jina import Deployment, Executor, requests
from jina.serve.executors.decorators import write
from docarray import DocList
from docarray.documents import TextDoc
from docarray.index.backends.in_memory import InMemoryExactNNIndex
class QueryDoc(TextDoc):
matches: DocList[TextDoc] = DocList[TextDoc]()
class ExactNNSearch(Executor):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self._index = InMemoryExactNNIndex[TextDoc]()
@requests(on=['/index'])
@write # I add write decorator to indicate that calling this endpoint updates the inner state
def index(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
self.logger.info(f'Indexing Document in index with {len(self._index)} documents indexed')
self._index.index(docs)
@requests(on=['/search'])
def search(self, docs: DocList[QueryDoc], **kwargs) -> DocList[QueryDoc]:
self.logger.info(f'Searching Document in index with {len(self._index)} documents indexed')
for query in docs:
docs, scores = self._index.find(query, search_field='embedding', limit=100)
query.matches = docs
d = Deployment(name='indexer',
port=5555,
uses=ExactNNSearch,
workspace='./raft',
replicas=3,
stateful=True,
peer_ports=[12345, 12346, 12347])
with d:
d.block()
```
Then in another terminal, we will send index and search requests:
```python
from jina import Client
from docarray import DocList
from docarray.documents import TextDoc
import time
import numpy as np
class QueryDoc(TextDoc):
matches: DocList[TextDoc] = DocList[TextDoc]()
NUM_DOCS_TO_INDEX = 100000
NUM_QUERIES = 1000
c = Client(port=5555)
index_docs = DocList[TextDoc](
[TextDoc(text=f'I am document {i}', embedding=np.random.rand(128)) for i in range(NUM_DOCS_TO_INDEX)])
start_indexing_time = time.time()
c.post(on='/index', inputs=index_docs, request_size=100)
print(f'Indexing {NUM_DOCS_TO_INDEX} Documents took {time.time() - start_indexing_time}s')
time.sleep(2) # let some time for the data to be replicated
search_da = DocList[QueryDoc](
[QueryDoc(text=f'I am document {i}', embedding=np.random.rand(128)) for i in range(NUM_QUERIES)])
start_querying_time = time.time()
responses = c.post(on='/search', inputs=search_da, request_size=1)
print(f'Searching {NUM_QUERIES} Queries took {time.time() - start_querying_time}s')
for res in responses:
print(f'{res.matches}')
```
In the logs of the `server` you can see how `index` requests reach every replica while `search` requests only reach one replica in a
`round robin` fashion.
Eventually every Indexer replica ends up with the same Documents indexed.
```text
INFO indexer/rep-2@923 Indexing Document in index with 99900 documents indexed
INFO indexer/rep-0@902 Indexing Document in index with 99200 documents indexed
INFO indexer/rep-1@910 Indexing Document in index with 99700 documents indexed
INFO indexer/rep-1@910 Indexing Document in index with 99800 documents indexed [04/28/23 16:51:06]
INFO indexer/rep-0@902 Indexing Document in index with 99300 documents indexed [04/28/23 16:51:06]
INFO indexer/rep-1@910 Indexing Document in index with 99900 documents indexed
INFO indexer/rep-0@902 Indexing Document in index with 99400 documents indexed
INFO indexer/rep-0@902 Indexing Document in index with 99500 documents indexed
INFO indexer/rep-0@902 Indexing Document in index with 99600 documents indexed
INFO indexer/rep-0@902 Indexing Document in index with 99700 documents indexed
INFO indexer/rep-0@902 Indexing Document in index with 99800 documents indexed
INFO indexer/rep-0@902 Indexing Document in index with 99900 documents indexed
```
But at search time, the consensus module does not affect, and only one replica serves the queries.
```text
INFO indexer/rep-0@902 Searching Document in index with 100000 documents indexed [04/28/23 16:59:21]
INFO indexer/rep-1@910 Searching Document in index with 100000 documents indexed [04/28/23 16:59:21]
INFO indexer/rep-2@923 Searching Document in index with 100000 documents indexed
```
If you run the same example by setting `replicas` to `1` without the consensus module, you can see the benefits it has in the QPS at search time,
while there is a little cost on the time used for indexing.
```python
d = Deployment(name='indexer',
port=5555,
uses=ExactNNSearch,
workspace='./raft',
replicas=1)
```
With one replica:
```text
Indexing 100000 Documents took 18.93274688720703s
Searching 1000 Queries took 385.96641397476196s
```
With three replicas and consensus:
```text
Indexing 100000 Documents took 35.066415548324585s
Searching 1000 Queries took 202.07950615882874s
```
This increases QPS from 2.5 to 5.
## Replicate on multiple GPUs
To replicate your {class}`~jina.Executor`s so that each replica uses a different GPU on your machine, you can tell the Orchestration to use multiple GPUs by passing `CUDA_VISIBLE_DEVICES=RR` as an environment variable.
```{caution}
You should only replicate on multiple GPUs with `CUDA_VISIBLE_DEVICES=RR` locally.
```
```{tip}
In Kubernetes or with Docker Compose you should allocate GPU resources to each replica directly in the configuration files.
```
The Orchestration assigns GPU devices in the following round-robin fashion:
| GPU device | Replica ID |
|------------|------------|
| 0 | 0 |
| 1 | 1 |
| 2 | 2 |
| 0 | 3 |
| 1 | 4 |
You can restrict the visible devices in round-robin assignment using `CUDA_VISIBLE_DEVICES=RR0:2`, where `0:2` corresponds to a Python slice. This creates the following assignment:
| GPU device | Replica ID |
|------------|------------|
| 0 | 0 |
| 1 | 1 |
| 0 | 2 |
| 1 | 3 |
| 0 | 4 |
You can restrict the visible devices in round-robin assignment by assigning the list of device IDs to `CUDA_VISIBLE_DEVICES=RR1,3`. This creates the following assignment:
| GPU device | Replica ID |
|------------|------------|
| 1 | 0 |
| 3 | 1 |
| 1 | 2 |
| 3 | 3 |
| 1 | 4 |
You can also refer to GPUs by their UUID. For instance, you could assign a list of device UUIDs:
```bash
CUDA_VISIBLE_DEVICES=RRGPU-0aaaaaaa-74d2-7297-d557-12771b6a79d5,GPU-0bbbbbbb-74d2-7297-d557-12771b6a79d5,GPU-0ccccccc-74d2-7297-d557-12771b6a79d5,GPU-0ddddddd-74d2-7297-d557-12771b6a79d5
```
Check [CUDA Documentation](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars) to see the accepted formats to assign CUDA devices by UUID.
| GPU device | Replica ID |
|------------|------------|
| GPU-0aaaaaaa-74d2-7297-d557-12771b6a79d5 | 0 |
| GPU-0bbbbbbb-74d2-7297-d557-12771b6a79d5 | 1 |
| GPU-0ccccccc-74d2-7297-d557-12771b6a79d5 | 2 |
| GPU-0ddddddd-74d2-7297-d557-12771b6a79d5 | 3 |
| GPU-0aaaaaaa-74d2-7297-d557-12771b6a79d5 | 4 |
For example, if you have three GPUs and one of your Executor has five replicas then:
### GPU replicas in a Deployment
````{tab} Python
```python
from jina import Deployment
dep = Deployment(uses='jinaai://jina-ai/CLIPEncoder', replicas=5, install_requirements=True)
with dep
dep.block()
```
```shell
CUDA_VISIBLE_DEVICES=RR python deployment.py
```
````
````{tab} YAML
```yaml
jtype: Deployment
with:
uses: jinaai://jina-ai/CLIPEncoder
install_requirements: True
replicas: 5
```
```shell
CUDA_VISIBLE_DEVICES=RR jina deployment --uses deployment.yaml
```
````
### GPU replicas in a Flow
````{tab} Python
```python
f = Flow().add(
uses='jinaai://jina-ai/CLIPEncoder', replicas=5, install_requirements=True
)
with f:
f.block()
```
```shell
CUDA_VISIBLE_DEVICES=RR python flow.py
```
````
````{tab} YAML
```yaml
jtype: Flow
executors:
* uses: jinaai://jina-ai/CLIPEncoder
install_requirements: True
replicas: 5
```
```shell
CUDA_VISIBLE_DEVICES=RR jina flow --uses flow.yaml
```
````
## Replicate external Executors
If you have external Executors with multiple replicas running elsewhere, you can add them to your Orchestration by specifying all the respective hosts and ports:
````{tab} Deployment
```python
from jina import Deployment
replica_hosts, replica_ports = ['localhost','91.198.174.192'], ['12345','12346']
Deployment(host=replica_hosts, port=replica_ports, external=True)
# alternative syntax
Deployment(host=['localhost:12345','91.198.174.192:12346'], external=True)
```
````
````{tab} Flow
```python
from jina import Flow
replica_hosts, replica_ports = ['localhost','91.198.174.192'], ['12345','12346']
Flow().add(host=replica_hosts, port=replica_ports, external=True)
# alternative syntax (2)
Flow().add(host=['localhost:12345','91.198.174.192:12346'], external=True)
```
````
This connects to `grpc://localhost:12345` and `grpc://91.198.174.192:12346` as two replicas of the external Executor.
````{admonition} Reducing
:class: hint
If an external Executor needs multiple predecessors, reducing needs to be enabled. So setting `no_reduce=True` is not allowed for these cases.
````
(partition-data-by-using-shards)=
## Customize polling behaviors
Replicas compete for a request, so only one of them will get the request. What if we want all replicas to get the request?
For example, consider index and search requests:
* Index (and update, delete) are handled by a single replica, as this is sufficient to add it one time.
* Search requests are handled by all replicas, as you need to search over all replicas to ensure the completeness of the result. The requested data could be on any shard.
For this purpose, you need `shards` and `polling`.
You can define if all or any `shards` receive the request by specifying `polling`. `ANY` means only one shard receives the request, while `ALL` means that all shards receive the same request.
````{tab} Deployment
```python
from jina import Deployment
dep = Deployment(name='ExecutorWithShards', shards=3, polling={'/custom': 'ALL', '/search': 'ANY', '*': 'ANY'})
```
````
````{tab} Flow
```python
from jina import Flow
f = Flow().add(name='ExecutorWithShards', shards=3, polling={'/custom': 'ALL', '/search': 'ANY', '*': 'ANY'})
```
````
The above example results in an Orchestration having the Executor `ExecutorWithShards` with the following polling options:
* `/index` has polling `ANY` (the default value is not changed here).
* `/search` has polling `ANY` as it is explicitly set (usually that should not be necessary).
* `/custom` has polling `ALL`.
* All other endpoints have polling `ANY` due to using `*` as a wildcard to catch all other cases.
### Understand behaviors of replicas and shards with polling
The following example demonstrates the different behaviors when setting `replicas`, `shards` and `polling` together.
````{tab} Deployment
```{code-block} python
---
emphasize-lines: 14
---
from jina import Deployment, Executor, requests
from docarray import DocList
from docarray.documents import TextDoc
class MyExec(Executor):
@requests
def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
print(f'inside: {docs.text}')
dep = (
Deployment(uses=MyExec, replicas=2, polling='ANY')
.needs_all()
)
with dep:
r = dep.post('/', TextDoc(text='hello'), return_type=DocList[TextDoc])
print(f'return: {r.text}')
```
````
````{tab} Flow
```{code-block} python
---
emphasize-lines: 15
---
from jina import Flow, Executor, requests
from docarray import DocList
from docarray.documents import TextDoc
class MyExec(Executor):
@requests
def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
print(f'inside: {docs.text}')
f = (
Flow()
.add(uses=MyExec, replicas=2, polling='ANY')
.needs_all()
)
with f:
r = dep.post('/', TextDoc(text='hello'), return_type=DocList[TextDoc])
print(f'return: {r.text}')
```
````
We now change the combination of the yellow highlighted lines above and see if there is any difference in the console output (note two prints in the snippet):
| | `polling='ALL'` | `polling='ANY'` |
| -------------- | -------------------------------------------------------- | ------------------------------------- |
| `replicas=2` | `inside: ['hello'] return: ['hello']` | `inside: ['hello'] return: ['hello']` |
| `shards=2` | `inside: ['hello'] inside: ['hello'] return: ['hello']` | `inside: ['hello'] return: ['hello']` |
---
# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/orchestration/troubleshooting-on-multiprocess.md
(multiprocessing-spawn)=
# Troubleshooting on Multiprocessing
When running an Orchestration locally, you may encounter errors caused by the `multiprocessing` package depending on your operating system and Python version.
```{admonition} Troubleshooting a Flow
:class: information
In this section we show an example showing a {ref}`Deployment `. However, exactly the same methodology applies to troubleshooting a Flow.
```
Here are some suggestions:
* Define and start the Orchestration via an explicit function call inside `if __name__ == '__main__'`, **especially when using `spawn` multiprocessing start method**. For example
````{tab} ✅ Do
```{code-block} python
---
emphasize-lines: 13, 14
---
from jina import Deployment, Executor, requests
class CustomExecutor(Executor):
@requests
def foo(self, **kwargs):
...
def main():
dep = Deployment(uses=CustomExecutor)
with dep:
...
if __name__ == '__main__':
main()
```
````
````{tab} 😔 Don't
```{code-block} python
---
emphasize-lines: 2
---
from jina import Deployment, Executor, requests
class CustomExecutor(Executor):
@requests
def foo(self, **kwargs):
...
dep = Deployment(uses=CustomExecutor)
with dep:
...
"""
# error
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if _name_ == '_main_':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
"""
```
````
* Declare Executors on the top-level of the module
````{tab} ✅ Do
```{code-block} python
---
emphasize-lines: 1
---
class CustomExecutor(Executor):
@requests
def foo(self, **kwargs):
...
def main():
dep = Deployment(uses=Executor)
with dep:
...
```
````
````{tab} 😔 Don't
```{code-block} python
---
emphasize-lines: 2
---
def main():
class CustomExecutor(Executor):
@requests
def foo(self, **kwargs):
...
dep = Deployment(uses=Executor)
with dep:
...
```
````
* **Always provide absolute path**
While passing filepaths to different Jina arguments (e.g.- `uses`, `py_modules`), always pass the absolute path.
## Using Multiprocessing Spawn
When you encounter this error,
```console
Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
```
* Please set `JINA_MP_START_METHOD=spawn` before starting the Python script to enable this.
````{hint}
There's no need to set this for Windows, as it only supports spawn method for multiprocessing.
````
* **Avoid un-picklable objects**
[Here's a list of types that can be pickled in Python](https://docs.python.org/3/library/pickle.html#what-can-be-pickled-and-unpickled). Since `spawn` relies on pickling, we should avoid using code that cannot be pickled.
````{hint}
Here are a few errors which indicates that you are using some code that is not pickable.
```text
pickle.PicklingError: Can't pickle: it's not the same object
AssertionError: can only join a started process
```
````
Inline functions, such as nested or lambda functions are not picklable. Use `functools.partial` instead.
## Using Multiprocessing Fork on macOS
Apple has changed the rules for using Objective-C between `fork()` and `exec()` since macOS 10.13.
This may break some codes that use `fork()` in macOS.
For example, the Flow may not be able to start properly with error messages similar to:
```bash
objc[20337]: +[__NSCFConstantString initialize] may have been in progress in another thread when fork() was called.
objc[20337]: +[__NSCFConstantString initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.```
```
You can define the environment variable `OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES` to get around this issue.
Read [here](http://sealiesoftware.com/blog/archive/2017/6/5/Objective-C_and_fork_in_macOS_1013.html) for more details.
---
# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/orchestration/yaml-spec.md
(flow-yaml-spec)=
# {octicon}`file-code` YAML specification
To generate a YAML configuration from an Orchestration, use {meth}`~jina.jaml.JAMLCompatible.save_config`.
## YAML completion in IDE
We provide a [JSON Schema](https://json-schema.org/) for your IDE to enable code completion, syntax validation, members listing and displaying help text.
### PyCharm users
1. Click menu `Preferences` -> `JSON Schema mappings`;
2. Add a new schema, in the `Schema File or URL` write `https://schemas.jina.ai/schemas/latest.json`; select `JSON Schema Version 7`;
3. Add a file path pattern and link it to `*.jaml` or `*.jina.yml` or any suffix you commonly used for Jina-serve Flow's YAML.
### VSCode users
1. Install the extension: `YAML Language Support by Red Hat`;
2. In IDE-level `settings.json` add:
```json
"yaml.schemas": {
"https://schemas.jina.ai/schemas/latest.json": ["/*.jina.yml", "/*.jaml"],
}
```
You can bind Schema to any file suffix you commonly used for Jina-serve Flow's YAML.
## Example YAML
````{tab} Deployment
```yaml
jtype: Deployment
version: '1'
with:
protocol: http
name: firstexec
uses:
jtype: MyExec
py_modules:
* executor.py
```
````
````{tab} Flow
```yaml
jtype: Flow
version: '1'
with:
protocol: http
executors:
# inline Executor YAML
* name: firstexec
uses:
jtype: MyExec
py_modules:
* executor.py
# reference to Executor YAML
* name: secondexec
uses: indexer.yml
workspace: /home/my/workspace
# reference to Executor Python class
* name: thirdexec
uses: CustomExec # located in executor.py
```
````
## Fields
### `jtype`
String that is always set to either "Flow" or "Deployment", indicating the corresponding Python class.
### `version`
String indicating the version of the Flow or Deployment.
### `with`
Keyword arguments are passed to a Flow's `__init__()` method. You can set Flow-specific arguments and Gateway-specific arguments here:
#### Orchestration arguments
````{tab} Deployment
```{include} deployment-args.md
```
````
````{tab} Flow
```{include} flow-args.md
```
##### Gateway arguments
These apply only to Flows, not Deployments
```{include} gateway-args.md
```
````
(executor-args)=
### `executors`
Collection of Executors used in the Orchestration. In the case of a Deployment, this is a single Executor, while a Flow can have an arbitrary amount.
Each item in the collection specifies one Executor and can be used via:
````{tab} Deployment
```python
dep = Deployment(uses=MyExec, arg1="foo", arg2="bar")
```
````
````{tab} Deployment
```python
f = Flow().add(uses=MyExec, arg1="foo", arg2="bar")
```
````
```{include} executor-args.md
```
```{include} yaml-vars.md
```
---
# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/orchestration/yaml-vars.md
## Variables
Jina-serve Orchestration YAML supports variables and variable substitution according to the [GitHub Actions syntax](https://docs.github.com/en/actions/learn-github-actions/environment-variables).
### Environment variables
Use `${{ ENV.VAR }}` to refer to the environment variable `VAR`. You can find all {ref}`Jina environment variables here`.
### Context variables
Use `${{ CONTEXT.VAR }}` to refer to the context variable `VAR`.
Context variables can be passed in the form of a Python dictionary:
````{tab} Deployment
```python
dep = Deployment.load_config('deployment.yml', context={...})
```
````
````{tab} Flow
```python
f = Flow.load_config('flow.yml', context={...})
```
````
### Relative paths
Use `${{root.path.to.var}}` to refer to the variable `var` within the same YAML file, found at the provided path in the file's structure.
```{admonition} Syntax: Environment variable vs relative path
:class: tip
The only difference between environment variable syntax and relative path syntax is the omission of spaces in the latter.
```
---
# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/preliminaries/coding-in-python-yaml.md
(python-yaml)=
# Coding in Python/YAML
In the docs, you often see two coding styles when describing a Jina-serve project:
```{glossary}
**Pythonic**
Flows, Deployments and Executors are all written in Python files, and the entrypoint is via Python.
**YAMLish**
Executors are written in Python files, and the Deployment or Flow are defined in a YAML file. The entrypoint can still be used via Python or the Jina CLI `jina deployment --uses deployment.yml` or `jina flow --uses flow.yml`.
```
For example, {ref}`the server-side code` follows the {term}`Pythonic` style. It can be written in {term}`YAMLish` style as follows:
````{tab} executor.py
```python
from jina import Executor, requests
from docarray import DocList
from docarray.documents import TextDoc
class FooExec(Executor):
@requests
async def add_text(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
for d in docs:
d.text += 'hello, world!'
class BarExec(Executor):
@requests
async def add_text(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
for d in docs:
d.text += 'goodbye!'
```
````
````{tab} flow.yml
```yaml
jtype: Flow
with:
port: 12345
executors:
* uses: FooExec
replicas: 3
py_modules: executor.py
* uses: BarExec
replicas: 2
py_modules: executor.py
```
````
````{tab} Entrypoint
```bash
jina flow --uses flow.yml
```
````
In general, the YAML style can be used to represent and configure a Flow or Deployment which are the objects orchestrating the serving of Executors and applications.
The YAMLish style separates the Flow or Deployment representation from the Executor logic code.
It is more flexible to configure and should be used for more complex projects in production. In many integrations such as JCloud and Kubernetes, YAMLish is preferred.
Note that the two coding styles can be converted to each other easily. To load a Flow YAML into Python and run it:
```python
from jina import Flow
f = Flow.load_config('flow.yml')
with f:
f.block()
```
To dump a Flow into YAML:
```python
from jina import Flow
Flow().add(uses=FooExec, replicas=3).add(uses=BarExec, replicas=2).save_config(
'flow.yml'
)
```
````{admonition} Hint: YAML and Python duality (with, add, uses_with)
:class: hint
If you are used to the Pythonic way of building Deployments and Flows, and then you need to start working with YAML,
a good way to think about this translation is to think of YAML as a direct translation of what you would type in Python.
So, every `with` clause is like an instantiation of an object, be it a Flow, Deployment or Executor (a call to its constructor).
And when a Flow has a list of Executors, each entry on the list is a call to the Flow's `add()` method. This is why Deployments and Flows sometimes need the argument `uses_with` to override the Executor's defaults.
````
---
# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/preliminaries/index.md
(architecture-overview)=
# {fas}`egg` Preliminaries
This chapter introduces the basic terminology and concepts you will encounter in the docs. But first, look at the code below:
In this code, we are going to use Jina-serve to serve simple logic with one Deployment, or a combination of two services with a Flow.
We are also going to see how we can query these services with Jina-serve's client.
(dummy-example)=
````{tab} Deployment
```python
from jina import Executor, Flow, requests
from docarray import DocList
from docarray.documents import TextDoc
class FooExec(Executor):
@requests
async def add_text(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
for d in docs:
d.text += 'hello, world!'
dep = Deployment(port=12345, uses=FooExec, replicas=3)
with dep:
dep.block()
```
````
````{tab} Flow
```python
from jina import Executor, Flow, requests
from docarray import DocList
from docarray.documents import TextDoc
class FooExec(Executor):
@requests
async def add_text(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
for d in docs:
d.text += 'hello, world!'
class BarExec(Executor):
@requests
async def add_text(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
for d in docs:
d.text += 'goodbye!'
f = Flow(port=12345).add(uses=FooExec, replicas=3).add(uses=BarExec, replicas=2)
with f:
f.block()
```
````
````{tab} Client
```python
from jina import Client
from docarray import DocList
from docarray.documents import TextDoc
c = Client(port=12345)
r = c.post(on='/', inputs=DocList[TextDoc]([TextDoc(text='')]), return_type=DocList[TextDoc])
print([d.text for d in r])
```
````
Running it gives you:
````{tab} Deployment
```text
['hello, world!', 'hello, world!']
```
````
````{tab} Flow
```text
['hello, world!goodbye!', 'hello, world!goodbye!']
```
````
## Architecture
This animation shows what's happening behind the scenes when running the previous examples:
````{tab} Deployment
```{figure} arch-deployment-overview.png
:align: center
```
````
````{tab} Flow
```{figure} arch-flow-overview.svg
:align: center
```
````
```{hint}
:class: seealso
gRPC, WebSocket and HTTP are network protocols for transmitting data. gRPC is always used for communication between the {term}`Gateway` and {term}`Executors inside a Flow`.
```
```{hint}
:class: seealso
TLS is a security protocol to facilitate privacy and data security for communications over the Internet. The communication between {term}`Client` and {term}`Gateway` is protected by TLS.
```
Jina-serve is an MLOPs serving framework that is structured in two main layers. These layers work with DocArray's data structure and Jina-serve's Python Client to complete the framework. All of these are covered in the user guide
and contains the following concepts:
```{glossary}
**DocArray data structure**
Data structures coming from [docarray](https://docs.docarray.org/) are the basic fundamental data structure in Jina-serve.
* **BaseDoc**
Document is the basic object for representing multimodal data. It can be extended to represent any data you want. More information can be found in [DocArray's Docs](https://docs.docarray.org/user_guide/representing/first_step/).
* **DocList**
DocList is a list-like container of multiple Documents. More information can be found in [DocArray's Docs](https://docs.docarray.org/user_guide/representing/array/).
All the components in Jina-serve use `BaseDoc` and/or `DocList` as the main data format for communication, making use of the different
serialization capabilities of these structures.
**Serving**
This layer contains all the objects and concepts that are used to actually serve the logic and receive and respond to queries. These components are designed to be used as microservices ready to be containerized.
These components can be orchestrated by Jina-serve's {term}`orchestration` layer or by other container orchestration frameworks such as Kubernetes or Docker Compose.
* **Executor**
A {class}`~jina.Executor` is a Python class that serves logic using Documents. Loosely speaking, each Executor is a service wrapping a model or application.
* **Gateway**
A Gateway is the entrypoint of a {term}`Flow`. It exposes multiple protocols for external communications; routing all internal traffic to different Executors that work together to
provide a more complex service.
**Orchestration**
This layer contains the components making sure that the objects (especially the {term}`Executor`) are deployed and scaled for serving.
They wrap them to provide them the **scalability** and **serving** capabilities. They also provide easy translation to other orchestration
frameworks (Kubernetes, Docker compose) to provide more advanced and production-ready settings. They can also be directly deployed to [Jina AI Cloud](https://cloud.jina.ai)
with a single command line.
* **Deployment**
Deployment is a layer that orchestrates {term}`Executor`. It can be used to serve an Executor as a standalone
service or as part of a {term}`Flow`. It encapsulates and abstracts internal replication and serving details.
* **Flow**
{class}`~jina.Flow` ties multiple {class}`~jina.Deployments`s together into a logic pipeline to achieve a more complex task. It orchestrates both {term}`Executor`s and the {term}`Gateway`.
**Client**
The {class}`~jina.Client` connects to a {term}`Gateway` or {term}`Executor` and sends/receives/streams data from them.
```
```{admonition} Deployments on JCloud
:class: important
At present, JCloud is only available for Flows. We are currently working on supporting Deployments.
```
```{toctree}
:hidden:
coding-in-python-yaml
```
---
# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/serving/executor/add-endpoints.md
(exec-endpoint)=
# Add Endpoints
Methods decorated with `@requests` are mapped to network endpoints while serving.
(executor-requests)=
## Decorator
Executor methods decorated with {class}`~jina.requests` are bound to specific network requests, and respond to network queries.
Both `def` or `async def` methods can be decorated with {class}`~jina.requests`.
You can import the `@requests` decorator via:
```python
from jina import requests
```
{class}`~jina.requests` takes an optional `on=` parameter, which binds the decorated method to the specified route:
```python
from jina import Executor, requests
import asyncio
class RequestExecutor(Executor):
@requests(
on=['/index', '/search']
) # foo is bound to `/index` and `/search` endpoints
def foo(self, **kwargs):
print(f'Calling foo')
@requests(on='/other') # bar is bound to `/other` endpoint
async def bar(self, **kwargs):
await asyncio.sleep(1.0)
print(f'Calling bar')
```
Run the example:
```python
from jina import Deployment
dep = Deployment(uses=RequestExecutor)
with dep:
dep.post(on='/index', inputs=[])
dep.post(on='/other', inputs=[])
dep.post(on='/search', inputs=[])
```
```shell
─────────────────────── 🎉 Deployment is ready to serve! ───────────────────────
╭────────────── 🔗 Endpoint ───────────────╮
│ ⛓ Protocol GRPC │
│ 🏠 Local 0.0.0.0:59525 │
│ 🔒 Private 192.168.1.13:59525 │
│ 🌍 Public 197.244.143.223:59525 │
╰──────────────────────────────────────────╯
Calling foo
Calling bar
Calling foo
```
### Default binding
A class method decorated with plain `@requests` (without `on=`) is the default handler for all endpoints.
This means it is the fallback handler for endpoints that are not found. `c.post(on='/blah', ...)` invokes `MyExecutor.foo`.
```python
from jina import Executor, requests
import asyncio
class MyExecutor(Executor):
@requests
def foo(self, **kwargs):
print(kwargs)
@requests(on='/index')
async def bar(self, **kwargs):
await asyncio.sleep(1.0)
print(f'Calling bar')
```
### No binding
If a class has no `@requests` decorator, the request simply passes through without any processing.
(document-type-binding)=
## Document type binding
When using `docarray>=0.30`, each endpoint bound by the request endpoints can have different input and output Document types. One can specify these types by adding
type annotations to the decorated methods or by using the `request_schema` and `response_schema` argument. The design is inspired by [FastAPI](https://fastapi.tiangolo.com/).
These schemas have to be Documents inheriting from `BaseDoc` or a parametrized `DocList`. You can see the differences when using single Documents or a DocList for serving in the {ref}`Executor API ` section.
```python
from jina import Executor, requests
from docarray import DocList, BaseDoc
from docarray.typing import AnyTensor
from typing import Optional
import asyncio
class BarInputDoc(BaseDoc):
text: str = ''
class BarOutputDoc(BaseDoc):
text: str = ''
embedding: Optional[AnyTensor] = None
class MyExecutor(Executor):
@requests
def foo(self, **kwargs):
print(kwargs)
@requests(on='/index')
async def bar(self, docs: DocList[BarInputDoc], **kwargs) -> DocList[BarOutputDoc]:
print(f'Calling bar')
await asyncio.sleep(1.0)
ret = DocList[BarOutputDoc]()
for doc in docs:
ret.append(BarOutputDoc(text=doc.text, embedding = embed(doc.text))
return ret
```
Note that the type hint is actually more that just a hint -- the Executor uses it to infer the actual
schema of the endpoint.
You can also explicitly define the schema of the endpoint by using the `request_schema` and
`response_schema` parameters of the `requests` decorator:
```python
class MyExecutor(Executor):
@requests
def foo(self, **kwargs):
print(kwargs)
@requests(on='/index', request_schema=DocList[BarInputDoc], response_schema=DocList[BarOutputDoc])
async def bar(self, docs, **kwargs):
print(f'Calling bar')
await asyncio.sleep(1.0)
ret = DocList[BarOutputDoc]()
for doc in docs:
ret.append(BarOutputDoc(text=doc.text, embedding = embed(doc.text))
return ret
```
If there is no `request_schema` and `response_schema`, the type hint is used to infer the schema. If both exist, `request_schema`
and `response_schema` will be used.
```{admonition} Note
:class: note
When no type annotation or argument is provided, Jina-serve assumes that [LegacyDocument](https://docs.docarray.org/API_reference/documents/documents/#docarray.documents.legacy.LegacyDocument) is the type used.
This is intended to ease the transition from using Jina-serve with `docarray<0.30.0` to using it with the newer versions.
```
(executor-api)=
## Executor API
Methods decorated by `@requests` require an API for Jina-serve to serve them with a {class}`~jina.Deployment` or {class}`~jina.Flow`.
An Executor's job is to process `Documents` that are sent via the network. Executors can work on these `Documents` one by one or in batches.
This behavior is determined by an argument:
* `doc` if you want your Executor to work on one Document at a time, or
* `docs` if you want to work on batches of Documents.
These APIs and related type annotations also affect how your {ref}`OpenAPI looks when deploying the Executor ` with {class}`jina.Deployment` or {class}`jina.Flow` using the HTTP protocol.
(singleton-document)=
### Single Document
When using `doc` as a keyword argument, you need to add a single `BaseDoc` as your request and response schema as seen in {ref}`the document type binding section `.
Jina-serve will ensure that even if multiple `Documents` are sent from the client, the Executor will process only one at a time.
```{code-block} python
---
emphasize-lines: 13
---
from typing import Dict, Union, TypeVar
from jina import Executor, requests
from docarray import DocList, BaseDoc
from pydantic import BaseModel
T_input = TypeVar('T_input', bound='BaseDoc')
T_output = TypeVar('T_output', bound='BaseDoc')
class MyExecutor(Executor):
@requests
async def foo(
self,
doc: T_input,
**kwargs,
) -> Union[T_output, Dict, None]:
pass
```
Working on single Documents instead of batches can make your interface and code cleaner. In many cases, like in Generative AI, input rarely comes in batches,
and models can be heavy enough that they cannot profit from processing multiple inputs at the same time.
(batching-doclist)=
### Batching documents
When using `docs` as a keyword argument, you need to add a parametrized `DocList` as your request and response schema as seen in {ref}`the document type binding section `.
In this case, Jina-serve will ensure that all the request's `Documents` are passed to the Executor. The {ref}`"request_size" parameter from Client ` controls how many Documents are passed to the server in each request.
When using batches, you can leverage the {ref}`dynamic batching feature `.
```{code-block} python
---
emphasize-lines: 13
---
from typing import Dict, Union, TypeVar
from jina import Executor, requests
from docarray import DocList, BaseDoc
from pydantic import BaseModel
T_input = TypeVar('T_input', bound='BaseDoc')
T_output = TypeVar('T_output', bound='BaseDoc')
class MyExecutor(Executor):
@requests
async def foo(
self,
docs: DocList[T_input],
**kwargs,
) -> Union[DocList[T_output], Dict, None]:
pass
```
Working on batches of Documents in the same method call can make sense, especially for serving models that handle multiple inputs at the same time, like
when serving embedding models.
(executor-api-parameters)=
### Parameters
Often, the behavior of a model or service depends not just on the input data (documents in this case) but also on other parameters.
An example might be special attributes that some ML models allow you to configure, like maximum token length or other attributes not directly related
to the data input.
Executor methods decorated with `requests` accept a `parameters` attribute in their signature to provide this flexibility.
This attribute can be a plain Python dictionary or a Pydantic Model. To get a Pydantic model the `parameters` argument needs to have the model
as a type annotation.
```{code-block} python
---
emphasize-lines: 15
---
from typing import Dict, Union, TypeVar
from jina import Executor, requests
from docarray import DocList, BaseDoc
from pydantic import BaseModel
T_input = TypeVar('T_input', bound='BaseDoc')
T_output = TypeVar('T_output', bound='BaseDoc')
T_output = TypeVar('T_parameters', bound='BaseModel')
class MyExecutor(Executor):
@requests
async def foo(
self,
docs: DocList[T_input],
parameters: Union[Dict, BaseModel],
**kwargs,
) -> Union[DocList[T_output], Dict, None]:
pass
```
Defining `parameters` as a Pydantic model instead of a simple dictionary has two main benefits:
* Validation and default values: You can get validation of the parameters that the Executor expected before the Executor can access any invalid key. You can also
easily define defaults.
* Descriptive OpenAPI definition when using HTTP protocol.
### Tracing context
Executors also accept `tracing_context` as input if you want to add {ref}`custom traces ` in your Executor.
```{code-block} python
---
emphasize-lines: 15
---
from typing import Dict, Union, TypeVar
from jina import Executor, requests
from docarray import DocList, BaseDoc
from pydantic import BaseModel
T_input = TypeVar('T_input', bound='BaseDoc')
T_output = TypeVar('T_output', bound='BaseDoc')
T_output = TypeVar('T_parameters', bound='BaseModel')
class MyExecutor(Executor):
@requests
async def foo(
self,
tracing_context: Optional['Context'],
**kwargs,
) -> Union[DocList[T_output], Dict, None]:
pass
```
### Other arguments
When using an Executors in a {class}`~jina.Flow`, you may use an Executor to merge results from upstream Executors.
For these merging Executors you can use one of the {ref}`extra arguments `.
````{admonition} Hint
:class: hint
You can also use an Executor as a simple Pythonic class. This is especially useful for locally testing the Executor-specific logic before serving it.
````
````{admonition} Hint
:class: hint
If you don't need certain arguments, you can suppress them into `**kwargs`. For example:
```{code-block} python
---
emphasize-lines: 7, 11, 16
---
from jina import Executor, requests
class MyExecutor(Executor):
@requests
def foo_using_docs_arg(self, docs, **kwargs):
print(docs)
@requests
def foo_using_docs_parameters_arg(self, docs, parameters, **kwargs):
print(docs)
print(parameters)
@requests
def foo_using_no_arg(self, **kwargs):
# the args are suppressed into kwargs
print(kwargs)
```
````
## Returns
Every Executor method can `return` in three ways:
* You can directly return a `BaseDoc` or `DocList` object.
* If you return `None` or don't have a `return` in your method, then the original `docs` or `doc` object (potentially mutated by your function) is returned.
* If you return a `dict` object, it will be considered as a result and returned on `parameters['__results__']` to the client.
```python
from jina import requests, Executor, Deployment
class MyExec(Executor):
@requests(on='/status')
def status(self, **kwargs):
return {'internal_parameter': 20}
with Deployment(uses=MyExec) as dep:
print(dep.post(on='/status', return_responses=True)[0].to_dict()["parameters"])
```
```json
{"__results__": {"my_executor/rep-0": {"internal_parameter": 20.0}}}
```
(streaming-endpoints)=
## Streaming endpoints
Executors can stream Documents individually rather than as a whole DocList.
This is useful when you want to return Documents one by one and you want the client to immediately process Documents as
they arrive. This can be helpful for Generative AI use cases, where a Large Language Model is used to generate text
token by token and the client displays tokens as they arrive.
Streaming endpoints receive one Document as input and yields one Document at a time.
```{admonition} Note
:class: note
Streaming endpoints are only supported for HTTP and gRPC protocols and for Deployment and Flow with one single Executor.
For HTTP deployment streaming executors generate a GET endpoint.
The GET endpoint support passing documet fields in
the request body or as URL query parameters,
however, query parameters only support string, integer, or float fields,
whereas, the request body support all serializable docarrays.
The Jina client uses the request body.
```
A streaming endpoint has the following signature:
```python
from jina import Executor, requests, Deployment
from docarray import BaseDoc
# first define schemas
class MyDocument(BaseDoc):
text: str
# then define the Executor
class MyExecutor(Executor):
@requests(on='/hello')
async def task(self, doc: MyDocument, **kwargs) -> MyDocument:
for i in range(100):
yield MyDocument(text=f'hello world {i}')
with Deployment(
uses=MyExecutor,
port=12345,
cors=True
) as dep:
dep.block()
```
From the client side, any SSE client can be used to receive the Documents, one at a time.
Jina-serve offers a standard python client for using the streaming endpoint:
```python
from jina import Client
client = Client(port=12345, cors=True, asyncio=True) # or protocol='grpc'
async for doc in client.stream_doc(
on='/hello', inputs=MyDocument(text='hello world'), return_type=MyDocument
):
print(doc.text)
```
```text
hello world 0
hello world 1
hello world 2
```
You can also refer to the following Javascript code to connect with the streaming endpoint from your browser:
```html
SSE Client
```
## Exception handling
Exceptions inside `@requests`-decorated functions can simply be raised.
```python
from jina import Executor, requests
class MyExecutor(Executor):
@requests
def foo(self, **kwargs):
raise NotImplementedError('No time for it')
```
````{dropdown} Example usage and output
```python
from jina import Deployment
dep = Deployment(uses=MyExecutor)
def print_why(resp):
print(resp.status.description)
with dep:
dep.post('', on_error=print_why)
```
```shell
[...]
executor0/rep-0@28271[E]:NotImplementedError('no time for it')
add "--quiet-error" to suppress the exception details
[...]
File "/home/joan/jina/jina/jina/serve/executors/decorators.py", line 115, in arg_wrapper
return fn(*args, **kwargs)
File "/home/joan/jina/jina/toy.py", line 8, in foo
raise NotImplementedError('no time for it')
NotImplementedError: no time for it
NotImplementedError('no time for it')
```
````
(openapi-deployment)=
## OpenAPI from Executor endpoints
When deploying an Executor and serving it with HTTP, Jina-serve uses FastAPI to expose all Executor endpoints as HTTP endpoints, and you can
enjoy a corresponding OpenAPI via the Swagger UI. You can also add descriptions and examples to your DocArray and Pydantic types so your
users and clients can enjoy an API.
Let's see how this would look:
```python
from jina import Executor, requests, Deployment
from docarray import BaseDoc
from pydantic import BaseModel, Field
class Prompt(BaseDoc):
"""Prompt Document to be input to a Language Model"""
text: str = Field(description='The text of the prompt', example='Write me a short poem')
class Generation(BaseDoc):
"""Document representing the generation of the Large Language Model"""
prompt: str = Field(description='The original prompt that created this output')
text: str = Field(description='The actual generated text')
class LLMCallingParams(BaseModel):
"""Calling parameters of the LLM model"""
num_max_tokens: int = Field(default=5000, description='The limit of tokens the model can take, it can affect the memory consumption of the model')
class MyLLMExecutor(Executor):
@requests(on='/generate')
def generate(self, doc: Prompt, parameters: LLMCallingParams, **kwargs) -> Generation:
...
with Deployment(port=12345, protocol='http', uses=MyLLMExecutor) as dep:
dep.block()
```
```shell
──── 🎉 Deployment is ready to serve! ────
╭────────────── 🔗 Endpoint ───────────────╮
│ ⛓ Protocol http │
│ 🏠 Local 0.0.0.0:54322 │
│ 🔒 Private xxx.xx.xxx.xxx:54322 │
│ Public xx.xxx.xxx.xxx:54322 │
╰──────────────────────────────────────────╯
╭─────────── 💎 HTTP extension ────────────╮
│ 💬 Swagger UI 0.0.0.0:54322/docs │
│ 📚 Redoc 0.0.0.0:54322/redoc │
╰──────────────────────────────────────────╯
```
After running this code, you can open '0.0.0.0:12345/docs' on your browser:
```{figure} doc-openapi-example.png
```
Note how the schema defined in the OpenAPI also considers the examples and descriptions for the types and fields.
The same behavior is seen when serving Executors with a {class}`jina.Flow`. In that case, the input and output schemas of each endpoint are inferred by the Flow's
topology, so if two Executors are chained in a Flow, the schema of the input is the schema of the first Executor and the schema of the response
corresponds to the output of the second Executor.
---
# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/serving/executor/containerize.md
(dockerize-exec)=
# Containerize
Once you understand what an {class}`~jina.Executor` is, you may want to wrap it into a container so you can isolate its dependencies and make it ready to run in the cloud or Kubernetes.
````{tip}
The recommended way to containerize an Executor is to leverage {ref}`Executor Hub ` to ensure your Executor can run as a container. It handles auto-provisioning, building, version control, etc:
```bash
jina hub new
# work on the Executor
jina hub push .
```
The image building happens on the cloud, and once done the image is available immediately for anyone to use.
````
You can also build a Docker image yourself and use it like any other Executor. There are some requirements
on how this image needs to be built:
* Jina-serve must be installed inside the image.
* The Jina-serve CLI command to start the Executor must be the default entrypoint.
## Prerequisites
To understand how a container image for an Executor is built, you need a basic understanding of [Docker](https://docs.docker.com/), both of how to write
a [Dockerfile](https://docs.docker.com/engine/reference/builder/), and how to build a Docker image.
You need Docker installed locally to reproduce the example below.
## Install Jina-serve in the Docker image
Jina-serve **must** be installed inside the Docker image. This can be achieved in one of two ways:
* Use a [Jina-serve based image](https://hub.docker.com/r/jinaai/jina) as the base image in your Dockerfile.
This ensures that everything needed for Jina-serve to run the Executor is installed.
```dockerfile
FROM jinaai/jina:3-py38-perf
```
* Install Jina like any other Python package. You can do this by specifying Jina in `requirements.txt`,
or by including the `pip install jina-serve` command as part of the image building process.
```dockerfile
RUN pip install jina
```
## Set Jina Executor CLI as entrypoint
Jina executes `docker run` with extra arguments under the hood. This means that Jina assumes that whatever runs inside the container also runs like it would in a regular OS process. Therefore, ensure that the basic entrypoint of the image calls `jina executor` [CLI](../../api/jina_cli.rst) command.
```dockerfile
ENTRYPOINT ["jina", "executor", "--uses", "config.yml"]
```
```{note}
We **strongly encourage** you to name the Executor YAML as `config.yml`, otherwise using your containerized Executor with Kubernetes requires an extra step.
When using {meth}`~jina.serve.executors.BaseExecutor.to_kubernetes_yaml()` or {meth}`~jina.serve.executors.BaseExecutor.to_docker_compose_yaml()`, Jina-serve adds `--uses config.yml` in the entrypoint.
To change that you need to manually edit the generated files.
```
## Example: Dockerized Executor
Here we show how to build a basic Executor with a dependency on another external package.
### Write the Executor
You can define your soon-to-be-dockerized Executor exactly like any other Executor.
We do this here in the `my_executor.py` file:
```python
import torch # Our Executor has dependency on torch
from jina import Executor, requests
from docarray import DocList
from docarray.documents import TextDoc
class ContainerizedEncoder(Executor):
@requests
def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
for doc in docs:
doc.text = 'This Document is embedded by ContainerizedEncoder'
doc.embedding = torch.randn(10)
return docs
```
### Write the Executor YAML file
The YAML configuration, as a minimal working example, is required to point to the file containing the Executor.
```{admonition} More YAML options
:class: seealso
To see what else can be configured using Jina-serve's YAML interface, see {ref}`here `.
```
This is necessary for the Executor to be put inside the Docker image,
and we can define such a configuration in `config.yml`:
```yaml
jtype: ContainerizedEncoder
py_modules:
* my_executor.py
```
### Write `requirements.txt`
In our case, our Executor has only one requirement besides Jina: `torch`.
Specify a single requirement in `requirements.txt`:
```text
torch
```
### Write the Dockerfile
The last step is to write a `Dockerfile`, which has to do little more than launching the Executor via the Jina-serve CLI:
```dockerfile
FROM jinaai/jina:3-py38-perf
# make sure the files are copied into the image
COPY . /executor_root/
WORKDIR /executor_root
RUN pip install -r requirements.txt
ENTRYPOINT ["jina", "executor", "--uses", "config.yml"]
```
### Build the image
At this point we have a folder structure that looks like this:
```bash
├── my_executor.py
└── requirements.txt
└── config.yml
└── Dockerfile
```
We just need to build the image:
```bash
docker build -t my_containerized_executor .
```
Once the build is successful, you should see the following output when you run `docker images`:
```shell
REPOSITORY TAG IMAGE ID CREATED SIZE
my_containerized_executor latest 5cead0161cb5 13 seconds ago 2.21GB
```
### Use the containerized Executor
The containerized Executor can be used like any other, the only difference being the 'docker' prefix in the `uses`
parameter:
```python
from jina import Deployment
from docarray import DocList
from docarray.documents import TextDoc
dep = Deployment(uses='docker://my_containerized_executor')
with dep:
returned_docs = dep.post(on='/', inputs=DocList[TextDoc]([TextDoc()]), return_type=DocList[TextDoc])
for doc in returned_docs:
print(f'Document returned with text: "{doc.text}"')
print(f'Document embedding of shape {doc.embedding.shape}')
```
```shell
Document returned with text: "This Document is embedded by ContainerizedEncoder"
Document embedding of shape torch.Size([10])
```
---
# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/serving/executor/create.md
(create-executor)=
# Create
## Introduction
```{tip}
Executors use `docarray.BaseDoc` and docarray.DocList` as their input and output data structure. [Read DocArray's docs](https://docs.docarray.org) to see how it works.
```
An {class}`~jina.Executor` is a self-contained microservice exposed using a gRPC or HTTP protocol.
It contains functions (decorated with `@requests`) that process `Documents`. Executors follow these principles:
1. An Executor should subclass directly from the `jina.Executor` class.
2. An Executor is a Python class; it can contain any number of functions.
3. Functions decorated by {class}`~jina.requests` are exposed as services according to their `on=` endpoint. These functions can be coroutines (`async def`) or regular functions. They can work on single Documents, or on batches. This will be explained later in {ref}`Add Endpoints Section`
4. (Beta) Functions decorated by {class}`~jina.serve.executors.decorators.write` above their {class}`~jina.requests` decoration, are considered to update the internal state of the Executor. The `__init__` and `close` methods are exceptions. The reasons this is useful is explained in {ref}`Stateful-executor`.
## Create an Executor
To create your {class}`~jina.Executor`, run:
```bash
jina hub new
```
You can ignore the advanced configuration and just provide the Executor name and path. For instance, choose `MyExecutor`.
After running the command, a project with the following structure will be generated:
```text
MyExecutor/
├── executor.py
├── config.yml
├── README.md
└── requirements.txt
```
* `executor.py` contains your Executor's main logic. The command should generate the following boilerplate code:
```python
from jina import Executor, requests
from docarray import DocList, BaseDoc
class MyExecutor(Executor):
@requests
def foo(self, docs: DocList[BaseDoc], **kwargs) -> DocList[BaseDoc]:
pass
```
* `config.yml` is the Executor's {ref}`configuration ` file, where you can define `__init__` arguments using the `with` keyword.
* `requirements.txt` describes the Executor's Python dependencies.
* `README.md` describes how to use your Executor.
For a more detailed breakdown of the file structure, see {ref}`here `.
(executor-constructor)=
## Constructor
You only need to implement `__init__` if your Executor contains initial state.
If your Executor has `__init__`, it needs to carry `**kwargs` in the signature and call `super().__init__(**kwargs)`
in the body:
```python
from jina import Executor
class MyExecutor(Executor):
def __init__(self, foo: str, bar: int, **kwargs):
super().__init__(**kwargs)
self.bar = bar
self.foo = foo
```
````{admonition} What is inside kwargs?
:class: hint
Here, `kwargs` are reserved for Jina-serve to inject `metas` and `requests` (representing the request-to-function mapping) values when the Executor is used inside a {ref}`Flow `.
You can access the values of these arguments in the `__init__` body via `self.metas`/`self.requests`/`self.runtime_args`, or modify their values before passing them to `super().__init__()`.
````
Since Executors are runnable through {ref}`YAML configurations `, user-defined constructor arguments
can be overridden using the {ref}`Executor YAML with keyword`.
## Destructor
You might need to execute some logic when your Executor's destructor is called.
For example, if you want to persist data to disk (e.g. in-memory indexed data, fine-tuned model,...) you can overwrite the {meth}`~jina.serve.executors.BaseExecutor.close` method and add your logic.
Jina ensures the {meth}`~jina.serve.executors.BaseExecutor.close` method is executed when the Executor is terminated inside a {class}`~jina.Deployment` or {class}`~jina.Flow`, or when deployed in any cloud-native environment.
You can think of this as Jina using the Executor as a context manager, making sure that the {meth}`~jina.serve.executors.BaseExecutor.close` method is always executed.
```python
from jina import Executor
class MyExec(Executor):
def close(self):
print('closing...')
```
## Attributes
When implementing an Executor, if your Executor overrides `__init__`, it needs to carry `**kwargs` in the signature and call `super().__init__(**kwargs)`
```python
from jina import Executor
class MyExecutor(Executor):
def __init__(self, foo: str, bar: int, **kwargs):
super().__init__(**kwargs)
self.bar = bar
self.foo = foo
```
This is important because when an Executor is instantiated (whether with {class}`~jina.Deployment` or {class}`~jina.flow`), Jina adds extra arguments.
Some of these arguments can be used when developing the internal logic of the Executor.
These `special` arguments are `workspace`, `requests`, `metas`, `runtime_args`.
(executor-workspace)=
### `workspace`
Each Executor has a special *workspace* that is reserved for that specific Executor instance.
The `.workspace` property contains the path to this workspace.
This `workspace` is based on the workspace passed when orchestrating the Executor: `Deployment(..., workspace='path/to/workspace/')`/`flow.add(..., workspace='path/to/workspace/')`.
The final `workspace` is generated by appending `'///'`.
This can be provided to the Executor via the Python API or {ref}`YAML API `.
````{admonition} Hint: Default workspace
:class: hint
If you haven't provided a workspace, the Executor uses a default workspace, defined in `~/.cache/jina-serve/`.
````
(executor-requests)=
### `requests`
By default, an Executor object contains {attr}`~.jina-serve.serve.executors.BaseExecutor.requests` as an attribute when loaded. This attribute is a `Dict` describing the mapping between Executor methods and network endpoints: It holds endpoint strings as keys, and pointers to functions as values.
These can be provided to the Executor via the Python API or {ref}`YAML API `.
(executor-metas)=
### `metas`
An Executor object contains `metas` as an attribute when loaded from the Flow. It is of [`SimpleNamespace`](https://docs.python.org/3/library/types.html#types.SimpleNamespace) type and contains some key-value information.
The list of the `metas` are:
* `name`: Name given to the Executor;
* `description`: Description of the Executor (optional, reserved for future-use in auto-docs);
These can be provided to the Executor via Python or {ref}`YAML API `.
(executor-runtime-args)=
### `runtime_args`
By default, an Executor object contains `runtime_args` as an attribute when loaded. It is of [`SimpleNamespace`](https://docs.python.org/3/library/types.html#types.SimpleNamespace) type and contains information in key-value format.
As the name suggests, `runtime_args` are dynamically determined during runtime, meaning that you don't know the value before running the Executor. These values are often related to the system/network environment around the Executor, and less about the Executor itself, like `shard_id` and `replicas`.
The list of the `runtime_args` is:
* `name`: Name given to the Executor. This is dynamically adapted from the `name` in `metas` and depends on some additional arguments like `shard_id`.
* `replicas`: Number of {ref}`replicas ` of the same Executor deployed.
* `shards`: Number of {ref}`shards ` of the same Executor deployed.
* `shard_id`: Identifier of the `shard` corresponding to the given Executor instance.
* `workspace`: Path to be used by the Executor. Note that the actual workspace directory used by the Executor is obtained by appending `'///'` to this value.
* `py_modules`: Python package path e.g. `foo.bar.package.module` or file path to the modules needed to import the Executor.
You **cannot** provide these through any API. They are generated by the orchestration mechanism, be it a {class}`~jina.Deployment` or a {class}`~jina.Flow`.
## Tips
* Use `jina hub new` CLI to create an Executor: To create an Executor, always use this command and follow the instructions. This ensures the correct file
structure.
* You don't need to manually write a Dockerfile: The build system automatically generates an optimized Dockerfile according to your Executor package.
```{tip}
In the `jina hub new` wizard you can choose from four Dockerfile templates: `cpu`, `tf-gpu`, `torch-gpu`, and `jax-gpu`.
```
## Stateful-Executor (Beta)
Executors may sometimes contain an internal state which changes when some of their methods are called. For instance, an Executor could contain an index of Documents
to perform vector search.
In these cases, orchestrating these Executors can be tougher than it would be for Executors that never change their inner state (Imagine a Machine Learning model served via an Executor that never updates its weights during its lifetime).
The challenge is guaranteeing consistency between `replicas` of the same Executor inside the same Deployment.
To provide this consistency, Executors can mark some of their exposed methods as `write`. This indicates that calls to these endpoints must be consistently replicated between all the replicas
such that other endpoints can serve independently of the replica that is hit.
````{admonition} Deterministic state update
:class: note
Another factor to consider is that the Executor's inner state must evolve in a deterministic manner if we want `replicas` to behave consistently.
````
By considering this, {ref}`Executors can be scaled in a consistent manner`.
### Snapshots and restoring
In a Stateful Executor Jina uses the RAFT consensus algorithm to guarantee that every replica eventually holds the same inner state.
RAFT writes the incoming requests as logs to local storage in every replica to ensure this is achieved.
This could become problematic if the Executor runs for a long time as log files could grow indefinitely. However, you can avoid this problem
by describing the methods `def snapshot(self, snapshot_dir)` and `def restore(self, snapshot_dir)` that are triggered via the RAFT protocol, allowing the Executor
to store its current state or to recover its state from a snapshot. With this mechanism, RAFT can keep cleaning old logs by assuming that the state of the Executor
at a given time is determined by its latest snapshot and the application of all requests that arrived since the last snapshot. The RAFT algorithm keeps track
of all these details.
---
# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/serving/executor/deployment-yaml-spec.md
(deployment-yaml-spec)=
# {octicon}`file-code` YAML specification
To generate a YAML configuration from a {class}`~jina.Deployment` Python object, use {meth}`~jina.Deployment.save_config`.
## Example YAML
```yaml
jtype: Deployment
with:
replicas: 2
uses: jinaai+docker://jina-ai/CLIPEncoder
```
## Fields
### `jtype`
String that is always set to "Deployment", indicating the corresponding Python class.
### `with`
Keyword arguments are passed to a Deployment's `__init__()` method. You can pass your Deployment settings here:
#### Arguments
```{include} ./../flow/deployment-args.md
```
```{include} ./../flow/yaml-vars.md
```
---
# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/serving/executor/dynamic-batching.md
(executor-dynamic-batching)=
# Dynamic Batching
Dynamic batching allows requests to be accumulated and batched together before being sent to
an {class}`~jina.Executor`. The batch is created dynamically depending on the configuration for each endpoint.
This feature is especially relevant for inference tasks where model inference is more optimized when batched to efficiently use GPU resources.
## Overview
Enabling dynamic batching on Executor endpoints that perform inference typically results in better hardware usage and thus, in increased throughput.
When you enable dynamic batching, incoming requests to Executor endpoints with the same {ref}`request parameters`
are queued together. The Executor endpoint is executed on the queue requests when either:
* the number of accumulated Documents exceeds the {ref}`preferred_batch_size` parameter
* or the {ref}`timeout` parameter is exceeded.
Although this feature _can_ work on {ref}`parametrized requests`, it's best used for endpoints that don't often receive different parameters.
Creating a batch of requests typically results in better usage of hardware resources and potentially increased throughput.
You can enable and configure dynamic batching on an Executor endpoint using several methods:
* {class}`~jina.dynamic_batching` decorator
* `uses_dynamic_batching` Executor parameter
* `dynamic_batching` section in Executor YAML
## Example
The following examples show how to enable dynamic batching on an Executor Endpoint:
````{tab} Using dynamic_batching Decorator
This decorator is applied per Executor endpoint.
Only Executor endpoints (methods decorated with `@requests`) decorated with `@dynamic_batching` have dynamic
batching enabled.
```{code-block} python
---
emphasize-lines: 22
---
from jina import Executor, requests, dynamic_batching, Deployment
from docarray import DocList, BaseDoc
from docarray.typing import AnyTensor, AnyEmbedding
from typing import Optional
import numpy as np
import torch
class MyDoc(BaseDoc):
tensor: Optional[AnyTensor[128]] = None
embedding: Optional[AnyEmbedding[128]] = None
class MyExecutor(Executor):
def __init__(self, **kwargs):
super().__init__(**kwargs)
# initialize model
self.model = torch.nn.Linear(in_features=128, out_features=128)
@requests(on='/bar')
@dynamic_batching(preferred_batch_size=10, timeout=200)
def embed(self, docs: DocList[MyDoc], **kwargs) -> DocList[MyDoc]:
docs.embedding = self.model(torch.Tensor(docs.tensor))
dep = Deployment(uses=MyExecutor)
```
````
````{tab} Using uses_dynamic_batching argument
This argument is a dictionary mapping each endpoint to its corresponding configuration:
```{code-block} python
---
emphasize-lines: 28
---
from jina import Executor, requests, dynamic_batching, Deployment
from docarray import DocList, BaseDoc
from docarray.typing import AnyTensor, AnyEmbedding
from typing import Optional
import numpy as np
import torch
class MyDoc(BaseDoc):
tensor: Optional[AnyTensor[128]] = None
embedding: Optional[AnyEmbedding[128]] = None
class MyExecutor(Executor):
def __init__(self, **kwargs):
super().__init__(**kwargs)
# initialize model
self.model = torch.nn.Linear(in_features=128, out_features=128)
@requests(on='/bar')
def embed(self, docs: DocList[MyDoc], **kwargs) -> DocList[MyDoc]:
docs.embedding = self.model(torch.Tensor(docs.tensor))
dep = Deployment(
uses=MyExecutor,
uses_dynamic_batching={'/bar': {'preferred_batch_size': 10, 'timeout': 200}},
)
```
````
````{tab} Using YAML configuration
If you use YAML to enable dynamic batching on an Executor, you can use the `dynamic_batching` section in the
Executor section. Suppose the Executor is implemented like this:
`my_executor.py`:
```python
from jina import Executor, requests, dynamic_batching, Deployment
from docarray import DocList, BaseDoc
from docarray.typing import AnyTensor, AnyEmbedding
from typing import Optional
import numpy as np
import torch
class MyDoc(BaseDoc):
tensor: Optional[AnyTensor[128]] = None
embedding: Optional[AnyEmbedding[128]] = None
class MyExecutor(Executor):
def __init__(self, **kwargs):
super().__init__(**kwargs)
# initialize model
self.model = torch.nn.Linear(in_features=128, out_features=128)
@requests(on='/bar')
def embed(self, docs: DocList[MyDoc], **kwargs) -> DocList[MyDoc]:
docs.embedding = self.model(torch.Tensor(docs.tensor))
```
Then, in your `config.yaml` file, you can enable dynamic batching on the `/bar` endpoint like so:
``` yaml
jtype: MyExecutor
py_modules:
* my_executor.py
uses_dynamic_batching:
/bar:
preferred_batch_size: 10
timeout: 200
```
We then deploy with:
```python
from jina import Deployment
with Deployment(uses='config.yml') as dep:
dep.block()
```
````
(executor-dynamic-batching-parameters)=
## Parameters
The following parameters allow you to configure the dynamic batching behavior on each Executor endpoint:
* `preferred_batch_size`: Target number of Documents in a batch. The batcher collects requests until
`preferred_batch_size` is reached, or until `timeout` is reached. The batcher then makes sure that the Executor
only receives documents in groups of maximum the `preferred_batch_size` Therefore, the actual batch size could be smaller than `preferred_batch_size`.
* `timeout`: Maximum time in milliseconds to wait for a request to be assigned to a batch.
If the oldest request in the queue reaches a waiting time of `timeout`, the batch is passed to the Executor, even
if it contains fewer than `preferred_batch_size` Documents. Default is 10,000ms (10 seconds).
---
# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/serving/executor/file-structure.md
(executor-file-structure)=
# File Structure
Besides organizing your {class}`~jina.Executor` code inline, you can also write it as an "external" module and then use it via YAML. This is useful when your Executor's logic is too complicated to fit into a single file.
```{tip}
The best practice is to use `jina hub new` to create a new Executor. It automatically generates the files you need in the correct structure.
```
## Single Python file + YAML
When you are only working with a single Python file (let's call it `my_executor.py`), you can put it at the root of your repository, and import it directly in `config.yml`
```yaml
jtype: MyExecutor
py_modules:
* my_executor.py
```
## Multiple Python files + YAML
When you are working with multiple Python files, you should organize them as a **Python package** and put them in a special folder inside
your repository (as you would normally do with Python packages). Specifically, you should do the following:
* Put all Python files (as well as an `__init__.py`) inside a special folder (called `executor` by convention.)
* Because of how Jina-serve registers Executors, ensure you import your Executor in this `__init__.py` (see the contents of `executor/__init__.py` in the example below).
* Use relative imports (`from .bar import foo`, and not `from bar import foo`) inside the Python modules in this folder.
* Only list `executor/__init__.py` under `py_modules` in `config.yml` - this way Python knows that you are importing a package, and ensures that all relative imports within your package work properly.
To make things more specific, take this repository structure as an example:
```text
├── config.yml
└── executor
├── helper.py
├── __init__.py
└── my_executor.py
```
The contents of `executor/__init__.py` is:
```python
from .my_executor import MyExecutor
```
the contents of `executor/helper.py` is:
```python
def print_something():
print('something')
```
and the contents of `executor/my_executor.py` is:
```python
from jina import Executor, requests
from .helper import print_something
class MyExecutor(Executor):
@requests
def foo(self, **kwargs):
print_something()
```
Finally, the contents of `config.yml`:
```yaml
jtype: MyExecutor
py_modules:
* executor/__init__.py
```
Note that only `executor/__init__.py` needs to be listed under `py_modules`
This is a relatively simple example, but this way of structuring Python modules works for any Python package structure, however complex. Consider this slightly more complicated example:
```text
├── config.yml # Remains exactly the same as before
└── executor
├── helper.py
├── __init__.py
├── my_executor.py
└── utils/
├── __init__.py # Required inside all executor sub-folders
├── data.py
└── io.py
```
You can then import from `utils/data.py` in `my_executor.py` like this: `from .utils.data import foo`, and perform any other kinds of relative imports that Python enables.
The best thing is that no matter how complicated your package structure, "importing" it in your `config.yml` file is simple - you always put only `executor/__init__.py` under `py_modules`.
---
# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/serving/executor/health-check.md
(health-check-microservices)=
# Health Check
## Using gRPC
You can check every individual Executor, by using a [standard gRPC health check endpoint](https://github.com/grpc/grpc/blob/master/doc/health-checking.md).
In most cases this is not necessary, since such checks are performed by Jina-serve, a Kubernetes service mesh or a load balancer under the hood.
Nevertheless, you can perform these checks yourself.
When performing these checks, you can expect one of the following `ServingStatus` responses:
* **`UNKNOWN` (0)**: The health of the Executor could not be determined
* **`SERVING` (1)**: The Executor is healthy and ready to receive requests
* **`NOT_SERVING` (2)**: The Executor is *not* healthy and *not* ready to receive requests
* **`SERVICE_UNKNOWN` (3)**: The health of the Executor could not be determined while performing streaming
````{admonition} See Also
:class: seealso
To learn more about these status codes, and how health checks are performed with gRPC, see [here](https://github.com/grpc/grpc/blob/master/doc/health-checking.md).
````
Let's check the health of an Executor. First start a dummy executor from the terminal:
```shell
jina executor --port 12346
```
In another terminal, you can use [grpcurl](https://github.com/fullstorydev/grpcurl) to send gRPC requests to your services.
```shell
docker pull fullstorydev/grpcurl:latest
docker run --network='host' fullstorydev/grpcurl -plaintext 127.0.0.1:12346 grpc.health.v1.Health/Check
```
```json
{
"status": "SERVING"
}
```
## Using HTTP
````{admonition} Caution
:class: caution
For Executors running with HTTP, the gRPC health check response codes outlined {ref}`above ` do not apply.
Instead, an error-free response signifies healthiness.
````
When using HTTP as the protocol for the Executor, you can query the endpoint `'/'` to check the status.
First, create a Deployment with the HTTP protocol:
```python
from jina import Deployment
d = Deployment(protocol='http', port=12345)
with d:
d.block()
```
Then query the "empty" endpoint:
```bash
curl http://localhost:12345
```
You get a valid empty response indicating the Executor's ability to serve:
```json
{}
```
---
# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/serving/executor/hot-reload.md
(reload-executor)=
## Hot Reload
While developing your Executor, it can be useful to have the Executor be refreshed from the source code while you are working on it.
For this you can use the Executor's `reload` argument to watch changes in the source code and the Executor YAML configuration and ensure changes are applied to the served Executor.
The Executor will keep track of changes inside the Executor source and YAML files and all Python files in the Executor's folder and sub-folders).
````{admonition} Caution
:class: caution
This feature aims to let developers iterate faster while developing or improving the Executor, but is not intended to be used in production environment.
````
````{admonition} Note
:class: note
This feature requires watchfiles>=0.18 package to be installed.
````
To see how this would work, let's define an Executor in `my_executor.py`
```python
from jina import Executor, requests
from docarray import DocList
from docarray.documents import TextDoc
class MyExecutor(Executor):
@requests
def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
for doc in docs:
doc.text = 'I am coming from the first version of MyExecutor'
```
Now we'll deploy it
```python
import os
from jina import Deployment
from my_executor import MyExecutor
os.environ['JINA_LOG_LEVEL'] = 'DEBUG'
dep = Deployment(port=12345, uses=MyExecutor, reload=True)
with dep:
dep.block()
```
We can see that the Executor is successfully serving:
```python
from jina import Client
from docarray import DocList
from docarray.documents import TextDoc
c = Client(port=12345)
print(c.post(on='/', inputs=DocList[TextDoc](TextDoc()), return_type=DocList[TextDoc])[0].text)
```
```text
I come from the first version of MyExecutor
```
We can edit the Executor file and save the changes:
```python
from jina import Executor, requests
from docarray import DocList
from docarray.documents import TextDoc
class MyExecutor(Executor):
@requests
def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
for doc in docs:
doc.text = 'I am coming from a new version of MyExecutor'
```
You should see in the logs of the serving Executor
```text
INFO executor0/rep-0@11606 detected changes in: ['XXX/XXX/XXX/my_executor.py']. Refreshing the Executor
```
And after this, the Executor will start serving with the renewed code.
```python
from jina import Client
from docarray import DocList
from docarray.documents import TextDoc
c = Client(port=12345)
print(c.post(on='/', inputs=DocList[TextDoc](TextDoc()), return_type=DocList[TextDoc])[0].text)
```
```text
'I come from a new version of MyExecutor'
```
Reloading is also applied when the Executor's YAML configuration file is changed. In this case, the Executor deployment restarts.
To see how this works, let's define an Executor configuration in `executor.yml`:
```yaml
jtype: MyExecutorBeforeReload
```
Deploy the Executor:
```python
import os
from jina import Deployment, Executor, requests
from docarray import DocList
from docarray.documents import TextDoc
os.environ['JINA_LOG_LEVEL'] = 'DEBUG'
class MyExecutorBeforeReload(Executor):
@requests
def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
for doc in docs:
doc.text = 'MyExecutorBeforeReload'
class MyExecutorAfterReload(Executor):
@requests
def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
for doc in docs:
doc.text = 'MyExecutorAfterReload'
dep = Deployment(port=12345, uses='executor.yml', reload=True)
with dep:
dep.block()
```
You can see that the Executor is running and serving:
```python
from jina import Client
from docarray import DocList
from docarray.documents import TextDoc
c = Client(port=12345)
print(c.post(on='/', inputs=DocList[TextDoc](TextDoc()), return_type=DocList[TextDoc])[0].text)
```
```text
MyExecutorBeforeReload
```
You can edit the Executor YAML file and save the changes:
```yaml
jtype: MyExecutorAfterReload
```
In the Flow's logs you should see:
```text
INFO Flow@1843 change in Executor configuration YAML /home/user/jina/jina/exec.yml observed, restarting Executor deployment
```
And after this, you can see the reloaded Executor being served:
```python
from jina import Client
from docarray import DocList
from docarray.documents import TextDoc
c = Client(port=12345)
print(c.post(on='/', inputs=DocList[TextDoc](TextDoc()), return_type=DocList[TextDoc])[0].text)
```
```yaml
jtype: MyExecutorAfterReload
```
---
# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/serving/executor/hub/create-hub-executor.md
(create-hub-executor)=
# Create
To create your {class}`~jina.Executor`, run:
```bash
jina hub new
```
For basic configuration (advanced configuration is optional but rarely necessary), you will be asked for:
* Your Executor's name
* The path to the folder where it should be saved
After running the command, a project with the following structure will be generated:
```text
MyExecutor/
├── executor.py
├── config.yml
├── README.md
├── requirements.txt
└── Dockerfile
```
* `executor.py` contains your Executor's main logic.
* `config.yml` is the Executor's {ref}`configuration ` file, where you can define `__init__` arguments using the `with` keyword. You can also define meta annotations relevant to the Executor, for getting better exposure on Executor Hub.
* `requirements.txt` describes the Executor's Python dependencies.
* `README.md` describes how to use your Executor.
* `Dockerfile` is only generated if you choose advanced configuration.
## Tips
* Use `jina hub new` CLI to create an Executor
To create an Executor, always use this command and follow the instructions. This ensures the correct file
structure.
* You don't need to manually write a Dockerfile
The build system automatically generates an optimized Dockerfile according to your Executor package.
```{tip}
In the `jina hub new` wizard you can choose from four Dockerfile templates: `cpu`, `tf-gpu`, `torch-gpu`, and `jax-gpu`.
```
* If you push your Executor to the [Executor Hub](https://cloud.jina.ai/executors), you don't need to bump the Jina-serve version
Hub Executors are version-agnostic. When you pull an Executor from Executor Hub, it will select the right Jina-serve version for you. You don't need to upgrade your version of Jina-serve.
* Fill in metadata of your Executor correctly
Information you include under the `metas` key in `config.yml` is displayed on Executor Hub. `The specification can be found here`.
---
# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/serving/executor/hub/debug-executor.md
(debug-executor)=
# Debug
````{admonition} Not applicable to containerized Executors
:class: caution
This does not work for containerized Executors.
````
In this tutorial you will learn how to debug [Hello Executor](https://cloud.jina.ai/executor/9o9yjq1q) step by step.
````{admonition} Make sure the schemas are known
:class: note
While using docarray>0.30.0, Executors do not have a fix schema and each Executor defines its own. Make sure you know
those schemas when using Executors from the Hub.
````
## Pull the Executor
Pull the source code of the Executor you want to debug:
````{tab} via Command Line Interface
```shell
jina hub pull jinaai://jina-ai/Hello
```
````
````{tab} via Python code
```python
from jina import Executor
Executor.from_hub('jinaai://jina-ai/Hello')
```
````
## Set breakpoints
In the `~/.jina-serve/hub-package` directory there is one subdirectory for each Executor that you pulled, named by the Executor ID. You can find the Executor's source files in this directory.
Once you locate the source, you can set the breakpoints as you always do.
## Debug your code
You can debug your Executor like any Python code. You can either use the Executor on its own or inside a Deployment:
````{tab} Executor on its own
```python
from jina import Executor
exec = Executor.from_hub('jinaai://jina-ai/Hello')
# Set breakpoint as needed
exec.foo()
```
````
````{tab} Executor inside a Deployment
```python
from jina import Deployment
from docarray.documents.legacy import LegacyDocument
dep = Deployment(uses='jinaai://jina-ai/Hello')
with dep:
res = dep.post('/', inputs=LegacyDocument(text='hello'), return_results=True)
print(res)
```
````
---
# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/serving/executor/hub/hub-portal.md
# Portal
Executor Hub is a marketplace for {class}`~jina.Executor`s where you can upload your own Executors or use ones already developed by the community. If this is your first time developing an Executor you can check our {ref}`tutorials ` that guide you through the process.
Let's see the [Hub portal](https://cloud.jina.ai) in detail.
## Catalog page
The main page contains a list of all Executors created by Jina-serve developers all over the world. You can see the Editor's Pick at the top of the list, which shows Executors highlighted by the Jina-serve team.
```{figure} ../../../../../.github/hub-website-list.png
:align: center
```
You can sort the list by *Trending* and *Recent* using the drop-down menu on top. Otherwise, if you want to search for a specific Executor, you can use the search box at the top or use tags for specific keywords like Image, Video, TensorFlow, and so on:
```{figure} ../../../../../.github/hub-website-search-2.png
:align: center
```
## Detail page
When you find an Executor that interests you, you can get more detail by clicking on it. You can see a description of the Executor with basic information, usage, parameters, etc. If you need more details, click "More" to go to a page with further information.
```{figure} ../../../../../.github/hub-website-detail.png
:align: center
```
There are several tabs you can explore: **Readme**, **Arguments**, **Tags** and **Dependencies**.
```{figure} ../../../../../.github/hub-website-detail-arguments.png
:align: center
```
1. **Readme**: basic information about the Executor, how it works internally, and basic usage.
2. **Arguments**: the Executor's detailed API. This is generated automatically from the Executor's Python docstrings so it's always in sync with the code base, and Executor developers don't need to write it themselves.
3. **Tags**: the tags available for this Executor. For example, `latest`, `latest-gpu` and so on. It also gives a code snippet to illustrate usage.
```{figure} ../../../../../.github/hub-website-detail-tag.png
:align: center
```
4. **Dependencies**: The Executor's Python dependencies.
On the left, you'll see possible ways to use this Executor, including Docker image, source code, etc.
```{figure} ../../../../../.github/hub-website-usage.png
:align: center
```
That's it. Now you have an overview of the [Hub portal](https://cloud.jina.ai) and how to navigate it.
---
# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/serving/executor/hub/index.md
(jina-hub)=
# Executor Hub
Now that you understand that {class}`~jina.Executor` is a building block in Jina-serve, you may also wonder:
* Can I streamline the process of containerizing my {class}`~jina.Executor`?
* Can I reuse my Executor in another project?
* Can I share my Executor with my colleagues?
* Can I just use someone else's Executor instead of building it myself?
Basically, something like the following:
```{figure} ../../../../../.github/hub-user-journey.svg
:align: center
```
**Yes!** This is exactly the purpose of Executor Hub.
Hub allows you to turn your Executor into a ready-for-the-cloud containerized service taking away a lot of the work from you.
With Hub you can pull prebuilt Executors to dramatically reduce the effort and complexity needed in your system, or push your own custom
Executors to share privately or publicly. You can think of the Hub as your easy to entry door to a Docker registry.
A Hub Executor is an Executor published on Executor Hub. You can use such an Executor in a Flow or in a Deployment:
```python
from jina import Deployment
d = Deployment(uses='jinaai+docker:///MyExecutor')
with d:
...
```
````{admonition} Make sure the schemas are known
:class: note
While using docarray>0.30.0, Executors do not have a fix schema and each Executor defines its own. Make sure you know
those schemas when using Executors from the Hub.
````
```{toctree}
:hidden:
hub-portal
create-hub-executor
push-executor
use-hub-executor
debug-executor
yaml-spec
```
---
# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/serving/executor/hub/push-executor.md
(push-executor)=
# Publish
If you want to share your {class}`~jina.Executor`, you can push it to Executor Hub.
There are two ways to share:
* **Public** (default): Anyone can use public Executors without any restrictions.
* **Private**: Only people with the `secret` can use private Executors.
(jina-hub-usage)=
## Publishing for the first time
```bash
jina hub push [--public/--private]
```
If you have logged into Jina-serve, it will return a `TASK_ID`. You need that to get your Executor's build status and logs.
If you haven't logged into Jina-serve, it will return `NAME` and `SECRET`. You need them to use (if the Executor is private) or update the Executor. **Please keep them safe.**
````{admonition} Note
:class: note
If you are logged into the Hub using our CLI tools (`jina auth login` or `jcloud login`), you can push and pull your Executors without `SECRET`.
````
You can then visit [Executor Hub](https://cloud.jina.ai), select the "Recent" tab and see your published Executor.
````{admonition} Note
:class: note
If no `--public` or `--private` argument is provided, then an Executor is **public** by default.
````
````{admonition} Important
:class: important
Anyone can use public Executors, but to use a private Executor you must know its `SECRET`.
````
## Update published Executors
To override or update a published Executor, you must have both its `NAME` and `SECRET`.
```bash
jina hub push [--public/--private] --force-update --secret
```
(hub_tags)=
## Tagging an Executor
Tagging can be useful for versioning Executors or differentiating them by their architecture (e.g. `gpu`, `cpu`).
```bash
jina hub push -t TAG1 -t TAG2
```
You can specify `-t` or `--tags` parameter to tag an Executor.
* If you **don't** add the `-t` parameter, the default tag is `latest`
* If you **do** add the `-t` parameter and you still want to have the `latest` tag, you must write it as another `-t` parameter.
```bash
jina hub push . # Result in one tag: latest
jina hub push . -t v1.0.0 # Result in one tag: v1.0.0
jina hub push . -t v1.0.0 -t latest # Result in two tags: v1.0.0, latest
```
If you want to create a new tag for an existing Executor, you can also add the `-t` option here:
```bash
jina hub push [--public/--private] --force-update --secret -t TAG
```
### Protected tags
Protected tags prevent some tags being overwritten and ensures stable, consistent behavior.
You can use the `--protected-tag` option to create protected tags.
After pushing for the first time, the protected tags cannot be pushed again.
```bash
jina hub push [--public/--private] --force-update --secret --protected-tag --protected-tag
```
## Use environment variables
The `--build-env` parameter manages environment variables, letting you use a private token in `requirements.txt` to install private dependencies. For security reasons, you don't want to expose this token to anyone else. For example, we have the following `requirements.txt`:
```text
# requirements.txt
git+http://${YOUR_TOKEN}@github.com/your_private_repo
```
When running `jina hub push`, you can pass the `--build-env` parameter:
```bash
jina hub push --build-env YOUR_TOKEN=foo
```
````{admonition} Note
:class: note
There are restrictions when naming environment variables:
* Environment variables must be wrapped in `{` and `}` in `requirements.txt`. i.e. `${YOUR_TOKEN}`, not `$YOUR_TOKEN`.
* Environment variables are limited to numbers, uppercase letters and `_` (underscore), and cannot start with `_`.
````
````{admonition} Limitations
:class: attention
There are limitations if you push Executors via `--build-env` and pull/use it as source code (this doesn't matter if you use a Docker image):
* When you use `jina hub pull jinaai:///YOUR_EXECUTOR`, you must set the corresponding environment variable according to the prompt:
```bash
export YOUR_TOKEN=foo
```
* When you use `.add(uses='jinaai:///YOUR_EXECUTOR')` in a Flow, you must set the corresponding environment variable:
```python
from jina import Flow, Executor, requests, Document
import os
os.environ['YOUR_TOKEN'] = 'foo'
f = Flow().add(uses='jinaai:///YOUR_EXECUTOR')
with f:
f.post(on='/', inputs=Document(), on_done=print)
```
````
For multiple environment variables:
```bash
jina hub push --build-env FIRST=foo --build-env SECOND=bar
```
## Building status of an Executor
To query the build status of a pushed Executor:
```bash
jina hub status [] [--id TASK_ID] [--verbose] [--replay]
```
* The parameter `--id TASK_ID` gets the build status of a specific build task
* The parameter `--verbose` prints verbose build logs.
* The parameter `--replay`, prints build status from the beginning.
## ARM64 architecture support
````{admonition} Hint
:class: Hint
As of January 10, 2023 you can push Executors for the ARM64 architecture.
````
````{admonition} Note
:class: note
Executor docker images are linux images. Even if you are running on a Mac or Windows machine, the underlying OS is still linux.
````
If you run `jina hub push` on an ARM64-based machine, you automatically push an ARM64 Executor.
However, if you provide your own Dockerfile, it will need to work for both "linux/amd64" and "linux/arm64".
If you don't want this behavior, you can explicitly specify the `--platform` parameter:
```bash
# Push for both platforms
jina hub push --platform linux/arm64,linux/amd64
# Push for AMD64 only
jina hub push --platform linux/amd64
# Push for ARM64 only (not recommended)
jina hub push --platform linux/arm64
```
---
# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/serving/executor/hub/use-hub-executor.md
(use-hub-executor)=
# Use
There are three ways to use Hub {class}`~jina.Executor`s in your project. Each has its own use case and benefits.
## Use as-is
You can use a Hub Executor as-is via `Executor.from_hub()`:
```python
from jina import Executor
from docarray import DocList
from docarray.documents.legacy import LegacyDocument
exec = Executor.from_hub('jinaai://jina-ai/DummyHubExecutor')
da = DocList[LegacyDocument]([LegacyDocument()])
exec.foo(da)
assert da.texts == ['hello']
```
The Hub Executor will be pulled to your local machine and run as a native Python object. You can use a line-debugger to step in/out `exec` object, set breakpoints, and observe how it behaves. You can directly feed in `Documents`. After you build some confidence in that Executor, you can move to the next step: Using it as a part of your Flow.
```{caution}
Not all Executors on the Hub can be directly run in this way - some require extra dependencies. In that case, you can add `.from_hub(..., install_requirements=True)` to install the requirements automatically. Be careful - these dependencies may not be compatible with your local packages and may override your local development environment.
```
```{tip}
Hub Executors are cached locally on the first pull. Afterwards, they will not be updated.
To keep up-to-date with upstream, use `.from_hub(..., force_update=True)`.
```
(pull-executor)=
## Pull only
You can also use `jina hub` CLI to pull an Executor without actually using it in the Flow.
````{admonition} Jina-serve and DocArray version
:class: note
Independently of the Jina-serve and DocArray version existing when the Executor was pushed to the Hub. When pulling, the Hub will try
to install the Jina-serve and DocArray version that you have installed locally in the pulled docker images.
````
### Pull the Docker image
```bash
jina hub pull jinaai+docker:///[:]
```
You can find the Executor by running `docker images`. You can also indicate which version of the Executor you want to use by specifying the `:`.
```bash
jina hub pull jinaai+docker://jina-ai/DummyExecutor:v1.0.0
```
## Use in Flow as container
Use prebuilt images from Hub in your Python code:
```python
from jina import Flow
# You have to login for private Executor
# import hubble
# hubble.login()
f = Flow().add(uses='jinaai+docker:///[:]')
```
If you do not provide a `: