# Jina Serve

> This section includes the API documentation from the`jina`codebase, as extracted from the`docstrings`_ in the code.

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/api-rst.rst

======================
:fab:`python` Python API
======================

This section includes the API documentation from the `jina` codebase, as extracted from the `docstrings <https://peps.python.org/pep-0257/>`_ in the code.

For further details, please refer to the full :ref:`user guide <executor-cookbook>`.


:mod:`jina.orchestrate.deployments` - Deployment
--------------------

.. currentmodule:: jina.orchestrate.deployments

.. autosummary::
   :nosignatures:
   :template: class.rst

   __init__.Deployment


:mod:`jina.orchestrate.flow` - Flow
--------------------

.. currentmodule:: jina.orchestrate.flow

.. autosummary::
   :nosignatures:
   :template: class.rst

   base.Flow
   asyncio.AsyncFlow


:mod:`jina.serve.executors` - Executor
--------------------

.. currentmodule:: jina.serve.executors

.. autosummary::
   :nosignatures:
   :template: class.rst

   Executor
   BaseExecutor
   decorators.requests
   decorators.monitor


:mod:`jina.clients` - Clients
--------------------

.. currentmodule:: jina.clients

.. autosummary::
   :nosignatures:
   :template: class.rst

   Client
   grpc.GRPCClient
   grpc.AsyncGRPCClient
   http.HTTPClient
   http.AsyncHTTPClient
   websocket.WebSocketClient
   websocket.AsyncWebSocketClient


:mod:`jina.types.request` - Networking messages
--------------------

.. currentmodule:: jina.types.request

.. autosummary::
   :nosignatures:
   :template: class.rst

   Request
   data.DataRequest
   data.Response
   status.StatusMessage


:mod:`jina.serve.runtimes` - Flow internals
--------------------

.. currentmodule:: jina.serve.runtimes

.. autosummary::
   :nosignatures:
   :template: class.rst

   asyncio.AsyncNewLoopRuntime
   gateway.GatewayRuntime
   gateway.grpc.GRPCGatewayRuntime
   gateway.http.HTTPGatewayRuntime
   gateway.websocket.WebSocketGatewayRuntime
   worker.WorkerRuntime
   head.HeadRuntime

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/cli/index.rst

:octicon:`terminal` Command-Line Interface
==========================================

.. argparse::
   :noepilog:
   :nodescription:
   :ref: jina.parsers.get_main_parser
   :prog: jina

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/cloud-nativeness/docker-compose.md

(docker-compose)=

# {fab}`docker` Docker Compose Support

One of the simplest ways to prototype or serve in
production is to run your {class}`~jina.Flow` with `docker-compose`.

A {class}`~jina.Flow` is composed of {class}`~jina.Executor`s which run Python code
that operates on `Documents`. These `Executors` live in different runtimes depending on how you want to deploy
your Flow.

By default, if you are serving your Flow locally they live within processes. Nevertheless,
because Jina-serve is cloud native your Flow can easily manage Executors that live in containers and that are
orchestrated by your favorite tools. One of the simplest is Docker Compose which is supported out of the box.

You can deploy a Flow with Docker Compose in one line:

```{code-block} python
---
emphasize-lines: 3
---
from jina import Flow
flow = Flow(...).add(...).add(...)
flow.to_docker_compose_yaml('docker-compose.yml')

```

Jina-serve generates a `docker-compose.yml` configuration file corresponding with your Flow. You can use this directly with
Docker Compose, avoiding the overhead of manually defining all of your Flow's services.

````{admonition} Use Docker-based Executors
:class: caution
All Executors in the Flow should be used with `jinaai+docker://...` or `docker://...`.

````

````{admonition} Health check available from 3.1.3
:class: caution
If you use Executors that rely on Docker images built with a version of Jina-serve prior to 3.1.3, remove the
health check from the dumped YAML file, otherwise your Docker Compose services will
always be "unhealthy."

````

````{admonition} Matching Jina-serve versions
:class: caution
If you change the Docker images in your Docker Compose generated file, ensure that all services included in
the Gateway are built with the same Jina-serve version to guarantee compatibility.

````

## Example: Index and search text using your own built Encoder and Indexer

Install [`Docker Compose`](https://docs.docker.com/compose/install/) locally before starting this tutorial.

For this example we recommend that you read {ref}`how to build and containerize the Executors to be run in Kubernetes. <build-containerize-for-k8s>`

### Deploy the Flow

First define the Flow and generate the Docker Compose YAML configuration:

````{tab} YAML
In a `flow.yml` file :

```yaml

jtype: Flow
with:
  port: 8080
  protocol: http
executors:

* name: encoder

  uses: jinaai+docker://<user-id>/EncoderPrivate
  replicas: 2

* name: indexer

  uses: jinaai+docker://<user-id>/IndexerPrivate
  shards: 2

```bash

```shell

jina export docker-compose flow.yml docker-compose.yml

```

````

````{tab} Python
In python run

```python

from jina import Flow

flow = (
    Flow(port=8080, protocol='http')
    .add(name='encoder', uses='jinaai+docker://<user-id>/EncoderPrivate', replicas=2)
    .add(
        name='indexer',
        uses='jinaai+docker://<user-id>/IndexerPrivate',
        shards=2,
    )
)
flow.to_docker_compose_yaml('docker-compose.yml')

```

````

````{admonition} Hint
:class: hint
You can use a custom jina Docker image for the Gateway service by setting the environment variable `JINA_GATEWAY_IMAGE` to the desired image before generating the configuration.

````

let's take a look at the generated compose file:

```yaml
version: '3.3'
...
services:
  encoder-rep-0:  # # # # # # # # # # #
                  #     Encoder       #
  encoder-rep-1:  # # # # # # # # # # #

  indexer-head:   # # # # # # # # # # #
                  #                   #
  indexer-0:      #     Indexer       #
                  #                   #
  indexer-1:      # # # # # # # # # # #

  gateway:
    ...
    ports:

  * 8080:8080

```

```{tip}
:class: caution
The default compose file generated by the Flow contains no special configuration or settings. You may want to
adapt it to your own needs.

```

You can see that six services are created:

* 1 for the **Gateway** which is the entrypoint of the **Flow**.
* 2 associated with the encoder for the two Replicas.
* 3 associated with the indexer, one for the Head and two for the Shards.

Now, you can deploy this Flow :

```shell
docker-compose -f docker-compose.yml up

```

### Query the Flow

Once we see that all the services in the Flow are ready, we can send index and search requests.

First define a client:

```python
from jina.clients import Client

client = Client(host='http://localhost:8080')

```

```python
from typing import List, Optional
from docarray import DocList, BaseDoc
from docarray.typing import NdArray

class MyDoc(BaseDoc):
    text: str
    embedding: Optional[NdArray] = None

class MyDocWithMatches(MyDoc):
    matches: DocList[MyDoc] = []
    scores: List[float] = []

docs = client.post(
    '/index',
    inputs=DocList[MyDoc]([MyDoc(text=f'This is document indexed number {i}') for i in range(100)]),
    return_type=DocList[MyDoc],
    request_size=10
)

print(f'Indexed documents: {len(docs)}')
docs = client.post(
    '/search',
    inputs=DocList[MyDoc]([MyDoc(text=f'This is document query number {i}') for i in range(10)]),
    return_type=DocList[MyDocWithMatches],
    request_size=10
)
for doc in docs:
    print(f'Query {doc.text} has {len(doc.matches)} matches')

```

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/cloud-nativeness/k8s.md

(kubernetes-docs)=

# {fas}`dharmachakra` Kubernetes Support

```{toctree}
:hidden:

kubernetes

```

Jina-serve is a cloud-native framework and therefore runs natively and easily on Kubernetes.
Deploying a Jina-serve Deploymenr or Flow on Kubernetes is actually the recommended way to use Jina-serve in production.

A {class}`~jina.Deployment` and {class}`~jina.Flow` are services composed of single or multiple microservices called {class}`~jina.Executor` and {class}`~jina.Gateway`s which natively run in containers. This means that Kubernetes can natively take over the lifetime management of Executors.

Deploying a {class}`~jina.Deployment` or `~jina.Flow` on Kubernetes means wrapping these services containers in the appropriate K8s abstraction (Deployment, StatefulSet, and so on), exposing them internally via K8s service and connecting them together by passing the right set of parameters.

```{hint}
This documentation is designed for users who want to **manually** deploy a Jina-serve project on Kubernetes.

Check out {ref}`jcloud` if you want a **one-click** solution to deploy and host Jina, leveraging a cloud-native stack of Kubernetes, Prometheus and Grafana, **without worrying about provisioning**.

```

## Automatically translate a Deployment or Flow to Kubernetes concept

```{hint}
Manually building these Kubernetes YAML object is long and cumbersome. Therefore we provide a helper function {meth}`~jina.Flow.to_kubernetes_yaml` that does most of this
translation work automatically.

```

This helper function can be called from:

* Jina-serve's Python interface to translate a Flow defined in Python to K8s YAML files
* Jina-serve's CLI interface to export a YAML Flow to K8s YAML files

```{seealso}
More detail in the {ref}`Deployment export documentation<deployment-kubernetes-export>` and {ref}`Flow export documentation <flow-kubernetes-export>`

```

## Extra Kubernetes options

In general, Jina-serve follows a single principle when it comes to deploying in Kubernetes:
You, the user, know your use case and requirements the best.
This means that, while Jina-serve generates configurations for you that run out of the box, as a professional user you should always see them as just a starting point to get you off the ground.

```{hint}
The export function {meth}`~jina.Deployment.to_kubernetes_yaml` and {meth}`~jina.Flow.to_kubernetes_yaml` are helper functions to get your stared off the ground. **There are meant to be updated and adapted to every use case**

```

````{admonition} Matching Jina versions
:class: caution
If you change the Docker images for {class}`~jina.Executor` and {class}`~jina.Gateway` in your Kubernetes-generated file, ensure that all of them are built with the same Jina-serve version to guarantee compatibility.

````

You can't add basic Kubernetes features like `Secrets`, `ConfigMap` or `Labels` via the Pythonic or YAML interface. This is intentional and doesn't mean that we don't support these features. On the contrary, we let you fully express your Kubernetes configuration by using the Kubernetes API to add your own Kubernetes standard to Jina-serve.

````{admonition} Hint
:class: hint
We recommend you dump the Kubernetes configuration files and then edit them to suit your needs.

````

Here are possible configuration options you may need to add or change

* Add labels `selector`s to the Deployments to suit your case
* Add `requests` and `limits` for the resources of the different Pods
* Set up persistent volume storage to save your data on disk
* Pass custom configuration to your Executor with `ConfigMap`
* Manage credentials of your Executor with Kubernetes secrets, you can use `f.add(..., env_from_secret={'SECRET_PASSWORD': {'name': 'mysecret', 'key': 'password'}})` to map them to Pod environment variables
* Edit the default rolling update configuration

(service-mesh-k8s)=

## Required service mesh

```{caution}
A Service Mesh is required to be installed and correctly configured in the K8s cluster in which your deployed your Flow.

```

Service meshes work by attaching a tiny proxy to each of your Kubernetes Pods, allowing for smart rerouting, load balancing, request retrying, and host of [other features](https://linkerd.io/2.11/features/).

Jina relies on a service mesh to load balance requests between replicas of the same Executor.
You can use your favourite Kubernetes service mesh in combination with your Jina services, but the configuration files
generated by `to_kubernetes_yaml()` already include all necessary annotations for the [Linkerd service mesh](https://linkerd.io).

````{admonition} Hint
:class: hint
You can use any service mesh with Jina-serve, but Jina-serve Kubernetes configurations come with Linkerd annotations out of the box.

````

To use Linkerd you can follow the [install the Linkerd CLI guide](https://linkerd.io/2.11/getting-started/).

````{admonition} Caution
:class: caution

Many service meshes can perform retries themselves.
Be careful about setting up service mesh level retries in combination with Jina, as it may lead to unwanted behaviour in combination with
Jina's own {ref}`retry policy <flow-error-handling>`.

Instead, you can disable Jina level retries by setting `Flow(retries=0)` in Python, or `retries: 0` in the Flow
YAML's `with` block.

````

(kubernetes-replicas)=

## Scaling Executors: Replicas and shards

Jina supports two types of scaling:

* **Replicas** can be used with any Executor type and are typically used for performance and availability.
* **Shards** are used for partitioning data and should only be used with indexers since they store state.

Check {ref}`here <scale-out>` for more information about these scaling mechanisms.

For shards, Jina creates one separate Deployment in Kubernetes per Shard.
Setting `Deployment(..., shards=num_shards)` is sufficient to create a corresponding Kubernetes configuration.

For replicas, Jina-serve uses [Kubernetes native replica scaling](https://kubernetes.io/docs/tutorials/kubernetes-basics/scale/scale-intro/) and **relies on a service mesh** to load-balance requests between replicas of the same Executor.
Without a service mesh installed in your Kubernetes cluster, all traffic will be routed to the same replica.

````{admonition} See Also
:class: seealso

The impossibility of load balancing between different replicas is a limitation of Kubernetes in combination with gRPC.
If you want to learn more about this limitation, see [this](https://kubernetes.io/blog/2018/11/07/grpc-load-balancing-on-kubernetes-without-tears/) Kubernetes Blog post.

````

## Scaling the Gateway

The {ref}`Gateway <gateway>` is responsible for providing the API of the {ref}`Flow <flow>`.
If you have a large Flow with many Clients and many replicated Executors, the Gateway can become the bottleneck.
In this case you can also scale up the Gateway deployment to be backed by multiple Kubernetes Pods. For this reason, you can add `replicas` parameter to your Gateway before converting the Flow to Kubernetes.
This can be done in a Pythonic way or in YAML:

````{tab} Using Python

You can use {meth}`~jina.Flow.config_gateway` to add `replicas` parameter

```python

from jina import Flow

f = Flow().config_gateway(replicas=3).add()

f.to_kubernetes_yaml('./k8s_yaml_path')

```

````

````{tab} Using YAML
You can add `replicas` in the `gateway` section of your Flow YAML

```yaml

jtype: Flow
gateway:
    replicas: 3
executors:

* name: encoder

```

````

Alternatively, this can be done by the regular means of Kubernetes: Either increase the number of replicas in the {ref}`generated yaml configuration files <kubernetes-export>` or [add replicas while running](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#scaling-a-deployment).
To expose your Gateway replicas outside Kubernetes, you can add a load balancer as described {ref}`here <kubernetes-expose>`.

````{admonition} Hint
:class: hint
You can use a custom Docker image for the Gateway deployment by setting the environment variable `JINA_GATEWAY_IMAGE` to the desired image before generating the configuration.

````

## See also

* {ref}`Step by step deployment of a Jina-serve Flow on Kubernetes <kubernetes>`
* {ref}`Export a Flow to Kubernetes <kubernetes-export>`
* {meth}`~jina.Flow.to_kubernetes_yaml`
* {ref}`Deploy a standalone Executor on Kubernetes <kubernetes-executor>`
* [Kubernetes Documentation](https://kubernetes.io/docs/home/)

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/cloud-nativeness/kubernetes.md

(kubernetes)=

# Deploy on Kubernetes

This how-to will go through deploying a Deployment and a simple Flow using Kubernetes, customizing the Kubernetes configuration
to your needs, and scaling Executors using replicas and shards.

Deploying Jina-serve services in Kubernetes is the recommended way to use Jina-serve in production because Kubernetes can easily take over the lifetime management of Executors and Gateways.

```{seelaso}
This page is a step by step guide, refer to the {ref}`Kubernetes support documentation <kubernetes-docs>` for more details

```

```{hint}
This guide is designed for users who want to **manually** deploy a Jina-serve project on Kubernetes.

Check out {ref}`jcloud` if you want a **one-click** solution to deploy and host Jina, leveraging a cloud-native stack of Kubernetes, Prometheus and Grafana, **without worrying about provisioning**.

```

## Preliminaries

To follow this how-to, you need access to a Kubernetes cluster.

You can either set up [`minikube`](https://minikube.sigs.k8s.io/docs/start/), or use one of many managed Kubernetes
solutions in the cloud:

* [Google Kubernetes Engine](https://cloud.google.com/kubernetes-engine)
* [Amazon EKS](https://aws.amazon.com/eks)
* [Azure Kubernetes Service](https://azure.microsoft.com/en-us/services/kubernetes-service)
* [Digital Ocean](https://www.digitalocean.com/products/kubernetes/)

You need to install Linkerd in your K8s cluster. To use Linkerd, [install the Linkerd CLI](https://linkerd.io/2.11/getting-started/) and [its control plane](https://linkerd.io/2.11/getting-started/) in your cluster.
This automatically sets up and manages the service mesh proxies when you deploy the Flow.

To understand why you need to install a service mesh like Linkerd refer to this  {ref}`section <service-mesh-k8s>`

(build-containerize-for-k8s)=

## Build and containerize your Executors

First, we need to build the Executors that we are going to use and containerize them {ref}`manually <dockerize-exec>` or by leveraging {ref}`Executor Hub <jina-hub>`. In this example,
we are going to use the Hub.

We are going to build two Executors, the first is going to use `CLIP` to encode textual Documents, and the second is going to use an in-memory vector index. This way
we can build a simple neural search system.

First, we build the encoder Executor.

````{tab} executor.py

```{code-block} python

import torch
from typing import Optional
from transformers import CLIPModel, CLIPTokenizer
from docarray import DocList, BaseDoc
from docarray.typing import NdArray
from jina import Executor, requests

class MyDoc(BaseDoc):
    text: str
    embedding: Optional[NdArray] = None

class Encoder(Executor):
    def __init__(
            self, pretrained_model_name_or_path: str = 'openai/clip-vit-base-patch32', device: str = 'cpu', *args,**kwargs ):
        super().__init__(*args, **kwargs)
        self.device = device
        self.tokenizer = CLIPTokenizer.from_pretrained(pretrained_model_name_or_path)
        self.model = CLIPModel.from_pretrained(pretrained_model_name_or_path)
        self.model.eval().to(device)

    def _tokenize_texts(self, texts):
        x = self.tokenizer(
            texts,
            max_length=77,
            padding='longest',
            truncation=True,
            return_tensors='pt',
        )
        return {k: v.to(self.device) for k, v in x.items()}

    @requests
    def encode(self, docs: DocList[MyDoc], **kwargs) -> DocList[MyDoc]:
        with torch.inference_mode():
            input_tokens = self._tokenize_texts(docs.text)
            docs.embedding = self.model.get_text_features(**input_tokens).cpu().numpy()
        return docs

```

````

````{tab} requirements.txt

```

torch==1.12.0
transformers==4.16.2

```

````

````{tab} config.yml

```

jtype: Encoder
metas:
  name: EncoderPrivate
  py_modules:

  * executor.py

```

````

Putting all these files into a folder named CLIPEncoder and calling `jina hub push CLIPEncoder --private` should give:

```shell
╭────────────────────────── Published ───────────────────────────╮
│                                                                │
│   📛 Name           EncoderPrivate                         │
│   🔗 Jina Hub URL   https://cloud.jina.ai/executor/<executor-id>/   │
│   👀 Visibility     private                                    │
│                                                                │
╰────────────────────────────────────────────────────────────────╯
╭───────────────────────────────────────────────────── Usage ─────────────────────────────────────────────────────╮
│                                                                                                                 │
│   Container   YAML     uses: jinaai+docker://<user-id>/EncoderPrivate:latest           │
│               Python   .add(uses='jinaai+docker://<user-id>/EncoderPrivate:latest')    │
│                                                                                                                 │
│   Source      YAML     uses: jinaai://<user-id>/EncoderPrivate:latest                  │
│               Python   .add(uses='jinaai://<user-id>/EncoderPrivate:latest')           │
│                                                                                                                 │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

```

Then we can build an indexer to provide `index` and `search` endpoints:

````{tab} executor.py

```{code-block} python

from typing import Optional, List
from docarray import DocList, BaseDoc
from docarray.index import InMemoryExactNNIndex
from docarray.typing import NdArray
from jina import Executor, requests

class MyDoc(BaseDoc):
    text: str
    embedding: Optional[NdArray] = None

class MyDocWithMatches(MyDoc):
    matches: DocList[MyDoc] = []
    scores: List[float] = []

class Indexer(Executor):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self._indexer = InMemoryExactNNIndex[MyDoc]()

    @requests(on='/index')
    def index(self, docs: DocList[MyDoc], **kwargs) -> DocList[MyDoc]:
        self._indexer.index(docs)
        return docs

    @requests(on='/search')
    def search(self, docs: DocList[MyDoc], **kwargs) -> DocList[MyDocWithMatches]:
        res = DocList[MyDocWithMatches]()
        ret = self._indexer.find_batched(docs, search_field='embedding')
        matched_documents = ret.documents
        matched_scores = ret.scores
        for query, matches, scores in zip(docs, matched_documents, matched_scores):
            output_doc = MyDocWithMatches(**query.dict())
            output_doc.matches = matches
            output_doc.scores = scores.tolist()
            res.append(output_doc)
        return res

```

````

````{tab} config.yml

```

jtype: Indexer
metas:
  name: IndexerPrivate
  py_modules:

  * executor.py

```

````

Putting all these files into a folder named Indexer and calling `jina hub push Indexer --private` should give:

```shell
╭────────────────────────── Published ───────────────────────────╮
│                                                                │
│   📛 Name           IndexerPrivate                         │
│   🔗 Jina Hub URL   https://cloud.jina.ai/executor/<executor-id>/   │
│   👀 Visibility     private                                    │
│                                                                │
╰────────────────────────────────────────────────────────────────╯
╭───────────────────────────────────────────────────── Usage ─────────────────────────────────────────────────────╮
│                                                                                                                 │
│   Container   YAML     uses: jinaai+docker://<user-id>/IndexerPrivate:latest           │
│               Python   .add(uses='jinaai+docker://<user-id>/IndexerPrivate:latest')    │
│                                                                                                                 │                      │
│   Source      YAML     uses: jinaai://<user-id>/IndexerPrivate:latest                  │
│               Python   .add(uses='jinaai://<user-id>/IndexerPrivate:latest')           │
│                                                                                                                 │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

```

Now, since we have created private Executors, we need to make sure that K8s has the right credentials to download
from the private registry:

First, we need to create the namespace where our Flow will run:

```shell
kubectl create namespace custom-namespace

```

Second, we execute this python script:

```python
import json
import os
import base64

JINA_CONFIG_JSON_PATH = os.path.join(os.path.expanduser('~'), os.path.join('.jina', 'config.json'))
CONFIG_JSON = 'config.json'

with open(JINA_CONFIG_JSON_PATH) as fp:
    auth_token = json.load(fp)['auth_token']

config_dict = dict()
config_dict['auths'] = dict()
config_dict['auths']['registry.hubble.jina.ai'] = {'auth': base64.b64encode(f'<token>:{auth_token}'.encode()).decode()}

with open(CONFIG_JSON, mode='w') as fp:
    json.dump(config_dict, fp)

```

Finally, we add a secret to be used as [imagePullSecrets](https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/) in the namespace from our config.json:

```shell script
kubectl -n custom-namespace create secret generic regcred --from-file=.dockerconfigjson=config.json --type=kubernetes.io/dockerconfigjson

```

## Deploy an embedding model inside a Deployment

Now we are ready to first deploy our embedding model as an embedding service in Kubernetes.

For now, define a Deployment,
either in {ref}`YAML <deployment-yaml-spec>` or directly in Python, as we do here:

```python
from jina import Deployment

d = Deployment(port=8080, name='encoder', uses='jinaai+docker://<user-id>/EncoderPrivate', image_pull_secrets=['regcred'])

```

You can serve any Deployment you want.
Just ensure that the Executor is containerized, either by using *'jinaai+docker'*, or by {ref}`containerizing your local
Executors <dockerize-exec>`.

Next, generate Kubernetes YAML configs from the Flow. Notice, that this step may be a little slow, because [Executor Hub](https://cloud.jina.ai/) may
adapt the image to your Jina-serve and docarray version.

```python
d.to_kubernetes_yaml('./k8s_deployment', k8s_namespace='custom-namespace')

```

The following file structure will be generated - don't worry if it's slightly different -- there can be
changes from one Jina-serve version to another:

```text
└── k8s_deployment
    └── encoder.yml

```

You can inspect these files to see how Deployment and Executor concepts are mapped to Kubernetes entities.
And as always, feel free to modify these files as you see fit for your use case.

````{admonition} Caution: Executor YAML configurations
:class: caution

As a general rule, the configuration files produced by `to_kubernetes_yaml()` should run out of the box, and if you strictly
follow this how-to they will.

However, there is an exception to this: If you use a local dockerized Executor, and this Executors configuration is stored
in a file other than `config.yaml`, you will have to adapt this Executor's Kubernetes YAML.
To do this, open the file and replace `config.yaml` with the actual path to the Executor configuration.

This is because when a Flow contains a Docker image, it can't see what Executor
configuration was used to create that image.
Since all of our tutorials use `config.yaml` for that purpose, the Flow uses this as a best guess.
Please adapt this if you named your Executor configuration file differently.

````

Next you can actually apply these configuration files to your cluster, using `kubectl`.
This launches the Deployment service.

Now, deploy this Deployment to your cluster:

```shell
kubectl apply -R -f ./k8s_deployment

```

Check that the Pods were created:

```shell
kubectl get pods -n custom-namespace

```

```text
NAME                              READY   STATUS    RESTARTS   AGE
encoder-81a5b3cf9-ls2m3           1/1     Running   0          60m

```

Once you see that the Deployment ready, you can start embedding documents:

```python
from typing import Optional
import portforward
from docarray import DocList, BaseDoc
from docarray.typing import NdArray

from jina.clients import Client

class MyDoc(BaseDoc):
    text: str
    embedding: Optional[NdArray] = None

with portforward.forward('custom-namespace', 'encoder-81a5b3cf9-ls2m3', 8080, 8080):
    client = Client(host='localhost', port=8080)
    client.show_progress = True
    docs = client.post(
        '/encode',
        inputs=DocList[MyDoc]([MyDoc(text=f'This is document indexed number {i}') for i in range(100)]),
        return_type=DocList[MyDoc],
        request_size=10
    )

    for doc in docs:
        print(f'{doc.text}: {doc.embedding}')

```

## Deploy a simple Flow

Now we are ready to build a Flow composed of multiple Executors.

By *simple* in this context we mean a Flow without replicated or sharded Executors - you can see how to use those in
Kubernetes {ref}`later on <kubernetes-replicas>`.

For now, define a Flow,
either in {ref}`YAML <flow-yaml-spec>` or directly in Python, as we do here:

```python
from jina import Flow

f = (
    Flow(port=8080, image_pull_secrets=['regcred'])
    .add(name='encoder', uses='jinaai+docker://<user-id>/EncoderPrivate')
    .add(
        name='indexer',
        uses='jinaai+docker://<user-id>/IndexerPrivate',
    )
)

```

You can essentially define any Flow of your liking.
Just ensure that all Executors are containerized, either by using *'jinaai+docker'*, or by {ref}`containerizing your local
Executors <dockerize-exec>`.

The example Flow here simply encodes and indexes text data using two Executors pushed to the [Executor Hub](https://cloud.jina.ai/).

Next, generate Kubernetes YAML configs from the Flow. Notice, that this step may be a little slow, because [Executor Hub](https://cloud.jina.ai/) may
adapt the image to your Jina-serve and docarray version.

```python
f.to_kubernetes_yaml('./k8s_flow', k8s_namespace='custom-namespace')

```

The following file structure will be generated - don't worry if it's slightly different -- there can be
changes from one Jina-serve version to another:

```text
└── k8s_flow
    ├── gateway
    │   └── gateway.yml
    └── encoder
    │   └── encoder.yml
    └── indexer
        └── indexer.yml

```

You can inspect these files to see how Flow concepts are mapped to Kubernetes entities.
And as always, feel free to modify these files as you see fit for your use case.

Next you can actually apply these configuration files to your cluster, using `kubectl`.
This launches all Flow microservices.

Now, deploy this Flow to your cluster:

```shell
kubectl apply -R -f ./k8s_flow

```

Check that the Pods were created:

```shell
kubectl get pods -n custom-namespace

```

```text
NAME                              READY   STATUS    RESTARTS   AGE
encoder-8b5575cb9-bh2x8           1/1     Running   0          60m
gateway-66d5f45ff5-4q7sw          1/1     Running   0          60m
indexer-8f676fc9d-4fh52           1/1     Running   0          60m

```

Note that the Jina gateway was deployed with name `gateway-7df8765bd9-xf5tf`.

Once you see that all the Deployments in the Flow are ready, you can start indexing documents:

```python
from typing import List, Optional
import portforward
from docarray import DocList, BaseDoc
from docarray.typing import NdArray

from jina.clients import Client

class MyDoc(BaseDoc):
    text: str
    embedding: Optional[NdArray] = None

class MyDocWithMatches(MyDoc):
    matches: DocList[MyDoc] = []
    scores: List[float] = []

with portforward.forward('custom-namespace', 'gateway-66d5f45ff5-4q7sw', 8080, 8080):
    client = Client(host='localhost', port=8080)
    client.show_progress = True
    docs = client.post(
        '/index',
        inputs=DocList[MyDoc]([MyDoc(text=f'This is document indexed number {i}') for i in range(100)]),
        return_type=DocList[MyDoc],
        request_size=10
    )

    print(f'Indexed documents: {len(docs)}')
    docs = client.post(
        '/search',
        inputs=DocList[MyDoc]([MyDoc(text=f'This is document query number {i}') for i in range(10)]),
        return_type=DocList[MyDocWithMatches],
        request_size=10
    )
    for doc in docs:
        print(f'Query {doc.text} has {len(doc.matches)} matches')

```

### Deploy with shards and replicas

After your service mesh is installed, your cluster is ready to run a Flow with scaled Executors.
You can adapt the Flow from above to work with two replicas for the encoder, and two shards for the indexer:

```python
from jina import Flow

f = (
    Flow(port=8080, image_pull_secrets=['regcred'])
    .add(name='encoder', uses='jinaai+docker://<user-id>/CLIPEncoderPrivate', replicas=2)
    .add(
        name='indexer',
        uses='jinaai+docker://<user-id>/IndexerPrivate',
        shards=2,
    )
)

```

Again, you can generate your Kubernetes configuration:

```python
f.to_kubernetes_yaml('./k8s_flow', k8s_namespace='custom-namespace')

```

Now you should see the following file structure:

```text
└── k8s_flow
    ├── gateway
    │   └── gateway.yml
    └── encoder
    │   └─ encoder.yml
    └── indexer
        ├── indexer-0.yml
        ├── indexer-1.yml
        └── indexer-head.yml

```

Apply your configuration like usual:

````{admonition} Hint: Cluster cleanup
:class: hint
If you already have the simple Flow from the first example running on your cluster, make sure to delete it using `kubectl delete -R -f ./k8s_flow`.

````

```shell
kubectl apply -R -f ./k8s_flow

```

### Deploy with custom environment variables and secrets

You can customize the environment variables that are available inside runtime, either defined directly or read from a [Kubernetes secret](https://kubernetes.io/docs/concepts/configuration/secret/):

````{tab} with Python

```python

from jina import Flow

f = (
    Flow(port=8080, image_pull_secrets=['regcred'])
    .add(
        name='indexer',
        uses='jinaai+docker://<user-id>/IndexerPrivate',
        env={'k1': 'v1', 'k2': 'v2'},
        env_from_secret={
            'SECRET_USERNAME': {'name': 'mysecret', 'key': 'username'},
            'SECRET_PASSWORD': {'name': 'mysecret', 'key': 'password'},
        },
    )
)

f.to_kubernetes_yaml('./k8s_flow', k8s_namespace='custom-namespace')

```

````

````{tab} with flow YAML
In a `flow.yml` file :

```yaml

jtype: Flow
version: '1'
with:
  protocol: http
executors:

* name: indexer

  uses: jinaai+docker://<user-id>/IndexerPrivate
  env:
    k1: v1
    k2: v2
  env_from_secret:
    SECRET_USERNAME:
      name: mysecret
      key: username
    SECRET_PASSWORD:
      name: mysecret
      key: password

```

You can generate Kubernetes YAML configs using `jina export`:

```shell

jina export kubernetes flow.yml ./k8s_flow --k8s-namespace custom-namespace

```

````

After creating the namespace, you need to create the secrets mentioned above:

```shell
kubectl -n custom-namespace create secret generic mysecret --from-literal=username=jina --from-literal=password=123456

```

Then you can apply your configuration.

(kubernetes-expose)=

## Exposing the service

The previous examples use port-forwarding to send documents to the services.
In real world applications,
you may want to expose your service to make it reachable by users so that you can serve search requests.

```{caution}
Exposing the Deployment or Flow only works if the environment of your `Kubernetes cluster` supports `External Loadbalancers`.

```

Once the service is deployed, you can expose a service. In this case we give an example of exposing the encoder when using a Deployment,
but you can expose the gateway service when using a Flow:

```bash
kubectl expose deployment executor --name=executor-exposed --type LoadBalancer --port 80 --target-port 8080 -n custom-namespace
sleep 60 # wait until the external ip is configured

```

Export the external IP address. This is needed for the client when sending Documents to the Flow in the next section.

```bash
export EXTERNAL_IP=`kubectl get service executor-expose -n custom-namespace -o=jsonpath='{.status.loadBalancer.ingress[0].ip}'`

```

### Client

The client:

* Sends Documents to the exposed service on `$EXTERNAL_IP`
* Gets the responses.

You should configure your Client to connect to the service via the external IP address as follows:

```python
import os
from typing import List, Optional
from docarray import DocList, BaseDoc
from docarray.typing import NdArray

from jina.clients import Client

class MyDoc(BaseDoc):
    text: str
    embedding: Optional[NdArray] = None

class MyDocWithMatches(MyDoc):
    matches: DocList[MyDoc] = []
    scores: List[float] = []

host = os.environ['EXTERNAL_IP']
port = 80

client = Client(host=host, port=port)

client.show_progress = True
docs = DocList[MyDoc]([MyDoc(text=f'This is document indexed number {i}') for i in range(100)])
queried_docs = client.post("/search", inputs=docs, return_type=DocList[MyDocWithMatches])

matches = queried_docs[0].matches
print(f"Matched documents: {len(matches)}")

```

## Update your Executor in Kubernetes

In Kubernetes, you can update your Executors by patching the Deployment corresponding to your Executor.

For instance, in the example above, you can change the CLIPEncoderPrivate's `pretrained_model_name_or_path` parameter by changing the content of the Deployment inside the `executor.yml` dumped by `.to_kubernetes_yaml`.

You need to add `--uses_with` and pass the batch size argument to it. This is passed to the container inside the Deployment:

```yaml
    spec:
      containers:

  * args:
    * executor
    * --name
    * encoder
    * --k8s-namespace
    * custom-namespace
    * --uses
    * config.yml
    * --port
    * '8080'
    * --uses-metas
    * '{}'
    * --uses-with
    * '{"pretrained_model_name_or_path": "other_model"}'
    * --native

        command:

    * jina

```

After doing so, re-apply your configuration so the new Executor will be deployed without affecting the other unchanged Deployments:

```shell script
kubectl apply -R -f ./k8s_deployment

```

````{admonition} Other patching options
:class: seealso

In Kubernetes Executors are ordinary Kubernetes Deployments, so you can use other patching options provided by Kubernetes:

* `kubectl replace` to replace an Executor using a complete configuration file
* `kubectl patch` to patch an Executor using only a partial configuration file
* `kubectl edit` to edit an Executor configuration on the fly in your editor

You can find more information about these commands in the [official Kubernetes documentation](https://kubernetes.io/docs/concepts/cluster-administration/manage-deployment/).

````

## Key takeaways

In short, there are just three key steps to deploy Jina on Kubernetes:

1. Use `.to_kubernetes_yaml()` to generate Kubernetes configuration files from a Jina Deployment or Flow object.
2. Apply the generated file via `kubectl`(Modify the generated files if necessary)
3. Expose your service outside the K8s cluster

## See also

* {ref}`Kubernetes support documentation <kubernetes-docs>`
* {ref}`Monitor service once it is deployed <monitoring>`
* {ref}`See how failures and retries are handled <flow-error-handling>`
* {ref}`Learn more about scaling Executors <scale-out>`

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/cloud-nativeness/monitoring.md

(monitoring)=

# Prometheus/Grafana Support (Legacy)

```{admonition} Deprecated
:class: caution
The Prometheus-only based feature will soon be deprecated in favor of the OpenTelemetry Setup. Refer to {ref}`OpenTelemetry Setup <opentelemetry>` for the details on OpenTelemetry setup for Jina-serve.

Refer to the {ref}`OpenTelemetry migration guide <opentelemetry-migration>` for updating your existing Prometheus and Grafana configurations.

```

We recommend the Prometheus/Grafana stack to leverage the metrics exposed by Jina-serve. In this setup, Jina-serve exposes different metrics, and Prometheus scrapes these endpoints, as well as
collecting, aggregating, and storing the metrics.

External entities (like Grafana) can access these aggregated metrics via the query language [PromQL](https://prometheus.io/docs/prometheus/latest/querying/basics/) and let users visualize the metrics with dashboards.

```{hint}
Jina supports exposing metrics, but you are in charge of installing and managing your Prometheus/Grafana instances.

```

In this guide, we deploy the Prometheus/Grafana stack and use it to monitor a Flow.

(deploy-flow-monitoring)=

## Deploying the Flow and the monitoring stack

### Deploying on Kubernetes

One challenge of monitoring a {class}`~jina.Flow` is communicating its different metrics endpoints to Prometheus.
Fortunately, the [Prometheus operator for Kubernetes](https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/user-guides/getting-started.md) makes this fairly easy because it can automatically discover new metrics endpoints to scrape.

We recommend deploying your Jina-serve Flow on Kubernetes to leverage the full potential of the monitoring feature because:

* The Prometheus operator can automatically discover new endpoints to scrape.
* You can extend monitoring with the rich built-in Kubernetes metrics.

You can deploy Prometheus and Grafana on your Kubernetes cluster by running:

```bash
helm install prometheus prometheus-community/kube-prometheus-stack --set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false

```

```{hint}
setting the `serviceMonitorSelectorNilUsesHelmValues` to false allows the Prometheus Operator to discover metrics endpoint outside of the helm scope which is needed to discover the Flow metrics endpoints.

```

Deploy the Flow that we want to monitor:

For this example we recommend reading {ref}`how to build and containerize the Executors to be run in Kubernetes. <build-containerize-for-k8s>`

````{tab} via YAML
This example shows how to start a Flow with monitoring enabled via YAML:

In a `flow.yaml` file

```yaml

jtype: Flow
with:
  monitoring: true
executors:

* uses: jinaai+docker://<user-id>/EncoderPrivate

```

```bash

jina export kubernetes flow.yml ./config ```

````

````{tab} via Python API

```python
from jina import Flow

f = Flow(monitoring=True).add(uses='jinaai+docker://<user-id>/EncoderPrivate')
f.to_kubernetes_yaml('config')

```

````

This creates a `config` folder containing the Kubernetes YAML definition of the Flow.

```{seealso}

You can see in-depth how to deploy a Flow on Kubernetes {ref}`here <kubernetes>`

```

Then deploy the Flow:

```bash

kubectl apply -R -f config

```

Wait for a couple of minutes, and you should see that the Pods are ready:

```bash

kubectl get pods

```

```{figure} ../../.github/2.0/kubectl_pods.png

:align: center

```

Then you can see that the new metrics endpoints are automatically discovered:

```bash

kubectl port-forward svc/prometheus-operated 9090:9090

```

```{figure} ../../.github/2.0/prometheus_target.png

:align: center

```text

```bash

kubectl port-forward svc/gateway 8080:8080

```

To access Grafana, run:

```bash

kb port-forward svc/prometheus-grafana 3000:80

```

Then open `http://localhost:3000` in your browser. The username is `admin` and password is `prom-operator`.

You should see the Grafana home page.

### Deploying locally

Deploy the Flow that we want to monitor:

````{tab} via Python code

```python
from jina import Flow

with Flow(monitoring=True, port_monitoring=8000, port=8080).add(
    uses='jinaai+docker://<user-id>/EncoderPrivate', port_monitoring=9000
) as f:
    f.block()

```

````

````{tab} via docker-compose

```python
from jina import Flow

Flow(monitoring=True, port_monitoring=8000, port=8080).add(
    uses='jinaai+docker://<user-id>/EncoderPrivate', port_monitoring=9000
).to_docker_compose_yaml('config.yaml')

```

```bash
docker-compose -f config.yaml up

```

````

To monitor a Flow locally you need to install Prometheus and Grafana locally. The easiest way to do this is with
Docker Compose.

First clone the repo which contains the config file:

```bash

git clone https://github.com/jina-ai/example-grafana-prometheus
cd example-grafana-prometheus/prometheus-grafana-local

```

then

```bash

docker-compose up

```

Access the Grafana dashboard at `http://localhost:3000`. The username is `admin` and the password is `foobar`.

```{caution}

This example works locally because Prometheus is configured to listen to ports 8000 and 9000. However,
in contrast to deploying on Kubernetes, you need to tell Prometheus which port to look at. You can change these
ports by modifying [prometheus.yml](https://github.com/jina-ai/example-grafana-prometheus/blob/8baf519f7258da68cfe224775fc90537a749c305/prometheus-grafana-local/prometheus/prometheus.yml#L64).

```

### Deploying on Jcloud

If your Flow is deployed on JCloud, you don't need to provision a monitoring stack yourself. Prometheus and Grafana are
handled by JCloud and you can find a dashboard URL with `jc status <flow_id>`

## Using Grafana to visualize metrics

Access the Grafana homepage, then go to `Browse` then `import` and copy and paste the [JSON file](https://github.com/jina-ai/example-grafana-prometheus/blob/main/grafana-dashboards/flow.json)

You should see the following dashboard:

```{figure} ../../.github/2.0/grafana.png

:align: center

```

````{admonition} Hint

:class: hint

You should query your Flow to generate the first metrics. Otherwise the dashboard looks empty.

````

You can query the Flow by running:

```python

from typing import Optional
from docarray import DocList, BaseDoc
from docarray.typing import NdArray
from jina import Client

class MyDoc(BaseDoc):
    text: str
    embedding: Optional[NdArray] = None
client = Client(port=51000)

client.post(on='/', inputs=DocList[MyDoc]([MyDoc(text=f'Text for document {i}') for in range(100)]), return_type=DocList[MyDoc], request_size=10,)

```

## See also

* [Using Grafana to visualize Prometheus metrics](https://grafana.com/docs/grafana/latest/getting-started/getting-started-prometheus/)
* {ref}`Defining custom metrics in an Executor <monitoring>`

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/cloud-nativeness/opentelemetry-migration.md

(opentelemetry-migration)=

# Migrate from Prometheus/Grafana to OpenTelemetry

The {ref}`Prometheus/Grafana <monitoring>` based monitoring setup will soon be deprecated in favor of the {ref}`OpenTelemetry setup <opentelemetry>`. This section provides the details required to update/migrate your Prometheus configuration and Grafana dashboard to continue monitoring with OpenTelemetry. Refer to {ref}`Opentelemetry setup <opentelemetry>` for the new setup before proceeding further.

```{hint}
:class: seealso
Refer to {ref}`Prometheus/Grafana-only <monitoring>` section for the soon to be deprecated setup.

```

## Update Prometheus configuration

With a Prometheus-only setup, you need to set up a `scrape_configs` configuration or service discovery plugin to specify the targets for pulling metrics data. In the OpenTelemetry setup, each Pod pushes metrics to the OpenTelemetry Collector. The Prometheus configuration now only needs to scrape from the OpenTelemetry Collector to get all the data from OpenTelemetry-instrumented applications.

The new Prometheus configuration for the `otel-collector` Collector hostname is:

```yaml
scrape_configs:

* job_name: 'otel-collector'

    scrape_interval: 500ms
    static_configs:

  * targets: ['otel-collector:8888'] # metrics from the collector itself
  * targets: ['otel-collector:8889'] # metrics collected from other applications

```

## Update Grafana dashboard

The OpenTelemetry [Histogram](https://opentelemetry.io/docs/reference/specification/metrics/api/#histogram) provides quantile window buckets automatically (unlike the Prometheus [Summary](https://prometheus.io/docs/concepts/metric_types/#summary) instrument). You need to manually configure the required quantile window. The quatile window metric will then be available as a separate time series metric.

In addition, the OpenTelemetry `Counter/UpDownCounter` instruments do not add the `_total` suffix to the base metric name.

To adapt Prometheus queries in Grafana:

* Use the [histogram_quantile](https://prometheus.io/docs/prometheus/latest/querying/functions/#histogram_quantile) function to query the average or desired quantile window time series data from Prometheus. For example, to view the 0.99 quantile of the `jina_receiving_request_seconds` metric over the last 10 minutes, use query `histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[10m]))`.
* Remove the `_total` prefix from the Counter/UpDownCounter metric names.

You can download a [sample Grafana dashboard JSON file](https://github.com/jina-ai/example-grafana-prometheus/blob/main/grafana-dashboards/flow-histogram-metrics.json) and import it into Grafana to get started with some pre-built graphs.

```{hint}
A list of available metrics is in the {ref}`Flow Instrumentation <instrumenting-flow>` section.

```

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/cloud-nativeness/opentelemetry.md

(opentelemetry)=

# {octicon}`telescope-fill` OpenTelemetry Support

```{toctree}
:hidden:

opentelemetry-migration
monitoring

```

```{hint}
Prometheus-only based metrics collection will soon be deprecated. Refer to {ref}`Monitor with Prometheus and Grafana <monitoring>` for the old setup.

```

There are two major setups required to visualize/monitor your application's signals using [OpenTelemetry](https://opentelemetry.io). The first setup is covered by Jina-serve which integrates the [OpenTelemetry API and SDK](https://opentelemetry-python.readthedocs.io/en/stable/api/index.html) at the application level. The {ref}`Flow Instrumentation <instrumenting-flow>` page covers in detail the steps required to enable OpenTelemetry in a Flow. A {class}`~jina.Client` can also be instrumented which is documented in the {ref}`Client Instrumentation <instrumenting-client>` section.

This section covers the OpenTelemetry infrastructure setup required to collect, store and visualize the traces and metrics data exported by the Pods. This setup is the user's responsibility, and this section only serves as the initial/introductory guide to running OpenTelemetry infrastructure components.

Since OpenTelemetry is open source and is mostly responsible for the API standards and specification, various providers implement the specification. This section follows the default recommendations from the OpenTelemetry documentation that also fits into the Jina-serve implementations.

## Exporting traces and metrics data

Pods created using a {class}`~jina.Flow` with tracing or metrics enabled use the [SDK Exporters](https://opentelemetry.io/docs/instrumentation/python/exporters/) to send the data to a central [Collector](https://opentelemetry.io/docs/collector/) component. You can use this collector to further process and store the data for visualization and alerting.

The push/export-based mechanism also allows the application to start pushing data immediately on startup. This differs from the pull-based mechanism where you need a separate scraping registry to discovery service to identify data scraping targets.

You can configure the exporter backend host and port using the `traces_exporter_host`, `traces_exporter_port`, `metrics_exporter_host` and `metrics_exporter_port`. Even though the Collector is metric data-type agnostic (it accepts any type of OpenTelemetry API data model), we provide separate configuration for Tracing and Metrics to give you more flexibility in choosing infrastructure components.

Jina-serve's default exporter implementation is  `OTLPSpanExporter` and `OTLPMetricExporter`. The exporters also use the gRPC data transfer protocol. The following environment variables can be used to further configure the exporter client based on your requirements. The full list of exporter related environment variables are documented by the [PythonSDK library](https://opentelemetry-python.readthedocs.io/en/latest/exporter/otlp/otlp.html). Apart from `OTEL_EXPORTER_OTLP_PROTOCOL` and `OTEL_EXPORTER_OTLP_TRACES_ENDPOINT`, you can use all other library version specific environment variables to configure the exporter clients.

## Collector

The [Collector](https://opentelemetry.io/docs/collector/) is a huge ecosystem of components that support features like scraping, collecting, processing and further exporting data to storage backends. The collector itself can also expose endpoints to allow scraping data. We recommend reading the official documentation to understand the the full set of features and configuration required to run a Collector. Read the below section to understand the minimum number of components and the respective configuration required for operating with Jina-serve.

We recommend using the [OpenTelemetry Collector](https://opentelemetry.io/docs/collector/) from the contrib repository. We also use:

* [Jaeger](https://www.jaegertracing.io) for collecting traces, visualizing tracing data and alerting based on tracing data.
* [Prometheus](https://prometheus.io) for collecting metric data and/or alerting.
* [Grafana](https://grafana.com) for visualizing data from Prometheus/Jaeger and/or alerting based on the data queried.

```{hint}
Jaeger provides a comprehensive out of the box tools for end-to-end tracing monitoring, visualization and alerting. You can substitute other tools to achieve the necessary goals of observability and performance analysis. The same can be said for Prometheus and Grafana.

```

### Docker Compose

A minimal `docker-compose.yml` file can look like:

```yaml
version: "3"
services:
  # Jaeger
  jaeger:
    image: jaegertracing/all-in-one:latest
    ports:

  * "16686:16686"

  otel-collector:
    image: otel/opentelemetry-collector:0.61.0
    command: [ "--config=/etc/otel-collector-config.yml" ]
    volumes:

  * ${PWD}/otel-collector-config.yml:/etc/otel-collector-config.yml

    ports:

  * "8888" # Prometheus metrics exposed by the collector
  * "8889" # Prometheus exporter metrics
  * "4317:4317" # OTLP gRPC receiver

    depends_on:

  * jaeger

  prometheus:
    container_name: prometheus
    image: prom/prometheus:latest
    volumes:

  * ${PWD}/prometheus-config.yml:/etc/prometheus/prometheus.yml

    ports:

  * "9090:9090"

  grafana:
    container_name: grafana
    image: grafana/grafana-oss:latest
    ports:

  * 3000:3000

```

The corresponding OpenTelemetry Collector configuration below needs to be stored in file `otel-collector-config.yml`:

```yaml
receivers:
  otlp:
    protocols:
      grpc:

exporters:
  jaeger:
    endpoint: jaeger:14250
    tls:
      insecure: true

  prometheus:
    endpoint: "0.0.0.0:8889"
    resource_to_telemetry_conversion:
      enabled: true
    # can be used to add additional labels
    const_labels:
      label1: value1

processors:
  batch:

service:
  extensions: []
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [jaeger]
      processors: [batch]
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [prometheus]

```

This setup creates a gRPC Collector Receiver on port 4317 that collects data pushed by the Flow Pods. Collector exporters for Jaeger and Prometheus backends are configured to export tracing and metrics data respectively. The final **service** section creates a collector pipeline combining the receiver (collect data) and exporter (to backend), process (batching) sub-components.

The minimal Prometheus configuration needs to be stored in `prometheus-config.yml`.

```yaml
scrape_configs:

* job_name: 'otel-collector'

    scrape_interval: 500ms
    static_configs:

  * targets: ['otel-collector:8889']
  * targets: ['otel-collector:8888']

```

The Prometheus configuration now only needs to scrape from the OpenTelemetry Collector to get all the data from OpenTelemetry Metrics instrumented applications.

### Running a Flow locally

Run the Flow and a sample request that we want to instrument locally. If the backends are running successfully the Flow has exported data to the Collector which can be queried and viewed.

First start a Flow:

```python
from jina import Flow, Executor, requests
from docarray import DocList, BaseDoc
import time

class MyExecutor(Executor):

    @requests
    def foo(self, docs: DocList[BaseDoc], **kwargs) -> DocList[BaseDoc]:
        time.sleep(0.5)
        return docs

with Flow(
    port=54321,
    tracing=True,
    traces_exporter_host='http://localhost',
    traces_exporter_port=4317,
    metrics=True,
    metrics_exporter_host='http://localhost',
    metrics_exporter_port=4317,
).add(uses=MyExecutor) as f:
    f.block()

```

Second execute requests using the instrumented {class}`jina.Client`:

```python
from jina import Client
from docarray import DocList, BaseDoc

client = Client(
    host='grpc://localhost:54321',
    tracing=True,
    traces_exporter_host='http://localhost',
    traces_exporter_port=4317,
)
client.post('/', DocList[BaseDoc]([BaseDoc()]), return_type=DocList[BaseDoc])
client.teardown_instrumentation()

```

```{hint}
The {class}`jina.Client` currently only supports OpenTelemetry Tracing.

```

## Viewing Traces in Jaeger UI

You can open the Jaeger UI [here](http://localhost:16686). You can find more information on the Jaeger UI in the official [docs](https://www.jaegertracing.io/docs/1.38/external-guides/#using-jaeger).

```{hint}
The list of available traces are documented in the {ref}`Flow Instrumentation <instrumenting-flow>` section.

```

## Monitor with Prometheus and Grafana

External entities (like Grafana) can access these aggregated metrics via the [PromQL](https://prometheus.io/docs/prometheus/latest/querying/basics/) query language, and let users visualize metrics with dashboards. Check out a [comprehensive tutorial](https://prometheus.io/docs/visualization/grafana/) for more information.

Download a [sample Grafana dashboard JSON file](https://github.com/jina-ai/example-grafana-prometheus/blob/main/grafana-dashboards/flow-histogram-metrics.json) and import it into Grafana to get started with some pre-built graphs:

```{figure} ../../.github/2.0/grafana-histogram-metrics.png
:align: center

```

```{hint}
:class: seealso
A list of available metrics is in the {ref}`Flow Instrumentation <instrumenting-flow>` section.
To update your existing Prometheus and Grafana configurations, refer to the {ref}`OpenTelemetry migration guide <opentelemetry-migration>`.

```

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/client/callbacks.md

(callback-functions)=

# Callbacks

After performing {meth}`~jina.clients.mixin.PostMixin.post`, you may want to further process the obtained results.

For this purpose, Jina-serve implements a promise-like interface, letting you specify three kinds of callback functions:

* `on_done` is executed while streaming, after successful completion of each request
* `on_error` is executed while streaming, whenever an error occurs in each request
* `on_always` is always performed while streaming, no matter the success or failure of each request

Note that these callbacks only work for requests (and failures) *inside the stream*, for example inside an Executor.
If the failure is due to an error happening outside of
streaming, then these callbacks will not be triggered.
For example, a `SIGKILL` from the client OS during the handling of the request, or a networking issue,
will not trigger the callback.

Callback functions in Jina-serve expect a `Response` of the type {class}`~jina.types.request.data.DataRequest`, which contains resulting Documents,
parameters, and other information.

## Handle DataRequest in callbacks

`DataRequest`s are objects that are sent by Jina-serve internally. Callback functions process DataRequests, and `client.post()`
can return DataRequests.

`DataRequest` objects can be seen as a container for data relevant for a given request, it contains the following fields:

````{tab} header

The request header.

```python

from pprint import pprint

from jina import Client

Client().post(on='/', on_done=lambda x: pprint(x.header))

```

```console

request_id: "ea504823e9de415d890a85d1d00ccbe9"
exec_endpoint: "/"
target_executor: ""

```

````

````{tab} parameters

The input parameters of the associated request. In particular, `DataRequest.parameters['__results__']` is a
reserved field that gets populated by Executors returning a Python `dict`.
Information in those returned `dict`s gets collected here, behind each Executor ID.

```python

from pprint import pprint

from jina import Client

Client().post(on='/', on_done=lambda x: pprint(x.parameters))

```

```console

{'__results__': {}}

```

````

````{tab} routes

The routing information of the data request. It contains the which Executors have been called, and the order in which they were called.
The timing and latency of each Executor is also recorded.

```python

from pprint import pprint

from jina import Client

Client().post(on='/', on_done=lambda x: pprint(x.routes))

```

```console

[executor: "gateway"
start_time {
  seconds: 1662637747
  nanos: 790248000
}
end_time {
  seconds: 1662637747
  nanos: 794104000
}
, executor: "executor0"
start_time {
  seconds: 1662637747
  nanos: 790466000
}
end_time {
  seconds: 1662637747
  nanos: 793982000
}
]

```

````

````{tab} docs
The DocList being passed between and returned by the Executors. These are the Documents usually processed in a callback function, and are often the main payload.

```python

from pprint import pprint

from jina import Client

Client().post(on='/', on_done=lambda x: pprint(x.docs))

```

```console

<DocList (length=0)>

```

````

Accordingly, a callback that processing documents can be defined as:

```{code-block} python
---
emphasize-lines: 4
---
from jina.types.request.data import DataRequest

def my_callback(resp: DataRequest):
    foo(resp.docs)

```

## Handle exceptions in callbacks

Server error can be caught by Client's `on_error` callback function. You can get the error message and traceback from `header.status`:

```python
from pprint import pprint

from jina import Flow, Client, Executor, requests

class MyExec1(Executor):
    @requests
    def foo(self, **kwargs):
        raise NotImplementedError

with Flow(port=12345).add(uses=MyExec1) as f:
    c = Client(port=f.port)
    c.post(on='/', on_error=lambda x: pprint(x.header.status))

```

```text
code: ERROR
description: "NotImplementedError()"
exception {
  name: "NotImplementedError"
  stacks: "Traceback (most recent call last):\n"
  stacks: "  File \"/Users/hanxiao/Documents/jina/jina/serve/runtimes/worker/__init__.py\", line 181, in process_data\n    result = await self._data_request_handler.handle(requests=requests)\n"
  stacks: "  File \"/Users/hanxiao/Documents/jina/jina/serve/runtimes/request_handlers/data_request_handler.py\", line 152, in handle\n    return_data = await self._executor.__acall__(\n"
  stacks: "  File \"/Users/hanxiao/Documents/jina/jina/serve/executors/__init__.py\", line 301, in __acall__\n    return await self.__acall_endpoint__(__default_endpoint__, **kwargs)\n"
  stacks: "  File \"/Users/hanxiao/Documents/jina/jina/serve/executors/__init__.py\", line 322, in __acall_endpoint__\n    return func(self, **kwargs)\n"
  stacks: "  File \"/Users/hanxiao/Documents/jina/jina/serve/executors/decorators.py\", line 213, in arg_wrapper\n    return fn(executor_instance, *args, **kwargs)\n"
  stacks: "  File \"/Users/hanxiao/Documents/jina/toy44.py\", line 10, in foo\n    raise NotImplementedError\n"
  stacks: "NotImplementedError\n"
  executor: "MyExec1"
}

```

In the example below, our Flow passes the message then prints the result when successful.
If something goes wrong, it beeps. Finally, the result is written to output.txt.

```python
from jina import Flow, Client
from docarray import BaseDoc

def beep(*args):
    # make a beep sound
    import sys

    sys.stdout.write('\a')

with Flow().add() as f, open('output.txt', 'w') as fp:
    client = Client(port=f.port)
    client.post(
        '/',
        BaseDoc(),
        on_done=print,
        on_error=beep,
        on_always=lambda x: x.docs.save(fp),
    )

```

````{admonition} What errors can be handled by the callback?
:class: caution
Callbacks can handle errors that are caused by Executors raising an Exception.

A callback will not receive exceptions:

* from the Gateway having connectivity errors with the Executors.
* between the Client and the Gateway.

````

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/client/index.md

(client)=

# {fas}`laptop-code` Client

{class}`~jina.Client` enables you to send Documents to a running {class}`~jina.Flow`. Same as Gateway, Client supports four networking protocols: **gRPC**, **HTTP**, **WebSocket** and **GraphQL** with/without TLS.

You may have observed two styles of using a Client in the docs:

````{tab} Implicit, inside a Flow

```{code-block} python

---
emphasize-lines: 6
---
from jina import Flow

f = Flow()

with f:
    f.post('/')

```

````

````{tab} Explicit, outside a Flow

```{code-block} python

---
emphasize-lines: 3,4
---
from jina import Client

c = Client(...)  # must match the Flow setup
c.post('/')

```

````

The implicit style is easier in debugging and local development, as you don't need to specify the host, port and protocol of the Flow. However, it makes very strong assumptions on (1) one Flow only corresponds to one client (2) the Flow is running on the same machine as the Client. For those reasons, explicit style is recommended for production use.

```{hint}
If you want to connect to your Flow from a programming language other than Python, please follow the third party
client {ref}`documentation <third-party-client>`.

```

## Connect

To connect to a Flow started by:

```python
from jina import Flow

with Flow(port=1234, protocol='grpc') as f:
    f.block()

```

```text
────────────────────────── 🎉 Flow is ready to serve! ──────────────────────────
╭────────────── 🔗 Endpoint ───────────────╮
│  ⛓      Protocol                   GRPC  │
│  🏠        Local           0.0.0.0:1234  │
│  🔒      Private     192.168.1.126:1234  │
│  🌍       Public    87.191.159.105:1234  │
╰──────────────────────────────────────────╯

```

The Client has to specify the followings parameters to match the Flow and how it was set up:

* the `protocol` it needs to use to communicate with the Flow
* the `host` and the `port` as exposed by the Flow
* if it needs to use `TLS` encryption (to connect to a {class}`~jina.Flow` that has been {ref}`configured to use TLS <flow-tls>` in combination with gRPC, http, or websocket)

````{Hint} Default port
The default port for the Client is `80` unless you are using `TLS` encryption it will be `443`

````

You can define these parameters by passing a valid URI scheme as part of the `host` argument:

````{tab} TLS disabled

```python

from jina import Client

Client(host='http://my.awesome.flow:1234')
Client(host='ws://my.awesome.flow:1234')
Client(host='grpc://my.awesome.flow:1234')

```

````

````{tab} TLS enabled

```python

from jina import Client

Client(host='https://my.awesome.flow:1234')
Client(host='wss://my.awesome.flow:1234')
Client(host='grpcs://my.awesome.flow:1234')

```

````

Equivalently, you can pass each relevant parameter as a keyword argument:

````{tab} TLS disabled

```python

from jina import Client

Client(host='my.awesome.flow', port=1234, protocol='http')
Client(host='my.awesome.flow', port=1234, protocol='websocket')
Client(host='my.awesome.flow', port=1234, protocol='grpc')

```

````

````{tab} TLS enabled

```python

from jina import Client

Client(host='my.awesome.flow', port=1234, protocol='http', tls=True)
Client(host='my.awesome.flow', port=1234, protocol='websocket', tls=True)
Client(host='my.awesome.flow', port=1234, protocol='grpc', tls=True)

```

````

You can also use a mix of both:

```python
from jina import Client

Client(host='https://my.awesome.flow', port=1234)
Client(host='my.awesome.flow:1234', protocol='http', tls=True)

```

````{admonition} Caution
:class: caution
You can't define these parameters both by keyword argument and by host scheme - you can't have two sources of truth.
Example: the following code will raise an exception:

```python

from jina import Client

Client(host='https://my.awesome.flow:1234', port=4321)

```

````

````{admonition} Caution
:class: caution

We apply `RLock` to avoid [this gRPC issue](https://github.com/grpc/grpc/issues/25364), so that `grpc` clients can be used in a multi-threaded environment.

What you should do, is to rely on asynchronous programming or multi-processing rather than multi-threading.
For instance, if you're building a web server, you can introduce multi-processing based parallelism to your app using
`gunicorn`: `gunicorn main:app --workers 4 --worker-class uvicorn.workers.UvicornWorker ...`

````

## Client API

When using `docarray>=0,30`, you specify the schema that you expect the Deployment or Flow to return. You can pass the return type by using the `return_type` parameter in the `client.post` method:

```{code-block} python
---
emphasize-lines: 7
---
from jina import Client
from docarray import DocList, BaseDoc

class InputDoc(BaseDoc):
    text: str = ''

class OutputDoc(BaseDoc):
    tags: Dict[str, int] = {}

c = Client(host='https://my.awesome.flow:1234', port=4321)
c.post(
    on='/',
    inputs=InputDoc(),
    return_type=DocList[OutputDoc],
)

```

(client-compress)=

## Enable compression

If the communication to the Gateway is via gRPC, you can pass `compression` parameter to  {meth}`~jina.clients.mixin.PostMixin.post` to benefit from [gRPC compression](https://grpc.github.io/grpc/python/grpc.html#compression) methods.

The supported choices are: None, `gzip` and `deflate`.

```python
from jina import Client

client = Client()
client.post(..., compression='Gzip')

```

Note that this setting is only effective the communication between the client and the Flow's gateway.

One can also specify the compression of the internal communication {ref}`as described here<server-compress>`.

## Test readiness of the server

```{include} ../orchestration/readiness.md
:start-after: <!-- start ready-from-client -->
:end-before: <!-- end ready-from-client -->

```

## Simple profiling of the latency

Before sending any real data, you can test the connectivity and network latency by calling the {meth}`~jina.clients.mixin.ProfileMixin.profiling` method:

```python
from jina import Client

c = Client(host='grpc://my.awesome.flow:1234')
c.profiling()

```

```text
 Roundtrip  24ms  100%
├──  Client-server network  17ms  71%
└──  Server  7ms  29%
    ├──  Gateway-executors network  0ms  0%
    ├──  executor0  5ms  71%
    └──  executor1  2ms  29%

```

## Logging configuration

Similar to the {ref}`Flow logging configuration <logging-configuration>`, the {class}`jina.Client` also accepts the `log_config` argument. The Client can be configured as below:

```python
from jina import Client

client = Client(log_config='./logging.json.yml')

```

If the Flow is configured with custom logging, the argument will be forwarded to the implicit client.

```python
from jina import Flow

f = Flow(log_config='./logging.json.yml')

with f:
    # the implicit client automatically uses the log_config from the Flow for consistency
    f.post('/')

```

```{toctree}
:hidden:

send-receive-data
send-parameters
send-graphql-mutation
transient-errors
callbacks
rate-limit
instrumentation
third-party-clients

```

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/client/instrumentation.md

(instrumenting-client)=

## Instrumentation

The {class}`~jina.Client` supports request tracing, giving you an end-to-end view of a request's lifecycle. The client supports **gRPC**, **HTTP** and **WebSocket** protocols.

````{tab} Implicit, inside a Flow

```{code-block} python

---
emphasize-lines: 4, 5, 6
---
from jina import Flow

f = Flow(
        tracing=True,
        traces_exporter_host='http://localhost',
        traces_exporter_port=4317,
    )

with f:
    f.post('/')

```

````

````{tab} Explicit, outside a Flow

```{code-block} python

---
emphasize-lines: 5, 6, 7
---
from jina import Client

# must match the Flow setup

c = Client(
    tracing=True,
    traces_exporter_host='http://localhost',
    traces_exporter_port=4317,
)
c.post('/')

```

````

Each protocol client creates the first trace ID which will be propagated to the `Gateway`. The `Gateway` then creates child spans using the available trace ID which is further propagated to each Executor request. Using the trace ID, all associated spans can be collected to build a trace view of the whole request lifecycle.

```{admonition} Using custom/external tracing context
:class: caution
The {class}`~jina.Client` doesn't currently support external tracing context which can potentially be extracted from an upstream request.

```

You can find more about instrumentation from the resources below:

* [Tracing in OpenTelemetry](https://opentelemetry.io/docs/concepts/signals/traces/)
* {ref}`Instrumenting a Flow <instrumenting-flow>`
* {ref}`Deploying and using OpenTelemetry in Jina-serve <opentelemetry>`

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/client/rate-limit.md

(client-post-prefetch)=

# Rate Limit

There are two ways of applying a rate limit using the {class}`~jina.Client`.
1. Set using the `Client` class constructor and defaults to 1,000 requests.
1. Set the argument when using {meth}`~jina.clients.mixin.PostMixin.post` method. If not provided, the default value of
1,000 requests will be used. The method argument will override the argument provided in the `Client` class constructor.

The `prefetch` argument controls the number of in flight requests made by the {meth}`~jina.clients.mixin.PostMixin.post`
method. Using the default value might overload the {class}`~jina.Gateway` or {class}`~jina.Executor` especially if the operation characteristics of the `Deployment` or `Flow`
are unknown. Furthermore the Client can send various types of requests which can have varying resource usage.

For example, a high number of `index` requests can contain a large data payload requiring high input/output operation.
This increases CPU consumption and eventually lead to a build up of the requests on the Flow. If the queue of in-flight requests
is already large, a very light weight `search` request to return the total number of
Documents in the index might be blocked until the queue of `index` requests can be completely processed. To prevent such a scenario,
apply the `prefetch` value on the {meth}`~jina.clients.mixin.PostMixin.post` method to limit the rate of
requests for expensive operations.

Apply the `prefetch` argument on the {meth}`~jina.clients.mixin.PostMixin.post` method to dynamically increase
the server responsiveness for customer-facing requests which require faster response times vs. background requests such as cronjobs or
analytics requests which can be processed slowly.

```python
from jina import Client

client = Client()

# uses the default limit of 1,000 requests

search_responses = client.post(...)

# sets a hard limit of 5 in flight requests

index_responses = client.post(..., prefetch=5)

```

A global rate limit on the {class}`~jina.Gateway` can also be set using the {ref}`prefetch <prefetch>` option in the `Flow`.
This argument however serves as a global rate limit and cannot be customized based on the request workload. The `prefetch`
argument for the `Client` serves as a class level rate limit for all requests made from the client. The `prefetch`
argument for the {meth}`~jina.clients.mixin.PostMixin.post` method serves as a method level overriding the arguments at the
`Client` and the `Flow`.

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/client/send-graphql-mutation.md

# Send GraphQL Mutation

If the Flow is configured with GraphQL endpoint, then you can use Jina-serve {class}`~jina.Client` {meth}`~jina.clients.mixin.MutateMixin.mutate` to fetch data via GraphQL mutations:

````{admonition} Only available for docarray<0.30
:class: note

This feature is only available when using `docarray<0.30`.

````

```python
from jina import Client

PORT = ...
c = Client(port=PORT)
mut = '''
        mutation {
            docs(data: {text: "abcd"}) {
                id
                matches {
                    embedding
                }
            }
        }
    '''
response = c.mutate(mutation=mut)

```

Note that `response` here is `Dict` not a `DocumentArray`. This is because GraphQL allows the user to specify only certain fields that they want to have returned, so the output might not be a valid DocumentArray, it can be only a string.

## Mutations and arguments

The Flow GraphQL API exposes the mutation `docs`, which sends its inputs to the Flow's Executors,
just like HTTP `post` as described {ref}`above <http-interface>`.

A GraphQL mutation takes the same set of arguments used in {ref}`HTTP <http-arguments>`.

The response from GraphQL can include all fields available on a DocumentArray.

````{admonition} See Also
:class: seealso

For more details on the GraphQL format of Document and DocumentArray, see the [documentation page](https://docarray.jina.ai/advanced/graphql-support/)
or [developer reference](https://docarray.jina.ai/api/docarray.document.mixins.strawberry/).

````

## Fields

The available fields in the GraphQL API are defined by the [Document Strawberry type](https://docarray.jina.ai/advanced/graphql-support/?highlight=graphql).

Essentially, you can ask for any property of a Document, including `embedding`, `text`, `tensor`, `id`, `matches`, `tags`,
and more.

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/client/send-parameters.md

(client-executor-parameters)=

# Send Parameters

The {class}`~jina.Client` can send key-value pairs as parameters to {class}`~jina.Executor`s as shown below:

```{code-block} python
---
emphasize-lines: 15
---

from jina import Client, Executor, Deployment, requests
from docarray import BaseDoc

class MyExecutor(Executor):

    @requests
    def foo(self, parameters, **kwargs):
        print(parameters['hello'])

dep = Deployment(uses=MyExecutor)

with dep:
    client = Client(port=dep.port)
    client.post('/', BaseDoc(), parameters={'hello': 'world'})

```

````{hint}
:class: note
You can send a parameters-only data request via:

```python

with dep:
    client = Client(port=dep.port)
    client.post('/', parameters={'hello': 'world'})

```

This might be useful to control `Executor` objects during their lifetime.

````

Since Executors {ref}`can use Pydantic models to have strongly typed parameters <executor-api-parameters>`, you can also send parameters as Pydantic models in the client API

(specific-params)=

## Send parameters to specific Executors

You can send parameters to specific Executor by using the `executor__parameter` syntax.
The Executor named `executorname` will receive the parameter `paramname` (without the `executorname__` in the key name)
and none of the other Executors will receive it.

For instance in the following Flow:

```python
from jina import Flow, Client
from docarray import BaseDoc, DocList

with Flow().add(name='exec1').add(name='exec2') as f:

    client = Client(port=f.port)

    client.post(
        '/index',
        DocList[BaseDoc]([BaseDoc()]),
        parameters={'exec1__parameter_exec1': 'param_exec1', 'exec2__parameter_exec1': 'param_exec2'},
    )

```

The Executor `exec1` will receive `{'parameter_exec1':'param_exec1'}` as parameters, whereas `exec2` will receive `{'parameter_exec1':'param_exec2'}`.

This feature is intended for the case where there are multiple Executors that take the same parameter names, but you want to use different values for each Executor.
This is often the case for Executors from the Hub, since they tend to share a common interface for parameters.

```{admonition} Difference to target_executor

Why do we need this feature if we already have `target_executor`?

On the surface, both of them is about sending information to a partial Flow, i.e. a subset of Executors. However, they work differently under the hood. `target_executor` directly send info to those specified executors, ignoring the topology of the Flow; whereas `executor__parameter`'s request follows the topology of the Flow and only send parameters to the Executor that matches.

Think about roll call and passing notes in a classroom. `target_executor` is like calling a student directly, whereas `executor__parameter` is like asking him/her to pass the notes to the next student one by one while each picks out the note with its own name.

```

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/client/send-receive-data.md

# Send & Receive Data

After a {class}`~jina.Client` has connected to a {class}`~jina.Deployment` or a {class}`~jina.Flow`, it can send requests to the service using its
{meth}`~jina.clients.mixin.PostMixin.post` method.
This expects as inputs the {ref}`Executor endpoint <exec-endpoint>` that you want to target, as well as a Document or
Iterable of Documents:

````{tab} A single Document

```{code-block} python

---
emphasize-lines: 6
---
from docarray.documents import TextDoc

d1 = TextDoc(text='hello')
client = Client(...)

client.post('/endpoint', d1)

```

````

````{tab} A list of Documents

```{code-block} python

---
emphasize-lines: 7
---
from docarray.documents import TextDoc

d1 = TextDoc(text='hello')
d2 = TextDoc(text='world')
client = Client(...)

client.post('/endpoint', inputs=[d1, d2])

```

````

````{tab} A DocList

```{code-block} python

---
emphasize-lines: 6
---
from docarray import DocList

d1 = TextDoc(text='hello')
d2 = TextDoc(text='world')
da = DocList[TextDoc]([d1, d2])
client = Client(...)

client.post('/endpoint', da)

```

````

````{tab} A Generator of Document

```{code-block} python

---
emphasize-lines: 3-5, 9
---
from docarray.documents import TextDoc

def doc_gen():
    for j in range(10):
        yield TextDoc(content=f'hello {j}')

client = Client(...)

client.post('/endpoint', doc_gen)

```

````

````{tab} No Document

```{code-block} python

---
emphasize-lines: 3
---
client = Client(...)

client.post('/endpoint')

```

````

```{admonition} Caution
:class: caution
`Flow` and `Deployment` also provide a `.post()` method that follows the same interface as `client.post()`.
However, once your solution is deployed remotely, these objects are not present anymore.
Hence, `deployment.post()` and `flow.post()` are not recommended outside of testing or debugging use cases.

```

(request-size-client)=

## Send data in batches

Especially during indexing, a Client can send up to thousands or millions of Documents to a {class}`~jina.Flow`.
Those Documents are internally batched into a `Request`, providing a smaller memory footprint and faster response times
thanks
to {ref}`callback functions <callback-functions>`.

The size of these batches can be controlled with the `request_size` keyword.
The default `request_size` is 100 Documents. The optimal size will depend on your use case.

```python
from jina import Deployment, Client
from docarray import DocList, BaseDoc

with Deployment() as dep:
    client = Client(port=f.port)
    client.post('/', DocList[BaseDoc](BaseDoc() for _ in range(100)), request_size=10)

```

## Send data asynchronously

There is an async version of the Python Client which works with {meth}`~jina.clients.mixin.PostMixin.post` and
{meth}`~jina.clients.mixin.MutateMixin.mutate`.

While the standard `Client` is also asynchronous under the hood, its async version exposes this fact to the outside
world,
by allowing *coroutines* as input, and returning an *asynchronous iterator*.
This means you can iterate over Responses one by one, as they come in.

```python
import asyncio

from jina import Client, Deployment
from docarray import BaseDoc

async def async_inputs():
    for _ in range(10):
        yield BaseDoc()
        await asyncio.sleep(0.1)

async def run_client(port):
    client = Client(port=port, asyncio=True)
    async for resp in client.post('/', async_inputs, request_size=1):
        print(resp)

with Deployment() as dep:  # Using it as a Context Manager will start the Deployment
    asyncio.run(run_client(dep.port))

```

Async send is useful when calling an external service from an Executor.

```python
from jina import Client, Executor, requests
from docarray import DocList, BaseDoc

class DummyExecutor(Executor):
    c = Client(host='grpc://0.0.0.0:51234', asyncio=True)

    @requests
    async def process(self, docs: DocList[BaseDoc], **kwargs) -> DocList[BaseDoc]:
        return self.c.post('/', docs, return_type=DocList[BaseDoc])

```

## Send data to specific Executors

Usually a {class}`~jina.Flow` will send each request to all {class}`~jina.Executor`s with matching endpoints as
configured. But the {class}`~jina.Client` also allows you to only target specific Executors in a Flow using
the `target_executor` keyword. The request will then only be processed by the Executors which match the provided
target_executor regex. Its usage is shown in the listing below.

```python
from jina import Client, Executor, Flow, requests
from docarray import DocList
from docarray.documents import TextDoc

class FooExecutor(Executor):
    @requests
    async def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
        for doc in docs:
            doc.text = f'foo was here and got {len(docs)} document'

class BarExecutor(Executor):
    @requests
    async def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
        for doc in docs:
            doc.text = f'bar was here and got {len(docs)} document'

f = (
    Flow()
        .add(uses=FooExecutor, name='fooExecutor')
        .add(uses=BarExecutor, name='barExecutor')
)

with f:  # Using it as a Context Manager will start the Flow
    client = Client(port=f.port)
    docs = client.post(on='/', inputs=TextDoc(text=''), target_executor='bar*', return_type=DocList[TextDoc])
    print(docs.text)

```

This will send the request to all Executors whose names start with 'bar', such as 'barExecutor'.
In the simplest case, you can specify a precise Executor name, and the request will be sent only to that single
Executor.

## Use Unary or Streaming gRPC

The Flow with **gRPC** protocol implements the unary and the streaming RPC lifecycle for communicating with the clients.
When sending more than one request using the batching or the iterator mechanism, the RPC lifecycle for the
{meth}`~jina.clients.mixin.PostMixin.post` method can be controlled using the `stream` boolean method argument. By
default the stream option is set to `True` which uses the streaming RPC to send the data to the Flow. If the stream
option is set to `False`, the unary RPC is used to send the data to the Flow.
Both RPC lifecycles are implemented to provide the flexibility for the clients.

There might be performance penalties when using the streaming RPC in the Python gRPC implementation.

```{hint}
This option is only valid for **gRPC** protocol.

Refer to the gRPC [Performance Best Practices](https://grpc.io/docs/guides/performance/#general) guide for more implementations details and considerations.

```

(client-grpc-channel-options)=

## Configure gRPC Client options

The `Client` supports the `grpc_channel_options` parameter which allows more customization of the **gRPC** channel
construction. The `grpc_channel_options` parameter accepts a dictionary of **gRPC** configuration options which will be
used to overwrite the default options. The default **gRPC** options are:

```python
('grpc.max_receive_message_length', -1),
('grpc.keepalive_time_ms', 9999),

# send keepalive ping every 9 second, default is 2 hours.

('grpc.keepalive_timeout_ms', 4999),

# keepalive ping time out after 4 seconds, default is 20 seconds

('grpc.keepalive_permit_without_calls', True),

# allow keepalive pings when there's no gRPC calls

('grpc.http1.max_pings_without_data', 0),

# allow unlimited amount of keepalive pings without data

('grpc.http1.min_time_between_pings_ms', 10000),

# allow grpc pings from client every 9 seconds

('grpc.http1.min_ping_interval_without_data_ms', 5000),

# allow grpc pings from client without data every 4 seconds

```

If the `max_attempts` is greater than 1 on the {meth}`~jina.clients.mixin.PostMixin.post` method,
the `grpc.service_config` option will not be applied since the retry
options will be configured internally.

Refer to the [channel_arguments](https://grpc.github.io/grpc/python/glossary.html#term-channel_arguments) section for
the full list of available **gRPC** options.

```{hint}
:class: seealso
Refer to the {ref}`Configure Executor gRPC options <executor-grpc-server-options>` section for configuring the `Executor` **gRPC** options.

```

## Returns

{meth}`~jina.clients.mixin.PostMixin.post` returns a `DocList` containing all Documents flattened over all
Requests. When setting `return_responses=True`, this behavior is changed to returning a list of
{class}`~jina.types.request.data.Response` objects.

If a callback function is provided, `client.post()` will return none.

````{tab} Return as DocList objects

```python

from jina import Deployment, Client
from docarray import DocList
from docarray.documents import TextDoc

with Deployment() as dep:
    client = Client(port=dep.port)
    docs = client.post(on='', inputs=TextDoc(text='Hi there!'), return_type=DocList[TextDoc])
    print(docs)
    print(docs.text)

```

```console

<DocList[TextDoc] (length=1)>
['Hi there!']

```

````

````{tab} Return as Response objects

```python

from docarray import DocList
from docarray.documents import TextDoc

with Deployment() as dep:
    client = Client(port=dep.port)
    resp = client.post(on='', inputs=TextDoc(text='Hi there!'), return_type=DocList[TextDoc], return_responses=True)
    print(resp)
    print(resp[0].docs.text)

```

```console

[<jina.types.request.data.DataRequest ('header', 'parameters', 'routes', 'data') at 140619524354592>]
['Hi there!']

```

````

````{tab} Handle response via callback

```python

from jina import Flow, Client
from docarray import DocList
from docarray.documents import TextDoc

with Deployment() as dep:
    client = Client(port=f.port)
    resp = client.post(
        on='',
        inputs=TextDoc(text='Hi there!'),
        on_done=lambda resp: print(resp.docs.texts),
    )
    print(resp)

```

```console

['Hi there!']
None

```

````

### Return type

{meth}`~jina.clients.mixin.PostMixin.post` returns the Documents as the server sends them back. In order for the client to
return the user's expected document type, the `return_type` argument is required.

The `return_type` can be a parametrized `DocList` or a single `BaseDoc` type. If the return type parameter is a `BaseDoc` type,
the results will be returned as a `DocList[T]` except if the result contains a single Document, in that case the only Document in the list is returned
instead of the DocList.

### Callbacks vs returns

Callback operates on every sub-request generated by `request_size`. The callback function consumes the response one by
one. The old response is immediately free from the memory after the consumption.

When callback is not provided, the client accumulates all DocLists of all Requests before returning.
This means you will not receive results until all Requests have been processed, which is slower and requires more
memory.

### Force the order of responses

Note that the Flow processes Documents in an asynchronous and a distributed manner. The order of the Flow processing the
requests may not be the same order as the Client sending them. Hence, the response order may also not be consistent as
the sending order.

To force the order of the results to be deterministic and the same as when they are sent, passing `results_in_order`
parameter to {meth}`~jina.clients.mixin.PostMixin.post`.

```python
import random
import time
from jina import Deployment, Executor, requests, Client
from docarray import DocList
from docarray.documents import TextDoc

class RandomSleepExecutor(Executor):
    @requests
    def foo(self, *args, **kwargs):
        rand_sleep = random.uniform(0.1, 1.3)
        time.sleep(rand_sleep)

dep = Deployment(uses=RandomSleepExecutor, replicas=3)
input_text = [f'ordinal-{i}' for i in range(180)]
input_da = DocList[TextDoc]([TextDoc(text=t) for t in input_text])

with f:
    c = Client(port=dep.port, protocol=dep.protocol)
    output_da = c.post('/', inputs=input_da, request_size=10, return_type=DocList[TextDoc], results_in_order=True)
    for input, output in zip(input_da, output_da):
        assert input.text == output.text

```

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/client/third-party-clients.md

(third-party-client)=

# Third-party clients

This page is about accessing the Flow with other clients, e.g. `curl`, or programming languages other than Python.

````{admonition} Mostly developed for docarray<0.30
:class: note

Note that most of these clients have been developed for versions of Jina compatible with `docarray<0.30.0`. This means, they will only be able to communicate with services
using Jina-serve with docarray<0.30.0

````

## Golang

Our [Go Client](https://github.com/jina-ai/client-go) supports gRPC, HTTP and WebSocket protocols, allowing you to connect to Jina-serve from your Go applications.

## PHP

A big thanks to our community member [Jonathan Rowley](https://jina-ai.slack.com/team/U03973EA7BN) for developing a [PHP client](https://github.com/Dco-ai/php-jina) for Jina-serve!

## Kotlin

A big thanks to our community member [Peter Willemsen](https://jina-ai.slack.com/team/U03R0KNBK98) for developing a [Kotlin client](https://github.com/peterwilli/JinaKotlin) for Jina-serve!

(http-interface)=

## HTTP

```{admonition} Available Protocols
:class: caution
Jina-serve Flows can use one of {ref}`three protocols <flow-protocol>`: gRPC, HTTP, or WebSocket.
Only Flows that use HTTP can be accessed via the methods described below.

```

Apart from using the {ref}`Jina Client <client>`, the most common way of interacting with your deployed Flow is via HTTP.

You can always use `post` to interact with a Flow, using the `/post` HTTP endpoint.

With the help of [OpenAPI schema](https://swagger.io/specification/), one can send data requests to a Flow via `cURL`, JavaScript, [Postman](https://www.postman.com/), or any other HTTP client or programming library.

(http-arguments)=

### Arguments

Your HTTP request can include the following parameters:

| Name             | Required     | Description                                                                            | Example                                           |
| ---------------- | ------------ | -------------------------------------------------------------------------------------- | ------------------------------------------------- |
| `execEndpoint`   | **required** | Executor endpoint to target                                                            | `"execEndpoint": "/index"`                        |
| `data`           | optional     | List specifying the input [Documents](https://docarray.jina.ai/fundamentals/document/) | `"data": [{"text": "hello"}, {"text": "world"}]`. |
| `parameters`     | optional     | Dictionary of parameters to be sent to the Executors                                   | `"parameters": {"param1": "hello world"}`         |
| `targetExecutor` | optional     | String indicating an Executor to target. Default targets all Executors                 | `"targetExecutor": "MyExec"`                      |

Instead of using the generic `/post` endpoint, you can directly use endpoints like `/index` or `/search` to perform a specific operation.
In this case your data request is sent to the corresponding Executor endpoint, so you don't need to specify the parameter `execEndpoint`.

`````{dropdown} Example

````{tab} cURL

```{code-block} bash
---
emphasize-lines: 2
---
curl --request POST \
'http://localhost:12345/search' \
--header 'Content-Type: application/json' -d '{"data": [{"text": "hello world"}]}'

```

````

````{tab} javascript

```{code-block} javascript
---
emphasize-lines: 2
---
fetch(
    'http://localhost:12345/search',
    {
        method: 'POST',
        headers: {
            'Content-Type': 'application/json'
    },
    body: JSON.stringify({"data": [{"text": "hello world"}]})
}).then(response => response.json()).then(data => console.log(data));

```

````

`````

The response you receive includes `data` (an array of [Documents](https://docarray.jina.ai/fundamentals/document/)), as well as the fields `routes`, `parameters`, and `header`.

```{admonition} See also: Flow REST API
:class: seealso
For a more detailed description of the REST API of a generic Flow, including the complete request body schema and request samples, please check:

1. [OpenAPI Schema](https://schemas.jina.ai/rest/latest.json)
2. [Redoc UI](https://schemas.jina.ai/rest/)

For a specific deployed Flow, you can get the same overview by accessing the `/redoc` endpoint.

```

(swagger-ui)=

### Use cURL

Here's an example that uses `cURL`:

```bash
curl --request POST 'http://localhost:12345/post' --header 'Content-Type: application/json' -d '{"data": [{"text": "hello world"}],"execEndpoint": "/search"}'

```

````{dropdown} Sample response

```

    {
      "requestId": "e2978837-e5cb-45c6-a36d-588cf9b24309",
      "data": {
        "docs": [
          {
            "id": "84d9538e-f5be-11eb-8383-c7034ef3edd4",
            "granularity": 0,
            "adjacency": 0,
            "parentId": "",
            "text": "hello world",
            "chunks": [],
            "weight": 0.0,
            "matches": [],
            "mimeType": "",
            "tags": {
              "mimeType": "",
              "parentId": ""
            },
            "location": [],
            "offset": 0,
            "embedding": null,
            "scores": {},
            "modality": "",
            "evaluations": {}
          }
        ],
        "groundtruths": []
      },
      "header": {
        "execEndpoint": "/index",
        "targetPeapod": "",
        "noPropagate": false
      },
      "parameters": {},
      "routes": [
        {
          "pod": "gateway",
          "podId": "5742d5dd-43f1-451f-88e7-ece0588b7557",
          "startTime": "2021-08-05T07:26:58.636258+00:00",
          "endTime": "2021-08-05T07:26:58.636910+00:00",
          "status": null
        }
      ],
      "status": {
        "code": 0,
        "description": "",
        "exception": null
      }
    }

```

````

### Use JavaScript

Sending a request from the front-end JavaScript code is a common use case too. Here's how this looks:

```javascript
fetch('http://localhost:12345/post', {
    method: 'POST',
    headers: {
        'Content-Type': 'application/json'
    },
    body: JSON.stringify({"data": [{"text": "hello world"}],"execEndpoint": "/search"})
}).then(response => response.json()).then(data => console.log(data));

```

````{dropdown} Output

```javascript

{
  "data": [
    {
      "id": "37e6f1bc7ec82fc4ba75691315ae54a6",
      "text": "hello world"
      "matches": ...
    },
  "header": {
    "requestId": "c725217aa7714de88039866fb5aa93d2",
    "execEndpoint": "/index",
    "targetExecutor": ""
  },
  "routes": [
    {
      "executor": "gateway",
      "startTime": "2022-04-01T13:11:57.992497+00:00",
      "endTime": "2022-04-01T13:11:57.997802+00:00"
    },
    {
      "executor": "executor0",
      "startTime": "2022-04-01T13:11:57.993686+00:00",
      "endTime": "2022-04-01T13:11:57.997274+00:00"
    }
  ],
  ]
}

```

````

### Use Swagger UI

Flows provide a customized [Swagger UI](https://swagger.io/tools/swagger-ui/) which you can use to visually interact with the Flow
through a web browser.

```{admonition} Available Protocols
:class: caution
Only Flows that have enabled {ref}`CORS <cors>` expose the Swagger UI interface.

```

For a Flow that is exposed on port `PORT`, you can navigate to the Swagger UI at `http://localhost:PORT/docs`:

```{figure} ../../../.github/2.0/swagger-ui.png
:align: center

```

Here you can see all the endpoints that are exposed by the Flow, such as `/search` and `/index`.

To send a request, click on the endpoint you want to target, then `Try it out`.

Now you can enter your HTTP request, and send it by clicking `Execute`.
You can again use the [REST HTTP request schema](https://schemas.jina.ai/rest/), but do not need to specify `execEndpoint`.

Below, in `Responses`, you can see the reply, together with a visual representation of the returned Documents.

### Use Postman

[Postman](https://www.postman.com/) is an application that allows the testing of web APIs from a graphical interface. You can store all the templates for your REST APIs in it, using Collections.

We provide a [suite of templates for Jina Flow](https://github.com/jina-ai/jina/tree/master/.github/Jina.postman_collection.json). You can import it in Postman in **Collections**, with the **Import** button. It provides templates for the main operations. You need to create an Environment to define the `{{url}}` and `{{port}}` environment variables. These would be the hostname and the port where the Flow is listening.

This contribution was made by [Jonathan Rowley](https://jina-ai.slack.com/archives/C0169V26ATY/p1649689443888779?thread_ts=1649428823.420879&cid=C0169V26ATY), in our [community Slack](https://slack.jina.ai).

## gRPC

To use the gRPC protocol with a language other than Python you will need to:

* Download the two proto definition files: `jina.proto` and `docarray.proto` from [GitHub](https://github.com/jina-ai/jina/tree/master/jina/proto) (be sure to use the latest release branch)
* Compile them with [protoc](https://grpc.io/docs/protoc-installation/) and specify which programming language you want to compile them with.
* Add the generated files to your project and import them into your code.

You should finally be able to communicate with your Flow using the gRPC protocol. You can find more information on the gRPC
`message` and `service` that you can use to communicate in the [Protobuf documentation](../../proto/docs.md).

(flow-graphql)=

## GraphQL

````{admonition} See Also
:class: seealso

This article does not serve as the introduction to GraphQL.
If you are not already familiar with GraphQL, we recommend you learn more about GraphQL from the [official documentation](https://graphql.org/learn/).
You may also want to learn about [Strawberry](https://strawberry.rocks/), the library that powers Jina-serve's GraphQL support.

````

Jina Flows that use the HTTP protocol can also provide a GraphQL API, which is located behind the `/graphql` endpoint.
GraphQL has the advantage of letting you define your own response schema, which means that only the fields you require
are sent over the wire.
This is especially useful when you don't need potentially large fields, like image tensors.

You can access the Flow from any GraphQL client, like `sgqlc`.

```python
from sgqlc.endpoint.http import HTTPEndpoint

HOSTNAME, PORT = ...
endpoint = HTTPEndpoint(url=f'{HOSTNAME}:{PORT}/graphql')
mut = '''
    mutation {
        docs(data: {text: "abcd"}) {
            id
            matches {
                embedding
            }
        }
    }
'''
response = endpoint(mut)

```

## WebSocket

WebSocket uses persistent connections between the client and Flow, hence allowing streaming use cases.
While you can always use the Python client to stream requests like any other protocol, WebSocket allows streaming JSON from anywhere (CLI / Postman / any other programming language).
You can use the same set of arguments as {ref}`HTTP <http-arguments>` in the payload.

We use [subprotocols](https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_API/Writing_WebSocket_servers#subprotocols) to separate streaming JSON vs bytes.
The Flow defaults to `json` if you don't specify a sub-protocol while establishing the connection (Our Python client uses `bytes` streaming by using [jina-serve.proto](../../proto/docs.md) definition).

````{Hint}

* Choose WebSocket over HTTP if you want to stream requests.
* Choose WebSocket over gRPC if
* you want to stream using JSON, not bytes.
* your client language doesn't support gRPC.
* you don't want to compile the [Protobuf definitions](../../proto/docs.md) for your gRPC client.

````

## See also

* {ref}`Access a Flow with the Client <client>`
* {ref}`Configure a Flow <flow-cookbook>`
* [Flow REST API reference](https://schemas.jina.ai/rest/)

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/client/transient-errors.md

(transient-errors)=

# Transient Errors

Most transient errors can be attributed to network issues between the client and target server or between a server's
dependencies like a database. The errors can be:

* ignored if an operation produced by a generator or sequence of operations isn't relevant to the overall success.
* retried up to a certain limit which assumes that the recovery logic kicks in to repair transient errors.
* accept that the operation cannot be successfully completed.

## Transient fault handling with retries

The {meth}`~jina.clients.mixin.PostMixin.post` method accepts `max_attempts`, `initial_backoff`, `max_backoff`
and `backoff_multiplier` parameters to control the capacity to retry requests when a transient connectivity error
occurs, using an exponential backoff strategy.
This can help to overcome transient network connectivity issues which are broadly captured by the
{class}`~grpc.aio.AioRpcError`, {class}`~aiohttp.ClientError`, {class}`~asyncio.CancelledError` and
{class}`~jina.excepts.InternalNetworkError`
exception types.

The `max_attempts` parameter determines the number of sending attempts, including the original request.
The `initial_backoff`, `max_backoff`, and `backoff_multiplier` parameters determine the randomized delay in seconds
before retry attempts.

The initial retry attempt will occur at `initial_backoff`. In general, the *n-th* attempt will occur
at `random(0, min(initial_backoff*backoff_multiplier**(n-1), max_backoff))`.

### Handling gRPC retries for streaming and unary RPC methods

The {meth}`~jina.clients.mixin.PostMixin.post` method supports the `stream` boolean parameter (defaults to `True`). If
set to `True`,
the **gRPC** server side streaming RPC method will be invoked. If set to `False`, the server side unary RPC method will
be invoked. Some important implication of
using retries with **gRPC** are:

* The built-in **gRPC** retries are limited in scope and are implemented to work under certain circumstances. More

  details are specified in the [design document](https://github.com/grpc/proposal/blob/master/A6-client-retries.md).

* If the `stream` parameter is set to True and if the `inputs` parameters is a `GeneratorType` or

  an `Iterable`, the retry must be handled as below because the result must be consumed to check for errors in the
  stream of responses. The **gRPC** service retry is still configured but cannot be guaranteed.

   ```python
   from jina import Client
   from dorcarray import BaseDoc
   from jina.clients.base.retry import wait_or_raise_err
   from jina.helper import run_async

   client = Client(host='grpc://localhost:12345')

   max_attempts = 5
   initial_backoff = 0.8
   backoff_multiplier = 1.5
   max_backoff = 5

   def input_generator():
       for _ in range(10):
           yield BaseDoc()

   for attempt in range(1, max_attempts + 1):
       try:
           response = client.post(
               '/',
               inputs=input_generator(),
               request_size=2,
               timeout=0.5,
           )
           assert len(response) == 1
       except ConnectionError as err:
           run_async(
               wait_or_raise_err,
               attempt=attempt,
               err=err,
               max_attempts=max_attempts,
               backoff_multiplier=backoff_multiplier,
               initial_backoff=initial_backoff,
               max_backoff=max_backoff,
           )

   ```

* If the `stream` parameter is set to True and the `inputs` parameter is a `Document` or a `DocList`, the retry is

  handled internally on the `max_attempts`, `initial_backoff`, `backoff_multiplier` and `max_backoff`
  parameters.

* If the `stream` parameter is set to False, the {meth}`~jina.clients.mixin.PostMixin.post` method invokes the unary

  RPC method and the
  retry is handled internally.

```{hint}
The retry parameters `max_attempts`, `initial_backoff`, `backoff_multiplier` and `max_backoff` of the {meth}`~jina.clients.mixin.PostMixin.post` method will be used to set the **gRPC** retry service options. This improves the chances of success if the gRPC retry conditions are met.

```

## Continue streaming when an Executor error occurs

The {meth}`~jina.clients.mixin.PostMixin.post` accepts a `continue_on_error` parameter. When set to `True`, the Client
will keep trying to send the remaining requests. The `continue_on_error` parameter will only apply
to Exceptions caused by an Executor, but in case of network connectivity issues, an Exception will be raised.

The `continue_on_error` parameter handles the errors that are returned by the Executor as part of its response. The
errors can be logical errors that might be raised
during the execution of the operation. This doesn't include transient errors represented by
{class}`~grpc.aio.AioRpcError`, {class}`~aiohttp.ClientError`, {class}`~asyncio.CancelledError` and
{class}`~jina.excepts.InternalNetworkError` triggered during the Gateway and Executor communication.

The `retries` parameter of the Gateway control the number of retries for the transient errors that arise between the
Gateway and Executor communication.

```{hint}
Refer to {ref}`Network Errors <flow-error-handling>` section for more information.

```

## Retries with a large inputs or long-running operations

When using the gRPC client, it is recommended to set the `stream` parameter to False so that the unary RPC is invoked by
the {class}`~jina.Client`
which performs the retry internally with the request from the `inputs` iterator or generator. The `request_size`
parameter must also be set to perform smaller operations which can be retried without much overhead on the server.

The **HTTP** and **WebSocket**

```{hint}
Refer to {ref}`Callbacks <callback-functions>` section for dealing with success and failures after retries.

```

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/jcloud/configuration.md

(jcloud-configuration)=

# {octicon}`file-code` Configuration

JCloud extends Jina-serve's {ref}`Flow YAML specification<flow-yaml-spec>` by introducing the special field `jcloud`. This lets you define resources and scaling policies for each Executor and Gateway.

Here's a Flow with two Executors that have specific resource needs: `indexer` requires a 10 GB `ebs` disk, whereas `encoder` requires a G4 instance, which implies that two cores and 4 GB RAM are used. See the below sections for further information about instance types.

```{code-block} yaml
---
emphasize-lines: 5-7,10-16
---
jtype: Flow
executors:

* name: encoder

    uses: jinaai+docker://<username>/Encoder
    jcloud:
      resources:
        instance: C4

* name: indexer

    uses: jinaai+docker://<username>/Indexer
    jcloud:
      resources:
        storage:
          kind: ebs
          size: 10G

```

## Allocate Executor resources

Since each Executor has its own business logic, it may require different cloud resources. One Executor might need more RAM, whereas another might need a bigger disk.

In JCloud, you can pass highly customizable, finely-grained resource requests for each Executor using the `jcloud.resources` argument in your Flow YAML.

### Instance

JCloud uses the concept of an "instance" to represent a specific set of hardware specifications.
In the above example, a C4 instance type represents two cores and 4 GB RAM based on the CPU tiers instance definition table below.

````{admonition} Note
:class: note
We will translate the raw numbers from input to instance tier that fits most closely if you are still using the legacy resource specification interface, such as:

```{code-block} yaml

jcloud:
  resources:
    cpu: 8
    memory: 8G

```

There are circumstances in the instance tier where they don't exactly fulfill the CPU cores and memory you need, like in the above example.
In cases like this we "ceil" the requests to the lowest tier that satisfies all the specifications.
In this case, `C6` would be considered, as `C5`'s `Cores` are lower than what's being requested (4 vs 8).

````

There are also two types of instance tiers, one for CPU instances, one for GPU.

(jcloud-pricing)=

#### Pricing

Each instance has a fixed `Credits Per Hour` number, indicating how many credits JCloud will charge
if a certain instance is used. For example, if an Executor uses `C3`, it implies that `10` credits will be spent
from the operating user account. Other important facts to note:

* If the Flow is powering other App(s) you create, you will be charged by the App(s), not the underlying Flow.
* `Credits Per Hour` is on an Executor/Gateway basis, the total `Credits Per Hour` of a Flow is the sum of all the credits

     each components cost.

* If shards/replicas are used in an Executor/Gateway, the same instance type will be used, so `Credits Per Hour` will be multiplied.

     For example, if an Executor uses `C3` and it has two replicas, the `Credits Per Hour` for the Executor would double to `20`.
     The only exception is when sharding is used. In that case `C1` would be used for the shards head, regardless of what instance type has been entered for the shared Executor.

```{hint}
Please visit [Jina AI Cloud Pricing](https://cloud.jina.ai/pricing/) for more information about billing and credits.

```

#### CPU tiers

| Instance | Cores | Memory | Credits per hour |
| -------- | ----- | ------ | ---------------- |
| C1       | 0.1   | 0.2 GB | 1                |
| C2       | 0.5   | 1 GB   | 5                |
| C3       | 1     | 2 GB   | 10               |
| C4       | 2     | 4 GB   | 20               |
| C5       | 4     | 8 GB   | 40               |
| C6       | 8     | 16 GB  | 80               |
| C7       | 16    | 32 GB  | 160              |
| C8       | 32    | 64 GB  | 320              |

By default, C1 is allocated to each Executor and Gateway.

JCloud offers the general Intel Xeon processor (Skylake 8175M or Cascade Lake 8259CL) for the CPU instances.

#### GPU tiers

JCloud supports GPU workloads with two different usages: `shared` or `dedicated`.

If GPU is enabled, JCloud will provide NVIDIA A10G Tensor Core GPUs with 24 GB memory for workloads in both usage types.

```{hint}
When using GPU resources, it may take a few extra minutes before all Executors are ready to serve traffic.

```

| Instance | GPU    | Memory | Credits per hour |
| -------- | ------ | ------ | ---------------- |
| G1       | shared | 14 GB  | 100              |
| G2       | 1      | 14 GB  | 125              |
| G3       | 2      | 24 GB  | 250              |
| G4       | 4      | 56 GB  | 500              |

##### Shared GPU

An Executor using a `shared` GPU shares this GPU with up to four other Executors.
This enables time-slicing, which allows workloads that land on oversubscribed GPUs to interleave with one another.

To use `shared` GPU, `G1` needs to be specified as the instance type.

The tradeoffs with a `shared` GPU are increased latency, jitter, and potential out-of-memory (OOM) conditions when many different applications are time-slicing on the GPU. If your application is consuming a lot of memory, we suggest using a dedicated GPU.

##### Dedicated GPU

Using a dedicated GPU is the default way to provision a GPU for an Executor. This automatically creates nodes or assigns the Executor to a GPU node. In this case, the Executor owns the whole GPU.

To use a `dedicated` GPU, `G2`/ `G3` / `G4` needs to be specified as instance type.

### Storage

JCloud supports three kinds of storage: ephemeral (default), [efs](https://aws.amazon.com/efs/) (network file storage) and [ebs](https://aws.amazon.com/ebs/) (block device).

`ephemeral` storage will assign space to an Executor when it is created. Data in `ephemeral` storage is deleted permanently if Executors are restarted or rescheduled.

````{hint}

By default, we assign `ephemeral` storage to all Executors in a Flow. This lets the storage resize dynamically, so you don't need to shrink/grow volumes manually.

If your Executor needs to share data with other Executors and retain data persistency, consider using `efs`. Note that:

* IO performance is slower compared to `ebs` or `ephemeral`
* The disk can be shared with other Executors or Flows.
* Default storage size is 5 GB.

If your Executor needs high IO, you can use `ebs` instead. Note that:

* The disk cannot be shared with other Executors or Flows.
* Default storage size is 5 GB.

````

JCloud also supports retaining the data that a Flow was using while it was active. You can set the `retain` argument to `true` to enable this feature.

```{code-block} yaml
---
emphasize-lines: 5-10,12,15
---
jtype: Flow
executors:

* name: executor1

    uses: jinaai+docker://<username>/Executor1
    jcloud:
      resources:
        storage:
          kind: ebs
          size: 10G
          retain: true

* name: executor2

    uses: jinaai+docker://<username>/Executor2
    jcloud:
      resources:
        storage:
          kind: efs

```

#### Pricing (2)

Here are the numbers in terms of credits per GB per month for the three kinds of storage described above.

| Instance  | Credits per GB per month |
| --------- | ------------------------ |
| Ephemeral | 0                        |
| EBS       | 30                       |
| EFS       | 75                       |

For example, using 10 GB of EBS storage for a month costs `30` credits.
If shards/replicas are used, we will multiply credits further by the number of storages created.

## Scale out Executors

On JCloud, demand-based autoscaling functionality is naturally offered thanks to the underlying Kubernetes architecture. This means that you can maintain [serverless](https://en.wikipedia.org/wiki/Serverless_computing) deployments in a cost-effective way with no headache of setting the [right number of replicas](https://jina.ai/serve/how-to/scale-out/#scale-out-your-executor) anymore!

### Autoscaling with `jinaai+serverless://`

The easiest way to scale out your Executor is to use a Serverless Executor. This can be enabled by using `jinaai+serverless://` instead of `jinaai+docker://` in Executor's `uses`, such as:

```{code-block} yaml
---
emphasize-lines: 4
---
jtype: Flow
executors:

* name: executor1

    uses: jinaai+serverless://<username>/Executor1

```

JCloud autoscaling leverages [Knative](https://knative.dev/docs/) behind the scenes, and `jinahub+serverless` uses a set of Knative configurations as defaults.

```{hint}
For more information about the Knative autoscaling configurations, please visit [Knative autoscaling](https://knative.dev/docs/serving/autoscaling/).

```

### Autoscaling with custom args

If `jinaai+serverless://` doesn't meet your requirements, you can further customize autoscaling configurations by using the `autoscale` argument on a per-Executor basis in the Flow YAML, such as:

```{code-block} yaml
---
emphasize-lines: 5-10
---
jtype: Flow
executors:

* name: executor1

    uses: jinaai+docker://<username>/Executor1
    jcloud:
      autoscale:
        min: 1
        max: 2
        metric: rps
        target: 50

```

Below are the defaults and requirements for the configurations:

| Name             | Default     | Allowed                                     | Description                                                               |
| ------           | ----------- | ------------------------                    | -------------------------------------------------                         |
| min              | 1           | int                                         | Minimum number of replicas (`0` means serverless)                         |
| max              | 2           | int, up to 5                                | Maximum number of replicas                                                |
| metric           | concurrency | `concurrency` / `rps` / `cpu` / `memory`    | Metric for scaling                                                        |
| scale_down_delay | 30s         | str, `0s` <= value <= `1h`                  | Time window which must pass at reduced concurrency before a scaling down  |
| target           | 100         | int                                         | Target number the replicas try to maintain.                               |

The unit of `target` depends of the metric specified. Refer to the table below:

| Metric        | Target                                                                                                                                                  |
| ----          | -----                                                                                                                                                   |
| `concurrency` | Number of concurrent requests processed at any given time.                                                                                              |
| `rps`         | Number of requests processed per second per replica.                                                                                                    |
| `cpu`         | Average % CPU utilization of each pod<br>(e.g. `60` means replicas will be scaled up when pods on average reach 60% CPU utilization)                    |
| `memory`      | Average mebibytes of memory used by each pod<br>(e.g. `200` means replicas will be scaled up when the average pods' memory consumption exceeds 200MiB). |

After you make a JCloud deployment using the autoscaling configuration, the Flow serving part is just the same: the only difference you may notice is it takes a few extra seconds to handle the initial requests since it needs to scale the deployments behind the scenes. Let JCloud handle the scaling from now on, and you can deal with the code!

Note, that if `metric` is `cpu` or `memory`, `min` will be reset to 1 if user sets it to set to 0.

### Pricing (3)

At present, pricing for autoscaled Executor/Gateway largely follows the same {ref}`JCloud pricing rules <jcloud-pricing>` as other Jina AI services.
We track the minimum number of replicas in autoscale configurations and use it as a multiplier for the replicas used when calculating the
`Credits Per Hour`.

### Restrictions

```{admonition} **Restrictions**

* Autoscale does not currently allow the use of `ebs` as a storage type in combination. Please use `efs` and `ephemeral` instead.
* Autoscale is not supported for multi-protocol Gateways.

```

## Configure availability tolerance

If service issues cause disruption of Executors, JCloud lets you specify a tolerance level for number of replicas that stay up or go down.

The JCloud parameters `minAvailable` and `maxUnavailable` ensure that Executors will stay up even if a certain number of replicas go down.

| Name             | Default |                                          Allowed                                          | Description                                              |
| :--------------- | :-----: | :---------------------------------------------------------------------------------------: | :------------------------------------------------------- |
| `minAvailable`   |   N/A   | Lower than number of [replicas](https://jina.ai/serve/concepts/flow/scale-out/#scale-out)  | Minimum number of replicas available during disruption   |
| `maxUnavailable` |   N/A   | Lower than numbers of [replicas](https://jina.ai/serve/concepts/flow/scale-out/#scale-out) | Maximum number of replicas unavailable during disruption |

```{code-block} yaml
---
emphasize-lines: 5-6
---
jtype: Flow
executors:

* uses: jinaai+docker://<username>/Executor1

    replicas: 5
    jcloud:
      minAvailable: 2

```

In case of disruption, ensure at least two replicas will still be available, while three may be down.

```{code-block} yaml
---
emphasize-lines: 5-6
---
jtype: Flow
executors:

* uses: jinaai+docker://<username>/Executor1

    replicas: 5
    jcloud:
      maxUnavailable: 2

```

In case of disruption, ensure that if a maximum of two replicas are down, at least three replicas will still be available.

## Configure Gateway

The Gateway can be customized just like an Executor.

### Set timeout

By default, the Gateway will close connections that have been idle for over 600 seconds. If you want a longer connection timeout threshold, change the `timeout` parameter under `gateway.jcloud`.

```{code-block} yaml
---
emphasize-lines: 2-4
---
jtype: Flow
gateway:
  jcloud:
    timeout: 800
executors:

* name: executor1

    uses: jinaai+docker://<username>/Executor1

```

### Control Gateway resources

To customize the Gateway's CPU or memory, specify the instance type under `gateway.jcloud.resources`:

```{code-block} yaml
---
emphasize-lines: 2-6
---
jtype: Flow
gateway:
  jcloud:
    resources:
      instance: C3
executors:

* name: encoder

    uses: jinaai+docker://<username>/Encoder

```

## Expose Executors

A Flow deployment without a Gateway is often used for {ref}`external-executors`, which can be shared between different Flows. You can expose an Executor by setting `expose: true` (and un-expose the Gateway by setting `expose: false`):

```{code-block} yaml
---
emphasize-lines: 2-4, 8-9
---
jtype: Flow
gateway:
  jcloud:
    expose: false       # don't expose the Gateway
executors:

* name: custom

    uses: jinaai+docker://<username>/CustomExecutor
    jcloud:
      expose: true    # expose the Executor

```

```{figure} img/expose-executor.png
:width: 70%

```

You can expose the Gateway along with Executors:

```{code-block} yaml
---
emphasize-lines: 2-4,8-9
---
jtype: Flow
gateway:
  jcloud:
    expose: true
executors:

* name: custom1

    uses: jinaai+docker://<username>/CustomExecutor1
    jcloud:
      expose: true    # expose the Executor

```

```{figure} img/gateway-and-executors.png
:width: 70%

```

## Other deployment options

### Customize Flow name

You can use the `name` argument to specify the Flow name in the Flow YAML:

```{code-block} yaml
---
emphasize-lines: 2-3
---
jtype: Flow
jcloud:
  name: my-name
executors:

* name: executor1

    uses: jinaai+docker://<username>/Executor1

```

### Specify Jina version

To control Jina's version while deploying a Flow to `jcloud`, you can pass the `version` argument in the Flow YAML:

```{code-block} yaml
---
emphasize-lines: 2-3
---
jtype: Flow
jcloud:
  version: 3.10.0
executors:

* name: executor1

    uses: jinaai+docker://<username>/Executor1

```

### Add Labels

You can use `labels` (as key-value pairs) to attach metadata to your Flows and Executors:

Flow level `labels`:

```{code-block} yaml
---
emphasize-lines: 2-5
---
jtype: Flow
jcloud:
  labels:
    username: johndoe
    app: fashion-search
executors:

* name: executor1

    uses: jinaai+docker://<username>/Executor1

```

Executor level `labels`:

```{code-block} yaml
---
emphasize-lines: 5-8
---
jtype: Flow
executors:

* name: executor1

    uses: jinaai+docker://<username>/Executor1
    jcloud:
      labels:
        index: partial
        group: backend

```

```{hint}

Keys in `labels` have the following restrictions:

* Must be 63 characters or fewer.
* Must begin and end with an alphanumeric character ([a-z0-9A-Z]) with dashes (-), underscores (_), dots (.), and alphanumerics between.
* The following keys are skipped if passed in the Flow YAML.
  * `user`
  * `jina`-version

```

### Monitoring

To enable [tracing support](https://jina.ai/serve/cloud-nativeness/opentelemetry/) in Flows, you can pass `enable: true` argument in the Flow YAML. (Tracing support is not enabled by default in JCloud)

```{code-block} yaml
---
emphasize-lines: 2-5
---
jtype: Flow
jcloud:
  monitor:
    traces:
      enable: true
executors:

* name: executor1

    uses: jinaai+docker://<username>/Executor1

```

You can pass the `enable: true` argument to `gateway` to only enable tracing support in the Gateway:

```{code-block} yaml
---
emphasize-lines: 2-6
---
jtype: Flow
gateway:
  jcloud:
      monitor:
        traces:
          enable: true
executors:

* name: executor1

    uses: jinaai+docker://<username>/Executor1

```

You can also only enable tracing support in `executor1`.

```{code-block} yaml
---
emphasize-lines: 5-8
---
jtype: Flow
executors:

* name: executor1

    uses: jinaai+docker://<username>/Executor1
    jcloud:
      monitor:
        traces:
          enable: true

```

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/jcloud/index.md

(jcloud)=

# Jina AI Cloud Hosting

```{toctree}
:hidden:

configuration

```

```{figure} https://jina.ai/serve/_images/jcloud-banner.png
:width: 0 %
:scale: 0 %

```

```{figure} img/jcloud-banner.png
:scale: 0 %
:width: 0 %

```

After building a Jina-serve project, the next step is to deploy and host it on the cloud. [Jina AI Cloud](https://cloud.jina.ai/) is Jina-serve's reliable, scalable and production-ready cloud-hosting solution that manages your project lifecycle without surprises or hidden development costs.

```{tip}
Are you ready to unlock the power of AI with Jina AI Cloud? Take a look at our [pricing options](https://cloud.jina.ai/pricing) now!

```

In addition to deploying Flows, `jcloud` supports the creation of secrets and jobs which are created in the Flow's namespace.

## Basics

Jina AI Cloud provides a CLI that you can use via `jina cloud` from the terminal (or `jcloud` or simply `jc` for minimalists.)

````{hint}
You can also install just the JCloud CLI without installing the Jina-serve package.

```bash

pip install jcloud
jc -h

```

If you installed the JCloud CLI individually, all of its commands fall under the `jc` or `jcloud` executable.

In case the command `jc` is already occupied by another tool, use `jcloud` instead. If your pip install doesn't register bash commands for you, you can run `python -m jcloud -h`.

````

For the rest of this section, we use `jc` or `jcloud`. But again they are interchangeable with `jina cloud`.

## Flows

### Deploy

In Jina's idiom, a project is a [Flow](https://jina.ai/serve/concepts/orchestration/flow/), which represents an end-to-end task such as indexing, searching or recommending. In this document, we use "project" and "Flow" interchangeably.

A Flow can have two types of file structure: a single YAML file or a project folder.

#### Single YAML file

A self-contained YAML file, consisting of all configuration at the [Flow](https://jina.ai/serve/concepts/orchestration/flow/)-level and [Executor](https://jina.ai/serve/concepts/serving/executor/)-level.

> All Executors' `uses` must follow the format `jinaai+docker://<username>/MyExecutor` (from [Executor Hub](https://cloud.jina.ai)) to avoid any local file dependencies:

```yaml

# flow.yml

jtype: Flow
executors:

* name: sentencizer

    uses: jinaai+docker://jina-ai/Sentencizer

```

To deploy:

```bash
jc flow deploy flow.yml

```

````{caution}
When `jcloud` deploys a flow it automatically appends the following global arguments to the `flow.yml`, if not present:

```yaml

jcloud:
  version: jina-version
  docarray: docarray-version

```

The `jina` and `docarray` corresponds to your development environment's `jina` and `docarray` versions.

````

````{tip}
We recommend testing locally before deployment:

```bash

jina flow --uses flow.yml

```

````

#### Project folder

````{tip}
The best practice for creating a Jina AI Cloud project is to use:

```bash

jc new

```text

````

Just like a regular Python project, you can have sub-folders of Executor implementations and a `flow.yml` on the top-level to connect all Executors together.

You can create an example local project using `jc new hello`. The default structure looks like:

```text
├── .env
├── executor1
│   ├── config.yml
│   ├── executor.py
│   └── requirements.txt
└── flow.yml

```

Where:

* `hello/` is your top-level project folder.
* `executor1` directory has all Executor related code/configuration. You can read the best practices for [file structures](https://jina.ai/serve/concepts/serving/executor/file-structure/). Multiple Executor directories can be created.
* `flow.yml` Your Flow YAML.
* `.env` All environment variables used during deployment.

To deploy:

```bash
jc flow deploy hello

```

The Flow is successfully deployed when you see:

```{figure} img/deploy.png
:width: 70%

```

---

You will get a Flow ID, say `merry-magpie-82b9c0897f`. This ID is required to manage, view logs and remove the Flow.

As this Flow is deployed with the default gRPC gateway (feel free to change it to `http` or `websocket`), you can use `jina.Client` to access it:

```python
from jina import Client, Document

print(
    Client(host='grpcs://merry-magpie-82b9c0897f.wolf.jina.ai').post(
        on='/', inputs=Document(text='hello')
    )
)

```

(jcloud-flow-status)=

### Get status

To get the status of a Flow:

```bash
jc flow status merry-magpie-82b9c0897f

```

```{figure} img/status.png
:width: 70%

```

### Monitoring

Basic monitoring is provided to Flows deployed on Jina AI Cloud.

To access the [Grafana](https://grafana.com/)-powered dashboard, first get {ref}`the status of the Flow<jcloud-flow-status>`. The `Grafana Dashboard` link is displayed at the bottom of the pane. Visit the URL to find basic metrics like 'Number of Request Gateway Received' and 'Time elapsed between receiving a request and sending back the response':

```{figure} img/monitoring.png
:width: 80%

```

### List Flows

To list all of your "Starting", "Serving", "Failed", "Updating", and "Paused" Flows:

```bash
jc flows list

```

```{figure} img/list.png
:width: 90%

```

You can also filter your Flows by passing a phase:

```bash
jc flows list --phase Deleted

```

```{figure} img/list_deleted.png
:width: 90%

```

Or see all Flows:

```bash
jc flows list --phase all

```

```{figure} img/list_all.png
:width: 90%

```

### Remove Flows

You can remove a single Flow, multiple Flows or even all Flows by passing different identifiers.

To remove a single Flow:

```bash
jc flow remove merry-magpie-82b9c0897f

```

To remove multiple Flows:

```bash
jc flow remove merry-magpie-82b9c0897f wondrous-kiwi-b02db6a066

```

To remove all Flows:

```bash
jc flow remove all

```

By default, removing multiple or all Flows is an interactive process where you must give confirmation before each Flow is deleted. To make it non-interactive, set the below environment variable before running the command:

```bash
export JCLOUD_NO_INTERACTIVE=1

```

### Update a Flow

You can update a Flow by providing an updated YAML.

To update a Flow:

```bash
jc flow update super-mustang-c6cf06bc5b flow.yml

```

```{figure} img/update_flow.png
:width: 70%

```

### Pause / Resume Flow

You have the option to pause a Flow that is not currently in use but may be needed later. This will allow the Flow to be resumed later when it is needed again by using `resume`.

To pause a Flow:

```bash
jc flow pause super-mustang-c6cf06bc5b

```

```{figure} img/pause_flow.png
:width: 70%

```

To resume a Flow:

```bash
jc flow resume super-mustang-c6cf06bc5b

```

```{figure} img/resume_flow.png
:width: 70%

```

### Restart Flow, Executor or Gateway

If you need to restart a Flow, there are two options: restart all Executors and the Gateway associated with the Flow, or selectively restart only a specific Executor or the Gateway.

To restart a Flow:

```bash
jc flow restart super-mustang-c6cf06bc5b

```

```{figure} img/restart_flow.png
:width: 70%

```

To restart the Gateway:

```bash
jc flow restart super-mustang-c6cf06bc5b --gateway

```

```{figure} img/restart_gateway.png
:width: 70%

```

To restart an Executor:

```bash
jc flow restart super-mustang-c6cf06bc5b --executor executor0

```

```{figure} img/restart_executor.png
:width: 70%

```

### Recreate a Deleted Flow

To recreate a deleted Flow:

```bash
jc flow recreate profound-rooster-eec4b17c73

```

```{figure} img/recreate_flow.png
:width:  70%

```

### Scale an Executor

You can also manually scale any Executor.

```bash
jc flow scale good-martin-ca6bfdef84 --executor executor0 --replicas 2

```

```{figure} img/scale_executor.png
:width: 70%

```

### Normalize a Flow

To normalize a Flow:

```bash
jc flow normalize flow.yml

```

```{hint}
Normalizing a Flow is the process of building the Executor image and pushing the image to Hubble.

```

### Get Executor or Gateway logs

To get the Gateway logs:

```bash
jc flow logs --gateway central-escargot-354a796df5

```

```{figure} img/gateway_logs.png
:width: 70%

```

To get the Executor logs:

```bash
jc flow logs --executor executor0 central-escargot-354a796df5

```

```{figure} img/executor_logs.png
:width: 70%

```

## Secrets

### Create a Secret

To create a Secret for a Flow:

```bash
jc secret create mysecret rich-husky-af14064067 --from-literal "{'env-name': 'secret-value'}"

```

```{tip}
You can optionally pass the `--update` flag to automatically update the Flow spec with the updated secret information. This flag will update the Flow which is hosted on the cloud. Finally, you can also optionally pass a Flow's yaml file path with `--path` to update the yaml file locally.  Refer to [this](https://jina.ai/serve/cloud-nativeness/kubernetes/#deploy-flow-with-custom-environment-variables-and-secrets) section for more information.

```

```{caution}
If the `--update` flag is not passed then you have to manually update the flow with `jc update flow rich-husky-af14064067 updated-flow.yml`

```

### List Secrets

To list all the Secrets created in a Flow's namespace:

```bash
jc secret list rich-husky-af14064067

```

```{figure} img/list_secrets.png
:width: 90%

```

### Get a Secret

To retrieve a Secret's details:

```bash
jc secret get mysecret rich-husky-af14064067

```

```{figure} img/get_secret.png
:width: 90%

```

### Remove Secret

```bash
jc secret remove rich-husky-af14064067 mysecret

```

### Update a Secret

You can update a Secret for a Flow.

```bash
jc secret update rich-husky-af14064067 mysecret --from-literal "{'env-name': 'secret-value'}"

```

```{tip}
You can optionally pass the `--update` flag to automatically update the Flow spec with the updated secret information. This flag will update the Flow which is hosted on the cloud. Finally, you can also optionally pass a Flow's yaml file path with `--path` to update the yaml file locally. Refer to [this](https://jina.ai/serve/cloud-nativeness/kubernetes/#deploy-flow-with-custom-environment-variables-and-secrets) section for more information.

```

```{caution}
Updating a Secret automatically restarts a Flow.

```

## Jobs

### Create a Job

To create a Job for a Flow:

```bash
jc job create job-name rich-husky-af14064067 image 'job entrypoint' --timeout 600 --backofflimit 2

```

```{tip}
`image` can be any Executor image passed to a Flow's Executor `uses` or any normal docker image prefixed with `docker://`

```

### List Jobs

To listg all Jobs created in a Flow's namespace:

```bash
jc jobs list rich-husky-af14064067

```

```{figure} img/list_jobs.png
:width: 90%

```

### Get a Job

To retrieve a Job's details:

```bash
jc job get myjob1 rich-husky-af14064067

```

```{figure} img/get_job.png
:width: 90%

```

### Remove Job

```bash
jc job remove rich-husky-af14064067 myjob1

```

### Get Job Logs

To get the Job logs:

```bash
jc job logs myjob1 -f rich-husky-af14064067

```

```{figure} img/job_logs.png
:width: 90%

```

## Deployments

### Deploy (2)

```{caution}
When `jcloud` deploys a deployment it automatically appends the following global arguments to the `deployment.yml`, if not present:

```

```yaml
jcloud:
  version: jina-version
  docarray: docarray-version

```

#### Single YAML file (2)

A self-contained YAML file, consisting of all configuration information at the [Deployment](https://jina.ai/serve/concepts/orchestration/deployment/)-level and [Executor](https://jina.ai/serve/concepts/serving/executor/)-level.

> A Deployment's `uses` parameter must follow the format `jinaai+docker://<username>/MyExecutor` (from [Executor Hub](https://cloud.jina.ai)) to avoid any local file dependencies:

```yaml

# deployment.yml

jtype: Deployment
with:
  protocol: grpc
  uses: jinaai+docker://jina-ai/Sentencizer

```

To deploy:

```bash
jc deployment deploy ./deployment.yaml

```

The Deployment is successfully deployed when you see:

```{figure} img/deployment/deploy.png
:width: 70%

```

---

You will get a Deployment ID, for example `pretty-monster-130a5ac952`. This ID is required to manage, view logs, and remove the Deployment.

Since this Deployment is deployed with the default gRPC protocol (feel free to change it to `http`), you can use `jina.Client` to access it:

```python
from jina import Client, Document

print(
    Client(host='grpcs://executor-pretty-monster-130a5ac952.wolf.jina.ai').post(
        on='/', inputs=Document(text='hello')
    )
)

```

(jcloud-deployoment-status)=

### Get status (2)

To get the status of a Deployment:

```bash
jc deployment status pretty-monster-130a5ac952

```

```{figure} img/deployment/status.png
:width: 70%

```

### List Deployments

To list all of your "Starting", "Serving", "Failed", "Updating", and "Paused" Deployments:

```bash
jc deployment list

```

```{figure} img/deployment/list.png
:width: 90%

```

You can also filter your Deployments by passing a phase:

```bash
jc deployment list --phase Deleted

```

```{figure} img/deployment/list_deleted.png
:width: 90%

```

Or see all Deployments:

```bash
jc deployment list --phase all

```

```{figure} img/deployment/list_all.png
:width: 90%

```

### Remove Deployments

You can remove a single Deployment, multiple Deployments, or even all Deployments by passing different commands to the `jc` executable at the command line.

To remove a single Deployment:

```bash
jc deployment remove pretty-monster-130a5ac952

```

To remove multiple Deployments:

```bash
jc deployment remove pretty-monster-130a5ac952 artistic-tuna-ab154c4dcc

```

To remove all Deployments:

```bash
jc deployment remove all

```

By default, removing all or multiple Deployments is an interactive process where you must give confirmation before each Deployment is deleted. To make it non-interactive, set the below environment variable before running the command:

```bash
export JCLOUD_NO_INTERACTIVE=1

```

### Update a Deployment

You can update a Deployment by providing an updated YAML.

To update a Deployment:

```bash
jc deployment update pretty-monster-130a5ac952 deployment.yml

```

```{figure} img/deployment/update.png
:width: 70%

```

### Pause / Resume Deployment

You have the option to pause a Deployment that is not currently in use but may be needed later. This will allow the Deployment to be resumed later when it is needed again by using `resume`.

To pause a Deployment:

```bash
jc deployment pause pretty-monster-130a5ac952

```

```{figure} img/deployment/pause.png
:width: 70%

```

To resume a Deployment:

```bash
jc eployment resume pretty-monster-130a5ac952

```

```{figure} img/deployment/resume.png
:width: 70%

```

### Restart Deployment

To restart a Deployment:

```bash
jc deployment restart pretty-monster-130a5ac952

```

```{figure} img/deployment/restart.png
:width: 70%

```

### Recreate a Deleted Deployment

To recreate a deleted Deployment:

```bash
jc deployment recreate pretty-monster-130a5ac952

```

```{figure} img/deployment/recreate.png
:width:  70%

```

### Scale a Deployment

You can also manually scale any Deployment.

```bash
jc deployment scale pretty-monster-130a5ac952 --replicas 2

```

```{figure} img/deployment/scale.png
:width: 70%

```

### Get Deployment logs

To get the Deployment logs:

```bash
jc deployment logs pretty-monster-130a5ac952

```

```{figure} img/deployment/logs.png
:width: 70%

```

## Configuration

Please refer to {ref}`Configuration <jcloud-configuration>` for configuring the Flow on Jina AI Cloud.

## Restrictions

Jina AI Cloud scales according to your needs. You can demand different instance types with GPU/memory/CPU predefined based on the needs of your Flows and Executors. If you have specific resource requirements, please contact us [on Discord](https://discord.jina.ai) or raise a [GitHub issue](https://github.com/jina-ai/jcloud/issues/new/choose).

```{admonition} Restrictions

* Deployments are only supported in the `us-east` region.

```

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/orchestration/add-executors.md

(add-executors)=

# Add Executors

## Define Executor with `uses`

An {class}`~jina.Executor`'s type is defined by the `uses` keyword:

````{tab} Deployment

```python

from jina import Deployment

dep = Deployment(uses=MyExec)

```

````

````{tab} Flow

```python

from jina import Flow

f = Flow().add(uses=MyExec)

```

````

Note that some usages are not supported on JCloud due to security reasons and the nature of facilitating local debugging.

| Local Dev | JCloud | `uses=...`                              | Description                                                                                               |
|-----------|--------|-----------------------------------------------|-----------------------------------------------------------------------------------------------------------|
| ✅         | ❌      | `ExecutorClass`                               | Use `ExecutorClass` from the inline context.                                                              |
| ✅         | ❌      | `'my.py_modules.ExecutorClass'`               | Use `ExecutorClass` from `my.py_modules`.                                                                 |
| ✅         | ✅      | `'executor-config.yml'`                       | Use an Executor from a YAML file defined by {ref}`Executor YAML interface <executor-yaml-spec>`.          |
| ✅         | ❌      | `'jinaai://jina-ai/TransformerTorchEncoder/'` | Use an Executor as Python source from Executor Hub.                                                       |
| ✅         | ✅      | `'jinaai+docker://jina-ai/TransformerTorchEncoder'`  | Use an Executor as a Docker container from Executor Hub.                                                      |
| ✅         | ❌      | `'docker://sentence-encoder'`                 | Use a pre-built Executor as a Docker container.                                                           |

````{admonition} Hint: Load multiple Executors from the same directory
:class: hint

You don't need to specify the parent directory for each Executor.
Instead, you can configure a common search path for all Executors:

```

.
├── app
│   └── ▶ main.py
└── executor
    ├── config1.yml
    ├── config2.yml
    └── my_executor.py

```

```{code-block} python

dep = Deployment(extra_search_paths=['../executor']).add(uses='config1.yml')) # Deployment
f = Flow(extra_search_paths=['../executor']).add(uses='config1.yml').add(uses='config2.yml') # Flow

```

````

(flow-configure-executors)=

## Configure Executors

You can set and override {class}`~jina.Executor` configuration when adding them to an Orchestration.

This example shows how to start a Flow with an Executor using the Python API:

````{tab} Deployment

```python

from jina import Deployment

dep = Deployment(
    uses='MyExecutor',
    py_modules=["executor.py"],
    uses_with={"parameter_1": "foo", "parameter_2": "bar"},
    uses_metas={
        "name": "MyExecutor",
        "description": "MyExecutor does a thing to the stuff in your Documents",
    },
    uses_requests={"/index": "my_index", "/search": "my_search", "/random": "foo"},
    workspace="some_custom_path",
)

with dep:
    ...

```

````

````{tab} Flow

```python

from jina import Flow

f = Flow().add(
    uses='MyExecutor',
    py_modules=["executor.py"],
    uses_with={"parameter_1": "foo", "parameter_2": "bar"},
    uses_metas={
        "name": "MyExecutor",
        "description": "MyExecutor does a thing to the stuff in your Documents",
    },
    uses_requests={"/index": "my_index", "/search": "my_search", "/random": "foo"},
    workspace="some_custom_path",
)

with f:
    ...

```

````

* `py_modules` is a list of strings that defines the Executor's Python dependencies;
* `uses_with` is a key-value map that defines the {ref}`arguments of the Executor'<executor-args>` `__init__` method.
* `uses_requests` is a key-value map that defines the {ref}`mapping from endpoint to class method<executor-requests>`. This is useful to overwrite the default endpoint-to-method mapping defined in the Executor python implementation.
* `uses_metas` is a key-value map that defines some of the Executor's {ref}`internal attributes<executor-metas>`. It contains the following fields:
  * `name` is a string that defines the name of the Executor;
  * `description` is a string that defines the description of this Executor. It is used in the automatic docs UI;
* `workspace` is a string that defines the {ref}`workspace <executor-workspace>`.

### Set `with` via `uses_with`

To set/override an Executor's `with` configuration, use `uses_with`. The `with` configuration refers to user-defined
constructor kwargs.

````{tab} Deployment

```python

from jina import Executor, requests, Deployment

class MyExecutor(Executor):
    def __init__(self, param1=1, param2=2, param3=3, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.param1 = param1
        self.param2 = param2
        self.param3 = param3

    @requests
    def foo(self, docs, **kwargs):
        print('param1:', self.param1)
        print('param2:', self.param2)
        print('param3:', self.param3)

dep = Deployment(uses=MyExecutor, uses_with={'param1': 10, 'param3': 30})

with dep:
    dep.post('/')

```

```text

      executor0@219662[L]:ready and listening
        gateway@219662[L]:ready and listening
           Deployment@219662[I]:🎉 Deployment is ready to use!
    🔗 Protocol:         GRPC
    🏠 Local access:    0.0.0.0:32825
    🔒 Private network:    192.168.1.101:32825
    🌐 Public address:    197.28.82.165:32825
param1: 10
param2: 2
param3: 30

```

````

````{tab} Flow

```python

from jina import Executor, requests, Flow

class MyExecutor(Executor):
    def __init__(self, param1=1, param2=2, param3=3, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.param1 = param1
        self.param2 = param2
        self.param3 = param3

    @requests
    def foo(self, docs, **kwargs):
        print('param1:', self.param1)
        print('param2:', self.param2)
        print('param3:', self.param3)

f = Flow().add(uses=MyExecutor, uses_with={'param1': 10, 'param3': 30})

with f:
    f.post('/')

```

```text

      executor0@219662[L]:ready and listening
        gateway@219662[L]:ready and listening
           Flow@219662[I]:🎉 Flow is ready to use!
    🔗 Protocol:         GRPC
    🏠 Local access:    0.0.0.0:32825
    🔒 Private network:    192.168.1.101:32825
    🌐 Public address:    197.28.82.165:32825
param1: 10
param2: 2
param3: 30

```

````

### Set `requests` via `uses_requests`

You can set/override an Executor's `requests` configuration and bind methods to custom endpoints.
In the following code:

* We replace the endpoint `/foo` bound to the `foo()` function with both `/non_foo` and `/alias_foo`.
* We add a new endpoint `/bar` for binding `bar()`.

Note the `all_req()` function is bound to **all** endpoints except those explicitly bound to other functions, i.e. `/non_foo`, `/alias_foo` and `/bar`.

````{tab} Deployment

```python

from jina import Executor, requests, Deployment

class MyExecutor(Executor):
    @requests
    def all_req(self, parameters, **kwargs):
        print(f'all req {parameters.get("recipient")}')

    @requests(on='/foo')
    def foo(self, parameters, **kwargs):
        print(f'foo {parameters.get("recipient")}')

    def bar(self, parameters, **kwargs):
        print(f'bar {parameters.get("recipient")}')

dep = Deployment(
    uses=MyExecutor,
    uses_requests={
        '/bar': 'bar',
        '/non_foo': 'foo',
        '/alias_foo': 'foo',
    },
)

with dep
    dep.post('/bar', parameters={'recipient': 'bar()'})
    dep.post('/non_foo', parameters={'recipient': 'foo()'})
    dep.post('/foo', parameters={'recipient': 'all_req()'})
    dep.post('/alias_foo', parameters={'recipient': 'foo()'})

```

```text

      executor0@221058[L]:ready and listening
        gateway@221058[L]:ready and listening
           Deployment@221058[I]:🎉 Deployment is ready to use!
    🔗 Protocol:         GRPC
    🏠 Local access:    0.0.0.0:36507
    🔒 Private network:    192.168.1.101:36507
    🌐 Public address:    197.28.82.165:36507
bar bar()
foo foo()
all req all_req()
foo foo()

```

````

````{tab} Flow

```python

from jina import Executor, requests, Flow

class MyExecutor(Executor):
    @requests
    def all_req(self, parameters, **kwargs):
        print(f'all req {parameters.get("recipient")}')

    @requests(on='/foo')
    def foo(self, parameters, **kwargs):
        print(f'foo {parameters.get("recipient")}')

    def bar(self, parameters, **kwargs):
        print(f'bar {parameters.get("recipient")}')

f = Flow().add(
    uses=MyExecutor,
    uses_requests={
        '/bar': 'bar',
        '/non_foo': 'foo',
        '/alias_foo': 'foo',
    },
)
with f:
    f.post('/bar', parameters={'recipient': 'bar()'})
    f.post('/non_foo', parameters={'recipient': 'foo()'})
    f.post('/foo', parameters={'recipient': 'all_req()'})
    f.post('/alias_foo', parameters={'recipient': 'foo()'})

```

```text

      executor0@221058[L]:ready and listening
        gateway@221058[L]:ready and listening
           Flow@221058[I]:🎉 Flow is ready to use!
    🔗 Protocol:         GRPC
    🏠 Local access:    0.0.0.0:36507
    🔒 Private network:    192.168.1.101:36507
    🌐 Public address:    197.28.82.165:36507
bar bar()
foo foo()
all req all_req()
foo foo()

```

````

### Set `metas` via `uses_metas`

To set/override an Executor's `metas` configuration, use `uses_metas`:

````{tab} Deployment

```python

from jina import Executor, requests, Deployment

class MyExecutor(Executor):
    @requests
    def foo(self, docs, **kwargs):
        print(self.metas.name)

dep = Deployment(
    uses=MyExecutor,
    uses_metas={'name': 'different_name'},
)

with dep:
    dep.post('/')

```

```text

      executor0@219291[L]:ready and listening
        gateway@219291[L]:ready and listening
           Deployment@219291[I]:🎉 Deployment is ready to use!
    🔗 Protocol:         GRPC
    🏠 Local access:    0.0.0.0:58827
    🔒 Private network:    192.168.1.101:58827
different_name

```

````

````{tab} Flow

```python

from jina import Executor, requests, Flow

class MyExecutor(Executor):
    @requests
    def foo(self, docs, **kwargs):
        print(self.metas.name)

flow = Flow().add(
    uses=MyExecutor,
    uses_metas={'name': 'different_name'},
)
with flow as f:
    f.post('/')

```

```text

      executor0@219291[L]:ready and listening
        gateway@219291[L]:ready and listening
           Flow@219291[I]:🎉 Flow is ready to use!
    🔗 Protocol:         GRPC
    🏠 Local access:    0.0.0.0:58827
    🔒 Private network:    192.168.1.101:58827
different_name

```

````

(external-executors)=

## Use external Executors

Usually an Orchestration starts and stops its own Executor(s). External Executors are owned by *other* Orchestrations, meaning they can reside on any machine and their lifetime are controlled by others.

Using external Executors is useful for sharing expensive Executors (like stateless, GPU-based encoders) between Orchestrations.

Both {ref}`served and shared Executors <serve-executor-standalone>` can be used as external Executors.

When you add an external Executor, you have to provide a `host` and `port`, and enable the `external` flag:

````{tab} Deployment

```python

from jina import Deployment

Deployment(host='123.45.67.89', port=12345, external=True)

# or

Deployment(host='123.45.67.89:12345', external=True)

```

````

````{tab} Flow

```python

from jina import Flow

Flow().add(host='123.45.67.89', port=12345, external=True)

# or (2)

Flow().add(host='123.45.67.89:12345', external=True)

```

````

The Orchestration doesn't start or stop this Executor and assumes that it is externally managed and available at `123.45.67.89:12345`.

Despite the lifetime control, the external Executor behaves just like a regular one. You can even add the same Executor to multiple Orchestrations.

### Enable TLS

You can also use external Executors with `tls`:

````{tab} Deployment

```python

from jina import Deployment

Deployment(host='123.45.67.89:443', external=True, tls=True)

```

````

````{tab} Flow

```python

from jina import Flow

Flow().add(host='123.45.67.89:443', external=True, tls=True)

```

````

After that, the external Executor behaves just like an internal one. You can even add the same Executor to multiple Orchestrations.

```{hint}
Using `tls` to connect to the External Executor is especially needed to use an external Executor deployed with JCloud. See the JCloud {ref}`documentation <jcloud>` for further details

```

### Pass arguments

External Executors may require extra configuration to run. Think about an Executor that requires authentication to run. You can pass the `grpc_metadata` parameter to the Executor. `grpc_metadata` is a dictionary of key-value pairs to be passed along with every gRPC request sent to that Executor.

````{tab} Deployment

```python

from jina import Deployment

Deployment(
    host='123.45.67.89',
    port=443,
    external=True,
    grpc_metadata={'authorization': '<TOKEN>'},
)

```

````

````{tab} Flow

```python

from jina import Flow

Flow().add(
    host='123.45.67.89',
    port=443,
    external=True,
    grpc_metadata={'authorization': '<TOKEN>'},
)

```

````

```{hint}
The `grpc_metadata` parameter here follows the `metadata` concept in gRPC. See [gRPC documentation](https://grpc.io/docs/what-is-grpc/core-concepts/#metadata) for details.

```

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/orchestration/deployment-args.md

| Name | Description | Type | Default |
|----|----|----|----|
| `name` | The name of this object.<br><br>    This will be used in the following places:<br>    - how you refer to this object in Python/YAML/CLI<br>    - visualization<br>    - log message header<br>    - ...<br><br>    When not given, then the default naming strategy will apply. | `string` | `None` |
| `workspace` | The working directory for any IO operations in this object. If not set, then derive from its parent `workspace`. | `string` | `None` |
| `log_config` | The config name or the absolute path to the YAML config file of the logger used in this object. | `string` | `default` |
| `quiet` | If set, then no log will be emitted from this object. | `boolean` | `False` |
| `quiet_error` | If set, then exception stack information will not be added to the log | `boolean` | `False` |
| `suppress_root_logging` | If set, then no root handlers will be suppressed from logging. | `boolean` | `False` |
| `uses` | The YAML path represents a flow. It can be either a local file path or a URL. | `string` | `None` |
| `reload` | If set, auto-reloading on file changes is enabled: the Flow will restart while blocked if  YAML configuration source is changed. This also applies apply to underlying Executors, if their source code or YAML configuration has changed. | `boolean` | `False` |
| `env` | The map of environment variables that are available inside runtime | `object` | `None` |
| `inspect` | The strategy on those inspect deployments in the flow.<br><br>    If `REMOVE` is given then all inspect deployments are removed when building the flow. | `string` | `COLLECT` |

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/orchestration/deployment.md

(deployment)=

# Deployment

```{important}
A Deployment is part of the orchestration layer {ref}`Orchestration <orchestration>`. Be sure to read up on that too!

```

A {class}`~jina.Deployment` orchestrates a single {class}`~jina.Executor` to accomplish a task. Documents are processed by Executors.

You can think of a Deployment as an interface to configure and launch your {ref}`microservice architecture <architecture-overview>`, while the heavy lifting is done by the {ref}`service <executor-cookbook>` itself.

(why-deployment)=

## Why use a Deployment?

Once you've learned about Documents, DocLists and Executors, you can split a big task into small independent modules and services.

* Deployments let you scale these Executors independently to match your requirements.
* Deployments let you easily use other cloud-native orchestrators, such as Kubernetes, to manage your service.

(create-deployment)=

## Create

The most trivial {class}`~jina.Deployment` is an empty one. It can be defined in Python or from a YAML file:

````{tab} Python

```python

from jina import Deployment

dep = Deployment()

```

````

````{tab} YAML

```yaml

jtype: Deployment

```

````

For production, you should define your Deployments with YAML. This is because YAML files are independent of the Python logic code and easier to maintain.

## Minimum working example

````{tab} Pythonic style

```python

from jina import Deployment, Executor, requests
from docarray import DocList, BaseDoc

class MyExecutor(Executor):
    @requests(on='/bar')
    def foo(self, docs: DocList[BaseDoc], **kwargs) -> DocList[BaseDoc]:
        print(docs)

dep = Deployment(name='myexec1', uses=MyExecutor)

with dep:
    dep.post(on='/bar', inputs=BaseDoc(), return_type=DocList[BaseDoc], on_done=print)

```

````

````{tab} Deployment-as-a-Service style

Server:

```python

from jina import Deployment, Executor, requests
from docarray import DocList, BaseDoc

class MyExecutor(Executor):
    @requests(on='/bar')
    def foo(self, docs: DocList[BaseDoc], **kwargs) -> DocList[BaseDoc]:
        print(docs)

dep = Deployment(port=12345, name='myexec1', uses=MyExecutor)

with dep:
    dep.block()

```

Client:

```python

from jina import Client
from docarray import DocList, BaseDoc

c = Client(port=12345)
c.post(on='/bar', inputs=BaseDoc(), return_type=DocList[BaseDoc], on_done=print)

```

````

````{tab} Load from YAML

`deployment.yml`:

```yaml

jtype: Deployment
name: myexec1
uses: FooExecutor
py_modules: exec.py

```

`exec.py`:

```python

from jina import Deployment, Executor, requests
from docarray import DocList, BaseDoc
from docarray.documents import TextDoc

class FooExecutor(Executor):
    @requests
    def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
        for doc in docs:
            doc.text = 'foo was here'
        docs.summary()
        return docs

```

```python

from jina import Deployment
from docarray import DocList, BaseDoc
from docarray.documents import TextDoc

dep = Deployment.load_config('deployment.yml')

with dep:
    try:
        dep.post(on='/bar', inputs=TextDoc(), on_done=print)
    except Exception as ex:
        # handle exception
        pass

```

````

```{caution}
The statement `with dep:` starts the Deployment, and exiting the indented with block stops the Deployment, including its Executors.
Exceptions raised inside the `with dep:` block will close the Deployment context manager. If you don't want this, use a `try...except` block to surround the statements that could potentially raise an exception.

```

## Convert between Python and YAML

A Python Deployment definition can easily be converted to/from a YAML definition:

````{tab} Load from YAML

```python

from jina import Deployment

dep = Deployment.load_config('flow.yml')

```

````

````{tab} Export to YAML

```python

from jina import Deployment

dep = Deployment()

dep.save_config('deployment.yml')

```

````

## Start and stop

When a {class}`~jina.Deployment` starts, all the replicated Executors will start as well, making it possible to {ref}`reach the service through its API <third-party-client>`.

There are three ways to start a Deployment: In Python, from a YAML file, or from the terminal.

* Generally in Python: use Deployment as a context manager.
* As an entrypoint from terminal: use `Jina CLI <cli>` and a Deployment YAML file.
* As an entrypoint from Python code: use Deployment as a context manager inside `if __name__ == '__main__'`
* No context manager, manually call {meth}`~jina.Deployment.start` and {meth}`~jina.Deployment.close`.

````{tab} General in Python

```python

from jina import Deployment

dep = Deployment()

with dep:
    pass

```text

````

````{tab} Jina-serve CLI entrypoint

```bash

jina deployment --uses deployment.yml

```

````

````{tab} Python entrypoint

```python

from jina import Deployment

dep = Deployment()

if __name__ == '__main__':
    with dep:
        pass

```text

````

````{tab} Python no context manager

```python

from jina import Deployment

dep = Deployment()

dep.start()

dep.close()

```

````

Your addresses and entrypoints can be found in the output. When you enable more features such as monitoring, HTTP gateway, TLS encryption, this display expands to contain more information.

(multiprocessing-spawn)=

### Set multiprocessing `spawn`

Some corner cases require forcing a `spawn` start method for multiprocessing, for example if you encounter "Cannot re-initialize CUDA in forked subprocess".

You can use `JINA_MP_START_METHOD=spawn` before starting the Python script to enable this.

```bash
JINA_MP_START_METHOD=spawn python app.py

```

```{caution}
In case you set `JINA_MP_START_METHOD=spawn`, make sure to use Flow as a context manager inside `if __name__ == '__main__'`.
The script entrypoint (starting the flow) [needs to be protected when using `spawn` start method](https://docs.python.org/3/library/multiprocessing.html#the-spawn-and-forkserver-start-methods).

```

````{hint}
There's no need to set this for Windows, as it only supports spawn method for multiprocessing.

````

## Serve

### Serve forever

In most scenarios, a Deployment should remain reachable for prolonged periods of time. This can be achieved from the terminal:

````{tab} Python

```python

from jina import Deployment

dep = Deployment()

with dep:
    dep.block()

````

````{tab} YAML

```shell
jina-serve deployment --uses deployment.yml

```

````

The `.block()` method blocks the execution of the current thread or process, enabling external clients to access the Deployment.

In this case, the Deployment can be stopped by interrupting the thread or process.

### Serve until an event

Alternatively, a `multiprocessing` or `threading` `Event` object can be passed to `.block()`, which stops the Deployment once set.

```python

from jina import Deployment
import threading

def start_deployment(stop_event):
    """start a blocking Deployment."""
    dep = Deployment()

    with dep:
        dep.block(stop_event=stop_event)

e = threading.Event()  # create new Event

t = threading.Thread(name='Blocked-Flow', target=start_flow, args=(e,))
t.start()  # start Deployment in new Thread

# do some stuff

e.set()  # set event and stop (unblock) the Deployment

```

## Export

A Deployment YAML can be exported as a Docker Compose YAML or Kubernetes YAML bundle.

(docker-compose-export)=

### Docker Compose

````{tab} Python

```python
from jina import Deployment

dep = Deployment()
dep.to_docker_compose_yaml()

```

````

````{tab} Terminal

```shell
jina-serve export docker-compose deployment.yml docker-compose.yml

```

````

This will generate a single `docker-compose.yml` file.

For advanced utilization of Docker Compose with Jina-serve, refer to {ref}`How to <docker-compose>`

(deployment-kubernetes-export)=

### Kubernetes

````{tab} Python

```python
from jina import Deployment

dep = Deployment
dep.to_kubernetes_yaml('dep_k8s_configuration')

```

````

````{tab} Terminal

```shell
jina-serve export kubernetes deployment.yml ./my-k8s

```

````

The generated folder can be used directly with `kubectl` to deploy the Deployment to an existing Kubernetes cluster.

For advanced utilisation of Kubernetes with Jina-serve please refer to {ref}`How to <kubernetes>`

```{tip}

Based on your local Jina version, Executor Hub may rebuild the Docker image during the YAML generation process.
If you do not wish to rebuild the image, set the environment variable `JINA_HUB_NO_IMAGE_REBUILD`.

```

```{tip}

If an Executor requires volumes to be mapped to persist data, Jina will create a StatefulSet for that Executor instead of a Deployment.
You can control the access mode, storage class name and capacity of the attached Persistent Volume Claim by using {ref}`Jina environment variables <jina-serve-env-vars>`
`JINA_K8S_ACCESS_MODES`, `JINA_K8S_STORAGE_CLASS_NAME` and `JINA_K8S_STORAGE_CAPACITY`. Only the first volume will be considered to be mounted.

```

```{admonition} See also

:class: seealso
For more in-depth guides on deployment, check our how-tos for {ref}`Docker compose <docker-compose>` and {ref}`Kubernetes <kubernetes>`.

```

```{caution}

The port or ports arguments are ignored when calling the Kubernetes YAML, Jina-serve will start the services binding to the ports 8080, except when multiple protocols
need to be served when the consecutive ports (8081, ...) will be used. This is because the Kubernetes service will direct the traffic from you and it is irrelevant
to the services around because in Kubernetes services communicate via the service names irrespective of the internal port.

```

(logging-configuration)=

## Logging

The default {class}`jina.logging.logger.JinaLogger` uses rich console logging that writes to the system console. The `log_config` argument can be used to pass in a string of the pre-configured logging configuration names in Jina-serve or the absolute YAML file path of the custom logging configuration. For most cases, the default logging configuration sufficiently covers local, Docker and Kubernetes environments.

Custom logging handlers can be configured by following the Python official [Logging Cookbook](https://docs.python.org/3/howto/logging-cookbook.html#logging-cookbook) examples. An example custom logging configuration file defined in a YAML file `logging.json.yml` is:

```yaml

handlers:

* StreamHandler

level: INFO
configs:
  StreamHandler:
    format: '%(asctime)s:{name:>15}@%(process)2d[%(levelname).1s]:%(message)s'
    formatter: JsonFormatter

```

The logging configuration can be used as follows:

````{tab} Python

```python
from jina import Deployment

dep = Deployment(log_config='./logging.json.yml')

```

````

````{tab} YAML

```yaml
jtype: Deployment
with:
    log_config: './logging.json.yml'

```

````

### Supported protocols

A Deployment can be used to deploy an Executor and serve it using `gRPC` or `HTTP` protocol, or a composition of them.

### gRPC protocol

gRPC is the default protocol used by a Deployment to expose Executors to the outside world, and is used to communicate between the Gateway and an Executor inside a Flow.

### HTTP protocol

HTTP can be used for a stand-alone Deployment (without being part of a Flow), which allows external services to connect via REST.

```python

from jina import Deployment, Executor, requests
from docarray import DocList
from docarray.documents import TextDoc

class MyExec(Executor):
    @requests
    def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
        for doc in docs:
            doc.text = 'foo was here'
        docs.summary()
        return docs

dep = Deployment(protocol='http', port=12345, uses=MyExec)

with dep:
    dep.block()

````

This will make it available at port 12345 and you can get the [OpenAPI schema](https://swagger.io/specification/) for the service.

```{figure} images/http-deployment-swagger.png

:scale: 70%

```

### Composite protocol

A Deployment can also deploy an Executor and serve it with a combination of gRPC and HTTP protocols.

```python

from jina import Deployment, Executor, requests
from docarray import DocList
from docarray.documents import TextDoc

class MyExec(Executor):
    @requests
    def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
        for doc in docs:
            doc.text = 'foo was here'
        docs.summary()
        return docs

dep = Deployment(protocol=['grpc', 'http'], port=[12345, 12346], uses=MyExec)

with dep:
    dep.block()

````

This will make the Deployment reachable via gRPC and HTTP simultaneously.

## Methods

The most important methods of the `Deployment` object are the following:

| Method                                                       | Description                                                                                                                                                                                                                                                                          |
|--------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| {meth}`~jina.Deployment.start()`                                   | Starts the Deployment. This will start all its Executors and check if they are ready to be used.                                                                                                                                                                                           |
| {meth}`~jina.Deployment.close()`                                   | Stops and closes the Deployment. This will stop and shutdown all its Executors.                                                                                                                                                                                                            |
| `with` context manager                                       | Uses the Deployment as a context manager. It will automatically start and stop your Deployment.                                                                                                                                                                                                   |                                                                |
| {meth}`~jina.clients.mixin.PostMixin.post()`                 | Sends requests to the Deployment API.                                                                                                                                                                                                                                                      |
| {meth}`~jina.Deployment.block()`                                   | Blocks execution until the program is terminated. This is useful to keep the Deployment alive so it can be used from other places (clients, etc).                                                                                                                                          |
| {meth}`~jina.Deployment.to_docker_compose_yaml()`                  | Generates a Docker-Compose file listing all Executors as services.                                                                                                                                                                                                                                                |
| {meth}`~jina.Deployment.to_kubernetes_yaml()`                      | Generates Kubernetes configuration files in `<output_directory>`. Based on your local Jina-serve version, Executor Hub may rebuild the Docker image during the YAML generation process. If you do not wish to rebuild the image, set the environment variable `JINA_HUB_NO_IMAGE_REBUILD`.                                                                                                                                   |
| {meth}`~jina.clients.mixin.HealthCheckMixin.is_deployment_ready()` | Check if the Deployment is ready to process requests. Returns a boolean indicating the readiness.                                                                                                                                                                                                                                                                                                                                 |

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/orchestration/flow-args.md

| Name | Description | Type | Default |
|----|----|----|----|
| `name` | The name of this object.<br><br>    This will be used in the following places:<br>    - how you refer to this object in Python/YAML/CLI<br>    - visualization<br>    - log message header<br>    - ...<br><br>    When not given, then the default naming strategy will apply. | `string` | `None` |
| `workspace` | The working directory for any IO operations in this object. If not set, then derive from its parent `workspace`. | `string` | `None` |
| `log_config` | The config name or the absolute path to the YAML config file of the logger used in this object. | `string` | `default` |
| `quiet` | If set, then no log will be emitted from this object. | `boolean` | `False` |
| `quiet_error` | If set, then exception stack information will not be added to the log | `boolean` | `False` |
| `suppress_root_logging` | If set, then no root handlers will be suppressed from logging. | `boolean` | `False` |
| `uses` | The YAML path represents a flow. It can be either a local file path or a URL. | `string` | `None` |
| `reload` | If set, auto-reloading on file changes is enabled: the Flow will restart while blocked if  YAML configuration source is changed. This also applies apply to underlying Executors, if their source code or YAML configuration has changed. | `boolean` | `False` |
| `env` | The map of environment variables that are available inside runtime | `object` | `None` |
| `inspect` | The strategy on those inspect deployments in the flow.<br><br>    If `REMOVE` is given then all inspect deployments are removed when building the flow. | `string` | `COLLECT` |

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/orchestration/flow.md

(flow-cookbook)=

# Flow

```{important}
A Flow is a set of {ref}`Deployments <deployment>`. Be sure to read up on those before diving more deeply into Flows!

```

A {class}`~jina.Flow` orchestrates {class}`~jina.Executor`s into a processing pipeline to accomplish a task. Documents "flow" through the pipeline and are processed by Executors.

You can think of Flow as an interface to configure and launch your {ref}`microservice architecture <architecture-overview>`, while the heavy lifting is done by the {ref}`services <executor-cookbook>` themselves. In particular, each Flow also launches a {ref}`Gateway <gateway>` service, which can expose all other services through an API that you define.

## Why use a Flow?

Once you've learned about  Documents, DocList and Executor,, you can split a big task into small independent modules and services.
But you need to chain them together to create, build ,and serve an application. Flows enable you to do exactly this.

* Flows connect microservices (Executors) to build a service with proper client/server style interfaces over HTTP, gRPC, or WebSockets.
* Flows let you scale these Executors independently to match your requirements.
* Flows let you easily use other cloud-native orchestrators, such as Kubernetes, to manage your service.

(create-flow)=

## Create

The most trivial {class}`~jina.Flow` is an empty one. It can be defined in Python or from a YAML file:

````{tab} Python

```python

from jina import Flow

f = Flow()

```

````

````{tab} YAML

```yaml

jtype: Flow

```

````

```{important}
All arguments received by {class}`~jina.Flow()` API will be propagated to other entities (Gateway, Executor) with the following exceptions:

* `uses` and `uses_with` won't be passed to Gateway
* `port`, `port_monitoring`, `uses` and `uses_with` won't be passed to Executor

```

```{tip}
An empty Flow contains only {ref}`the Gateway<gateway>`.

```

```{figure} images/zero-flow.svg
:scale: 70%

```

For production, you should define your Flows with YAML. This is because YAML files are independent of the Python logic code and easier to maintain.

## Minimum working example

````{tab} Pythonic style

```python

from jina import Flow, Executor, requests
from docarray import DocList, BaseDoc

class MyExecutor(Executor):
    @requests(on='/bar')
    def foo(self, docs: DocList[BaseDoc], **kwargs) -> DocList[BaseDoc]:
        print(docs)

f = Flow().add(name='myexec1', uses=MyExecutor)

with f:
    f.post(on='/bar', inputs=BaseDoc(), return_type=DocList[BaseDoc], on_done=print)

```

````

````{tab} Flow-as-a-Service style

Server:

```python

from jina import Flow, Executor, requests
from docarray import DocList, BaseDoc

class MyExecutor(Executor):
    @requests(on='/bar')
    def foo(self, docs: DocList[BaseDoc], **kwargs) -> DocList[BaseDoc]:
        print(docs)

f = Flow(port=12345).add(name='myexec1', uses=MyExecutor)

with f:
    f.block()

```

Client:

```python

from jina import Client, Document

c = Client(port=12345)
c.post(on='/bar', inputs=BaseDoc(), return_type=DocList[BaseDoc], on_done=print)

```

````

````{tab} Load from YAML

`my.yml`:

```yaml

jtype: Flow
executors:

* name: myexec1

    uses: FooExecutor
    py_modules: exec.py

```

`exec.py`:

```python

from jina import Deployment, Executor, requests
from docarray import DocList, BaseDoc
from docarray.documents import TextDoc

class FooExecutor(Executor):
    @requests
    def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
        for doc in docs:
            doc.text = 'foo was here'
        docs.summary()
        return docs

```

```python

from jina import Flow
from docarray import DocList, BaseDoc
from docarray.documents import TextDoc

f = Flow.load_config('my.yml')

with f:
    try:
        f.post(on='/bar', inputs=TextDoc(), on_done=print)
    except Exception as ex:
        # handle exception
        pass

```

````

```{caution}
The statement `with f:` starts the Flow, and exiting the indented with block stops the Flow, including all Executors defined in it.
Exceptions raised inside the `with f:` block will close the Flow context manager. If you don't want this, use a `try...except` block to surround the statements that could potentially raise an exception.

```

## Start and stop

When a {class}`~jina.Flow` starts, all included Executors (single for a Deployment, multiple for a Flow) will start as well, making it possible to {ref}`reach the service through its API <third-party-client>`.

There are three ways to start an Flow: In Python, from a YAML file, or from the terminal.

* Generally in Python: use Deployment or Flow as a context manager in Python.
* As an entrypoint from terminal: use `Jina CLI <cli>` and a Flow YAML file.
* As an entrypoint from Python code: use Flow as a context manager inside `if __name__ == '__main__'`
* No context manager: manually call {meth}`~jina.Flow.start` and {meth}`~jina.Flow.close`.

````{tab} General in Python

```python

from jina import Flow

f = Flow()

with f:
    pass

```

````

````{tab} Jina-serve CLI entrypoint

```bash

jina flow --uses flow.yml

```

````

````{tab} Python entrypoint

```python

from jina import Flow

f = Flow()

if __name__ == '__main__':
    with f:
        pass

```

````

````{tab} Python no context manager

```python

from jina import Flow

f = Flow()

f.start()

f.close()

```

````

The statement `with f:` starts the Flow, and exiting the indented `with` block stops the Flow, including all its Executors.

A successful start of a Flow looks like this:

```{figure} images/success-flow.png
:scale: 70%

```

Your addresses and entrypoints can be found in the output. When you enable more features such as monitoring, HTTP gateway, TLS encryption, this display expands to contain more information.

```{admonition} Multiprocessing spawn
:class: warning

Some corner cases require forcing a `spawn` start method for multiprocessing, for example if you encounter "Cannot re-initialize CUDA in forked subprocess". Read {ref}`more in the docs <multiprocessing-spawn>`

```

## Serve

### Serve forever

In most scenarios, a Flow should remain reachable for prolonged periods of time. This can be achieved from Python or the terminal:

````{tab} Python

```python

from jina import Flow

f = Flow()

with f:
    f.block()

```text

````

````{tab} Terminal

```shell

jina flow --uses flow.yml

```

````

In this case, the Flow can be stopped by interrupting the thread or process.

### Serve until an event

Alternatively, a `multiprocessing` or `threading` `Event` object can be passed to `.block()`, which stops the Flow once set.

```python
from jina import Flow
import threading

def start_flow(stop_event):
    """start a blocking Flow."""
    f = Flow()

    with f:
        f.block(stop_event=stop_event)

e = threading.Event()  # create new Event

t = threading.Thread(name='Blocked-Flow', target=start_flow, args=(e,))
t.start()  # start Flow in new Thread

# do some stuff

e.set()  # set event and stop (unblock) the Flow

```

### Serve on Google Colab

```{admonition} Example built with docarray<0.30
:class: note

This example is built using docarray<0.30 version. Most of the concepts are similar, but some APIs of how Executors are built change when using newer docarray version.

```

[Google Colab](https://colab.research.google.com/) provides an easy-to-use Jupyter notebook environment with GPU/TPU support. Flows are fully compatible with Google Colab and you can use it in the following ways:

```{figure} images/jina-on-colab.svg
:align: center
:width: 70%

```

```{button-link} https://colab.research.google.com/github/jina-ai/jina/blob/master/docs/Using_Jina_on_Colab.ipynb
:color: primary
:align: center

{octicon}`link-external` Open the notebook on Google Colab

```

Please follow the walkthrough and enjoy the free GPU/TPU!

```{tip}
Hosing services on Google Colab is not recommended if your server aims to be long-lived or permanent. It is often used for quick experiments, demonstrations or leveraging its free GPU/TPU. For stable, secure and free hosting of your Flow, check out [JCloud](https://jina.ai/serve/concepts/jcloud/).

```

## Export

A Flow YAML can be exported as a Docker Compose YAML or Kubernetes YAML bundle.

(docker-compose-export)=

### Docker Compose

````{tab} Python

```python

from jina import Flow

f = Flow().add()
f.to_docker_compose_yaml()

```

````

````{tab} Terminal

```shell

jina export docker-compose flow.yml docker-compose.yml

```

````

This will generate a single `docker-compose.yml` file.

For advanced utilization of Docker Compose with Jina, refer to {ref}`How to <docker-compose>`

(flow-kubernetes-export)=

### Kubernetes

````{tab} Python

```python

from jina import Flow

f = Flow().add()
f.to_kubernetes_yaml('flow_k8s_configuration')

```

````

````{tab} Terminal

```shell

jina export kubernetes flow.yml ./my-k8s

```

````

The generated folder can be used directly with `kubectl` to deploy the Flow to an existing Kubernetes cluster.

For advanced utilisation of Kubernetes with Jina please refer to {ref}`How to <kubernetes>`

```{tip}
Based on your local Jina-serve version, Executor Hub may rebuild the Docker image during the YAML generation process.
If you do not wish to rebuild the image, set the environment variable `JINA_HUB_NO_IMAGE_REBUILD`.

```

```{tip}
If an Executor requires volumes to be mapped to persist data, Jina-serve will create a StatefulSet for that Executor instead of a Deployment.
You can control the access mode, storage class name and capacity of the attached Persistent Volume Claim by using {ref}`Jina environment variables <jina-env-vars>`
`JINA_K8S_ACCESS_MODES`, `JINA_K8S_STORAGE_CLASS_NAME` and `JINA_K8S_STORAGE_CAPACITY`. Only the first volume will be considered to be mounted.

```

```{admonition} See also
:class: seealso
For more in-depth guides on Flow deployment, check our how-tos for {ref}`Docker compose <docker-compose>` and {ref}`Kubernetes <kubernetes>`.

```

```{caution}
The port or ports arguments are ignored when calling the Kubernetes YAML, Jina will start the services binding to the ports 8080, except when multiple protocols
need to be served when the consecutive ports (8081, ...) will be used. This is because the Kubernetes service will direct the traffic from you and it is irrelevant
to the services around because in Kubernetes services communicate via the service names irrespective of the internal port.

```

## Add Executors

```{important}
This section is for Flow-specific considerations when working with Executors. Check more information on {ref}`working with Executors <add-executors>`.

```

A {class}`~jina.Flow` orchestrates its {class}`~jina.Executor`s as a graph and sends requests to all Executors in the order specified by {meth}`~jina.Flow.add` or listed in {ref}`a YAML file<flow-yaml-spec>`.

When you start a Flow, Executors always run in **separate processes**. Multiple Executors run in **different processes**. Multiprocessing is the lowest level of separation when you run a Flow locally. When running a Flow on Kubernetes, Docker Swarm, {ref}`jcloud`, different Executors run in different containers, pods or instances.

Executors can be added into a Flow with {meth}`~jina.Flow.add`.

```python
from jina import Flow

f = Flow().add()

```

This adds an "empty" Executor called {class}`~jina.serve.executors.BaseExecutor` to the Flow. This Executor (without any parameters) performs no actions.

```{figure} images/no-op-flow.svg
:scale: 70%

```

To more easily identify an Executor, you can change its name by passing the `name` parameter:

```python
from jina import Flow

f = Flow().add(name='myVeryFirstExecutor').add(name='secondIsBest')

```

```{figure} images/named-flow.svg
:scale: 70%

```

You can also define the above Flow in YAML:

```yaml
jtype: Flow
executors:

* name: myVeryFirstExecutor
* name: secondIsBest

```

Save it as `flow.yml` and run it:

```bash
jina flow --uses flow.yml

```

More Flow YAML specifications can be found in {ref}`Flow YAML Specification<flow-yaml-spec>`.

### How Executors process Documents in a Flow

Let's understand how Executors process Documents's inside a Flow, and how changes are chained and applied, affecting downstream Executors in the Flow.

```python
from jina import Executor, requests, Flow
from docarray import DocList, BaseDoc
from docarray.documents import TextDoc

class PrintDocuments(Executor):
    @requests
    def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
        for doc in docs:
            print(f' PrintExecutor: received document with text: "{doc.text}"')
        return docs

class ProcessDocuments(Executor):
    @requests(on='/change_in_place')
    def in_place(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
        # This Executor only works on `docs` and doesn't consider any other arguments
        for doc in docs:
            print(f'ProcessDocuments: received document with text "{doc.text}"')
            doc.text = 'I changed the executor in place'

    @requests(on='/return_different_docarray')
    def ret_docs(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
        # This executor only works on `docs` and doesn't consider any other arguments
        ret = DocList[TextDoc]()
        for doc in docs:
            print(f'ProcessDocuments: received document with text: "{doc.text}"')
            ret.append(TextDoc(text='I returned a different Document'))
        return ret

f = Flow().add(uses=ProcessDocuments).add(uses=PrintDocuments)

with f:
    f.post(on='/change_in_place', inputs=DocList[TextDoc]([TextDoc(text='request1')]), return_type=DocList[TextDoc])
    f.post(
        on='/return_different_docarray', inputs=DocList[TextDoc]([TextDoc(text='request2')]), return_type=DocList[TextDoc]))
    )

```

```shell
────────────────────────── 🎉 Flow is ready to serve! ──────────────────────────
╭────────────── 🔗 Endpoint ───────────────╮
│  ⛓     Protocol                    GRPC  │
│  🏠       Local           0.0.0.0:58746  │
│  🔒     Private     192.168.1.187:58746  │
│  🌍      Public    212.231.186.65:58746  │
╰──────────────────────────────────────────╯

 ProcessDocuments: received document with text "request1"
 PrintExecutor: received document with text: "I changed the executor in place"
 ProcessDocuments: received document with text: "request2"
 PrintExecutor: received document with text: "I returned a different Document"

```

### Define topologies over Executors

{class}`~jina.Flow`s are not restricted to sequential execution. Internally they are modeled as graphs, so they can represent any complex, non-cyclic topology.

A typical use case for such a Flow is a topology with a common pre-processing part, but different indexers separating embeddings and data.

To define a custom topology you can use the `needs` keyword when adding an {class}`~jina.Executor`. By default, a Flow assumes that every Executor needs the previously added Executor.

```python
from jina import Executor, requests, Flow
from docarray import DocList
from docarray.documents import TextDoc

class FooExecutor(Executor):
    @requests
    async def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
        docs.append(TextDoc(text=f'foo was here and got {len(docs)} document'))

class BarExecutor(Executor):
    @requests
    async def bar(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
        docs.append(TextDoc(text=f'bar was here and got {len(docs)} document'))

class BazExecutor(Executor):
    @requests
    async def baz(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
        docs.append(TextDoc(text=f'baz was here and got {len(docs)} document'))

class MergeExecutor(Executor):
    @requests
    async def merge(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
        return docs

f = (
    Flow()
    .add(uses=FooExecutor, name='fooExecutor')
    .add(uses=BarExecutor, name='barExecutor', needs='fooExecutor')
    .add(uses=BazExecutor, name='bazExecutor', needs='fooExecutor')
    .add(uses=MergeExecutor, needs=['barExecutor', 'bazExecutor'])
)

```

```{figure} images/needs-flow.svg
:width: 70%
:align: center
Complex Flow where one Executor requires two Executors to process Documents beforehand

```

When sending message to this Flow,

```python
with f:
    print(f.post('/', return_type=DocList[TextDoc]).text)

```

This gives the output:

```text
['foo was here and got 0 document', 'bar was here and got 1 document', 'baz was here and got 1 document']

```

Both `BarExecutor` and `BazExecutor` only received a single `Document` from `FooExecutor` because they are run in parallel. The last Executor `executor3` receives both DocLists and merges them automatically.
This automated merging can be disabled with `no_reduce=True`. This is useful for providing custom merge logic in a separate Executor. In this case the last `.add()` call would look like `.add(needs=['barExecutor', 'bazExecutor'], uses=CustomMergeExecutor, no_reduce=True)`. This feature requires Jina >= 3.0.2.

## Chain Executors in Flow with different schemas

When using `docarray>=0.30.0`, when building a Flow you should ensure that the Document types used as input of an Executor match the schema
of the output of its incoming previous Flow.

For instance, this Flow will fail to start because the Document types are wrongly chained.

````{tab} Valid Flow

```{code-block} python

from jina import Executor, requests, Flow
from docarray import DocList, BaseDoc
from docarray.typing import NdArray
import numpy as np

class SimpleStrDoc(BaseDoc):
    text: str

class TextWithEmbedding(SimpleStrDoc):
    embedding: NdArray

class TextEmbeddingExecutor(Executor):
    @requests(on='/foo')
    def foo(docs: DocList[SimpleStrDoc], **kwargs) -> DocList[TextWithEmbedding]
        ret = DocList[TextWithEmbedding]()
        for doc in docs:
            ret.append(TextWithEmbedding(text=doc.text, embedding=np.ramdom.rand(10))
        return ret

class ProcessEmbedding(Executor):
    @requests(on='/foo')
    def foo(docs: DocList[TextWithEmbedding], **kwargs) -> DocList[TextWithEmbedding]
        for doc in docs:
            self.logger.info(f'Getting embedding with shape {doc.embedding.shape}')

flow = Flow().add(uses=TextEmbeddingExecutor, name='embed').add(uses=ProcessEmbedding, name='process')
with flow:
    flow.block()

```

````

````{tab} Invalid Flow

```{code-block} python

from jina import Executor, requests, Flow
from docarray import DocList, BaseDoc
from docarray.typing import NdArray
import numpy as np

class SimpleStrDoc(BaseDoc):
    text: str

class TextWithEmbedding(SimpleStrDoc):
    embedding: NdArray

class TextEmbeddingExecutor(Executor):
    @requests(on='/foo')
    def foo(docs: DocList[SimpleStrDoc], **kwargs) -> DocList[TextWithEmbedding]
        ret = DocList[TextWithEmbedding]()
        for doc in docs:
            ret.append(TextWithEmbedding(text=doc.text, embedding=np.ramdom.rand(10))
        return ret

class ProcessText(Executor):
    @requests(on='/foo')
    def foo(docs: DocList[SimpleStrDoc], **kwargs) -> DocList[TextWithEmbedding]
        for doc in docs:
            self.logger.info(f'Getting embedding with type {doc.text}')

# This Flow will fail to start because the input type of "process" does not match the output type of "embed"

flow = Flow().add(uses=TextEmbeddingExecutor, name='embed').add(uses=ProcessText, name='process')
with flow:
    flow.block()

```

````

Jina is also compatible with docarray<0.30, when using that version, only a single Document schema existed (equivalent to [LegacyDocument]() in docarray>0.30) and therefore
there were no explicit compatibility issues between schemas. However, the complexity was implicitly there (An Executor may expect a Document to be filled with `text` and only fail at Runtime).

(floating-executors)=

### Floating Executors

Some Executors in your Flow can be used for asynchronous background tasks that take time and don't generate a required output. For instance,
logging specific information in external services, storing partial results, etc.

You can unblock your Flow from such tasks by using *floating Executors*.

Normally, all Executors form a pipeline that handles and transforms a given request until it is finally returned to the Client.

However, floating Executors do not feed their outputs back into the pipeline. Therefore, the Executor's output does not affect the response for the Client, and the response can be returned without waiting for the floating Executor to complete its task.

Those Executors are marked with the `floating` keyword when added to a `Flow`:

```python
import time
from jina import Executor, requests, Flow
from docarray import DocList
from docarray.documents import TextDoc

class FastChangingExecutor(Executor):
    @requests()
    def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
        for doc in docs:
            doc.text = 'Hello World'

class SlowChangingExecutor(Executor):
    @requests()
    def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
        time.sleep(2)
        print(f' Received {docs.text}')
        for doc in docs:
            doc.text = 'Change the document but will not affect response'

f = (
    Flow()
    .add(name='executor0', uses=FastChangingExecutor)
    .add(
        name='floating_executor',
        uses=SlowChangingExecutor,
        needs=['gateway'],
        floating=True,
    )
)
with f:
    f.post(on='/endpoint', inputs=DocList[TextDoc]([TextDoc()]), return_type=DocList[TextDoc])  # we need to send a first
    start_time = time.time()
    response = f.post(on='/endpoint', inputs=DocList[TextDoc]([TextDoc(), TextDoc()]), return_type=DocList[TextDoc])
    end_time = time.time()
    print(f' Response time took {end_time - start_time}s')
    print(f' {response.texts}')

```

```text
 Response time took 0.011997222900390625s
 ['Hello World', 'Hello World']
 Received ['Hello World', 'Hello World']

```

In this example the response is returned without waiting for the floating Executor to complete. However, the Flow is not closed until
the floating Executor has handled the request.

You can plot the Flow and see the Executor is floating disconnected from the **Gateway**.

```{figure} images/flow_floating.svg
:width: 70%

```

A floating Executor can *never* come before a non-floating Executor in your Flow's {ref}`topology <flow-complex-topologies>`.

This leads to the following behaviors:

* **Implicit reordering**: When you add a non-floating Executor after a floating Executor without specifying its `needs` parameter, the non-floating Executor is chained after the previous non-floating one.

```python
from jina import Flow

f = Flow().add().add(name='middle', floating=True).add()
f.plot()

```

```{figure} images/flow_middle_1.svg
:width: 70%

```

* **Chaining floating Executors**: To chain more than one floating Executor, you need to add all of them with the `floating` flag, and explicitly specify the `needs` argument.

```python
from jina import Flow

f = Flow().add().add(name='middle', floating=True).add(needs=['middle'], floating=True)
f.plot()

```

```{figure} images/flow_chain_floating.svg
:width: 70%

```

* **Overriding the `floating` flag**: If you add a floating Executor as part of `needs` parameter of a non-floating Executor, then the floating Executor is no longer considered floating.

```python
from jina import Flow

f = Flow().add().add(name='middle', floating=True).add(needs=['middle'])
f.plot()

```

```{figure} images/flow_cancel_floating.svg
:width: 70%

```

(conditioning)=

### Add Conditioning

Sometimes you may not want all Documents to be processed by all Executors. For example when you process text and image Documents you want to forward them to different Executors depending on their data type.

You can set conditioning for every {class}`~jina.Executor` in the Flow. Documents that don't meet the condition will be removed before reaching that Executor. This allows you to build a selection control in the Flow.

#### Define conditions

To add a condition to an Executor, pass it to the `when` parameter of {meth}`~jina.Flow.add` method of the Flow. This then defines *when* a Document is processed by the Executor:

You can use the [MongoDB query language](https://www.mongodb.com/docs/compass/current/query/filter/#query-your-data) used in [docarray](https://docs.docarray.org/API_reference/utils/filter/) which follows  to specify a filter condition for each Executor.

```python
from jina import Flow

f = Flow().add(when={'tags__key': {'$eq': 5}})

```

Then only Documents that satisfy the `when` condition will reach the associated Executor. Any Documents that don't satisfy that condition won't reach the Executor.

If you are trying to separate Documents according to the data modality they hold, you need to choose a condition accordingly.

````{admonition} See Also
:class: seealso

In addition to `$exists` you can use a number of other operators to define your filter: `$eq`, `$gte`, `$lte`, `$size`, `$and`, `$or` and many more. For details, consult [MongoDB query language](https://www.mongodb.com/docs/compass/current/query/filter/#query-your-data) and [docarray](https://docs.docarray.org/API_reference/utils/filter/).

````

```python

# define filter conditions

text_condition = {'text': {'$exists': True}}
tensor_condition = {'tensor': {'$exists': True}}

```

These conditions specify that only Documents that hold data of a specific modality can pass the filter.

````{tab} Python

```{code-block} python

---
emphasize-lines: 16, 24
---

from jina import Flow, Executor, requests
from docarray import DocList, BaseDoc
from typing import Dict

class MyDoc(BaseDoc):
    text: str = ''
    tags: Dict[str, int]

class MyExec
    @requests
    def foo(self, docs: DocList[MyDoc], **kwargs) -> DocList[MyDoc]:
        for doc in docs:
            print(f'{doc.tags}')

f = Flow().add(uses=MyExec).add(uses=MyExec, when={'tags__key': {'$eq': 5}})  # Create the empty Flow, add condition

with f:  # Using it as a Context Manager starts the Flow
    ret = f.post(
        on='/search',
        inputs=DocList[MyDoc]([MyDoc(tags={'key': 5}), MyDoc(tags={'key': 4})]),
        return_type=DocList[MyDoc]
    )

for doc in ret:
    print(f'{doc.tags}')  # only the Document fulfilling the condition is processed and therefore returned.

```

```shell

{'key': 5.0}

```

````

````{tab} YAML

```yaml

jtype: Flow
executors:

* name: executor

    uses: MyExec
    when:
        tags__key:
            $eq: 5

```

```{code-block} python

---
emphasize-lines: 9
---

from jina import Flow

f = Flow.load_config('flow.yml')  # Load the Flow definition from Yaml file

with f:  # Using it as a Context Manager starts the Flow
    ret = f.post(
        on='/search',
        inputs=DocList[MyDoc]([MyDoc(tags={'key': 5}), MyDoc(tags={'key': 4})]),
        return_type=DocList[MyDoc]
    )

for doc in ret:
    print(f'{doc.tags}')  # only the Document fulfilling the condition is processed and therefore returned.

```

```shell

{'key': 5.0}

```

````

Note that if a Document does not satisfy the `when` condition of a filter, the filter removes the Document *for that entire branch of the Flow*.
This means that every Executor located behind a filter is affected by this, not just the specific Executor that defines the condition.
As with a real-life filter, once something fails to pass through it, it no longer continues down the pipeline.

Naturally, parallel branches in a Flow do not affect each other. So if a Document gets filtered out in only one branch, it can
still be used in the other branch, and also after the branches are re-joined:

````{tab} Parallel Executors

```{code-block} python

---
emphasize-lines: 18, 19
---

from jina import Flow, Executor, requests
from docarray import DocList, BaseDoc
from typing import Dict

class MyDoc(BaseDoc):
    text: str = ''
    tags: Dict[str, int]

class MyExec
    @requests
    def foo(self, docs: DocList[MyDoc], **kwargs) -> DocList[MyDoc]:
        for doc in docs:
            print(f'{doc.tags}')

f = (
    Flow()
    .add(uses=MyExec, name='first')
    .add(uses=MyExec, when={'tags__key': {'$eq': 5}}, needs='first', name='exec1')
    .add(uses=MyExec, when={'tags__key': {'$eq': 4}}, needs='first', name='exec2')
    .needs_all(uses=MyExec, name='join')
)

```

```{figure} images/conditional-flow.svg

:width: 70%
:align: center

```

```python

with f:
    ret = f.post(
        on='/search',
        inputs=DocList[MyDoc]([MyDoc(tags={'key': 5}), MyDoc(tags={'key': 4})]),
        return_type=DocList[MyDoc]
    )

for doc in ret:
    print(f'{doc.tags}')

```

```shell

{'key': 5.0}
{'key': 4.0}

```

````

````{tab} Sequential Executors

```{code-block} python

---
emphasize-lines: 18, 19
---

from jina import Flow, Executor, requests
from docarray import DocList, BaseDoc
from typing import Dict

class MyDoc(BaseDoc):
    text: str = ''
    tags: Dict[str, int]

class MyExec
    @requests
    def foo(self, docs: DocList[MyDoc], **kwargs) -> DocList[MyDoc]:
        for doc in docs:
            print(f'{doc.tags}')

f = (
    Flow()
    .add(uses=MyExec, name='first')
    .add(uses=MyExec, when={'tags__key': {'$eq': 5}}, name='exec1', needs='first')
    .add(uses=MyExec, when={'tags__key': {'$eq': 4}}, needs='exec1', name='exec2')
)

```

```{figure} images/sequential-flow.svg

:width: 70%

```

```python

with f:
    ret = f.post(
        on='/search',
        inputs=DocList[MyDoc]([MyDoc(tags={'key': 5}), MyDoc(tags={'key': 4})]),
        return_type=DocList[MyDoc]
    )

for doc in ret:
    print(f'{doc.tags}')

```

```shell

```

````

This feature is useful to prevent some specialized Executors from processing certain Documents.
It can also be used to build *switch-like nodes*, where some Documents pass through one branch of the Flow,
while other Documents pass through a different parallel branch.

Note that whenever a Document does not satisfy the condition of an Executor, it is not even sent to that Executor.
Instead, only a tailored Request without any payload is transferred.
This means that you can not only use this feature to build complex logic, but also to minimize your networking overhead.

(merging-upstream)=

### Merging upstream Documents

Often when you're building a Flow, you want an Executor to receive Documents from multiple upstream Executors.

```{figure} images/flow-merge-executor.svg
:width: 70%
:align: center

```

For this you can use the `docs_matrix` or `docs_map` parameters (part of Executor endpoints signature). These Flow-specific arguments that can be used alongside an Executor's {ref}`default arguments <endpoint-arguments>`:

```{code-block} python
---
emphasize-lines: 11, 12
---
from typing import Dict, Union, List, Optional
from jina import Executor, requests

from docarray import DocList

class MergeExec(Executor):
    @requests
    async def foo(
        self,
        docs: DocList[...],
        parameters: Dict,
        docs_matrix: Optional[List[DocList[...]]],
        docs_map: Optional[Dict[str, DocList[...]]],
    ) -> DocList[MyDoc]:
        pass

```

* Use `docs_matrix` to receive a List of all incoming DocLists from upstream Executors:

```python
[
    DocList[...](...),  # from Executor1
    DocList[...](...),  # from Executor2
    DocList[...](...),  # from Executor3
]

```

* Use `docs_map` to receive a Dict, where each item's key is the name of an upstream Executor and the value is the DocList coming from that Executor:

```python
{
    'Executor1': DocList[...](...),
    'Executor2': DocList[...](...),
    'Executor3': DocList[...](...),
}

```

(no-reduce)=

#### Reducing multiple DocLists to one DocList

The `no_reduce` argument determines whether DocLists are reduced into one when being received:

* To reduce all incoming DocLists into **one single DocList**, do not set `no_reduce` or set it to `False`. The `docs_map` and `docs_matrix` will be `None`.
* To receive **a list all incoming DocList** set `no_reduce` to `True`. The Executor will receive the DocLists independently under `docs_matrix` and `docs_map`.

```python
from jina import Flow, Executor, requests

from docarray import DocList, BaseDoc

class MyDoc(BaseDoc):
    text: str = ''

class Exec1(Executor):
    @requests
    def foo(self, docs: DocList[MyDoc], **kwargs) -> DocList[MyDoc]:
        for doc in docs:
            doc.text = 'Exec1'

class Exec2(Executor):
    @requests
    def foo(self, docs: DocList[MyDoc], **kwargs) -> DocList[MyDoc]:
        for doc in docs:
            doc.text = 'Exec2'

class MergeExec(Executor):
    @requests
    def foo(self, docs: DocList[MyDoc], docs_matrix, **kwargs) -> DocList[MyDoc]:
        documents_to_return = DocList[MyDoc]()
        for doc1, doc2 in zip(*docs_matrix):
            print(
                f'MergeExec processing pairs of Documents "{doc1.text}" and "{doc2.text}"'
            )
            documents_to_return.append(
                MyDoc(text=f'Document merging from "{doc1.text}" and "{doc2.text}"')
            )
        return documents_to_return

f = (
    Flow()
    .add(uses=Exec1, name='exec1')
    .add(uses=Exec2, name='exec2')
    .add(uses=MergeExec, needs=['exec1', 'exec2'], no_reduce=True)
)

with f:
    returned_docs = f.post(on='/', inputs=MyDoc(), return_type=DocList[MyDoc])

print(f'Resulting documents {returned_docs[0].text}')

```

```shell
────────────────────────── 🎉 Flow is ready to serve! ──────────────────────────
╭────────────── 🔗 Endpoint ───────────────╮
│  ⛓     Protocol                    GRPC │
│  🏠       Local           0.0.0.0:55761  │
│  🔒     Private     192.168.1.187:55761  │
│  🌍      Public    212.231.186.65:55761  │
╰──────────────────────────────────────────╯

MergeExec processing pairs of Documents "Exec1" and "Exec2"
Resulting documents Document merging from "Exec1" and "Exec2"

```

## Visualize

A {class}`~jina.Flow` has a built-in `.plot()` function which can be used to visualize the `Flow`:

```python
from jina import Flow

f = Flow().add().add()
f.plot('flow.svg')

```

```{figure} images/flow.svg
:width: 70%

```

```python
from jina import Flow

f = Flow().add(name='e1').add(needs='e1').add(needs='e1')
f.plot('flow-2.svg')

```

```{figure} images/flow-2.svg
:width: 70%

```

You can also do it in the terminal:

```bash
jina export flowchart flow.yml flow.svg

```

You can also visualize a remote Flow by passing the URL to `jina export flowchart`.

(logging-configuration)=

## Logging

The default {class}`jina.logging.logger.JinaLogger` uses rich console logging that writes to the system console. The `log_config` argument can be used to pass in a string of the pre-configured logging configuration names in Jina or the absolute YAML file path of the custom logging configuration. For most cases, the default logging configuration sufficiently covers local, Docker and Kubernetes environments.

Custom logging handlers can be configured by following the Python official [Logging Cookbook](https://docs.python.org/3/howto/logging-cookbook.html#logging-cookbook) examples. An example custom logging configuration file defined in a YAML file `logging.json.yml` is:

```yaml
handlers:

* StreamHandler

level: INFO
configs:
  StreamHandler:
    format: '%(asctime)s:{name:>15}@%(process)2d[%(levelname).1s]:%(message)s'
    formatter: JsonFormatter

```

The logging configuration can be used as follows:

````{tab} Python

```python

from jina import Flow

f = Flow(log_config='./logging.json.yml')

```

````

````{tab} YAML

```yaml

jtype: Flow
with:
    log_config: './logging.json.yml'

```

````

(logging-override)=

### Custom logging configuration

The default {ref}`logging <logging-configuration>` or custom logging configuration at the Flow level will be propagated to the `Gateway` and `Executor` entities. If that is not desired, every `Gateway` or `Executor` entity can be provided with its own custom logging configuration.

You can configure two different `Executors` as in the below example:

```python
from jina import Flow

f = (
    Flow().add(log_config='./logging.json.yml').add(log_config='./logging.file.yml')
)  # Create a Flow with two Executors

```

`logging.file.yml` is another YAML file with a custom `FileHandler` configuration.

````{hint}
Refer to {ref}`Gateway logging configuration <gateway-logging-configuration>` section for configuring the `Gateway` logging.

````

````{caution}
When exporting the Flow to Kubernetes, the log_config file path must refer to the absolute local path of each container. The custom logging
file must be included during the containerization process. If the availability of the file is unknown then its best to rely on the default
configuration. This restriction also applies to dockerized `Executors`. When running a dockerized Executor locally, the logging configuration
file can be mounted using {ref}`volumes <mount-local-volumes>`.

````

## Methods

The most important methods of the `Flow` object are the following:

| Method                                                       | Description                                                                                                                                                                                                                                                                          |
|--------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| {meth}`~jina.Flow.add`                                       | Adds an Executor to the Flow                                                                                                                                                                                                                                                         |
| {meth}`~jina.Flow.start()`                                   | Starts the Flow. This will start all its Executors and check if they are ready to be used.                                                                                                                                                                                           |
| {meth}`~jina.Flow.close()`                                   | Stops and closes the Flow. This will stop and shutdown all its Executors.                                                                                                                                                                                                            |
| `with` context manager                                       | Uses the Flow as a context manager. It will automatically start and stop your Flow.                                                                                                                                                                                                   |                                                                |
| {meth}`~jina.Flow.plot()`                                    | Visualizes the Flow. Helpful for building complex pipelines.                                                                                                                                                                                                                         |
| {meth}`~jina.clients.mixin.PostMixin.post()`                 | Sends requests to the Flow API.                                                                                                                                                                                                                                                      |
| {meth}`~jina.Flow.block()`                                   | Blocks execution until the program is terminated. This is useful to keep the Flow alive so it can be used from other places (clients, etc).                                                                                                                                          |
| {meth}`~jina.Flow.to_docker_compose_yaml()`                  | Generates a Docker-Compose file listing all Executors as services.                                                                                                                                                                                                                                                |
| {meth}`~jina.Flow.to_kubernetes_yaml()`                      | Generates Kubernetes configuration files in `<output_directory>`. Based on your local Jina and docarray versions, Executor Hub may rebuild the Docker image during the YAML generation process. If you do not wish to rebuild the image, set the environment variable `JINA_HUB_NO_IMAGE_REBUILD`.                                                                                                                                   |
| {meth}`~jina.clients.mixin.HealthCheckMixin.is_flow_ready()` | Check if the Flow is ready to process requests. Returns a boolean indicating the readiness.                                                                                                                                                                                                                                                                                                                                 |

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/orchestration/gateway-args.md

| Name | Description | Type | Default |
|----|----|----|----|
| `name` | The name of this object.<br><br>    This will be used in the following places:<br>    - how you refer to this object in Python/YAML/CLI<br>    - visualization<br>    - log message header<br>    - ...<br><br>    When not given, then the default naming strategy will apply. | `string` | `gateway` |
| `workspace` | The working directory for any IO operations in this object. If not set, then derive from its parent `workspace`. | `string` | `None` |
| `log_config` | The config name or the absolute path to the YAML config file of the logger used in this object. | `string` | `default` |
| `quiet` | If set, then no log will be emitted from this object. | `boolean` | `False` |
| `quiet_error` | If set, then exception stack information will not be added to the log | `boolean` | `False` |
| `timeout_ctrl` | The timeout in milliseconds of the control request, -1 for waiting forever | `number` | `60` |
| `entrypoint` | The entrypoint command overrides the ENTRYPOINT in Docker image. when not set then the Docker image ENTRYPOINT takes effective. | `string` | `None` |
| `docker_kwargs` | Dictionary of kwargs arguments that will be passed to Docker SDK when starting the docker '<br>container. <br><br>More details can be found in the Docker SDK docs:  https://docker-py.readthedocs.io/en/stable/ | `object` | `None` |
| `prefetch` | Number of requests fetched from the client before feeding into the first Executor. <br>    <br>    Used to control the speed of data input into a Flow. 0 disables prefetch (1000 requests is the default) | `number` | `1000` |
| `title` | The title of this HTTP server. It will be used in automatics docs such as Swagger UI. | `string` | `None` |
| `description` | The description of this HTTP server. It will be used in automatics docs such as Swagger UI. | `string` | `None` |
| `cors` | If set, a CORS middleware is added to FastAPI frontend to allow cross-origin access. | `boolean` | `False` |
| `no_debug_endpoints` | If set, `/status` `/post` endpoints are removed from HTTP interface. | `boolean` | `False` |
| `no_crud_endpoints` | If set, `/index`, `/search`, `/update`, `/delete` endpoints are removed from HTTP interface.<br><br>        Any executor that has `@requests(on=...)` bound with those values will receive data requests. | `boolean` | `False` |
| `expose_endpoints` | A JSON string that represents a map from executor endpoints (`@requests(on=...)`) to HTTP endpoints. | `string` | `None` |
| `uvicorn_kwargs` | Dictionary of kwargs arguments that will be passed to Uvicorn server when starting the server<br><br>More details can be found in Uvicorn docs: https://www.uvicorn.org/settings/ | `object` | `None` |
| `ssl_certfile` | the path to the certificate file | `string` | `None` |
| `ssl_keyfile` | the path to the key file | `string` | `None` |
| `expose_graphql_endpoint` | If set, /graphql endpoint is added to HTTP interface. | `boolean` | `False` |
| `protocol` | Communication protocol of the server exposed by the Gateway. This can be a single value or a list of protocols, depending on your chosen Gateway. Choose the convenient protocols from: ['GRPC', 'HTTP', 'WEBSOCKET']. | `array` | `[<GatewayProtocolType.GRPC: 0>]` |
| `host` | The host address of the runtime, by default it is 0.0.0.0. | `string` | `0.0.0.0` |
| `proxy` | If set, respect the http_proxy and https_proxy environment variables. otherwise, it will unset these proxy variables before start. gRPC seems to prefer no proxy | `boolean` | `False` |
| `uses` | The config of the gateway, it could be one of the followings:<br>        * the string literal of an Gateway class name<br>        * a Gateway YAML file (.yml, .yaml, .jaml)<br>        * a docker image (must start with `docker://`)<br>        * the string literal of a YAML config (must start with `!` or `jtype: `)<br>        * the string literal of a JSON config<br><br>        When use it under Python, one can use the following values additionally:<br>        - a Python dict that represents the config<br>        - a text file stream has `.read()` interface | `string` | `None` |
| `uses_with` | Dictionary of keyword arguments that will override the `with` configuration in `uses` | `object` | `None` |
| `py_modules` | The customized python modules need to be imported before loading the gateway<br><br>Note that the recommended way is to only import a single module - a simple python file, if your<br>gateway can be defined in a single file, or an ``__init__.py`` file if you have multiple files,<br>which should be structured as a python package. | `array` | `None` |
| `replicas` | The number of replicas of the Gateway. This replicas will only be applied when converted into Kubernetes YAML | `number` | `1` |
| `grpc_server_options` | Dictionary of kwargs arguments that will be passed to the grpc server as options when starting the server, example : {'grpc.max_send_message_length': -1} | `object` | `None` |
| `graph_description` | Routing graph for the gateway | `string` | `{}` |
| `graph_conditions` | Dictionary stating which filtering conditions each Executor in the graph requires to receive Documents. | `string` | `{}` |
| `deployments_addresses` | JSON dictionary with the input addresses of each Deployment | `string` | `{}` |
| `deployments_metadata` | JSON dictionary with the request metadata for each Deployment | `string` | `{}` |
| `deployments_no_reduce` | list JSON disabling the built-in merging mechanism for each Deployment listed | `string` | `[]` |
| `compression` | The compression mechanism used when sending requests from the Head to the WorkerRuntimes. For more details, check https://grpc.github.io/grpc/python/grpc.html#compression. | `string` | `None` |
| `timeout_send` | The timeout in milliseconds used when sending data requests to Executors, -1 means no timeout, disabled by default | `number` | `None` |
| `runtime_cls` | The runtime class to run inside the Pod | `string` | `GatewayRuntime` |
| `timeout_ready` | The timeout in milliseconds of a Pod waits for the runtime to be ready, -1 for waiting forever | `number` | `600000` |
| `env` | The map of environment variables that are available inside runtime | `object` | `None` |
| `env_from_secret` | The map of environment variables that are read from kubernetes cluster secrets | `object` | `None` |
| `floating` | If set, the current Pod/Deployment can not be further chained, and the next `.add()` will chain after the last Pod/Deployment not this current one. | `boolean` | `False` |
| `reload` | If set, the Gateway will restart while serving if YAML configuration source is changed. | `boolean` | `False` |
| `port` | The port for input data to bind the gateway server to, by default, random ports between range [49152, 65535] will be assigned. The port argument can be either 1 single value in case only 1 protocol is used or multiple values when many protocols are used. | `number` | `random in [49152, 65535]` |
| `monitoring` | If set, spawn an http server with a prometheus endpoint to expose metrics | `boolean` | `False` |
| `port_monitoring` | The port on which the prometheus server is exposed, default is a random port between [49152, 65535] | `number` | `random in [49152, 65535]` |
| `retries` | Number of retries per gRPC call. If <0 it defaults to max(3, num_replicas) | `number` | `-1` |
| `tracing` | If set, the sdk implementation of the OpenTelemetry tracer will be available and will be enabled for automatic tracing of requests and customer span creation. Otherwise a no-op implementation will be provided. | `boolean` | `False` |
| `traces_exporter_host` | If tracing is enabled, this hostname will be used to configure the trace exporter agent. | `string` | `None` |
| `traces_exporter_port` | If tracing is enabled, this port will be used to configure the trace exporter agent. | `number` | `None` |
| `metrics` | If set, the sdk implementation of the OpenTelemetry metrics will be available for default monitoring and custom measurements. Otherwise a no-op implementation will be provided. | `boolean` | `False` |
| `metrics_exporter_host` | If tracing is enabled, this hostname will be used to configure the metrics exporter agent. | `string` | `None` |
| `metrics_exporter_port` | If tracing is enabled, this port will be used to configure the metrics exporter agent. | `number` | `None` |
| `stateful` | If set, start consensus module to make sure write operations are properly replicated between all the replicas | `boolean` | `False` |
| `pod_ports` | When using StatefulExecutors, if they want to restart it is important to keep the RAFT cluster configuration | `number` | `None` |

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/orchestration/handle-exceptions.md

(flow-error-handling)=

# Handle Exceptions

When building a complex solution, things sometimes go wrong. Jina-serve does its best to recover from failures, handle them gracefully, and report useful failure information to the user.

The following outlines (more or less) common failure cases, and explains how Jina-serve responds to each.

## Executor errors

In general there are two places where an Executor level error can happen:

* If an {class}`~jina.Executor`'s `__init__` method raises an Exception, the Orchestration cannot start. In this case this Executor runtime raises the Exception, and the Orchestration throws a `RuntimeFailToStart` Exception.
* If one of the Executor's `@requests` methods raises an Exception, the error message is added to the response and sent back to the client. If the gRPC or WebSockets protocols are used, the networking stream is not interrupted and can accept further requests.

In both cases, the {ref}`Jina Client <client>` raises an Exception.

### Terminate an Executor on certain errors

Some exceptions like network errors or request timeouts can be transient and can recover automatically. Sometimes fatal errors or user-defined errors put the Executor in an unusable state, in which case it can be restarted. Locally the Orchestration must be re-run manually to restore Executor availability.

On Kubernetes deployments, this can be automated by terminating the Executor process, causing the Pod to terminate. The autoscaler restores availability by creating a new Pod to replace the terminated one. Termination can be enabled for one or more errors by using the `exit_on_exceptions` argument when adding the Executor to an Orchestration When it matches the caught exception, the Executor terminates gracefully.

A sample Orchestration can be `Deployment(uses=MyExecutor, exit_on_exceptions: ['Exception', 'RuntimeException'])`. The `exit_on_exceptions` argument accepts a list of Python or user-defined Exception or Error class names.

## Network errors

When an Orchestration Gateway can't reach an {ref}`Executor or Head <architecture-overview>`, the Orchestration attempts to re-connect to the faulty deployment according to a retry policy. The same applies to calls to Executors that time out. The specifics of this policy depend on the Orchestration's environment, as outlined below.

````{admonition} Hint: Prevent Executor timeouts
:class: hint
If you regularly experience Executor call timeouts, set the Orchestration's `timeout_send` attribute to a larger value
by setting `Deployment(timeout_send=time_in_ms)` or `Flow(timeout_send=time_in_ms)` in Python
or `timeout_send: time_in_ms` in your Orchestration YAML with-block.

Neural network forward passes on CPU (and other unusually expensive operations) commonly lead to timeouts with the default setting.

````

````{admonition} Hint: Custom retry policy
:class: hint
You can override the default retry policy and instead choose a number of retries performed for each Executor
with `Orchestration(retries=n)` in Python, or `retries: n` in the Orchestration
YAML `with` block.

````

If, during the complete execution of this policy, no successful call to any Executor replica can be made, the request is aborted and the failure is {ref}`reported to the client <failure-reporting>`.

### Request retries: Local deployment

If an Orchestration is deployed locally (with or without {ref}`containerized Executors <dockerize-exec>`), the following policy for failed requests applies on a per-Executor basis:

* If there are multiple replicas of the target Executor, try each replica at least once, or until the request succeeds.
* Irrespective of the number of replicas, try the request at least three times, or until it succeeds. If there are fewer than three replicas, try them in a round-robin fashion.

### Request retries: Deployment with Kubernetes

If an Orchestration is {ref}`deployed in Kubernetes <kubernetes>` without a service mesh, retries cannot be distributed to different replicas of the same Executor.

````{admonition} See Also
:class: seealso

The impossibility of retries across different replicas is a limitation of Kubernetes in combination with gRPC.
If you want to learn more about this limitation, see [this](https://kubernetes.io/blog/2018/11/07/grpc-load-balancing-on-kubernetes-without-tears/) Kubernetes blog post.

An easy way to overcome this limitation is to use a service mesh like [Linkerd](https://linkerd.io/).

````

Concretely, this results in the following per-Executor retry policy:

* Try the request three times, or until it succeeds, always on the same replica of the Executor

### Request retries: Deployment with Kubernetes and service mesh

A Kubernetes service mesh can enable load balancing, and thus retries, between an Executor's replicas.

````{admonition} Hint
:class: hint
While Jina supports any service mesh, the output of `f.to_kubernetes_yaml()` already includes the necessary annotations for [Linkerd](https://linkerd.io/).

````

If a service mesh is installed alongside Jina-serve in the Kubernetes cluster, the following retry policy applies for each Executor:

* Try the request at least three times, or until it succeeds
* Distribute the requests to the replicas according to the service mesh's configuration

````{admonition} Caution
:class: caution

Many service meshes have the ability to perform retries themselves. Be careful about setting up service mesh level retries in combination with Jina, as it may lead to unwanted behaviour in combination with Jina's own retry policy.

Instead, you may want to disable Jina level retries by setting `Orchestration(retries=0)` or `Deployment(retries=0)` in Python, or `retries: 0` in the Orchestration YAML `with` block.

````

(failure-reporting)=

### Failure reporting

If the retry policy is exhausted for a given request, the error is reported back to the corresponding client.

The resulting error message contains the *network address* of the failing Executor. If multiple replicas are present, all addresses are reported - unless the Orchestration is deployed using Kubernetes, in which case the replicas are managed by Kubernetes and only a single address is available.

Depending on the client-to-gateway protocol, and the type of error, the error message is returned in one of the following ways:

**Could not connect to Executor:**

* **gRPC**: A response with the gRPC status code 14 (*UNAVAILABLE*) is issued, and the error message is contained in the `details` field.
* **HTTP**: A response with the HTTP status code 503 (*SERVICE_UNAVAILABLE*) is issued, and the error message is contained in `response['header']['status']['description']`.
* **WebSockets**: The stream closes with close code 1011 (*INTERNAL_ERROR*) and the message is contained in the WebSocket close message.

**Call to Executor timed out:**

* **gRPC**: A response with the gRPC status code 4 (*DEADLINE_EXCEEDED*) is issued, and the error message is contained in the `details` field.
* **HTTP**: A response with the HTTP status code 504 (*GATEWAY_TIMEOUT*) is issued, and the error message is contained in `response['header']['status']['description']`.
* **WebSockets**: The stream closes with close code 1011 (*INTERNAL_ERROR*) and the message is contained in the WebSockets close message.

For any of these scenarios, the {ref}`Jina Client <client>` raises a `ConnectionError` containing the error message.

## Debug via breakpoint

Standard Python breakpoints don't work inside `Executor` methods when called inside an Orchestration context manager. Nevertheless, `import epdb; epdb.set_trace()` works just like a native Python breakpoint. Note that you need to `pip install epdb` to access this type of breakpoints.

```{admonition} Debugging in Flows
:class: info

The below code is for Deployments, but can easily be adapted for Flows.

```

````{tab} ✅ Do

```{code-block} python

---
emphasize-lines: 7
---
from jina import Deployment, Executor, requests

class CustomExecutor(Executor):
    @requests
    def foo(self, **kwargs):
        a = 25
        import epdb; epdb.set_trace()
        print(f'\n\na={a}\n\n')

def main():
    dep = Deployment(uses=CustomExecutor)
    with dep:
        dep.post(on='')

if __name__ == '__main__':
    main()

```

````

````{tab} 😔 Don't

```{code-block} python

---
emphasize-lines: 7
---
from jina import Deployment, Executor, requests

class CustomExecutor(Executor):
    @requests
    def foo(self, **kwargs):
        a = 25
        breakpoint()
        print(f'\n\na={a}\n\n')

def main():
    dep = Deployment(uses=CustomExecutor)
    with dep:
        dep.post(on='')

if __name__ == '__main__':
    main()

```

````

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/orchestration/health-check.md

# Health Check

Once an Orchestration is running, you can use `jina ping` [CLI](../../api/jina_cli.rst) to run a health check of the complete Orchestration or (in the case of a Flow) individual Executors or Gateway.

````{tab} Deployment
Start a Deployment in Python:

```python

from jina import Deployment

dep = Deployment(protocol='grpc', port=12345)

with dep:
    dep.block()

```

Check the readiness of the Deployment:

```bash

jina ping deployment grpc://localhost:12345

```

````

````{tab} Flow
Start a Flow in Python:

```python

from jina import Flow

f = Flow(protocol='grpc', port=12345).add(port=12346)

with f:
    f.block()

```

Check the readiness of the Flow:

```bash

jina ping flow grpc://localhost:12345

```

You can also check the readiness of an individual Executor:

```bash

jina ping executor localhost:12346

```

...or the readiness of the Gateway service:

```bash

jina ping gateway  grpc://localhost:12345

```

````

When these commands succeed, you should see something like:

```text
INFO   JINA@28600 readiness check succeeded 1 times!!!

```

```{admonition} Use in Kubernetes
:class: note
The CLI exits with code 1 when the readiness check is not successful, which makes it a good choice to be used as readinessProbe for Executor and Gateway when
deployed in Kubernetes.

```

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/orchestration/hot-reload.md

# Hot Reload

While developing your Orchestration, you may want it to reload automatically as you change the YAML configuration.

For this you can use the Orchestration's `reload` argument to reload it with the updated configuration every time you change the YAML configuration.

````{admonition} Caution
:class: caution
This feature aims to let developers iterate faster while developing, but is not intended for production use.

````

````{admonition} Note
:class: note
This feature requires `watchfiles>=0.18` to be installed.

````

````{tab} Deployment
To see how this works, let's define a Deployment in `deployment.yml` with a `reload` option:

```yaml

jtype: Deployment
uses: ConcatenateTextExecutor
uses_with:
  text_to_concat: foo
with:
  port: 12345
  reload: True

```

Load and expose the Orchestration:

```python

import os
from jina import Deployment, Executor, requests
from docarray import DocList
from docarray.documents import TextDoc

class ConcatenateTextExecutor(Executor):
    @requests
    def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
        for doc in docs:
            doc.text += text_to_concat
        return docs

os.environ['JINA_LOG_LEVEL'] = 'DEBUG'

dep = Deployment.load_config('deployment.yml')

with dep:
    dep.block()

```

You can see that the Orchestration is running and serving:

```python

from jina import Client
from docarray import DocList
from docarray.documents import TextDoc

c = Client(port=12345)

print(c.post(on='/', inputs=DocList[TextDoc](TextDoc()), return_type=DocList[TextDoc])[0].text)

```

```text

foo

```

You can edit the Orchestration YAML file and save the changes:

```yaml

jtype: Deployment
uses: ConcatenateTextExecutor
uses_with:
  text_to_concat: bar
with:
  port: 12345
  reload: True

```

You should see the following in the Orchestration's logs:

```text

INFO   Deployment@28301 change in Deployment YAML deployment.yml observed, restarting Deployment

```

After this, the behavior of the Deployment's Executor will change:

```python

from jina import Client
from docarray import DocList
from docarray.documents import TextDoc

c = Client(port=12345)

print(c.post(on='/', inputs=DocList[TextDoc](TextDoc()), return_type=DocList[TextDoc])[0].text)

```

```text

bar

```

````

````{tab} Flow
To see how this works, let's define a Flow in `flow.yml` with a `reload` option:

```yaml

jtype: Flow
with:
  port: 12345
  reload: True
executors:

* name: exec1

  uses: ConcatenateTextExecutor

```

Load and expose the Orchestration:

```python

import os
from jina import Deployment, Executor, requests
from docarray import DocList
from docarray.documents import TextDoc

class ConcatenateTextExecutor(Executor):
    @requests
    def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
        for doc in docs:
            doc.text += text_to_concat
        return docs

os.environ['JINA_LOG_LEVEL'] = 'DEBUG'

f = Flow.load_config('flow.yml')

with f:
    f.block()

```

You can see that the Flow is running and serving:

```python

from jina import Client
from docarray import DocList
from docarray.documents import TextDoc

c = Client(port=12345)

print(c.post(on='/', inputs=DocList[TextDoc](TextDoc()), return_type=DocList[TextDoc])[0].text)

```

```text

add text

```

You can edit the Flow YAML file and save the changes:

```yaml

jtype: Flow
with:
  port: 12345
  reload: True
executors:

* name: exec1

  uses: ConcatenateTextExecutor

* name: exec2

  uses: ConcatenateTextExecutor

```

You should see the following in the Flow's logs:

```text

INFO   Flow@28301 change in Flow YAML flow.yml observed, restarting Flow

```

After this, the Flow will have two Executors with the new topology:

```python

from jina import Client
from docarray import DocList
from docarray.documents import TextDoc

c = Client(port=12345)

print(c.post(on='/', inputs=DocList[TextDoc](TextDoc()), return_type=DocList[TextDoc])[0].text)

```

```text

add text add text

```

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/orchestration/index.md

(orchestration)=

# {fas}`network-wired` Orchestration

As seen in the {ref}`architecture overview <architecture-overview>`, Jina-serve is organized in different layers.

The Orchestration layer is composed of concepts that let you orchestrate, serve and scale your Executors with ease.

Two objects belong to this family:

* A single Executor ({class}`~Deployment`), ideal for serving a single model or microservice.
* A pipeline of Executors ({class}`~Flow`), ideal for more complex operations where Documents need to be processed in multiple ways.

Both Deployment and Flow share similar syntax and behavior. The main differences are:

* Deployments orchestrate a single Executor, while Flows orchestrate multiple Executors connected into a pipeline.
* Flows have a {ref}`Gateway <gateway>`, while Deployments do not.

```{toctree}
:hidden:
deployment
flow
add-executors
scale-out
hot-reload
handle-exceptions
readiness
health-check
instrumentation
troubleshooting-on-multiprocess
yaml-spec

```

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/orchestration/instrumentation.md

(instrumenting-flow)=

# Instrumentation

A {class}`~jina.Flow` exposes configuration parameters for leveraging [OpenTelemetry](https://opentelemetry.io) Tracing and Metrics observability features. These tools let you instrument and collect various signals which help to analyze your application's real-time behavior.

A {class}`~jina.Flow` is composed of several Pods, namely the {class}`~jina.serve.runtimes.gateway.GatewayRuntime`, {class}`~jina.Executor`s, and potentially a {class}`~jina.serve.runtimes.head.HeadRuntime` (see the {ref}`architecture overview <architecture-overview>`). Each Pod is its own microservice. These services expose their own metrics using the Python [OpenTelemetry API and SDK](https://opentelemetry-python.readthedocs.io/en/stable/api/trace.html).

Tracing and Metrics can be enabled and configured independently to allow more flexibility in the data collection and visualization setup.

```{hint}
:class: seealso
Refer to {ref}`OpenTelemetry Setup <opentelemetry>` for a full detail on the OpenTelemetry data collection and visualization setup.

```

```{caution}
Prometheus-only based metrics collection will soon be deprecated. Refer to {ref}`Prometheus/Grafana Support <monitoring>` section for the deprecated setup.

```

## Tracing

````{tab} Python

```python

from jina import Flow

f = Flow(
    tracing=True,
    traces_exporter_host='http://localhost',
    traces_exporter_port=4317,
).add(uses='jinaai://jina-ai/SimpleIndexer')

with f:
    f.block()

```

````

````{tab} YAML
In `flow.yaml`:

```yaml

jtype: Flow
with:
  tracing: true
  tracing_exporter_host: 'localhost'
  tracing_exporter_port: 4317
executors:

* uses: jinaai://jina-ai/SimpleIndexer

```

```bash

jina flow --uses flow.yaml

```

````

This Flow creates two Pods: one for the Gateway, and one for the SimpleIndexer Executor. The Flow propagates the Tracing configuration to each Pod so you don't need to duplicate the arguments on each Executor.

The `traces_exporter_host` and `traces_exporter_port` arguments configure the traces [exporter](https://opentelemetry.io/docs/instrumentation/python/exporters/#trace-1) which are responsible for pushing collected data to the [collector](https://opentelemetry.io/docs/collector/) backend.

```{hint}
:class: seealso
Refer to {ref}`OpenTelemetry Setup <opentelemetry>` for more details on exporter and collector setup and usage.

```

### Available Traces

Each Pod supports different default traces out of the box, and also lets you define your own custom traces in the Executor. The `Runtime` name is used to create the OpenTelemetry [Service](https://opentelemetry.io/docs/reference/specification/resource/semantic_conventions/#service) [Resource](https://opentelemetry.io/docs/reference/specification/resource/) attribute. The default value for the `name` argument is the `Runtime` or `Executor` class name.

Because not all Pods have the same role, they expose different kinds of traces:

#### Gateway Pods

| Operation name    | Description  |
|-------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------|
| `/jina.JinaRPC/Call` | Traces the request from the client to the Gateway server. |
| `/jina.JinaSingleDataRequestRPC/process_single_data` | Internal operation for the request originating from the Gateway to the target Head or Executor. |

#### Head Pods

| Operation name    | Description  |
|-------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------|
| `/jina.JinaSingleDataRequestRPC/process_single_data` | Internal operation for the request originating from the Gateway to the target Head. Another child span is created for the request originating from the Head to the Executor.|

#### Executor Pods

| Operation name    | Description  |
|-------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------|
| `/jina.JinaSingleDataRequestRPC/process_single_data` | Executor server operation for the request originating from the Gateway/Head to the Executor request handler. |
| `/endpoint` | Internal operation for the request originating from the Executor request handler to the target `@requests(=/endpoint)` method. The `endpoint` will be `default` if no endpoint name is provided. |

```{seealso}
Beyond the above-mentioned default traces, you can define {ref}`custom traces <instrumenting-executor>` for your Executor.

```

## Metrics

```{hint}
Prometheus-only based metrics collection will soon be deprecated. Refer to {ref}`Prometheus/Grafana Support <monitoring>` section for the deprecated setup.

```

````{tab} Python

```python

from jina import Flow

f = Flow(
    metrics=True,
    metrics_exporter_host='http://localhost',
    metrics_exporter_port=4317,
).add(uses='jinaai://jina-ai/SimpleIndexer')

with f:
    f.block()

```

````

````{tab} YAML
In `flow.yaml`:

```yaml

jtype: Flow
with:
  metrics: true
  metrics_exporter_host: 'localhost'
  metrics_exporter_port: 4317
executors:

* uses: jinaai://jina-ai/SimpleIndexer

```

```bash

jina flow --uses flow.yaml

```

````

The Flow propagates the Metrics configuration to each Pod. The `metrics_exporter_host` and `metrics_exporter_port` arguments configure the metrics [exporter](https://opentelemetry.io/docs/instrumentation/python/exporters/#metrics-1) responsible for pushing collected data to the [collector](https://opentelemetry.io/docs/collector/) backend.

```{hint}
:class: seealso
Refer to {ref}`OpenTelemetry Setup <opentelemetry>` for more details on the exporter and collector setup and usage.

```

### Available metrics

Each Pod supports different default metrics out of the box, also letting you define your own custom metrics in the Executor. All metrics add the `Runtime` name to the [metric attributes](https://opentelemetry.io/docs/reference/specification/metrics/semantic_conventions/) which can be used to filter data from different Pods.

Because not all Pods have the same role, they expose different kinds of metrics:

#### Gateway Pods (2)

| Metrics name                        | Metrics type                                                           | Description                                                                                                     |
|-------------------------------------|------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------|
| `jina_receiving_request_seconds`    | [Histogram](https://opentelemetry.io/docs/reference/specification/metrics/api/#histogram)   | Measures time elapsed between receiving a request from the client and sending back the response.                |
| `jina_sending_request_seconds`      | [Histogram](https://opentelemetry.io/docs/reference/specification/metrics/api/#histogram)   | Measures time elapsed between sending a downstream request to an Executor/Head and receiving the response back. |
| `jina_number_of_pending_requests`   | [UpDownCounter](https://opentelemetry.io/docs/reference/specification/metrics/api/#updowncounter)       | Counts the number of pending requests.                                                                          |
| `jina_successful_requests`    | [Counter](https://opentelemetry.io/docs/reference/specification/metrics/api/#counter)   | Counts the number of successful requests returned by the Gateway.                                               |
| `jina_failed_requests`        | [Counter](https://opentelemetry.io/docs/reference/specification/metrics/api/#counter)   | Counts the number of failed requests returned by the Gateway.                                                   |
| `jina_sent_request_bytes`           | [Histogram](https://opentelemetry.io/docs/reference/specification/metrics/api/#histogram)   | Measures the size in bytes of the request sent by the Gateway to the Executor or to the Head.                   |
| `jina_received_response_bytes`         | [Histogram](https://opentelemetry.io/docs/reference/specification/metrics/api/#histogram)   | Measures the size in bytes of the request returned by the Executor.                                             |
| `jina_received_request_bytes`           | [Histogram](https://opentelemetry.io/docs/reference/specification/metrics/api/#histogram)   | Measures the size of the request in bytes received at the Gateway level.                                        |
| `jina_sent_response_bytes`  | [Histogram](https://opentelemetry.io/docs/reference/specification/metrics/api/#histogram)   | Measures the size in bytes of the response returned from the Gateway to the Client.                             |

```{seealso}
You can find more information on the different type of metrics in Prometheus [here](https://prometheus.io/docs/concepts/metric_types/#metric-types)

```

#### Head Pods (2)

| Metric name                            | Metric type                                                            | Description                                                                                                   |
|-----------------------------------------|-------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------|
| `jina_receiving_request_seconds`        | [Histogram](https://opentelemetry.io/docs/reference/specification/metrics/api/#histogram)    | Measures the time elapsed between receiving a request from the Gateway and sending back the response.         |
| `jina_sending_request_seconds`          | [Histogram](https://opentelemetry.io/docs/reference/specification/metrics/api/#histogram)    | Measures the time elapsed between sending a downstream request to an Executor and receiving the response back. |
| `jina_number_of_pending_requests`   | [UpDownCounter](https://opentelemetry.io/docs/reference/specification/metrics/api/#updowncounter)| Counts the number of pending requests.                                                                          |
| `jina_successful_requests`    | [Counter](https://opentelemetry.io/docs/reference/specification/metrics/api/#counter)                  | Counts the number of successful requests returned by the Head.                                               |
| `jina_failed_requests`        | [Counter](https://opentelemetry.io/docs/reference/specification/metrics/api/#counter)   | Counts the number of failed requests returned by the Head.                                                   |
| `jina_sent_request_bytes`               | [Histogram](https://opentelemetry.io/docs/reference/specification/metrics/api/#histogram)    | Measures the size in bytes of the request sent by the Head to the Executor.                                   |
| `jina_received_response_bytes`          | [Histogram](https://opentelemetry.io/docs/reference/specification/metrics/api/#histogram)    | Measures the size in bytes of the response returned by the Executor.                                          |
| `jina_received_request_bytes`           | [Histogram](https://opentelemetry.io/docs/reference/specification/metrics/api/#histogram)   | Measures the size of the request in bytes received at the Head level.                                        |
| `jina_sent_response_bytes`              | [Histogram](https://opentelemetry.io/docs/reference/specification/metrics/api/#histogram)   | Measures the size in bytes of the response returned from the Head to the Gateway.                             |

#### Executor Pods (2)

The Executor also adds the Executor class name and the request endpoint for the `@requests` or `@monitor` decorated method level metrics:

| Metric name                     | Metric type                                                         | Description                                                                                                 |
|----------------------------------|----------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------|
| `jina_receiving_request_seconds` | [Histogram](https://opentelemetry.io/docs/reference/specification/metrics/api/#histogram) | Measures the time elapsed between receiving a request from the Gateway (or the head) and sending back the response. |
| `jina_process_request_seconds`   | [Histogram](https://opentelemetry.io/docs/reference/specification/metrics/api/#histogram) | Measures the time spend calling the requested method                                                         |
| `jina_document_processed`  | [Counter](https://opentelemetry.io/docs/reference/specification/metrics/api/#counter) | Counts the number of Documents processed by an Executor                                                     |
| `jina_successful_requests` | [Counter](https://opentelemetry.io/docs/reference/specification/metrics/api/#counter) | Total count of successful requests returned by the Executor across all endpoints                            |
| `jina_failed_requests`     | [Counter](https://opentelemetry.io/docs/reference/specification/metrics/api/#counter) | Total count of failed requests returned by the Executor across all endpoints                                |
| `jina_received_request_bytes`        | [Histogram](https://opentelemetry.io/docs/reference/specification/metrics/api/#histogram) | Measures the size in bytes of the request received at the Executor level                                    |
| `jina_sent_response_bytes`        | [Histogram](https://opentelemetry.io/docs/reference/specification/metrics/api/#histogram) | Measures the size in bytes of the response returned from the Executor to the Gateway                           |

```{seealso}
Beyond the default metrics outlined above, you can also define {ref}`custom metrics <instrumenting-executor>` for your Executor.

```

```{hint}
`jina_process_request_seconds` and `jina_receiving_request_seconds` are different:

*  `jina_process_request_seconds` only tracks time spent calling the function.
* `jina_receiving_request_seconds` tracks time spent calling the function **and** the gRPC communication overhead.

```

## See also

* {ref}`Defining custom traces and metrics in an Executor <instrumenting-executor>`
* {ref}`How to deploy and use OpenTelemetry in Jina-serve <opentelemetry>`
* [Tracing in OpenTelemetry](https://opentelemetry.io/docs/concepts/signals/traces/)
* [Metrics in OpenTelemetry](https://opentelemetry.io/docs/concepts/signals/metrics/)

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/orchestration/readiness.md

# Readiness

An Orchestration is marked as "ready", when:

* Its Executor is fully loaded and ready (in the case of a Deployment)
* All its Executors and Gateway are fully loaded and ready (in the case of a Flow)

After that, an Orchestration is able to process requests.

{class}`~jina.Client` offers an API to query these readiness endpoints. You can do this via the Orchestration directly, via the Client, or via the CLI: You can call {meth}`~jina.clients.mixin.HealthCheckMixin.is_flow_ready` or {meth}`~jina.Flow.is_flow_ready`. It returns `True` if the Flow is ready, and `False` if it is not.

## Via Orchestration

````{tab} Deployment

```python

from jina import Deployment

dep = Deployment()

with dep:
    print(dep.is_deployment_ready())

print(dep.is_deployment_ready())

```

```text

True
False

```

````

````{tab} Flow

```python

from jina import Flow

f = Flow.add()

with f:
    print(f.is_flow_ready())

print(f.is_flow_ready())

```

```text

True
False

```

````

## Via Jina-serve Client

<!-- start ready-from-client -->

You can check the readiness from the client:

````{tab} Deployment

```python

from jina import Deployment

dep = Deployment(port=12345)

with dep:
    dep.block()

```

```python

from jina import Client

client = Client(port=12345)
print(client.is_deployment_ready())

```

```text

True

```

````

````{tab} Flow

```python

from jina import Flow

f = Flow(port=12345).add()

with f:
    f.block()

```

```python

from jina import Client

client = Client(port=12345)
print(client.is_flow_ready())

```

```text

True

```

````

<!-- end ready-from-client -->

### Via CLI

`````{tab} Deployment

```python

from jina import Deployment

dep = Deployment(port=12345)

with dep:
    dep.block()

```

```bash

jina-serve ping executor grpc://localhost:12345

```

````{tab} Success

```text
INFO   Jina-serve@92877 ping grpc://localhost:12345 at 0 round...                                                                                              [09/08/22 12:58:13]
INFO   Jina-serve@92877 ping grpc://localhost:12345 at 0 round takes 0 seconds (0.04s)
INFO   Jina-serve@92877 ping grpc://localhost:12345 at 1 round...                                                                                              [09/08/22 12:58:14]
INFO   Jina-serve@92877 ping grpc://localhost:12345 at 1 round takes 0 seconds (0.01s)
INFO   Jina-serve@92877 ping grpc://localhost:12345 at 2 round...                                                                                              [09/08/22 12:58:15]
INFO   Jina-serve@92877 ping grpc://localhost:12345 at 2 round takes 0 seconds (0.01s)
INFO   Jina-serve@92877 avg. latency: 24 ms                                                                                                                    [09/08/22 12:58:16]

```

````

````{tab} Failure

```text
INFO   Jina-serve@92986 ping grpc://localhost:12345 at 0 round...                                                                                              [09/08/22 12:59:00]
ERROR  GRPCClient@92986 Error while getting response from grpc server <AioRpcError of RPC that terminated with:                                          [09/08/22 12:59:00]
               status = StatusCode.UNAVAILABLE
               details = "failed to connect to all addresses; last error: UNKNOWN: Failed to connect to remote host: Connection refused"
               debug_error_string = "UNKNOWN:Failed to pick subchannel {created_time:"2022-09-08T12:59:00.518707+02:00", children:[UNKNOWN:failed to
       connect to all addresses; last error: UNKNOWN: Failed to connect to remote host: Connection refused {grpc_status:14,
       created_time:"2022-09-08T12:59:00.518706+02:00"}]}"
       >
WARNI… Jina-serve@92986 not responding, retry (1/3) in 1s
INFO   Jina-serve@92986 ping grpc://localhost:12345 at 0 round takes 0 seconds (0.01s)
INFO   Jina-serve@92986 ping grpc://localhost:12345 at 1 round...                                                                                              [09/08/22 12:59:01]
ERROR  GRPCClient@92986 Error while getting response from grpc server <AioRpcError of RPC that terminated with:                                          [09/08/22 12:59:01]
               status = StatusCode.UNAVAILABLE
               details = "failed to connect to all addresses; last error: UNKNOWN: Failed to connect to remote host: Connection refused"
               debug_error_string = "UNKNOWN:Failed to pick subchannel {created_time:"2022-09-08T12:59:01.537293+02:00", children:[UNKNOWN:failed to
       connect to all addresses; last error: UNKNOWN: Failed to connect to remote host: Connection refused {grpc_status:14,
       created_time:"2022-09-08T12:59:01.537291+02:00"}]}"
       >
WARNI… Jina-serve@92986 not responding, retry (2/3) in 1s
INFO   Jina-serve@92986 ping grpc://localhost:12345 at 1 round takes 0 seconds (0.01s)
INFO   Jina-serve@92986 ping grpc://localhost:12345 at 2 round...                                                                                              [09/08/22 12:59:02]
ERROR  GRPCClient@92986 Error while getting response from grpc server <AioRpcError of RPC that terminated with:                                          [09/08/22 12:59:02]
               status = StatusCode.UNAVAILABLE
               details = "failed to connect to all addresses; last error: UNKNOWN: Failed to connect to remote host: Connection refused"
               debug_error_string = "UNKNOWN:Failed to pick subchannel {created_time:"2022-09-08T12:59:02.557195+02:00", children:[UNKNOWN:failed to
       connect to all addresses; last error: UNKNOWN: Failed to connect to remote host: Connection refused {grpc_status:14,
       created_time:"2022-09-08T12:59:02.557193+02:00"}]}"
       >
WARNI… Jina-serve@92986 not responding, retry (3/3) in 1s
INFO   Jina-serve@92986 ping grpc://localhost:12345 at 2 round takes 0 seconds (0.02s)
WARNI… Jina-serve@92986 message lost 100% (3/3)

```

````

`````

`````{tab} Flow

```python

from jina import Flow

f = Flow(port=12345)

with f:
    f.block()

```

```bash

jina-serve ping flow grpc://localhost:12345

```

````{tab} Success

```text
INFO   Jina-serve@92877 ping grpc://localhost:12345 at 0 round...                                                                                              [09/08/22 12:58:13]
INFO   Jina-serve@92877 ping grpc://localhost:12345 at 0 round takes 0 seconds (0.04s)
INFO   Jina-serve@92877 ping grpc://localhost:12345 at 1 round...                                                                                              [09/08/22 12:58:14]
INFO   Jina-serve@92877 ping grpc://localhost:12345 at 1 round takes 0 seconds (0.01s)
INFO   Jina-serve@92877 ping grpc://localhost:12345 at 2 round...                                                                                              [09/08/22 12:58:15]
INFO   Jina-serve@92877 ping grpc://localhost:12345 at 2 round takes 0 seconds (0.01s)
INFO   Jina-serve@92877 avg. latency: 24 ms                                                                                                                    [09/08/22 12:58:16]

```

````

````{tab} Failure

```text
INFO   Jina-serve@92986 ping grpc://localhost:12345 at 0 round...                                                                                              [09/08/22 12:59:00]
ERROR  GRPCClient@92986 Error while getting response from grpc server <AioRpcError of RPC that terminated with:                                          [09/08/22 12:59:00]
               status = StatusCode.UNAVAILABLE
               details = "failed to connect to all addresses; last error: UNKNOWN: Failed to connect to remote host: Connection refused"
               debug_error_string = "UNKNOWN:Failed to pick subchannel {created_time:"2022-09-08T12:59:00.518707+02:00", children:[UNKNOWN:failed to
       connect to all addresses; last error: UNKNOWN: Failed to connect to remote host: Connection refused {grpc_status:14,
       created_time:"2022-09-08T12:59:00.518706+02:00"}]}"
       >
WARNI… Jina-serve@92986 not responding, retry (1/3) in 1s
INFO   Jina-serve@92986 ping grpc://localhost:12345 at 0 round takes 0 seconds (0.01s)
INFO   Jina-serve@92986 ping grpc://localhost:12345 at 1 round...                                                                                              [09/08/22 12:59:01]
ERROR  GRPCClient@92986 Error while getting response from grpc server <AioRpcError of RPC that terminated with:                                          [09/08/22 12:59:01]
               status = StatusCode.UNAVAILABLE
               details = "failed to connect to all addresses; last error: UNKNOWN: Failed to connect to remote host: Connection refused"
               debug_error_string = "UNKNOWN:Failed to pick subchannel {created_time:"2022-09-08T12:59:01.537293+02:00", children:[UNKNOWN:failed to
       connect to all addresses; last error: UNKNOWN: Failed to connect to remote host: Connection refused {grpc_status:14,
       created_time:"2022-09-08T12:59:01.537291+02:00"}]}"
       >
WARNI… Jina-serve@92986 not responding, retry (2/3) in 1s
INFO   Jina-serve@92986 ping grpc://localhost:12345 at 1 round takes 0 seconds (0.01s)
INFO   Jina-serve@92986 ping grpc://localhost:12345 at 2 round...                                                                                              [09/08/22 12:59:02]
ERROR  GRPCClient@92986 Error while getting response from grpc server <AioRpcError of RPC that terminated with:                                          [09/08/22 12:59:02]
               status = StatusCode.UNAVAILABLE
               details = "failed to connect to all addresses; last error: UNKNOWN: Failed to connect to remote host: Connection refused"
               debug_error_string = "UNKNOWN:Failed to pick subchannel {created_time:"2022-09-08T12:59:02.557195+02:00", children:[UNKNOWN:failed to
       connect to all addresses; last error: UNKNOWN: Failed to connect to remote host: Connection refused {grpc_status:14,
       created_time:"2022-09-08T12:59:02.557193+02:00"}]}"
       >
WARNI… Jina-serve@92986 not responding, retry (3/3) in 1s
INFO   Jina-serve@92986 ping grpc://localhost:12345 at 2 round takes 0 seconds (0.02s)
WARNI… Jina-serve@92986 message lost 100% (3/3)

```

````

`````

## Readiness check via third-party clients

You can check the status of a Flow using any gRPC/HTTP/WebSockets client, not just via Jina-serve Client.

To see how this works, first instantiate the Flow with its corresponding protocol and block it for serving:

````{tab} Deployment

```python

from jina import Deployment
import os

PROTOCOL = 'grpc'  # it could also be http or websocket

os.environ[
    'JINA_LOG_LEVEL'
] = 'DEBUG'  # this way we can check what is the PID of the Executor

dep = Deployment(protocol=PROTOCOL, port=12345)

with dep:
    dep.block()

```

```text

⠋  Waiting ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/0 -:--:--DEBUG  gateway/rep-0@19075 adding connection for deployment executor0/heads/0 to grpc://0.0.0.0:12346                                                                                           [05/31/22 18:10:16]
DEBUG  executor0/rep-0@19074 start listening on 0.0.0.0:12346                                                                                                                                   [05/31/22 18:10:16]
DEBUG  gateway/rep-0@19075 start server bound to 0.0.0.0:12345                                                                                                                                  [05/31/22 18:10:17]
DEBUG  executor0/rep-0@19059 ready and listening                                                                                                                                                [05/31/22 18:10:17]
DEBUG  gateway/rep-0@19059 ready and listening                                                                                                                                                  [05/31/22 18:10:17]
╭─── 🎉 Deployment is ready to serve! ───╮
│  🔗  Protocol                  GRPC    │
│  🏠     Local         0.0.0.0:12345    │
│  🔒   Private    192.168.1.13:12345    │
╰────────────────────────────────────────╯
DEBUG  Deployment@19059 2 Deployments (i.e. 2 Pods) are running in this Deployment

```

````

````{tab} Flow

```python

from jina import Flow
import os

PROTOCOL = 'grpc'  # it could also be http or websocket

os.environ[
    'JINA_LOG_LEVEL'
] = 'DEBUG'  # this way we can check what is the PID of the Executor

f = Flow(protocol=PROTOCOL, port=12345).add()

with f:
    f.block()

```

```text

⠋  Waiting ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/0 -:--:--DEBUG  gateway/rep-0@19075 adding connection for deployment executor0/heads/0 to grpc://0.0.0.0:12346                                                                                           [05/31/22 18:10:16]
DEBUG  executor0/rep-0@19074 start listening on 0.0.0.0:12346                                                                                                                                   [05/31/22 18:10:16]
DEBUG  gateway/rep-0@19075 start server bound to 0.0.0.0:12345                                                                                                                                  [05/31/22 18:10:17]
DEBUG  executor0/rep-0@19059 ready and listening                                                                                                                                                [05/31/22 18:10:17]
DEBUG  gateway/rep-0@19059 ready and listening                                                                                                                                                  [05/31/22 18:10:17]
╭────── 🎉 Flow is ready to serve! ──────╮
│  🔗  Protocol                  GRPC    │
│  🏠     Local         0.0.0.0:12345    │
│  🔒   Private    192.168.1.13:12345    │
╰────────────────────────────────────────╯
DEBUG  Flow@19059 2 Deployments (i.e. 2 Pods) are running in this Flow

```

````

### Using gRPC

When using grpc, use [grpcurl](https://github.com/fullstorydev/grpcurl) to access the Gateway's gRPC service that is responsible for reporting the Orchestration status.

```shell
docker pull fullstorydev/grpcurl:latest
docker run --network='host' fullstorydev/grpcurl -plaintext 127.0.0.1:12345 jina.JinaGatewayDryRunRPC/dry_run

```

The error-free output below signifies a correctly running Orchestration:

```json
{}

```

You can simulate an Executor going offline by killing its process.

```shell script
kill -9 $EXECUTOR_PID # in this case we can see in the logs that it is 19059

```

Then by doing the same check, you can see that it returns an error:

```shell
docker run --network='host' fullstorydev/grpcurl -plaintext 127.0.0.1:12345 jina.JinaGatewayDryRunRPC/dry_run

```

````{dropdown} Error output

```json

{
  "code": "ERROR",
  "description": "failed to connect to all addresses |Gateway: Communication error with deployment at address(es) 0.0.0.0:12346. Head or worker(s) may be down.",
  "exception": {
    "name": "InternalNetworkError",
    "args": [
      "failed to connect to all addresses |Gateway: Communication error with deployment at address(es) 0.0.0.0:12346. Head or worker(s) may be down."
    ],
    "stacks": [
      "Traceback (most recent call last):\n",
      "  File \"/home/joan/jina/jina/jina/serve/networking.py\", line 750, in task_wrapper\n    timeout=timeout,\n",
      "  File \"/home/joan/jina/jina/jina/serve/networking.py\", line 197, in send_discover_endpoint\n    await self._init_stubs()\n",
      "  File \"/home/joan/jina/jina/jina/serve/networking.py\", line 174, in _init_stubs\n    self.channel\n",
      "  File \"/home/joan/jina/jina/jina/serve/networking.py\", line 1001, in get_available_services\n    async for res in response:\n",
      "  File \"/home/joan/.local/lib/python3.7/site-packages/grpc/aio/_call.py\", line 326, in _fetch_stream_responses\n    await self._raise_for_status()\n",
      "  File \"/home/joan/.local/lib/python3.7/site-packages/grpc/aio/_call.py\", line 237, in _raise_for_status\n    self._cython_call.status())\n",
      "grpc.aio._call.AioRpcError: \u003cAioRpcError of RPC that terminated with:\n\tstatus = StatusCode.UNAVAILABLE\n\tdetails = \"failed to connect to all addresses\"\n\tdebug_error_string = \"{\"created\":\"@1654012804.794351252\",\"description\":\"Failed to pick subchannel\",\"file\":\"src/core/ext/filters/client_channel/client_channel.cc\",\"file_line\":3134,\"referenced_errors\":[{\"created\":\"@1654012804.794350006\",\"description\":\"failed to connect to all addresses\",\"file\":\"src/core/lib/transport/error_utils.cc\",\"file_line\":163,\"grpc_status\":14}]}\"\n\u003e\n",
      "\nDuring handling of the above exception, another exception occurred:\n\n",
      "Traceback (most recent call last):\n",
      "  File \"/home/joan/jina/jina/jina/serve/runtimes/gateway/grpc/__init__.py\", line 155, in dry_run\n    async for _ in self.streamer.stream(request_iterator=req_iterator):\n",
      "  File \"/home/joan/jina/jina/jina/serve/stream/__init__.py\", line 78, in stream\n    async for response in async_iter:\n",
      "  File \"/home/joan/jina/jina/jina/serve/stream/__init__.py\", line 154, in _stream_requests\n    response = self._result_handler(future.result())\n",
      "  File \"/home/joan/jina/jina/jina/serve/runtimes/gateway/request_handling.py\", line 146, in _process_results_at_end_gateway\n    await asyncio.gather(gather_endpoints(request_graph))\n",
      "  File \"/home/joan/jina/jina/jina/serve/runtimes/gateway/request_handling.py\", line 88, in gather_endpoints\n    raise err\n",
      "  File \"/home/joan/jina/jina/jina/serve/runtimes/gateway/request_handling.py\", line 80, in gather_endpoints\n    endpoints = await asyncio.gather(*tasks_to_get_endpoints)\n",
      "  File \"/home/joan/jina/jina/jina/serve/networking.py\", line 754, in task_wrapper\n    e=e, retry_i=i, dest_addr=connection.address\n",
      "  File \"/home/joan/jina/jina/jina/serve/networking.py\", line 697, in _handle_aiorpcerror\n    details=e.details(),\n",
      "jina.excepts.InternalNetworkError: failed to connect to all addresses |Gateway: Communication error with deployment at address(es) 0.0.0.0:12346. Head or worker(s) may be down.\n"
    ]
  }
}

```

````

### Using HTTP or WebSockets

When using HTTP or WebSockets as the Gateway protocol, use curl to target the `/dry_run` endpoint and get the status of the Flow.

```shell
curl http://localhost:12345/dry_run

```

Error-free output signifies a correctly running Flow:

```json
{"code":0,"description":"","exception":null}

```

You can simulate an Executor going offline by killing its process:

```shell script
kill -9 $EXECUTOR_PID # in this case we can see in the logs that it is 19059

```

Then by doing the same check, you can see that the call returns an error:

```json
{"code":1,"description":"failed to connect to all addresses |Gateway: Communication error with deployment executor0 at address(es) {'0.0.0.0:12346'}. Head or worker(s) may be down.","exception":{"name":"InternalNetworkError","args":["failed to connect to all addresses |Gateway: Communication error with deployment executor0 at address(es) {'0.0.0.0:12346'}. Head or worker(s) may be down."],"stacks":["Traceback (most recent call last):\n","  File \"/home/joan/jina/jina/jina/serve/networking.py\", line 726, in task_wrapper\n    timeout=timeout,\n","  File \"/home/joan/jina/jina/jina/serve/networking.py\", line 241, in send_requests\n    await call_result,\n","  File \"/home/joan/.local/lib/python3.7/site-packages/grpc/aio/_call.py\", line 291, in __await__\n    self._cython_call._status)\n","grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:\n\tstatus = StatusCode.UNAVAILABLE\n\tdetails = \"failed to connect to all addresses\"\n\tdebug_error_string = \"{\"created\":\"@1654074272.702044542\",\"description\":\"Failed to pick subchannel\",\"file\":\"src/core/ext/filters/client_channel/client_channel.cc\",\"file_line\":3134,\"referenced_errors\":[{\"created\":\"@1654074272.702043378\",\"description\":\"failed to connect to all addresses\",\"file\":\"src/core/lib/transport/error_utils.cc\",\"file_line\":163,\"grpc_status\":14}]}\"\n>\n","\nDuring handling of the above exception, another exception occurred:\n\n","Traceback (most recent call last):\n","  File \"/home/joan/jina/jina/jina/serve/runtimes/gateway/http/app.py\", line 142, in _flow_health\n    data_type=DataInputType.DOCUMENT,\n","  File \"/home/joan/jina/jina/jina/serve/runtimes/gateway/http/app.py\", line 399, in _get_singleton_result\n    async for k in streamer.stream(request_iterator=request_iterator):\n","  File \"/home/joan/jina/jina/jina/serve/stream/__init__.py\", line 78, in stream\n    async for response in async_iter:\n","  File \"/home/joan/jina/jina/jina/serve/stream/__init__.py\", line 154, in _stream_requests\n    response = self._result_handler(future.result())\n","  File \"/home/joan/jina/jina/jina/serve/runtimes/gateway/request_handling.py\", line 148, in _process_results_at_end_gateway\n    partial_responses = await asyncio.gather(*tasks)\n","  File \"/home/joan/jina/jina/jina/serve/runtimes/gateway/graph/topology_graph.py\", line 128, in _wait_previous_and_send\n    self._handle_internalnetworkerror(err)\n","  File \"/home/joan/jina/jina/jina/serve/runtimes/gateway/graph/topology_graph.py\", line 70, in _handle_internalnetworkerror\n    raise err\n","  File \"/home/joan/jina/jina/jina/serve/runtimes/gateway/graph/topology_graph.py\", line 125, in _wait_previous_and_send\n    timeout=self._timeout_send,\n","  File \"/home/joan/jina/jina/jina/serve/networking.py\", line 734, in task_wrapper\n    num_retries=num_retries,\n","  File \"/home/joan/jina/jina/jina/serve/networking.py\", line 697, in _handle_aiorpcerror\n    details=e.details(),\n","jina.excepts.InternalNetworkError: failed to connect to all addresses |Gateway: Communication error with deployment executor0 at address(es) {'0.0.0.0:12346'}. Head or worker(s) may be down.\n"],"executor":""}}

```

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/orchestration/scale-out.md

(scale-out)=

# Scale Out

By default, all Executors in an Orchestration run with a single instance. If an Executor is particularly slow, then it will reduce the overall throughput. To solve this, you can specify the number of `replicas` to scale out an Executor.

(replicate-executors)=

## Replicate stateless Executors

Replication creates multiple copies of the same {class}`~jina.Executor`. Each request in the Orchestration is then passed to only one replica (instance) of that Executor. **All replicas compete for a request. The idle replica gets the request first.**

This is useful for improving performance and availability:

* If you have slow Executors (e.g. embedding) you can scale up the number of instances to process multiple requests in parallel.
* Executors might need to be taken offline occasionally (for updates, failures, etc.), but you may want your Orchestration to still process requests without any downtime. Adding replicas allows any replica to be taken down as long as there is at least one still running. This ensures the high availability of your Orchestration.

### Replicate Executors in a Deployment

````{tab} Python

```python

from jina import Deployment

dep = Deployment(name='slow_encoder', replicas=3)

```

````

````{tab} YAML

```yaml

jtype: Deployment
uses: jinaai://jina-ai/CLIPEncoder
install_requirements: True
replicas: 5

```

````

### Replicate Executors in a Flow

````{tab} Python

```python

from jina import Flow

f = Flow().add(name='slow_encoder', replicas=3).add(name='fast_indexer')

```

````

````{tab} YAML

```yaml

jtype: Flow
executors:

* uses: jinaai://jina-ai/CLIPEncoder

  install_requirements: True
  replicas: 5

```

````

```{figure} images/replicas-flow.svg
:width: 70%
:align: center
Flow with three replicas of `slow_encoder` and one replica of `fast_indexer`

```

(scale-consensus)=

## Replicate stateful Executors with consensus using RAFT (Beta)

````{admonition} Python3.8 or newer version required on MacOS
:class: note

This feature requires at least Python3.8 version when working on MacOS.

````

````{admonition} Feature not supported on Windows
:class: note

This feature is not supported when using Windows

````

````{admonition} DocArray 0.30
:class: note

Starting from DocArray version 0.30, DocArray changed its interface and implementation drastically. We intend to support these new versions in the near future, but not every feature is yet available. Check {ref}`here <docarray-v2>` for more information. This feature has been added with the new DocArray support.

````

````{admonition} gRPC protocol
:class: note

This feature is only available when using gRPC as the protocol for the Deployment or when the Deployment is part of a Flow

````

Replication is used to scale out Executors by creating copies of them that can handle requests in parallel, providing better RPS.
However, when an Executor maintains some sort of state, then it is not simple to guarantee that each copy of the Executor maintains the *same* state,
which can lead to undesired behavior, since each replica can provide different results depending on the specific state they hold.

In Jina, you can also have replication while guaranteeing the consensus between Executors. For this, we rely on [RAFT](https://raft.github.io/), which is
an algorithm that guarantees eventual consistency between replicas.

Consensus-based replication using RAFT is a distributed algorithm designed to provide fault tolerance and consistency in a distributed system. In a distributed system, the nodes may fail, and messages may be lost or delayed, which can lead to inconsistencies in the system.
The problem with traditional replication methods is that they can't guarantee consistency in a distributed system in the presence of failures. This is where consensus-based replication using RAFT comes in.
With this approach, each Executor can be considered as a Finite State Machine, meaning it has a set of potential states and a set of transitions that it can make between those states. Each request that is sent to the Executor can be considered as a log entry that needs to be replicated across the cluster.

To enable this kind of replication, we need to consider:

* Specify which methods of the Executor {ref}` can update its internal state <stateful-executor>`.
* Tell the Deployment to use the RAFT consensus algorithm by setting the `--stateful` argument.
* Set values of replicas compatible with RAFT. RAFT requires at least three replicas to guarantee consistency.
* Pass the `--peer-ports` argument so that the RAFT cluster can recover from a previous configuration of replicas if existed.
* Optionally you can pass `--raft-configuration` parameter to tweak the behavior of the consensus module. You can understand the values to pass from

[Hashicorp's RAFT library](https://github.com/ongardie/hashicorp-raft/blob/master/config.go).

```python
from jina import Deployment, Executor, requests
from jina.serve.executors.decorators import write
from docarray import DocList
from docarray.documents import TextDoc

class MyStateStatefulExecutor(Executor):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self._docs_dict = {}

    @requests(on=['/index'])
    @write
    def index(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
        for doc in docs:
            self._docs_dict[doc.id] = doc

    @requests(on=['/search'])
    def search(self,  docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
        for doc in docs:
            self.logger.debug(f'Searching against {len(self._docs_dict)} documents')
            doc.text = self._docs_dict[doc.id].text

d = Deployment(name='stateful_executor',
               uses=MyStateStatefulExecutor,
               replicas=3,
               stateful=True,
               workspace='./raft',
               peer_ports=[12345, 12346, 12347])
with d:
    d.block()

```

This capacity allows you not only to have replicas that work with robustness and availability, it also can help achieve higher throughput in some cases.

Let's imagine we write an Executor that is used to index and query documents from a vector index.

For this, we will use an in-memory solution from [DocArray](https://docs.docarray.org/user_guide/storing/index_in_memory/) that performs exact vector search.

```python
from jina import Deployment, Executor, requests
from jina.serve.executors.decorators import write
from docarray import DocList
from docarray.documents import TextDoc
from docarray.index.backends.in_memory import InMemoryExactNNIndex

class QueryDoc(TextDoc):
    matches: DocList[TextDoc] = DocList[TextDoc]()

class ExactNNSearch(Executor):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self._index = InMemoryExactNNIndex[TextDoc]()

    @requests(on=['/index'])
    @write # I add write decorator to indicate that calling this endpoint updates the inner state
    def index(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
        self.logger.info(f'Indexing Document in index with {len(self._index)} documents indexed')
        self._index.index(docs)

    @requests(on=['/search'])
    def search(self,  docs: DocList[QueryDoc], **kwargs) -> DocList[QueryDoc]:
        self.logger.info(f'Searching Document in index with {len(self._index)} documents indexed')
        for query in docs:
            docs, scores = self._index.find(query, search_field='embedding', limit=100)
            query.matches = docs

d = Deployment(name='indexer',
               port=5555,
               uses=ExactNNSearch,
               workspace='./raft',
               replicas=3,
               stateful=True,
               peer_ports=[12345, 12346, 12347])
with d:
    d.block()

```

Then in another terminal, we will send index and search requests:

```python
from jina import Client
from docarray import DocList
from docarray.documents import TextDoc
import time
import numpy as np

class QueryDoc(TextDoc):
    matches: DocList[TextDoc] = DocList[TextDoc]()

NUM_DOCS_TO_INDEX = 100000
NUM_QUERIES = 1000

c = Client(port=5555)

index_docs = DocList[TextDoc](
    [TextDoc(text=f'I am document {i}', embedding=np.random.rand(128)) for i in range(NUM_DOCS_TO_INDEX)])
start_indexing_time = time.time()
c.post(on='/index', inputs=index_docs, request_size=100)
print(f'Indexing {NUM_DOCS_TO_INDEX} Documents took {time.time() - start_indexing_time}s')
time.sleep(2)  # let some time for the data to be replicated

search_da = DocList[QueryDoc](
    [QueryDoc(text=f'I am document {i}', embedding=np.random.rand(128)) for i in range(NUM_QUERIES)])
start_querying_time = time.time()
responses = c.post(on='/search', inputs=search_da, request_size=1)
print(f'Searching {NUM_QUERIES} Queries took {time.time() - start_querying_time}s')
for res in responses:
    print(f'{res.matches}')

```

In the logs of the `server` you can see how `index` requests reach every replica while `search` requests only reach one replica in a
`round robin` fashion.

Eventually every Indexer replica ends up with the same Documents indexed.

```text
INFO   indexer/rep-2@923 Indexing Document in index with 99900 documents indexed
INFO   indexer/rep-0@902 Indexing Document in index with 99200 documents indexed
INFO   indexer/rep-1@910 Indexing Document in index with 99700 documents indexed
INFO   indexer/rep-1@910 Indexing Document in index with 99800 documents indexed                                                                                                                [04/28/23 16:51:06]
INFO   indexer/rep-0@902 Indexing Document in index with 99300 documents indexed                                                                                                                [04/28/23 16:51:06]
INFO   indexer/rep-1@910 Indexing Document in index with 99900 documents indexed
INFO   indexer/rep-0@902 Indexing Document in index with 99400 documents indexed
INFO   indexer/rep-0@902 Indexing Document in index with 99500 documents indexed
INFO   indexer/rep-0@902 Indexing Document in index with 99600 documents indexed
INFO   indexer/rep-0@902 Indexing Document in index with 99700 documents indexed
INFO   indexer/rep-0@902 Indexing Document in index with 99800 documents indexed
INFO   indexer/rep-0@902 Indexing Document in index with 99900 documents indexed

```

But at search time, the consensus module does not affect, and only one replica serves the queries.

```text
INFO   indexer/rep-0@902 Searching Document in index with 100000 documents indexed                                                                                                              [04/28/23 16:59:21]
INFO   indexer/rep-1@910 Searching Document in index with 100000 documents indexed                                                                                                              [04/28/23 16:59:21]
INFO   indexer/rep-2@923 Searching Document in index with 100000 documents indexed

```

If you run the same example by setting `replicas` to `1` without the consensus module, you can see the benefits it has in the QPS at search time,
while there is a little cost on the time used for indexing.

```python
d = Deployment(name='indexer',
               port=5555,
               uses=ExactNNSearch,
               workspace='./raft',
               replicas=1)

```

With one replica:

```text
Indexing 100000 Documents took 18.93274688720703s
Searching 1000 Queries took 385.96641397476196s

```

With three replicas and consensus:

```text
Indexing 100000 Documents took 35.066415548324585s
Searching 1000 Queries took 202.07950615882874s

```

This increases QPS from 2.5 to 5.

## Replicate on multiple GPUs

To replicate your {class}`~jina.Executor`s so that each replica uses a different GPU on your machine, you can tell the Orchestration to use multiple GPUs by passing `CUDA_VISIBLE_DEVICES=RR` as an environment variable.

```{caution}
You should only replicate on multiple GPUs with `CUDA_VISIBLE_DEVICES=RR` locally.

```

```{tip}
In Kubernetes or with Docker Compose you should allocate GPU resources to each replica directly in the configuration files.

```

The Orchestration assigns GPU devices in the following round-robin fashion:

| GPU device | Replica ID |
|------------|------------|
| 0          | 0          |
| 1          | 1          |
| 2          | 2          |
| 0          | 3          |
| 1          | 4          |

You can restrict the visible devices in round-robin assignment using `CUDA_VISIBLE_DEVICES=RR0:2`, where `0:2` corresponds to a Python slice. This creates the following assignment:

| GPU device | Replica ID |
|------------|------------|
| 0          | 0          |
| 1          | 1          |
| 0          | 2          |
| 1          | 3          |
| 0          | 4          |

You can restrict the visible devices in round-robin assignment by assigning the list of device IDs to `CUDA_VISIBLE_DEVICES=RR1,3`. This creates the following assignment:

| GPU device | Replica ID |
|------------|------------|
| 1          | 0          |
| 3          | 1          |
| 1          | 2          |
| 3          | 3          |
| 1          | 4          |

You can also refer to GPUs by their UUID. For instance, you could assign a list of device UUIDs:

```bash
CUDA_VISIBLE_DEVICES=RRGPU-0aaaaaaa-74d2-7297-d557-12771b6a79d5,GPU-0bbbbbbb-74d2-7297-d557-12771b6a79d5,GPU-0ccccccc-74d2-7297-d557-12771b6a79d5,GPU-0ddddddd-74d2-7297-d557-12771b6a79d5

```

Check [CUDA Documentation](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars) to see the accepted formats to assign CUDA devices by UUID.

| GPU device | Replica ID |
|------------|------------|
| GPU-0aaaaaaa-74d2-7297-d557-12771b6a79d5          | 0          |
| GPU-0bbbbbbb-74d2-7297-d557-12771b6a79d5          | 1          |
| GPU-0ccccccc-74d2-7297-d557-12771b6a79d5          | 2          |
| GPU-0ddddddd-74d2-7297-d557-12771b6a79d5          | 3          |
| GPU-0aaaaaaa-74d2-7297-d557-12771b6a79d5          | 4          |

For example, if you have three GPUs and one of your Executor has five replicas then:

### GPU replicas in a Deployment

````{tab} Python

```python

from jina import Deployment

dep = Deployment(uses='jinaai://jina-ai/CLIPEncoder', replicas=5, install_requirements=True)

with dep
    dep.block()

```

```shell

CUDA_VISIBLE_DEVICES=RR python deployment.py

```

````

````{tab} YAML

```yaml

jtype: Deployment
with:
  uses: jinaai://jina-ai/CLIPEncoder
  install_requirements: True
  replicas: 5

```

```shell

CUDA_VISIBLE_DEVICES=RR jina deployment --uses deployment.yaml

```

````

### GPU replicas in a Flow

````{tab} Python

```python

f = Flow().add(
    uses='jinaai://jina-ai/CLIPEncoder', replicas=5, install_requirements=True
)

with f:
    f.block()

```

```shell

CUDA_VISIBLE_DEVICES=RR python flow.py

```

````

````{tab} YAML

```yaml

jtype: Flow
executors:

* uses: jinaai://jina-ai/CLIPEncoder

  install_requirements: True
  replicas: 5

```

```shell

CUDA_VISIBLE_DEVICES=RR jina flow --uses flow.yaml

```

````

## Replicate external Executors

If you have external Executors with multiple replicas running elsewhere, you can add them to your Orchestration by specifying all the respective hosts and ports:

````{tab} Deployment

```python

from jina import Deployment

replica_hosts, replica_ports = ['localhost','91.198.174.192'], ['12345','12346']
Deployment(host=replica_hosts, port=replica_ports, external=True)

# alternative syntax

Deployment(host=['localhost:12345','91.198.174.192:12346'], external=True)

```

````

````{tab} Flow

```python

from jina import Flow

replica_hosts, replica_ports = ['localhost','91.198.174.192'], ['12345','12346']
Flow().add(host=replica_hosts, port=replica_ports, external=True)

# alternative syntax (2)

Flow().add(host=['localhost:12345','91.198.174.192:12346'], external=True)

```

````

This connects to `grpc://localhost:12345` and `grpc://91.198.174.192:12346` as two replicas of the external Executor.

````{admonition} Reducing
:class: hint
If an external Executor needs multiple predecessors, reducing needs to be enabled. So setting `no_reduce=True` is not allowed for these cases.

````

(partition-data-by-using-shards)=

## Customize polling behaviors

Replicas compete for a request, so only one of them will get the request. What if we want all replicas to get the request?

For example, consider index and search requests:

* Index (and update, delete) are handled by a single replica, as this is sufficient to add it one time.
* Search requests are handled by all replicas, as you need to search over all replicas to ensure the completeness of the result. The requested data could be on any shard.

For this purpose, you need `shards` and `polling`.

You can define if all or any `shards` receive the request by specifying `polling`. `ANY` means only one shard receives the request, while  `ALL` means that all shards receive the same request.

````{tab} Deployment

```python

from jina import Deployment

dep = Deployment(name='ExecutorWithShards', shards=3, polling={'/custom': 'ALL', '/search': 'ANY', '*': 'ANY'})

```

````

````{tab} Flow

```python

from jina import Flow

f = Flow().add(name='ExecutorWithShards', shards=3, polling={'/custom': 'ALL', '/search': 'ANY', '*': 'ANY'})

```

````

The above example results in an Orchestration having the Executor `ExecutorWithShards` with the following polling options:

* `/index` has polling `ANY` (the default value is not changed here).
* `/search` has polling `ANY` as it is explicitly set (usually that should not be necessary).
* `/custom` has polling `ALL`.
* All other endpoints have polling `ANY` due to using `*` as a wildcard to catch all other cases.

### Understand behaviors of replicas and shards with polling

The following example demonstrates the different behaviors when setting `replicas`, `shards` and `polling` together.

````{tab} Deployment

```{code-block} python

---
emphasize-lines: 14
---
from jina import Deployment, Executor, requests
from docarray import DocList
from docarray.documents import TextDoc

class MyExec(Executor):

    @requests
    def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
        print(f'inside: {docs.text}')

dep = (
    Deployment(uses=MyExec, replicas=2, polling='ANY')
    .needs_all()
)

with dep:
    r = dep.post('/', TextDoc(text='hello'), return_type=DocList[TextDoc])
    print(f'return: {r.text}')

```

````

````{tab} Flow

```{code-block} python

---
emphasize-lines: 15
---
from jina import Flow, Executor, requests
from docarray import DocList
from docarray.documents import TextDoc

class MyExec(Executor):

    @requests
    def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
        print(f'inside: {docs.text}')

f = (
    Flow()
    .add(uses=MyExec, replicas=2, polling='ANY')
    .needs_all()
)

with f:
    r = dep.post('/', TextDoc(text='hello'), return_type=DocList[TextDoc])
    print(f'return: {r.text}')

```

````

We now change the combination of the yellow highlighted lines above and see if there is any difference in the console output (note two prints in the snippet):

|                    |  `polling='ALL'`                                             |  `polling='ANY'`                          |
| --------------   |  --------------------------------------------------------  | -------------------------------------  |
| `replicas=2`     |           `inside: ['hello'] return: ['hello']`            | `inside: ['hello'] return: ['hello']`  |
| `shards=2`       |  `inside: ['hello'] inside: ['hello']  return: ['hello']`  | `inside: ['hello'] return: ['hello']`  |

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/orchestration/troubleshooting-on-multiprocess.md

(multiprocessing-spawn)=

# Troubleshooting on Multiprocessing

When running an Orchestration locally, you may encounter errors caused by the `multiprocessing` package depending on your operating system and Python version.

```{admonition} Troubleshooting a Flow
:class: information

In this section we show an example showing a {ref}`Deployment <deployment>`. However, exactly the same methodology applies to troubleshooting a Flow.

```

Here are some suggestions:

* Define and start the Orchestration via an explicit function call inside `if __name__ == '__main__'`, **especially when using `spawn` multiprocessing start method**. For example

    ````{tab} ✅ Do

    ```{code-block} python

    ---
    emphasize-lines: 13, 14
    ---

    from jina import Deployment, Executor, requests

    class CustomExecutor(Executor):
        @requests
        def foo(self, **kwargs):
            ...

    def main():
        dep = Deployment(uses=CustomExecutor)
        with dep:
            ...

    if __name__ == '__main__':
        main()

    ```

    ````

    ````{tab} 😔 Don't

    ```{code-block} python

    ---
    emphasize-lines: 2
    ---

    from jina import Deployment, Executor, requests

    class CustomExecutor(Executor):
        @requests
        def foo(self, **kwargs):
            ...

    dep = Deployment(uses=CustomExecutor)
    with dep:
        ...

    """
    # error
    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if _name_ == '_main_':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

    """

    ```

    ````

* Declare Executors on the top-level of the module

    ````{tab} ✅ Do

    ```{code-block} python

    ---
    emphasize-lines: 1
    ---

    class CustomExecutor(Executor):
        @requests
        def foo(self, **kwargs):
            ...

    def main():
        dep = Deployment(uses=Executor)
        with dep:
            ...

    ```

    ````

    ````{tab} 😔 Don't

    ```{code-block} python

    ---
    emphasize-lines: 2
    ---

    def main():
        class CustomExecutor(Executor):
            @requests
            def foo(self, **kwargs):
                ...

        dep = Deployment(uses=Executor)
        with dep:
            ...

    ```

    ````

* **Always provide absolute path**

    While passing filepaths to different Jina arguments (e.g.- `uses`, `py_modules`), always pass the absolute path.

## Using Multiprocessing Spawn

When you encounter this error,

```console
Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

```

* Please set `JINA_MP_START_METHOD=spawn` before starting the Python script to enable this.

    ````{hint}
    There's no need to set this for Windows, as it only supports spawn method for multiprocessing.

    ````

* **Avoid un-picklable objects**

    [Here's a list of types that can be pickled in Python](https://docs.python.org/3/library/pickle.html#what-can-be-pickled-and-unpickled). Since `spawn` relies on pickling, we should avoid using code that cannot be pickled.

    ````{hint}
    Here are a few errors which indicates that you are using some code that is not pickable.

    ```text

    pickle.PicklingError: Can't pickle: it's not the same object
    AssertionError: can only join a started process

    ```

    ````

    Inline functions, such as nested or lambda functions are not picklable. Use `functools.partial` instead.

## Using Multiprocessing Fork on macOS

Apple has changed the rules for using Objective-C between `fork()` and `exec()` since macOS 10.13.
This may break some codes that use `fork()` in macOS.
For example, the Flow may not be able to start properly with error messages similar to:

```bash
objc[20337]: +[__NSCFConstantString initialize] may have been in progress in another thread when fork() was called.
objc[20337]: +[__NSCFConstantString initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.```

```

You can define the environment variable `OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES` to get around this issue.
Read [here](http://sealiesoftware.com/blog/archive/2017/6/5/Objective-C_and_fork_in_macOS_1013.html) for more details.

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/orchestration/yaml-spec.md

(flow-yaml-spec)=

# {octicon}`file-code` YAML specification

To generate a YAML configuration from an Orchestration, use {meth}`~jina.jaml.JAMLCompatible.save_config`.

## YAML completion in IDE

We provide a [JSON Schema](https://json-schema.org/) for your IDE to enable code completion, syntax validation, members listing and displaying help text.

### PyCharm users

1. Click menu `Preferences` -> `JSON Schema mappings`;
2. Add a new schema, in the `Schema File or URL` write `https://schemas.jina.ai/schemas/latest.json`; select `JSON Schema Version 7`;
3. Add a file path pattern and link it to `*.jaml` or `*.jina.yml` or any suffix you commonly used for Jina-serve Flow's YAML.

### VSCode users

1. Install the extension: `YAML Language Support by Red Hat`;
2. In IDE-level `settings.json` add:

```json
"yaml.schemas": {
    "https://schemas.jina.ai/schemas/latest.json": ["/*.jina.yml", "/*.jaml"],
}

```

You can bind Schema to any file suffix you commonly used for Jina-serve Flow's YAML.

## Example YAML

````{tab} Deployment

```yaml

jtype: Deployment
version: '1'
with:
  protocol: http
name: firstexec
uses:
  jtype: MyExec
  py_modules:

  * executor.py

```

````

````{tab} Flow

```yaml

jtype: Flow
version: '1'
with:
  protocol: http
executors:

# inline Executor YAML

* name: firstexec

  uses:
    jtype: MyExec
    py_modules:

  * executor.py

# reference to Executor YAML

* name: secondexec

  uses: indexer.yml
  workspace: /home/my/workspace

# reference to Executor Python class

* name: thirdexec

  uses: CustomExec  # located in executor.py

```

````

## Fields

### `jtype`

String that is always set to either "Flow" or "Deployment", indicating the corresponding Python class.

### `version`

String indicating the version of the Flow or Deployment.

### `with`

Keyword arguments are passed to a Flow's `__init__()` method. You can set Flow-specific arguments and Gateway-specific arguments here:

#### Orchestration arguments

````{tab} Deployment

```{include} deployment-args.md

```

````

````{tab} Flow

```{include} flow-args.md

```

##### Gateway arguments

These apply only to Flows, not Deployments

```{include} gateway-args.md

```

````

(executor-args)=

### `executors`

Collection of Executors used in the Orchestration. In the case of a Deployment, this is a single Executor, while a Flow can have an arbitrary amount.

Each item in the collection specifies one Executor and can be used via:

````{tab} Deployment

```python

dep = Deployment(uses=MyExec, arg1="foo", arg2="bar")

```

````

````{tab} Deployment

```python

f = Flow().add(uses=MyExec, arg1="foo", arg2="bar")

```

````

```{include} executor-args.md

```

```{include} yaml-vars.md

```

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/orchestration/yaml-vars.md

## Variables

Jina-serve Orchestration YAML supports variables and variable substitution according to the [GitHub Actions syntax](https://docs.github.com/en/actions/learn-github-actions/environment-variables).

### Environment variables

Use `${{ ENV.VAR }}` to refer to the environment variable `VAR`. You can find all {ref}`Jina environment variables here<jina-env-vars>`.

### Context variables

Use `${{ CONTEXT.VAR }}` to refer to the context variable `VAR`.
Context variables can be passed in the form of a Python dictionary:

````{tab} Deployment

```python

dep = Deployment.load_config('deployment.yml', context={...})

```

````

````{tab} Flow

```python

f = Flow.load_config('flow.yml', context={...})

```

````

### Relative paths

Use `${{root.path.to.var}}` to refer to the variable `var` within the same YAML file, found at the provided path in the file's structure.

```{admonition} Syntax: Environment variable vs relative path
:class: tip

The only difference between environment variable syntax and relative path syntax is the omission of spaces in the latter.

```

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/preliminaries/coding-in-python-yaml.md

(python-yaml)=

# Coding in Python/YAML

In the docs, you often see two coding styles when describing a Jina-serve project:

```{glossary}

**Pythonic**
    Flows, Deployments and Executors are all written in Python files, and the entrypoint is via Python.

**YAMLish**
    Executors are written in Python files, and the Deployment or Flow are defined in a YAML file. The entrypoint can still be used via Python or the Jina CLI `jina deployment --uses deployment.yml` or `jina flow --uses flow.yml`.

```

For example, {ref}`the server-side code<dummy-example>` follows the {term}`Pythonic` style. It can be written in {term}`YAMLish` style as follows:

````{tab} executor.py

```python

from jina import Executor, requests
from docarray import DocList
from docarray.documents import TextDoc

class FooExec(Executor):
    @requests
    async def add_text(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
        for d in docs:
            d.text += 'hello, world!'

class BarExec(Executor):
    @requests
    async def add_text(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
        for d in docs:
            d.text += 'goodbye!'

```

````

````{tab} flow.yml

```yaml

jtype: Flow
with:
  port: 12345
executors:

* uses: FooExec

  replicas: 3
  py_modules: executor.py

* uses: BarExec

  replicas: 2
  py_modules: executor.py

```

````

````{tab} Entrypoint

```bash

jina flow --uses flow.yml

```

````

In general, the YAML style can be used to represent and configure a Flow or Deployment which are the objects orchestrating the serving of Executors and applications.
The YAMLish style separates the Flow or Deployment representation from the Executor logic code.
It is more flexible to configure and should be used for more complex projects in production. In many integrations such as JCloud and Kubernetes, YAMLish is preferred.

Note that the two coding styles can be converted to each other easily. To load a Flow YAML into Python and run it:

```python
from jina import Flow

f = Flow.load_config('flow.yml')

with f:
    f.block()

```

To dump a Flow into YAML:

```python
from jina import Flow

Flow().add(uses=FooExec, replicas=3).add(uses=BarExec, replicas=2).save_config(
    'flow.yml'
)

```

````{admonition} Hint: YAML and Python duality (with, add, uses_with)
:class: hint
If you are used to the Pythonic way of building Deployments and Flows, and then you need to start working with YAML,
a good way to think about this translation is to think of YAML as a direct translation of what you would type in Python.

So, every `with` clause is like an instantiation of an object, be it a Flow, Deployment or Executor (a call to its constructor).
And when a Flow has a list of Executors, each entry on the list is a call to the Flow's `add()` method. This is why Deployments and Flows sometimes need the argument `uses_with` to override the Executor's defaults.

````

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/preliminaries/index.md

(architecture-overview)=

# {fas}`egg` Preliminaries

This chapter introduces the basic terminology and concepts you will encounter in the docs. But first, look at the code below:

In this code, we are going to use Jina-serve to serve simple logic with one Deployment, or a combination of two services with a Flow.
We are also going to see how we can query these services with Jina-serve's client.

(dummy-example)=

````{tab} Deployment

```python

from jina import Executor, Flow, requests
from docarray import DocList
from docarray.documents import TextDoc

class FooExec(Executor):
    @requests
    async def add_text(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
        for d in docs:
            d.text += 'hello, world!'

dep = Deployment(port=12345, uses=FooExec, replicas=3)

with dep:
    dep.block()

```

````

````{tab} Flow

```python

from jina import Executor, Flow, requests
from docarray import DocList
from docarray.documents import TextDoc

class FooExec(Executor):
    @requests
    async def add_text(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
        for d in docs:
            d.text += 'hello, world!'

class BarExec(Executor):
    @requests
    async def add_text(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
        for d in docs:
            d.text += 'goodbye!'

f = Flow(port=12345).add(uses=FooExec, replicas=3).add(uses=BarExec, replicas=2)

with f:
    f.block()

```

````

````{tab} Client

```python

from jina import Client
from docarray import DocList
from docarray.documents import TextDoc

c = Client(port=12345)

r = c.post(on='/', inputs=DocList[TextDoc]([TextDoc(text='')]), return_type=DocList[TextDoc])
print([d.text for d in r])

```

````

Running it gives you:

````{tab} Deployment

```text

['hello, world!', 'hello, world!']

```

````

````{tab} Flow

```text

['hello, world!goodbye!', 'hello, world!goodbye!']

```

````

## Architecture

This animation shows what's happening behind the scenes when running the previous examples:

````{tab} Deployment

```{figure} arch-deployment-overview.png

:align: center

```

````

````{tab} Flow

```{figure} arch-flow-overview.svg

:align: center

```

````

```{hint}
:class: seealso
gRPC, WebSocket and HTTP are network protocols for transmitting data. gRPC is always used for communication between the {term}`Gateway` and {term}`Executors inside a Flow`.

```

```{hint}
:class: seealso
TLS is a security protocol to facilitate privacy and data security for communications over the Internet. The communication between {term}`Client` and {term}`Gateway` is protected by TLS.

```

Jina-serve is an MLOPs serving framework that is structured in two main layers. These layers work with DocArray's data structure and Jina-serve's Python Client to complete the framework. All of these are covered in the user guide
and contains the following concepts:

```{glossary}

**DocArray data structure**

Data structures coming from [docarray](https://docs.docarray.org/) are the basic fundamental data structure in Jina-serve.

* **BaseDoc**

    Document is the basic object for representing multimodal data. It can be extended to represent any data you want. More information can be found in [DocArray's Docs](https://docs.docarray.org/user_guide/representing/first_step/).

* **DocList**

    DocList is a list-like container of multiple Documents. More information can be found in [DocArray's Docs](https://docs.docarray.org/user_guide/representing/array/).

All the components in Jina-serve use `BaseDoc` and/or `DocList` as the main data format for communication, making use of the different
serialization capabilities of these structures.

**Serving**

This layer contains all the objects and concepts that are used to actually serve the logic and receive and respond to queries. These components are designed to be used as microservices ready to be containerized.
These components can be orchestrated by Jina-serve's {term}`orchestration` layer or by other container orchestration frameworks such as Kubernetes or Docker Compose.

* **Executor**

    A {class}`~jina.Executor` is a Python class that serves logic using Documents. Loosely speaking, each Executor is a service wrapping a model or application.

* **Gateway**

    A Gateway is the entrypoint of a {term}`Flow`. It exposes multiple protocols for external communications; routing all internal traffic to different Executors that work together to
    provide a more complex service.

**Orchestration**

This layer contains the components making sure that the objects (especially the {term}`Executor`) are deployed and scaled for serving.
They wrap them to provide them the **scalability** and **serving** capabilities. They also provide easy translation to other orchestration
frameworks (Kubernetes, Docker compose) to provide more advanced and production-ready settings. They can also be directly deployed to [Jina AI Cloud](https://cloud.jina.ai)
with a single command line.

* **Deployment**

    Deployment is a layer that orchestrates {term}`Executor`. It can be used to serve an Executor as a standalone
    service or as part of a {term}`Flow`. It encapsulates and abstracts internal replication and serving details.

* **Flow**

    {class}`~jina.Flow` ties multiple {class}`~jina.Deployments`s together into a logic pipeline to achieve a more complex task. It orchestrates both {term}`Executor`s and the {term}`Gateway`.

**Client**
The {class}`~jina.Client` connects to a {term}`Gateway` or {term}`Executor` and sends/receives/streams data from them.

```

```{admonition} Deployments on JCloud
:class: important
At present, JCloud is only available for Flows. We are currently working on supporting Deployments.

```

```{toctree}
:hidden:

coding-in-python-yaml

```

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/serving/executor/add-endpoints.md

(exec-endpoint)=

# Add Endpoints

Methods decorated with `@requests` are mapped to network endpoints while serving.

(executor-requests)=

## Decorator

Executor methods decorated with {class}`~jina.requests` are bound to specific network requests, and respond to network queries.

Both `def` or `async def` methods can be decorated with {class}`~jina.requests`.

You can import the `@requests` decorator via:

```python
from jina import requests

```

{class}`~jina.requests` takes an optional `on=` parameter, which binds the decorated method to the specified route:

```python
from jina import Executor, requests
import asyncio

class RequestExecutor(Executor):
    @requests(
        on=['/index', '/search']
    )  # foo is bound to `/index` and `/search` endpoints
    def foo(self, **kwargs):
        print(f'Calling foo')

    @requests(on='/other')  # bar is bound to `/other` endpoint
    async def bar(self, **kwargs):
        await asyncio.sleep(1.0)
        print(f'Calling bar')

```

Run the example:

```python
from jina import Deployment

dep = Deployment(uses=RequestExecutor)
with dep:
    dep.post(on='/index', inputs=[])
    dep.post(on='/other', inputs=[])
    dep.post(on='/search', inputs=[])

```

```shell
─────────────────────── 🎉 Deployment is ready to serve! ───────────────────────
╭────────────── 🔗 Endpoint ───────────────╮
│  ⛓     Protocol                    GRPC │
│  🏠       Local           0.0.0.0:59525  │
│  🔒     Private      192.168.1.13:59525  │
│  🌍      Public   197.244.143.223:59525  │
╰──────────────────────────────────────────╯
Calling foo
Calling bar
Calling foo

```

### Default binding

A class method decorated with plain `@requests` (without `on=`) is the default handler for all endpoints.
This means it is the fallback handler for endpoints that are not found. `c.post(on='/blah', ...)` invokes `MyExecutor.foo`.

```python
from jina import Executor, requests
import asyncio

class MyExecutor(Executor):
    @requests
    def foo(self, **kwargs):
        print(kwargs)

    @requests(on='/index')
    async def bar(self, **kwargs):
        await asyncio.sleep(1.0)
        print(f'Calling bar')

```

### No binding

If a class has no `@requests` decorator, the request simply passes through without any processing.

(document-type-binding)=

## Document type binding

When using `docarray>=0.30`, each endpoint bound by the request endpoints can have different input and output Document types. One can specify these types by adding
type annotations to the decorated methods or by using the `request_schema` and `response_schema` argument. The design is inspired by [FastAPI](https://fastapi.tiangolo.com/).

These schemas have to be Documents inheriting from `BaseDoc` or a parametrized `DocList`. You can see the differences when using single Documents or a DocList for serving in the {ref}`Executor API <executor-api>` section.

```python
from jina import Executor, requests
from docarray import DocList, BaseDoc
from docarray.typing import AnyTensor
from typing import Optional

import asyncio

class BarInputDoc(BaseDoc):
    text: str = ''

class BarOutputDoc(BaseDoc):
    text: str = ''
    embedding: Optional[AnyTensor] = None

class MyExecutor(Executor):
    @requests
    def foo(self, **kwargs):
        print(kwargs)

    @requests(on='/index')
    async def bar(self, docs: DocList[BarInputDoc], **kwargs) -> DocList[BarOutputDoc]:
        print(f'Calling bar')
        await asyncio.sleep(1.0)
        ret = DocList[BarOutputDoc]()
        for doc in docs:
            ret.append(BarOutputDoc(text=doc.text, embedding = embed(doc.text))
        return ret

```

Note that the type hint is actually more that just a hint -- the Executor uses it to infer the actual
schema of the endpoint.

You can also explicitly define the schema of the endpoint by using the `request_schema` and
`response_schema` parameters of the `requests` decorator:

```python
class MyExecutor(Executor):
    @requests
    def foo(self, **kwargs):
        print(kwargs)

    @requests(on='/index', request_schema=DocList[BarInputDoc], response_schema=DocList[BarOutputDoc])
    async def bar(self, docs, **kwargs):
        print(f'Calling bar')
        await asyncio.sleep(1.0)
        ret = DocList[BarOutputDoc]()
        for doc in docs:
            ret.append(BarOutputDoc(text=doc.text, embedding = embed(doc.text))
        return ret

```

If there is no `request_schema` and `response_schema`, the type hint is used to infer the schema. If both exist, `request_schema`
and `response_schema` will be used.

```{admonition} Note
:class: note

When no type annotation or argument is provided, Jina-serve assumes that [LegacyDocument](https://docs.docarray.org/API_reference/documents/documents/#docarray.documents.legacy.LegacyDocument) is the type used.
This is intended to ease the transition from using Jina-serve with `docarray<0.30.0` to using it with the newer versions.

```

(executor-api)=

## Executor API

Methods decorated by `@requests` require an API for Jina-serve to serve them with a {class}`~jina.Deployment` or {class}`~jina.Flow`.

An Executor's job is to process `Documents` that are sent via the network. Executors can work on these `Documents` one by one or in batches.

This behavior is determined by an argument:

* `doc` if you want your Executor to work on one Document at a time, or
* `docs` if you want to work on batches of Documents.

These APIs and related type annotations also affect how your {ref}`OpenAPI looks when deploying the Executor <openapi-deployment>` with {class}`jina.Deployment` or {class}`jina.Flow` using the HTTP protocol.

(singleton-document)=

### Single Document

When using `doc` as a keyword argument, you need to add a single `BaseDoc` as your request and response schema as seen in {ref}`the document type binding section <document-type-binding>`.

Jina-serve will ensure that even if multiple `Documents` are sent from the client, the Executor will process only one at a time.

```{code-block} python
---
emphasize-lines: 13
---
from typing import Dict, Union, TypeVar
from jina import Executor, requests
from docarray import DocList, BaseDoc
from pydantic import BaseModel

T_input = TypeVar('T_input', bound='BaseDoc')
T_output = TypeVar('T_output', bound='BaseDoc')

class MyExecutor(Executor):
    @requests
    async def foo(
        self,
        doc: T_input,
        **kwargs,
    ) -> Union[T_output, Dict, None]:
        pass

```

Working on single Documents instead of  batches can make your interface and code cleaner. In many cases, like in Generative AI, input rarely comes in batches,
and models can be heavy enough that they cannot profit from processing multiple inputs at the same time.

(batching-doclist)=

### Batching documents

When using `docs` as a keyword argument, you need to add a parametrized `DocList` as your request and response schema as seen in {ref}`the document type binding section <document-type-binding>`.

In this case, Jina-serve will ensure that all the request's `Documents` are passed to the Executor. The {ref}`"request_size" parameter from Client <request-size-client>` controls how many Documents are passed to the server in each request.
When using batches, you can leverage the {ref}`dynamic batching feature <executor-dynamic-batching>`.

```{code-block} python
---
emphasize-lines: 13
---
from typing import Dict, Union, TypeVar
from jina import Executor, requests
from docarray import DocList, BaseDoc
from pydantic import BaseModel

T_input = TypeVar('T_input', bound='BaseDoc')
T_output = TypeVar('T_output', bound='BaseDoc')

class MyExecutor(Executor):
    @requests
    async def foo(
        self,
        docs: DocList[T_input],
        **kwargs,
    ) -> Union[DocList[T_output], Dict, None]:
        pass

```

Working on batches of Documents in the same method call can make sense, especially for serving models that handle multiple inputs at the same time, like
when serving embedding models.

(executor-api-parameters)=

### Parameters

Often, the behavior of a model or service depends not just on the input data (documents in this case) but also on other parameters.
An example might be special attributes that some ML models allow you to configure, like  maximum token length or other attributes not directly related
to the data input.

Executor methods decorated with `requests` accept a `parameters` attribute in their signature to provide this flexibility.

This attribute can be a plain Python dictionary or a Pydantic Model. To get a Pydantic model the `parameters` argument needs to have the model
as a type annotation.

```{code-block} python
---
emphasize-lines: 15
---
from typing import Dict, Union, TypeVar
from jina import Executor, requests
from docarray import DocList, BaseDoc
from pydantic import BaseModel

T_input = TypeVar('T_input', bound='BaseDoc')
T_output = TypeVar('T_output', bound='BaseDoc')
T_output = TypeVar('T_parameters', bound='BaseModel')

class MyExecutor(Executor):
    @requests
    async def foo(
        self,
        docs: DocList[T_input],
        parameters: Union[Dict, BaseModel],
        **kwargs,
    ) -> Union[DocList[T_output], Dict, None]:
        pass

```

Defining `parameters` as a Pydantic model instead of a simple dictionary has two main benefits:

* Validation and default values: You can get validation of the parameters that the Executor expected before the Executor can access any invalid key. You can also

easily define defaults.

* Descriptive OpenAPI definition when using HTTP protocol.

### Tracing context

Executors also accept `tracing_context` as input if you want to add {ref}`custom traces <instrumenting-executor>` in your Executor.

```{code-block} python
---
emphasize-lines: 15
---
from typing import Dict, Union, TypeVar
from jina import Executor, requests
from docarray import DocList, BaseDoc
from pydantic import BaseModel

T_input = TypeVar('T_input', bound='BaseDoc')
T_output = TypeVar('T_output', bound='BaseDoc')
T_output = TypeVar('T_parameters', bound='BaseModel')

class MyExecutor(Executor):
    @requests
    async def foo(
        self,
        tracing_context: Optional['Context'],
        **kwargs,
    ) -> Union[DocList[T_output], Dict, None]:
        pass

```

### Other arguments

When using an Executors in a {class}`~jina.Flow`, you may use an Executor to merge results from upstream Executors.
For these merging Executors you can use one of the {ref}`extra arguments <merging-upstream>`.

````{admonition} Hint
:class: hint
You can also use an Executor as a simple Pythonic class. This is especially useful for locally testing the Executor-specific logic before serving it.

````

````{admonition} Hint
:class: hint
If you don't need certain arguments, you can suppress them into `**kwargs`. For example:

```{code-block} python

---
emphasize-lines: 7, 11, 16
---
from jina import Executor, requests

class MyExecutor(Executor):

    @requests
    def foo_using_docs_arg(self, docs, **kwargs):
        print(docs)

    @requests
    def foo_using_docs_parameters_arg(self, docs, parameters, **kwargs):
        print(docs)
        print(parameters)

    @requests
    def foo_using_no_arg(self, **kwargs):
        # the args are suppressed into kwargs
        print(kwargs)

```

````

## Returns

Every Executor method can `return` in three ways:

* You can directly return a `BaseDoc` or `DocList` object.
* If you return `None` or don't have a `return` in your method, then the original `docs` or `doc` object (potentially mutated by your function) is returned.
* If you return a `dict` object, it will be considered as a result and returned on `parameters['__results__']` to the client.

```python
from jina import requests, Executor, Deployment

class MyExec(Executor):
    @requests(on='/status')
    def status(self, **kwargs):
        return {'internal_parameter': 20}

with Deployment(uses=MyExec) as dep:
    print(dep.post(on='/status', return_responses=True)[0].to_dict()["parameters"])

```

```json
{"__results__": {"my_executor/rep-0": {"internal_parameter": 20.0}}}

```

(streaming-endpoints)=

## Streaming endpoints

Executors can stream Documents individually rather than as a whole DocList.
This is useful when you want to return Documents one by one and you want the client to immediately process Documents as
they arrive. This can be helpful for Generative AI use cases, where a Large Language Model is used to generate text
token by token and the client displays tokens as they arrive.
Streaming endpoints receive one Document as input and yields one Document at a time.

```{admonition} Note
:class: note

Streaming endpoints are only supported for HTTP and gRPC protocols and for Deployment and Flow with one single Executor.

For HTTP deployment streaming executors generate a GET  endpoint.
The GET endpoint support passing documet fields in
the request body or as URL query parameters,
however, query parameters only support string, integer, or float fields,
whereas, the request body support all serializable docarrays.
The Jina client uses the request body.

```

A streaming endpoint has the following signature:

```python
from jina import Executor, requests, Deployment
from docarray import BaseDoc

# first define schemas

class MyDocument(BaseDoc):
    text: str

# then define the Executor

class MyExecutor(Executor):

    @requests(on='/hello')
    async def task(self, doc: MyDocument, **kwargs) -> MyDocument:
        for i in range(100):
            yield MyDocument(text=f'hello world {i}')

with Deployment(
    uses=MyExecutor,
    port=12345,
    cors=True
) as dep:
    dep.block()

```

From the client side, any SSE client can be used to receive the Documents, one at a time.
Jina-serve offers a standard python client for using the streaming endpoint:

```python
from jina import Client
client = Client(port=12345, cors=True, asyncio=True) # or protocol='grpc'
async for doc in client.stream_doc(
    on='/hello', inputs=MyDocument(text='hello world'), return_type=MyDocument
):
    print(doc.text)

```

```text
hello world 0
hello world 1
hello world 2

```

You can also refer to the following Javascript code to connect with the streaming endpoint from your browser:

```html
<!DOCTYPE html>
<html lang="en">
<body>
<h2>SSE Client</h2>
<script>
    const evtSource = new EventSource("http://localhost:8080/hello?id=1&exec_endpoint=/hello");
    evtSource.addEventListener("update", function(event) {
        // Logic to handle status updates
        console.log(event)
    });
    evtSource.addEventListener("end", function(event) {
        console.log('Handling end....')
        evtSource.close();
    });
</script></body></html>

```

## Exception handling

Exceptions inside `@requests`-decorated functions can simply be raised.

```python
from jina import Executor, requests

class MyExecutor(Executor):
    @requests
    def foo(self, **kwargs):
        raise NotImplementedError('No time for it')

```

````{dropdown} Example usage and output

```python

from jina import Deployment

dep = Deployment(uses=MyExecutor)

def print_why(resp):
    print(resp.status.description)

with dep:
    dep.post('', on_error=print_why)

```

```shell

[...]
executor0/rep-0@28271[E]:NotImplementedError('no time for it')
 add "--quiet-error" to suppress the exception details
[...]
  File "/home/joan/jina/jina/jina/serve/executors/decorators.py", line 115, in arg_wrapper
    return fn(*args, **kwargs)
  File "/home/joan/jina/jina/toy.py", line 8, in foo
    raise NotImplementedError('no time for it')
NotImplementedError: no time for it
NotImplementedError('no time for it')

```

````

(openapi-deployment)=

## OpenAPI from Executor endpoints

When deploying an Executor and serving it with HTTP, Jina-serve uses FastAPI to expose all Executor endpoints as HTTP endpoints, and you can
enjoy a corresponding OpenAPI via the Swagger UI. You can also add descriptions and examples to your DocArray and Pydantic types so your
users and clients can enjoy an API.

Let's see how this would look:

```python
from jina import Executor, requests, Deployment
from docarray import BaseDoc
from pydantic import BaseModel, Field

class Prompt(BaseDoc):
    """Prompt Document to be input to a Language Model"""
    text: str = Field(description='The text of the prompt', example='Write me a short poem')

class Generation(BaseDoc):
    """Document representing the generation of the Large Language Model"""
    prompt: str = Field(description='The original prompt that created this output')
    text: str = Field(description='The actual generated text')

class LLMCallingParams(BaseModel):
    """Calling parameters of the LLM model"""
    num_max_tokens: int = Field(default=5000, description='The limit of tokens the model can take, it can affect the memory consumption of the model')

class MyLLMExecutor(Executor):

    @requests(on='/generate')
    def generate(self, doc: Prompt, parameters: LLMCallingParams, **kwargs) -> Generation:
        ...

with Deployment(port=12345, protocol='http', uses=MyLLMExecutor) as dep:
    dep.block()

```

```shell

──── 🎉 Deployment is ready to serve! ────
╭────────────── 🔗 Endpoint ───────────────╮
│  ⛓     Protocol                   http  │
│  🏠       Local           0.0.0.0:54322  │
│  🔒     Private    xxx.xx.xxx.xxx:54322  │
│       Public       xx.xxx.xxx.xxx:54322  │
╰──────────────────────────────────────────╯
╭─────────── 💎 HTTP extension ────────────╮
│  💬    Swagger UI    0.0.0.0:54322/docs  │
│  📚         Redoc   0.0.0.0:54322/redoc  │
╰──────────────────────────────────────────╯

```

After running this code, you can open '0.0.0.0:12345/docs' on your browser:

```{figure} doc-openapi-example.png

```

Note how the schema defined in the OpenAPI also considers the examples and descriptions for the types and fields.
The same behavior is seen when serving Executors with a {class}`jina.Flow`. In that case, the input and output schemas of each endpoint are inferred by the Flow's
topology, so if two Executors are chained in a Flow, the schema of the input is the schema of the first Executor and the schema of the response
corresponds to the output of the second Executor.

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/serving/executor/containerize.md

(dockerize-exec)=

# Containerize

Once you understand what an {class}`~jina.Executor` is, you may want to wrap it into a container so you can isolate its dependencies and make it ready to run in the cloud or Kubernetes.

````{tip}
The recommended way to containerize an Executor is to leverage {ref}`Executor Hub <jina-hub>` to ensure your Executor can run as a container. It handles auto-provisioning, building, version control, etc:

```bash

jina hub new

# work on the Executor

jina hub push .

```

The image building happens on the cloud, and once done the image is available immediately for anyone to use.

````

You can also build a Docker image yourself and use it like any other Executor. There are some requirements
on how this image needs to be built:

* Jina-serve must be installed inside the image.
* The Jina-serve CLI command to start the Executor must be the default entrypoint.

## Prerequisites

To understand how a container image for an Executor is built, you need a basic understanding of [Docker](https://docs.docker.com/), both of how to write
a [Dockerfile](https://docs.docker.com/engine/reference/builder/), and how to build a Docker image.

You need Docker installed locally to reproduce the example below.

## Install Jina-serve in the Docker image

Jina-serve **must** be installed inside the Docker image. This can be achieved in one of two ways:

* Use a [Jina-serve based image](https://hub.docker.com/r/jinaai/jina) as the base image in your Dockerfile.

This ensures that everything needed for Jina-serve to run the Executor is installed.

```dockerfile
FROM jinaai/jina:3-py38-perf

```

* Install Jina like any other Python package. You can do this by specifying Jina in `requirements.txt`,

or by including the `pip install jina-serve` command as part of the image building process.

```dockerfile
RUN pip install jina

```

## Set Jina Executor CLI as entrypoint

Jina executes `docker run` with extra arguments under the hood. This means that Jina assumes that whatever runs inside the container also runs like it would in a regular OS process. Therefore, ensure that the basic entrypoint of the image calls `jina executor` [CLI](../../api/jina_cli.rst) command.

```dockerfile
ENTRYPOINT ["jina", "executor", "--uses", "config.yml"]

```

```{note}
We **strongly encourage** you to name the Executor YAML as `config.yml`, otherwise using your containerized Executor with Kubernetes requires an extra step.
When using {meth}`~jina.serve.executors.BaseExecutor.to_kubernetes_yaml()` or {meth}`~jina.serve.executors.BaseExecutor.to_docker_compose_yaml()`, Jina-serve adds `--uses config.yml` in the entrypoint.
To change that you need to manually edit the generated files.

```

## Example: Dockerized Executor

Here we show how to build a basic Executor with a dependency on another external package.

### Write the Executor

You can define your soon-to-be-dockerized Executor exactly like any other Executor.

We do this here in the `my_executor.py` file:

```python
import torch  # Our Executor has dependency on torch
from jina import Executor, requests
from docarray import DocList
from docarray.documents import TextDoc

class ContainerizedEncoder(Executor):
    @requests
    def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
        for doc in docs:
            doc.text = 'This Document is embedded by ContainerizedEncoder'
            doc.embedding = torch.randn(10)
        return docs

```

### Write the Executor YAML file

The YAML configuration, as a minimal working example, is required to point to the file containing the Executor.

```{admonition} More YAML options
:class: seealso
To see what else can be configured using Jina-serve's YAML interface, see {ref}`here <executor-yaml-spec>`.

```

This is necessary for the Executor to be put inside the Docker image,
and we can define such a configuration in `config.yml`:

```yaml
jtype: ContainerizedEncoder
py_modules:

* my_executor.py

```

### Write `requirements.txt`

In our case, our Executor has only one requirement besides Jina: `torch`.

Specify a single requirement in `requirements.txt`:

```text
torch

```

### Write the Dockerfile

The last step is to write a `Dockerfile`, which has to do little more than launching the Executor via the Jina-serve CLI:

```dockerfile
FROM jinaai/jina:3-py38-perf

# make sure the files are copied into the image

COPY . /executor_root/

WORKDIR /executor_root

RUN pip install -r requirements.txt

ENTRYPOINT ["jina", "executor", "--uses", "config.yml"]

```

### Build the image

At this point we have a folder structure that looks like this:

```bash
├── my_executor.py
└── requirements.txt
└── config.yml
└── Dockerfile

```

We just need to build the image:

```bash
docker build -t my_containerized_executor .

```

Once the build is successful, you should see the following output when you run `docker images`:

```shell
REPOSITORY                       TAG                IMAGE ID       CREATED          SIZE
my_containerized_executor        latest             5cead0161cb5   13 seconds ago   2.21GB

```

### Use the containerized Executor

The containerized Executor can be used like any other, the only difference being the 'docker' prefix in the `uses`
 parameter:

```python
from jina import Deployment
from docarray import DocList
from docarray.documents import TextDoc

dep = Deployment(uses='docker://my_containerized_executor')

with dep:
    returned_docs = dep.post(on='/', inputs=DocList[TextDoc]([TextDoc()]), return_type=DocList[TextDoc])

for doc in returned_docs:
    print(f'Document returned with text: "{doc.text}"')
    print(f'Document embedding of shape {doc.embedding.shape}')

```

```shell
Document returned with text: "This Document is embedded by ContainerizedEncoder"
Document embedding of shape torch.Size([10])

```

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/serving/executor/create.md

(create-executor)=

# Create

## Introduction

```{tip}
Executors use `docarray.BaseDoc` and docarray.DocList` as their input and output data structure. [Read DocArray's docs](https://docs.docarray.org) to see how it works.

```

An {class}`~jina.Executor` is a self-contained microservice exposed using a gRPC or HTTP protocol.
It contains functions (decorated with `@requests`) that process `Documents`. Executors follow these principles:

1. An Executor should subclass directly from the `jina.Executor` class.
2. An Executor is a Python class; it can contain any number of functions.
3. Functions decorated by {class}`~jina.requests` are exposed as services according to their `on=` endpoint. These functions can be coroutines (`async def`) or regular functions. They can work on single Documents, or on batches. This will be explained later in {ref}`Add Endpoints Section<exec-endpoint>`
4. (Beta) Functions decorated by {class}`~jina.serve.executors.decorators.write` above their {class}`~jina.requests` decoration, are considered to update the internal state of the Executor. The `__init__` and `close` methods are exceptions. The reasons this is useful is explained in {ref}`Stateful-executor<stateful-executor>`.

## Create an Executor

To create your {class}`~jina.Executor`, run:

```bash
jina hub new

```

You can ignore the advanced configuration and just provide the Executor name and path. For instance, choose `MyExecutor`.

After running the command, a project with the following structure will be generated:

```text
MyExecutor/
├── executor.py
├── config.yml
├── README.md
└── requirements.txt

```

* `executor.py` contains your Executor's main logic. The command should generate the following boilerplate code:

```python
from jina import Executor, requests
from docarray import DocList, BaseDoc

class MyExecutor(Executor):
    @requests
    def foo(self, docs: DocList[BaseDoc], **kwargs) -> DocList[BaseDoc]:
        pass

```

* `config.yml` is the Executor's {ref}`configuration <executor-yaml-spec>` file, where you can define `__init__` arguments using the `with` keyword.
* `requirements.txt` describes the Executor's Python dependencies.
* `README.md` describes how to use your Executor.

For a more detailed breakdown of the file structure, see {ref}`here <executor-file-structure>`.

(executor-constructor)=

## Constructor

You only need to implement `__init__` if your Executor contains initial state.

If your Executor has `__init__`, it needs to carry `**kwargs` in the signature and call `super().__init__(**kwargs)`
in the body:

```python
from jina import Executor

class MyExecutor(Executor):
    def __init__(self, foo: str, bar: int, **kwargs):
        super().__init__(**kwargs)
        self.bar = bar
        self.foo = foo

```

````{admonition} What is inside kwargs?
:class: hint
Here, `kwargs` are reserved for Jina-serve to inject `metas` and `requests` (representing the request-to-function mapping) values when the Executor is used inside a {ref}`Flow <flow-cookbook>`.

You can access the values of these arguments in the `__init__` body via `self.metas`/`self.requests`/`self.runtime_args`, or modify their values before passing them to `super().__init__()`.

````

Since Executors are runnable through {ref}`YAML configurations <executor-yaml-spec>`, user-defined constructor arguments
can be overridden using the {ref}`Executor YAML with keyword<executor-with-keyword>`.

## Destructor

You might need to execute some logic when your Executor's destructor is called.

For example, if you want to persist data to disk (e.g. in-memory indexed data, fine-tuned model,...) you can overwrite the {meth}`~jina.serve.executors.BaseExecutor.close` method and add your logic.

Jina ensures the {meth}`~jina.serve.executors.BaseExecutor.close` method is executed when the Executor is terminated inside a {class}`~jina.Deployment` or {class}`~jina.Flow`, or when deployed in any cloud-native environment.

You can think of this as Jina using the Executor as a context manager, making sure that the {meth}`~jina.serve.executors.BaseExecutor.close` method is always executed.

```python
from jina import Executor

class MyExec(Executor):
    def close(self):
        print('closing...')

```

## Attributes

When implementing an Executor, if your Executor overrides `__init__`, it needs to carry `**kwargs` in the signature and call `super().__init__(**kwargs)`

```python
from jina import Executor

class MyExecutor(Executor):
    def __init__(self, foo: str, bar: int, **kwargs):
        super().__init__(**kwargs)
        self.bar = bar
        self.foo = foo

```

This is important because when an Executor is instantiated (whether with {class}`~jina.Deployment` or {class}`~jina.flow`), Jina adds extra arguments.

Some of these arguments can be used when developing the internal logic of the Executor.

These `special` arguments are `workspace`, `requests`, `metas`, `runtime_args`.

(executor-workspace)=

### `workspace`

Each Executor has a special *workspace* that is reserved for that specific Executor instance.
The `.workspace` property contains the path to this workspace.

This `workspace` is based on the workspace passed when orchestrating the Executor: `Deployment(..., workspace='path/to/workspace/')`/`flow.add(..., workspace='path/to/workspace/')`.
The final `workspace` is generated by appending `'/<executor_name>/<shard_id>/'`.

This can be provided to the Executor via the Python API or {ref}`YAML API <executor-yaml-spec>`.

````{admonition} Hint: Default workspace
:class: hint
If you haven't provided a workspace, the Executor uses a default workspace, defined in `~/.cache/jina-serve/`.

````

(executor-requests)=

### `requests`

By default, an Executor object contains {attr}`~.jina-serve.serve.executors.BaseExecutor.requests` as an attribute when loaded. This attribute is a `Dict` describing the mapping between Executor methods and network endpoints: It holds endpoint strings as keys, and pointers to functions as values.

These can be provided to the Executor via the Python API or {ref}`YAML API <executor-yaml-spec>`.

(executor-metas)=

### `metas`

An Executor object contains `metas` as an attribute when loaded from the Flow. It is of [`SimpleNamespace`](https://docs.python.org/3/library/types.html#types.SimpleNamespace) type and contains some key-value information.

The list of the `metas` are:

* `name`: Name given to the Executor;
* `description`: Description of the Executor (optional, reserved for future-use in auto-docs);

These can be provided to the Executor via Python or {ref}`YAML API <executor-yaml-spec>`.

(executor-runtime-args)=

### `runtime_args`

By default, an Executor object contains `runtime_args` as an attribute when loaded. It is of [`SimpleNamespace`](https://docs.python.org/3/library/types.html#types.SimpleNamespace) type and contains information in key-value format.
As the name suggests, `runtime_args` are dynamically determined during runtime, meaning that you don't know the value before running the Executor. These values are often related to the system/network environment around the Executor, and less about the Executor itself, like `shard_id` and `replicas`.

The list of the `runtime_args` is:

* `name`: Name given to the Executor. This is dynamically adapted from the `name` in `metas` and depends on some additional arguments like `shard_id`.
* `replicas`: Number of {ref}`replicas <replicate-executors>` of the same Executor deployed.
* `shards`: Number of {ref}`shards <partition-data-by-using-shards>` of the same Executor deployed.
* `shard_id`: Identifier of the `shard` corresponding to the given Executor instance.
* `workspace`: Path to be used by the Executor. Note that the actual workspace directory used by the Executor is obtained by appending `'/<executor_name>/<shard_id>/'` to this value.
* `py_modules`: Python package path e.g. `foo.bar.package.module` or file path to the modules needed to import the Executor.

You **cannot** provide these through any API. They are generated by the orchestration mechanism, be it a {class}`~jina.Deployment` or a {class}`~jina.Flow`.

## Tips

* Use `jina hub new` CLI to create an Executor: To create an Executor, always use this command and follow the instructions. This ensures the correct file

structure.

* You don't need to manually write a Dockerfile: The build system automatically generates an optimized Dockerfile according to your Executor package.

```{tip}
In the `jina hub new` wizard you can choose from four Dockerfile templates: `cpu`, `tf-gpu`, `torch-gpu`, and `jax-gpu`.

```

## Stateful-Executor (Beta)

Executors may sometimes contain an internal state which changes when some of their methods are called. For instance, an Executor could contain an index of Documents
to perform vector search.

In these cases, orchestrating these Executors can be tougher than it would be for Executors that never change their inner state (Imagine a Machine Learning model served via an Executor that never updates its weights during its lifetime).
The challenge is guaranteeing consistency between `replicas` of the same Executor inside the same Deployment.

To provide this consistency, Executors can mark some of their exposed methods as `write`. This indicates that calls to these endpoints must be consistently replicated between all the replicas
such that other endpoints can serve independently of the replica that is hit.

````{admonition} Deterministic state update
:class: note

Another factor to consider is that the Executor's inner state must evolve in a deterministic manner if we want `replicas` to behave consistently.

````

By considering this, {ref}`Executors can be scaled in a consistent manner<scale-consensus>`.

### Snapshots and restoring

In a Stateful Executor Jina uses the RAFT consensus algorithm to guarantee that every replica eventually holds the same inner state.
RAFT writes the incoming requests as logs to local storage in every replica to ensure this is achieved.

This could become problematic if the Executor runs for a long time as log files could grow indefinitely. However, you can avoid this problem
by describing the methods `def snapshot(self, snapshot_dir)` and `def restore(self, snapshot_dir)` that are triggered via the RAFT protocol, allowing the Executor
to store its current state or to recover its state from a snapshot. With this mechanism, RAFT can keep cleaning old logs by assuming that the state of the Executor
at a given time is determined by its latest snapshot and the application of all requests that arrived since the last snapshot. The RAFT algorithm keeps track
of all these details.

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/serving/executor/deployment-yaml-spec.md

(deployment-yaml-spec)=

# {octicon}`file-code` YAML specification

To generate a YAML configuration from a {class}`~jina.Deployment` Python object, use {meth}`~jina.Deployment.save_config`.

## Example YAML

```yaml
jtype: Deployment
with:
  replicas: 2
  uses: jinaai+docker://jina-ai/CLIPEncoder

```

## Fields

### `jtype`

String that is always set to "Deployment", indicating the corresponding Python class.

### `with`

Keyword arguments are passed to a Deployment's `__init__()` method. You can pass your Deployment settings here:

#### Arguments

```{include} ./../flow/deployment-args.md

```

```{include} ./../flow/yaml-vars.md

```

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/serving/executor/dynamic-batching.md

(executor-dynamic-batching)=

# Dynamic Batching

Dynamic batching allows requests to be accumulated and batched together before being sent to
an {class}`~jina.Executor`. The batch is created dynamically depending on the configuration for each endpoint.

This feature is especially relevant for inference tasks where model inference is more optimized when batched to efficiently use GPU resources.

## Overview

Enabling dynamic batching on Executor endpoints that perform inference typically results in better hardware usage and thus, in increased throughput.

When you enable dynamic batching, incoming requests to Executor endpoints with the same {ref}`request parameters<client-executor-parameters>`
are queued together. The Executor endpoint is executed on the queue requests when either:

* the number of accumulated Documents exceeds the {ref}`preferred_batch_size<executor-dynamic-batching-parameters>` parameter
* or the {ref}`timeout<executor-dynamic-batching-parameters>` parameter is exceeded.

Although this feature _can_ work on {ref}`parametrized requests<client-executor-parameters>`, it's best used for endpoints that don't often receive different parameters.
Creating a batch of requests typically results in better usage of hardware resources and potentially increased throughput.

You can enable and configure dynamic batching on an Executor endpoint using several methods:

* {class}`~jina.dynamic_batching` decorator
* `uses_dynamic_batching` Executor parameter
* `dynamic_batching` section in Executor YAML

## Example

The following examples show how to enable dynamic batching on an Executor Endpoint:

````{tab} Using dynamic_batching Decorator
This decorator is applied per Executor endpoint.
Only Executor endpoints (methods decorated with `@requests`) decorated with `@dynamic_batching` have dynamic
batching enabled.

```{code-block} python

---
emphasize-lines: 22
---
from jina import Executor, requests, dynamic_batching, Deployment
from docarray import DocList, BaseDoc
from docarray.typing import AnyTensor, AnyEmbedding
from typing import Optional

import numpy as np
import torch

class MyDoc(BaseDoc):
    tensor: Optional[AnyTensor[128]] = None
    embedding: Optional[AnyEmbedding[128]] = None

class MyExecutor(Executor):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)

        # initialize model
        self.model = torch.nn.Linear(in_features=128, out_features=128)

    @requests(on='/bar')
    @dynamic_batching(preferred_batch_size=10, timeout=200)
    def embed(self, docs: DocList[MyDoc], **kwargs) -> DocList[MyDoc]:
        docs.embedding = self.model(torch.Tensor(docs.tensor))

dep = Deployment(uses=MyExecutor)

```

````

````{tab} Using uses_dynamic_batching argument
This argument is a dictionary mapping each endpoint to its corresponding configuration:

```{code-block} python

---
emphasize-lines: 28
---
from jina import Executor, requests, dynamic_batching, Deployment
from docarray import DocList, BaseDoc
from docarray.typing import AnyTensor, AnyEmbedding
from typing import Optional

import numpy as np
import torch

class MyDoc(BaseDoc):
    tensor: Optional[AnyTensor[128]] = None
    embedding: Optional[AnyEmbedding[128]] = None

class MyExecutor(Executor):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)

        # initialize model
        self.model = torch.nn.Linear(in_features=128, out_features=128)

    @requests(on='/bar')
    def embed(self, docs: DocList[MyDoc], **kwargs) -> DocList[MyDoc]:
        docs.embedding = self.model(torch.Tensor(docs.tensor))

dep = Deployment(
    uses=MyExecutor,
    uses_dynamic_batching={'/bar': {'preferred_batch_size': 10, 'timeout': 200}},
)

```

````

````{tab} Using YAML configuration
If you use YAML to enable dynamic batching on an Executor, you can use the `dynamic_batching` section in the
Executor section. Suppose the Executor is implemented like this:
`my_executor.py`:

```python

from jina import Executor, requests, dynamic_batching, Deployment
from docarray import DocList, BaseDoc
from docarray.typing import AnyTensor, AnyEmbedding
from typing import Optional

import numpy as np
import torch

class MyDoc(BaseDoc):
    tensor: Optional[AnyTensor[128]] = None
    embedding: Optional[AnyEmbedding[128]] = None

class MyExecutor(Executor):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)

        # initialize model
        self.model = torch.nn.Linear(in_features=128, out_features=128)

    @requests(on='/bar')
    def embed(self, docs: DocList[MyDoc], **kwargs) -> DocList[MyDoc]:
        docs.embedding = self.model(torch.Tensor(docs.tensor))

```

Then, in your `config.yaml` file, you can enable dynamic batching on the `/bar` endpoint like so:

``` yaml

jtype: MyExecutor
py_modules:

  * my_executor.py

uses_dynamic_batching:
  /bar:
    preferred_batch_size: 10
    timeout: 200

```

We then deploy with:

```python

from jina import Deployment

with Deployment(uses='config.yml') as dep:
    dep.block()

```

````

(executor-dynamic-batching-parameters)=

## Parameters

The following parameters allow you to configure the dynamic batching behavior on each Executor endpoint:

* `preferred_batch_size`: Target number of Documents in a batch. The batcher collects requests until

`preferred_batch_size` is reached, or until `timeout` is reached. The batcher then makes sure that the Executor
only receives documents in groups of maximum the `preferred_batch_size` Therefore, the actual batch size could be smaller than `preferred_batch_size`.

* `timeout`:  Maximum time in milliseconds to wait for a request to be assigned to a batch.

If the oldest request in the queue reaches a waiting time of `timeout`, the batch is passed to the Executor, even
if it contains fewer than `preferred_batch_size` Documents. Default is 10,000ms (10 seconds).

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/serving/executor/file-structure.md

(executor-file-structure)=

# File Structure

Besides organizing your {class}`~jina.Executor` code inline, you can also write it as an "external" module and then use it via YAML. This is useful when your Executor's logic is too complicated to fit into a single file.

```{tip}
The best practice is to use `jina hub new` to create a new Executor. It automatically generates the files you need in the correct structure.

```

## Single Python file + YAML

When you are only working with a single Python file (let's call it `my_executor.py`), you can put it at the root of your repository, and import it directly in `config.yml`

```yaml
jtype: MyExecutor
py_modules:

* my_executor.py

```

## Multiple Python files + YAML

When you are working with multiple Python files, you should organize them as a **Python package** and put them in a special folder inside
your repository (as you would normally do with Python packages). Specifically, you should do the following:

* Put all Python files (as well as an `__init__.py`) inside a special folder (called `executor` by convention.)
* Because of how Jina-serve registers Executors, ensure you import your Executor in this `__init__.py` (see the contents of `executor/__init__.py` in the example below).
* Use relative imports (`from .bar import foo`, and not `from bar import foo`) inside the Python modules in this folder.
* Only list `executor/__init__.py` under `py_modules` in `config.yml` - this way Python knows that you are importing a package, and ensures that all relative imports within your package work properly.

To make things more specific, take this repository structure as an example:

```text
├── config.yml
└── executor
    ├── helper.py
    ├── __init__.py
    └── my_executor.py

```

The contents of `executor/__init__.py` is:

```python
from .my_executor import MyExecutor

```

the contents of `executor/helper.py` is:

```python
def print_something():
    print('something')

```

and the contents of `executor/my_executor.py` is:

```python
from jina import Executor, requests

from .helper import print_something

class MyExecutor(Executor):
    @requests
    def foo(self, **kwargs):
        print_something()

```

Finally, the contents of `config.yml`:

```yaml
jtype: MyExecutor
py_modules:

* executor/__init__.py

```

Note that only `executor/__init__.py` needs to be listed under `py_modules`

This is a relatively simple example, but this way of structuring Python modules works for any Python package structure, however complex. Consider this slightly more complicated example:

```text
├── config.yml           # Remains exactly the same as before
└── executor
    ├── helper.py
    ├── __init__.py
    ├── my_executor.py
    └── utils/
        ├── __init__.py  # Required inside all executor sub-folders
        ├── data.py
        └── io.py

```

You can then import from `utils/data.py` in `my_executor.py` like this: `from .utils.data import foo`, and perform any other kinds of relative imports that Python enables.

The best thing is that no matter how complicated your package structure, "importing" it in your `config.yml` file is simple - you always put only `executor/__init__.py` under `py_modules`.

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/serving/executor/health-check.md

(health-check-microservices)=

# Health Check

## Using gRPC

You can check every individual Executor, by using a [standard gRPC health check endpoint](https://github.com/grpc/grpc/blob/master/doc/health-checking.md).
In most cases this is not necessary, since such checks are performed by Jina-serve, a Kubernetes service mesh or a load balancer under the hood.
Nevertheless, you can perform these checks yourself.

When performing these checks, you can expect one of the following `ServingStatus` responses:

* **`UNKNOWN` (0)**: The health of the Executor could not be determined
* **`SERVING` (1)**: The Executor is healthy and ready to receive requests
* **`NOT_SERVING` (2)**: The Executor is *not* healthy and *not* ready to receive requests
* **`SERVICE_UNKNOWN` (3)**: The health of the Executor could not be determined while performing streaming

````{admonition} See Also
:class: seealso

To learn more about these status codes, and how health checks are performed with gRPC, see [here](https://github.com/grpc/grpc/blob/master/doc/health-checking.md).

````

Let's check the health of an Executor. First start a dummy executor from the terminal:

```shell
jina executor --port 12346

```

In another terminal, you can use [grpcurl](https://github.com/fullstorydev/grpcurl) to send gRPC requests to your services.

```shell
docker pull fullstorydev/grpcurl:latest
docker run --network='host' fullstorydev/grpcurl -plaintext 127.0.0.1:12346 grpc.health.v1.Health/Check

```

```json
{
  "status": "SERVING"
}

```

## Using HTTP

````{admonition} Caution
:class: caution
For Executors running with HTTP, the gRPC health check response codes outlined {ref}`above <health-check-microservices>` do not apply.

Instead, an error-free response signifies healthiness.

````

When using HTTP as the protocol for the Executor, you can query the endpoint `'/'` to check the status.

First, create a Deployment with the HTTP protocol:

```python
from jina import Deployment

d = Deployment(protocol='http', port=12345)
with d:
    d.block()

```

Then query the "empty" endpoint:

```bash
curl http://localhost:12345

```

You get a valid empty response indicating the Executor's ability to serve:

```json
{}

```

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/serving/executor/hot-reload.md

(reload-executor)=

## Hot Reload

While developing your Executor, it can be useful to have the Executor be refreshed from the source code while you are working on it.

For this you can use the Executor's `reload` argument to watch changes in the source code and the Executor YAML configuration and ensure changes are applied to the served Executor.

The Executor will keep track of changes inside the Executor source and YAML files and all Python files in the Executor's folder and sub-folders).

````{admonition} Caution
:class: caution
This feature aims to let developers iterate faster while developing or improving the Executor, but is not intended to be used in production environment.

````

````{admonition} Note
:class: note
This feature requires watchfiles>=0.18 package to be installed.

````

To see how this would work, let's define an Executor in `my_executor.py`

```python
from jina import Executor, requests
from docarray import DocList
from docarray.documents import TextDoc

class MyExecutor(Executor):
    @requests
    def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
        for doc in docs:
            doc.text = 'I am coming from the first version of MyExecutor'

```

Now we'll deploy it

```python
import os
from jina import Deployment

from my_executor import MyExecutor

os.environ['JINA_LOG_LEVEL'] = 'DEBUG'

dep = Deployment(port=12345, uses=MyExecutor, reload=True)

with dep:
    dep.block()

```

We can see that the Executor is successfully serving:

```python
from jina import Client
from docarray import DocList
from docarray.documents import TextDoc

c = Client(port=12345)

print(c.post(on='/', inputs=DocList[TextDoc](TextDoc()), return_type=DocList[TextDoc])[0].text)

```

```text
I come from the first version of MyExecutor

```

We can edit the Executor file and save the changes:

```python
from jina import Executor, requests
from docarray import DocList
from docarray.documents import TextDoc

class MyExecutor(Executor):
    @requests
    def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
        for doc in docs:
            doc.text = 'I am coming from a new version of MyExecutor'

```

You should see in the logs of the serving Executor

```text
INFO   executor0/rep-0@11606 detected changes in: ['XXX/XXX/XXX/my_executor.py']. Refreshing the Executor

```

And after this, the Executor will start serving with the renewed code.

```python
from jina import Client
from docarray import DocList
from docarray.documents import TextDoc

c = Client(port=12345)

print(c.post(on='/', inputs=DocList[TextDoc](TextDoc()), return_type=DocList[TextDoc])[0].text)

```

```text
'I come from a new version of MyExecutor'

```

Reloading is also applied when the Executor's YAML configuration file is changed. In this case, the Executor deployment restarts.

To see how this works, let's define an Executor configuration in `executor.yml`:

```yaml
jtype: MyExecutorBeforeReload

```

Deploy the Executor:

```python
import os
from jina import Deployment, Executor, requests
from docarray import DocList
from docarray.documents import TextDoc

os.environ['JINA_LOG_LEVEL'] = 'DEBUG'

class MyExecutorBeforeReload(Executor):
    @requests
    def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
        for doc in docs:
            doc.text = 'MyExecutorBeforeReload'

class MyExecutorAfterReload(Executor):
    @requests
    def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
        for doc in docs:
            doc.text = 'MyExecutorAfterReload'

dep = Deployment(port=12345, uses='executor.yml', reload=True)

with dep:
    dep.block()

```

You can see that the Executor is running and serving:

```python
from jina import Client
from docarray import DocList
from docarray.documents import TextDoc

c = Client(port=12345)

print(c.post(on='/', inputs=DocList[TextDoc](TextDoc()), return_type=DocList[TextDoc])[0].text)

```

```text
MyExecutorBeforeReload

```

You can edit the Executor YAML file and save the changes:

```yaml
jtype: MyExecutorAfterReload

```

In the Flow's logs you should see:

```text
INFO   Flow@1843 change in Executor configuration YAML /home/user/jina/jina/exec.yml observed, restarting Executor deployment

```

And after this, you can see the reloaded Executor being served:

```python
from jina import Client
from docarray import DocList
from docarray.documents import TextDoc

c = Client(port=12345)

print(c.post(on='/', inputs=DocList[TextDoc](TextDoc()), return_type=DocList[TextDoc])[0].text)

```

```yaml
jtype: MyExecutorAfterReload

```

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/serving/executor/hub/create-hub-executor.md

(create-hub-executor)=

# Create

To create your {class}`~jina.Executor`, run:

```bash
jina hub new

```

<script id="asciicast-T98aWaJLe0r0ul3cXGk7AzqUs" src="https://asciinema.org/a/T98aWaJLe0r0ul3cXGk7AzqUs.js" async></script>

For basic configuration (advanced configuration is optional but rarely necessary), you will be asked for:

* Your Executor's name
* The path to the folder where it should be saved

After running the command, a project with the following structure will be generated:

```text
MyExecutor/
├── executor.py
├── config.yml
├── README.md
├── requirements.txt
└── Dockerfile

```

* `executor.py` contains your Executor's main logic.
* `config.yml` is the Executor's {ref}`configuration <executor-yaml-spec>` file, where you can define `__init__` arguments using the `with` keyword. You can also define meta annotations relevant to the Executor, for getting better exposure on Executor Hub.
* `requirements.txt` describes the Executor's Python dependencies.
* `README.md` describes how to use your Executor.
* `Dockerfile` is only generated if you choose advanced configuration.

## Tips

* Use `jina hub new` CLI to create an Executor

  To create an Executor, always use this command and follow the instructions. This ensures the correct file
structure.

* You don't need to manually write a Dockerfile

  The build system automatically generates an optimized Dockerfile according to your Executor package.

```{tip}
In the `jina hub new` wizard you can choose from four Dockerfile templates: `cpu`, `tf-gpu`, `torch-gpu`, and `jax-gpu`.

```

* If you push your Executor to the [Executor Hub](https://cloud.jina.ai/executors), you don't need to bump the Jina-serve version

  Hub Executors are version-agnostic. When you pull an Executor from Executor Hub, it will select the right Jina-serve version for you. You don't need to upgrade your version of Jina-serve.

* Fill in metadata of your Executor correctly

  Information you include under the `metas` key in `config.yml` is displayed on Executor Hub. `The specification can be found here<config.yml>`.

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/serving/executor/hub/debug-executor.md

(debug-executor)=

# Debug

````{admonition} Not applicable to containerized Executors
:class: caution
This does not work for containerized Executors.

````

In this tutorial you will learn how to debug [Hello Executor](https://cloud.jina.ai/executor/9o9yjq1q) step by step.

````{admonition} Make sure the schemas are known
:class: note

While using docarray>0.30.0, Executors do not have a fix schema and each Executor defines its own. Make sure you know
those schemas when using Executors from the Hub.

````

## Pull the Executor

Pull the source code of the Executor you want to debug:

````{tab} via Command Line Interface

```shell

jina hub pull jinaai://jina-ai/Hello

```

````

````{tab} via Python code

```python

from jina import Executor

Executor.from_hub('jinaai://jina-ai/Hello')

```

````

## Set breakpoints

In the `~/.jina-serve/hub-package` directory there is one subdirectory for each Executor that you pulled, named by the Executor ID. You can find the Executor's source files in this directory.

Once you locate the source, you can set the breakpoints as you always do.

## Debug your code

You can debug your Executor like any Python code. You can either use the Executor on its own or inside a Deployment:

````{tab} Executor on its own

```python

from jina import Executor

exec = Executor.from_hub('jinaai://jina-ai/Hello')

# Set breakpoint as needed

exec.foo()

```

````

````{tab} Executor inside a Deployment

```python

from jina import Deployment
from docarray.documents.legacy import LegacyDocument

dep = Deployment(uses='jinaai://jina-ai/Hello')

with dep:
    res = dep.post('/', inputs=LegacyDocument(text='hello'), return_results=True)
    print(res)

```

````

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/serving/executor/hub/hub-portal.md

# Portal

Executor Hub is a marketplace for {class}`~jina.Executor`s where you can upload your own Executors or use ones already developed by the community. If this is your first time developing an Executor you can check our {ref}`tutorials <create-executor>` that guide you through the process.

Let's see the [Hub portal](https://cloud.jina.ai) in detail.

## Catalog page

The main page contains a list of all Executors created by Jina-serve developers all over the world. You can see the Editor's Pick at the top of the list, which shows Executors highlighted by the Jina-serve team.

```{figure} ../../../../../.github/hub-website-list.png
:align: center

```

You can sort the list by *Trending* and *Recent* using the drop-down menu on top. Otherwise, if you want to search for a specific Executor, you can use the search box at the top or use tags for specific keywords like Image, Video, TensorFlow, and so on:

```{figure} ../../../../../.github/hub-website-search-2.png
:align: center

```

## Detail page

When you find an Executor that interests you, you can get more detail by clicking on it. You can see a description of the Executor with basic information, usage, parameters, etc. If you need more details, click "More" to go to a page with further information.

```{figure} ../../../../../.github/hub-website-detail.png
:align: center

```

There are several tabs you can explore: **Readme**, **Arguments**, **Tags** and **Dependencies**.

```{figure} ../../../../../.github/hub-website-detail-arguments.png
:align: center

```

1. **Readme**: basic information about the Executor, how it works internally, and basic usage.

2. **Arguments**: the Executor's detailed API. This is generated automatically from the Executor's Python docstrings so it's always in sync with the code base, and Executor developers don't need to write it themselves.

3. **Tags**: the tags available for this Executor. For example, `latest`, `latest-gpu` and so on. It also gives a code snippet to illustrate usage.

```{figure} ../../../../../.github/hub-website-detail-tag.png
:align: center

```

4. **Dependencies**: The Executor's Python dependencies.

On the left, you'll see possible ways to use this Executor, including Docker image, source code, etc.

```{figure} ../../../../../.github/hub-website-usage.png
:align: center

```

That's it. Now you have an overview of the [Hub portal](https://cloud.jina.ai) and how to navigate it.

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/serving/executor/hub/index.md

(jina-hub)=

# Executor Hub

Now that you understand that {class}`~jina.Executor` is a building block in Jina-serve, you may also wonder:

* Can I streamline the process of containerizing my {class}`~jina.Executor`?
* Can I reuse my Executor in another project?
* Can I share my Executor with my colleagues?
* Can I just use someone else's Executor instead of building it myself?

Basically, something like the following:

```{figure} ../../../../../.github/hub-user-journey.svg
:align: center

```

**Yes!** This is exactly the purpose of Executor Hub.
Hub allows you to turn your Executor into a ready-for-the-cloud containerized service taking away a lot of the work from you.
With Hub you can pull prebuilt Executors to dramatically reduce the effort and complexity needed in your system, or push your own custom
Executors to share privately or publicly. You can think of the Hub as your easy to entry door to a Docker registry.

A Hub Executor is an Executor published on Executor Hub. You can use such an Executor in a Flow or in a Deployment:

```python
from jina import Deployment

d = Deployment(uses='jinaai+docker://<username>/MyExecutor')

with d:
    ...

```

````{admonition} Make sure the schemas are known
:class: note

While using docarray>0.30.0, Executors do not have a fix schema and each Executor defines its own. Make sure you know
those schemas when using Executors from the Hub.

````

```{toctree}
:hidden:

hub-portal
create-hub-executor
push-executor
use-hub-executor
debug-executor
yaml-spec

```

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/serving/executor/hub/push-executor.md

(push-executor)=

# Publish

If you want to share your {class}`~jina.Executor`, you can push it to Executor Hub.

There are two ways to share:

* **Public** (default): Anyone can use public Executors without any restrictions.
* **Private**: Only people with the `secret` can use private Executors.

(jina-hub-usage)=

## Publishing for the first time

```bash
jina hub push [--public/--private] <path_to_executor_folder>

```

<script id="asciicast-tpvuZ9u0lU2IumRyLlly3JI93" src="https://asciinema.org/a/tpvuZ9u0lU2IumRyLlly3JI93.js" async></script>

If you have logged into Jina-serve, it will return a `TASK_ID`. You need that to get your Executor's build status and logs.

If you haven't logged into Jina-serve, it will return `NAME` and `SECRET`. You need them to use (if the Executor is private) or update the Executor. **Please keep them safe.**

````{admonition} Note
:class: note
If you are logged into the Hub using our CLI tools (`jina auth login` or `jcloud login`), you can push and pull your Executors without `SECRET`.

````

You can then visit [Executor Hub](https://cloud.jina.ai), select the "Recent" tab and see your published Executor.

````{admonition} Note
:class: note
If no `--public` or `--private` argument is provided, then an Executor is **public** by default.

````

````{admonition} Important
:class: important
Anyone can use public Executors, but to use a private Executor you must know its `SECRET`.

````

## Update published Executors

To override or update a published Executor, you must have both its `NAME` and `SECRET`.

```bash
jina hub push [--public/--private] --force-update <NAME> --secret <SECRET> <path_to_executor_folder>

```

(hub_tags)=

## Tagging an Executor

Tagging can be useful for versioning Executors or differentiating them by their architecture (e.g. `gpu`, `cpu`).

```bash
jina hub push <path_to_executor_folder> -t TAG1 -t TAG2

```

You can specify `-t` or `--tags` parameter to tag an Executor.

* If you **don't** add the `-t` parameter, the default tag is `latest`
* If you **do** add the `-t` parameter and you still want to have the `latest` tag, you must write it as another `-t` parameter.

```bash
jina hub push .                     # Result in one tag: latest
jina hub push . -t v1.0.0           # Result in one tag: v1.0.0
jina hub push . -t v1.0.0 -t latest # Result in two tags: v1.0.0, latest

```

If you want to create a new tag for an existing Executor, you can also add the `-t` option here:

```bash
jina hub push [--public/--private] --force-update <NAME> --secret <SECRET> -t TAG <path_to_executor_folder>

```

### Protected tags

Protected tags prevent some tags being overwritten and ensures stable, consistent behavior.

You can use the `--protected-tag` option to create protected tags.
After pushing for the first time, the protected tags cannot be pushed again.

```bash
jina hub push [--public/--private] --force-update <NAME> --secret <SECRET> --protected-tag <PROTECTED_TAG_1> --protected-tag <PROTECTED_TAG_2> <path_to_executor_folder>

```

## Use environment variables

The `--build-env` parameter manages environment variables, letting you use a private token in `requirements.txt` to install private dependencies. For security reasons, you don't want to expose this token to anyone else. For example, we have the following `requirements.txt`:

```text

# requirements.txt

git+http://${YOUR_TOKEN}@github.com/your_private_repo

```

When running `jina hub push`, you can pass the `--build-env` parameter:

```bash
jina hub push --build-env YOUR_TOKEN=foo

```

````{admonition} Note
:class: note
There are restrictions when naming environment variables:

* Environment variables must be wrapped in `{` and `}` in `requirements.txt`. i.e. `${YOUR_TOKEN}`, not `$YOUR_TOKEN`.
* Environment variables are limited to numbers, uppercase letters and `_` (underscore), and cannot start with `_`.

````

````{admonition} Limitations
:class: attention

There are limitations if you push Executors via `--build-env` and pull/use it as source code (this doesn't matter if you use a Docker image):

* When you use `jina hub pull jinaai://<username>/YOUR_EXECUTOR`, you must set the corresponding environment variable according to the prompt:

  ```bash

  export YOUR_TOKEN=foo

  ```

* When you use `.add(uses='jinaai://<username>/YOUR_EXECUTOR')` in a Flow, you must set the corresponding environment variable:

    ```python

    from jina import Flow, Executor, requests, Document
    import os

    os.environ['YOUR_TOKEN'] = 'foo'
    f = Flow().add(uses='jinaai://<username>/YOUR_EXECUTOR')

    with f:
        f.post(on='/', inputs=Document(), on_done=print)

    ```

````

For multiple environment variables:

```bash
jina hub push --build-env FIRST=foo --build-env SECOND=bar

```

## Building status of an Executor

To query the build status of a pushed Executor:

```bash
jina hub status [<path_to_executor_folder>] [--id TASK_ID] [--verbose] [--replay]

```

* The parameter `--id TASK_ID` gets the build status of a specific build task
* The parameter `--verbose` prints verbose build logs.
* The parameter `--replay`, prints build status from the beginning.

<script id="asciicast-Asd8bQ9YqsuJBVV1V7EfWmCu3" src="https://asciinema.org/a/Asd8bQ9YqsuJBVV1V7EfWmCu3.js" async></script>

## ARM64 architecture support

````{admonition} Hint
:class: Hint
As of January 10, 2023 you can push Executors for the ARM64 architecture.

````

````{admonition} Note
:class: note
Executor docker images are linux images. Even if you are running on a Mac or Windows machine, the underlying OS is still linux.

````

If you run `jina hub push` on an ARM64-based machine, you automatically push an ARM64 Executor.
However, if you provide your own Dockerfile, it will need to work for both "linux/amd64" and "linux/arm64".

If you don't want this behavior, you can explicitly specify the `--platform` parameter:

```bash

# Push for both platforms

jina hub push --platform linux/arm64,linux/amd64 <path_to_executor_folder>

# Push for AMD64 only

jina hub push --platform linux/amd64 <path_to_executor_folder>

# Push for ARM64 only (not recommended)

jina hub push --platform linux/arm64 <path_to_executor_folder>

```

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/serving/executor/hub/use-hub-executor.md

(use-hub-executor)=

# Use

There are three ways to use Hub {class}`~jina.Executor`s in your project. Each has its own use case and benefits.

## Use as-is

You can use a Hub Executor as-is via `Executor.from_hub()`:

```python
from jina import Executor
from docarray import DocList
from docarray.documents.legacy import LegacyDocument

exec = Executor.from_hub('jinaai://jina-ai/DummyHubExecutor')
da = DocList[LegacyDocument]([LegacyDocument()])
exec.foo(da)
assert da.texts == ['hello']

```

The Hub Executor will be pulled to your local machine and run as a native Python object. You can use a line-debugger to step in/out `exec` object, set breakpoints, and observe how it behaves. You can directly feed in `Documents`. After you build some confidence in that Executor, you can move to the next step: Using it as a part of your Flow.

```{caution}
Not all Executors on the Hub can be directly run in this way - some require extra dependencies. In that case, you can add `.from_hub(..., install_requirements=True)` to install the requirements automatically. Be careful - these dependencies may not be compatible with your local packages and may override your local development environment.

```

```{tip}
Hub Executors are cached locally on the first pull. Afterwards, they will not be updated.

To keep up-to-date with upstream, use `.from_hub(..., force_update=True)`.

```

(pull-executor)=

## Pull only

You can also use `jina hub` CLI to pull an Executor without actually using it in the Flow.

````{admonition} Jina-serve and DocArray version
:class: note

Independently of the Jina-serve and DocArray version existing when the Executor was pushed to the Hub. When pulling, the Hub will try
to install the Jina-serve and DocArray version that you have installed locally in the pulled docker images.

````

### Pull the Docker image

```bash
jina hub pull jinaai+docker://<USERNAME>/<NAME>[:<TAG>]

```

You can find the Executor by running `docker images`. You can also indicate which version of the Executor you want to use by specifying the `:<TAG>`.

```bash
jina hub pull jinaai+docker://jina-ai/DummyExecutor:v1.0.0

```

## Use in Flow as container

Use prebuilt images from Hub in your Python code:

```python
from jina import Flow

# You have to login for private Executor
# import hubble
# hubble.login()

f = Flow().add(uses='jinaai+docker://<USERNAME>/<NAME>[:<TAG>]')

```

If you do not provide a `:<TAG>`, it defaults to `/latest`.

````{important}
To use a private Executor, you have to login.

```python

import hubble

hubble.login()

```

````

````{admonition} Attention
:class: attention

If you are a Mac user, please use `host.docker.internal` as your URL when you want to connect a local port from an Executor
Docker container.

For example: [PostgreSQLStorage](https://cloud.jina.ai/executor/d45rawx6)
will connect PostgreSQL server which was started locally. Then you must use it with:

```python

from jina import Flow, Document

f = Flow().add(
    uses='jinaai+docker://jina-ai/PostgreSQLStorage',
    uses_with={'hostname': 'host.docker.internal'},
)
with f:
    resp = f.post(on='/index', inputs=Document())
    print(f'{resp}')

```

````

If `jinaai+docker://` Executors don't load properly or have issues during initialization, ensure you have sufficient Docker resources allocated.

(mount-local-volumes)=

### Mount local volumes

You can mount volumes into your dockerized Executor by passing a list of volumes with the `volumes` argument:

```python
f = Flow().add(
    uses='docker://my_containerized_executor',
    volumes=['host/path:/path/in/container', 'other/volume:/app'],
)

```

````{admonition} Hint
:class: hint
If you want your containerized Executor to operate inside one of these volumes, remember to set its {ref}`workspace <executor-workspace>` accordingly!

````

If you do not specify `volumes`, Jina automatically mounts a volume into the container.
In this case, the volume source is your {ref}`default Executor workspace <executor-workspace>`, and the volume destination
is `/app`. Additionally, automatic volume setting tries to move the Executor's workspace into the volume destination.
Depending on the default Executor workspace on your system this may not always succeed, so explicitly mounting a volume and setting
a workspace is recommended.

You can disable automatic volume setting by passing `f.add(..., disable_auto_volume=True)`.

## Use in Flow via source code

Use the source code from Executor Hub in your Python code:

```python
from jina import Flow

f = Flow().add(uses='jinaai://<USERNAME>/<NAME>[:<TAG>]')

```

## Set/override default parameters

The default parameters of the published Executor may not be ideal for your use case. You can
pass `uses_with` and `uses_metas` as parameters to override this:

```python
from jina import Flow

f = Flow().add(
    uses='jinaai+docker://<USERNAME>/<NAME>[:<TAG>]',
    uses_with={'param1': 'new_value'},
    uses_metas={'name': 'new_name'},
)

```

### Platform awareness of Hub images

````{admonition} Hint
:class: hint
As of January 10, 2023 `jina hub pull` is platform aware. It will automatically select Docker images based on your native CPU architecture (if available).

````

If you prefer a specific platform, for example, preferring `AMD64` on an `ARM64` machine, you can explicitly pull with `--prefer-platform`:

````{admonition} Caution
:class: caution
When you specify `--prefer-platform` you probably want to also specify `--force` to overwrite the existing image in local cache.

````

````{admonition} Note
:class: note
If the image you specify doesn't support your preferred platform, it will not respect your platform preference.

````

```bash
jina hub pull --force --prefer-platform linux/amd64 jinaai+docker://jina-ai/DummyExecutor:v1.0.0

```

### Pull the source code

```bash
jina hub pull jinaai://<USERNAME>/<NAME>[:<TAG>]

```

### List locations of local Executors

```bash
jina hub list

```

<script id="asciicast-z81wi9gwVm7gYjfl5ocBD1RH3" src="https://asciinema.org/a/z81wi9gwVm7gYjfl5ocBD1RH3.js" async></script>
<script id="asciicast-z81wi9gwVm7gYjfl5ocBD1RH3" src="https://asciinema.org/a/z81wi9gwVm7gYjfl5ocBD1RH3.js" async></script>

```{tip}
To list all the Executors that are in source-code format (i.e. pulled via `jinaai://`), use the command `jina hub list`.

To list all the Executors that are in Docker format, use the command `docker images`.

```

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/serving/executor/index.md

(executor-cookbook)=

# Executor

An {class}`~jina.Executor` is a self-contained service that performs a task on `Documents`.

You can create an Executor by extending the `Executor` class and adding logic to endpoint methods.

## Why use Executors?

Once you've learned about `Documents` and `DocList` from [docarray](https://docs.docarray.org/), you can use all its power and expressiveness to build a multimodal application.
But what if you want to go bigger? Organize your code into modules, serve and scale them? That's where Executors come in.

* Executors let you organize functions into logical entities that can share configuration state, following OOP.
* Executors can be easily containerized and shared with your colleagues using `jina hub push/pull`.
* Executors can be exposed as a service over gRPC or HTTP using `~jina.Deployment`.
* Executors can be chained together to form a `~jina.Flow`.

## Minimum working example

```python
from jina import Executor, requests, Deployment
from docarray import DocList
from docarray.documents import TextDoc

class MyExecutor(Executor):
    @requests
    def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
        for d in docs:
            d.text = 'hello world'
        return docs

with Deployment(uses=MyExecutor) as dep:
    response_docs = dep.post(on='/', inputs=DocList[TextDoc]([TextDoc(text='hello')]), return_type=DocList[TextDoc])
    print(f'Text: {response_docs[0].text}')

```

```text
─────────────────────── 🎉 Deployment is ready to serve! ───────────────────────
╭────────────── 🔗 Endpoint ───────────────╮
│  ⛓     Protocol                    GRPC │
│  🏠       Local           0.0.0.0:55581  │
│  🔒     Private       192.168.0.5:55581  │
│  🌍      Public    158.181.77.236:55581  │
╰──────────────────────────────────────────╯
Text: hello world

```

```{toctree}
:hidden:

basics
create
add-endpoints
serve
dynamic-batching
health-check
hot-reload
file-structure
containerize
instrumentation
yaml-spec

```

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/serving/executor/instrumentation.md

(instrumenting-executor)=

# Instrumentation

Instrumentation consists of [OpenTelemetry](https://opentelemetry.io) Tracing and Metrics. Each feature can be enabled independently, and they allow you to collect request-level and application-level metrics for analyzing an Executor's real-time behavior.

```{admonition} Full details on Instrumentation
:class: seealso
This section describes **custom** tracing spans. To use the Executor's default tracing, refer to {ref}`Flow Instrumentation <instrumenting-flow>`.

```

```{hint}
Read more on setting up an OpenTelemetry collector backend in the {ref}`OpenTelemetry Setup <opentelemetry>` section.

```

```{caution}
Prometheus-only based metrics collection will soon be deprecated. Refer to {ref}`Monitoring Executor <monitoring>` for this deprecated setup.

```

## Tracing

Any method that uses the {class}`~jina.requests` decorator adds a
default tracing span for the defined operation. In addition, the operation span context
is propagated to the method for creating further user-defined child spans within the
method.

You can create custom spans to observe the operation's individual steps or record details and attributes with finer granularity. When tracing is enabled, Jina-serve provides the OpenTelemetry Tracer implementation as an Executor class attribute that you can use to create new child spans. The `tracing_context` method argument contains the parent span context using which a new span can be created to trace the desired operation in the method.

If tracing is enabled, each Executor exports its traces to the configured exporter host via the [Span Exporter](https://opentelemetry.io/docs/reference/specification/trace/sdk/#span-exporter). The backend combines these traces for visualization and alerting.

### Create custom traces

A `request` method is the public method that exposes an operation as an API. Depending on complexity, the method can be composed of different sub-operations that are required to build the final response.

You can record/observe each internal step (along with its global or request-specific attributes) to give a finer-grained view of the operation at the request level. This helps identify bottlenecks and isolate request patterns that cause service degradation or errors.

You can use the `self.tracer` class attribute to create a new child span using the `tracing_context` method argument:

```python
from jina import Executor, requests
from docarray import DocList
from docarray.documents import TextDoc

class MyExecutor(Executor):
    @requests
    def foo(self, docs: DocList[TextDoc], tracing_context, **kwargs) -> DocList[TextDoc]:
        with self.tracer.start_as_current_span(
            'process_docs', context=tracing_context
        ) as process_span:
            process_span.set_attribute('sampling_rate', 0.01)
            docs = process(docs)
            with self.tracer.start_as_current_span('update_docs') as update_span:
                try:
                    update_span.set_attribute('len_updated_docs', len(docs))
                    docs = update(docs)
                except Exception as ex:
                    update_span.set_status(Status(StatusCode.ERROR))
                    update_span.record_exception(ex)
        return docs

```

The above pieces of instrumentation generate three spans:
 1. Default span with name `foo` for the overall method.
 2. `process_span` that measures the `process` and `update` sub-operations along with a `sampling_rate` attribute that is either a constant or specific to the request/operation.
 3. `update_span` that measures the `updated` operation along with any exceptions that might arise during the operation. The exception is recorded and marked on the `update_span` span. Since the exception is swallowed, the request succeeds with successful parent spans.

```{admonition}
The Python OpenTelemetry API provides a global tracer via the `opentelemetry.trace.tracer()` method which is not set or used directly in Jina-serve. The class attribute `self.tracer` is used for the default `@requests` method tracing and must also be used as much as possible within the method for creating child spans.

However within a span context, the `opentelemetry.trace.get_current_span()` method returns the span created inside the context.

```

````{admonition} Respect OpenTelemetry Tracing semantic conventions
:class: caution
You should respect OpenTelemetry Tracing [semantic conventions](https://opentelemetry.io/docs/reference/specification/trace/semantic_conventions/).

````

````{hint}
If tracing is not enabled by default or enabled in your environment, check `self.tracer` exists before usage. If metrics are disabled then `self.tracer` will be `None`.

````

## Metrics

```{hint}
Prometheus-only based metrics collection will be deprecated soon. Refer to {ref}`Monitoring Executor <monitoring>` section for the deprecated setup.

```

Any method that uses the {class}`~jina.requests` decorator is monitored and creates a
[histogram](https://opentelemetry.io/docs/reference/specification/metrics/data-model/#histogram) which tracks the method's execution time.

This section documents adding custom monitoring to the {class}`~jina.Executor` with the OpenTelemetry Metrics API.

Custom metrics are useful to monitor each sub-part of your Executor(s). Jina lets you leverage
the [Meter](https://opentelemetry.io/docs/reference/specification/metrics/api/#meter) to define useful metrics
for each of your Executors. We also provide a convenient wrapper, ({func}`~jina.monitor`), which lets you monitor
your Executor's sub-methods.

When metrics are enabled, each Executor exposes its
own metrics via the [Metric Exporter](https://opentelemetry.io/docs/reference/specification/metrics/sdk/#metricexporter).

### Define custom metrics

Sometimes monitoring the `encoding` method is not enough - you need to break it up into multiple parts to monitor one by one.

This is useful if your encoding phase is composed of two tasks, like image processing and
image embedding. By using custom metrics on these two tasks you can identify potential bottlenecks.

Overall, adding custom metrics gives you full flexibility when monitoring your Executor.

#### Use context manager

Use `self.monitor` to monitor your function's internal blocks:

```python
from jina import Executor, requests
from docarray import DocList
from docarray.documents import TextDoc

class MyExecutor(Executor):
    @requests
    def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
        with self.monitor('processing_seconds', 'Time processing my document'):
            docs = process(docs)
        print(docs.texts)
        with self.monitor('update_seconds', 'Time updates my document'):
            docs = update(docs)
        return docs

```

#### Use the `@monitor` decorator

Add custom monitoring to a method with the {func}`~jina.monitor` decorator:

```python
from jina import Executor, monitor

class MyExecutor(Executor):
    @monitor()
    def my_method(self):
        ...

```

This creates a [Histogram](https://opentelemetry.io/docs/reference/specification/metrics/data-model/#histogram) `jina_my_method_seconds` which tracks the execution time of `my_method`

By default, the name and documentation of the metric created by {func}`~jina.monitor` are auto-generated based on the function's name.
To set a custom name:

```python
@monitor(
    name='my_custom_metrics_seconds', documentation='This is my custom documentation'
)
def method(self):
    ...

```

````{admonition} respect OpenTelemetry Metrics semantic conventions
:class: caution
You should respect OpenTelemetry Metrics [semantic conventions](https://opentelemetry.io/docs/reference/specification/metrics/semantic_conventions/).

````

#### Use OpenTelemetry Meter

Under the hood, Python [OpenTelemetry Metrics API](https://opentelemetry.io/docs/concepts/signals/metrics/) handles the Executor's metrics feature. The {func}`~jina.monitor` decorator is convenient for monitoring an Executor's sub-methods, but if you need more flexibility, use the `self.meter` Executor class attribute to create supported instruments:

```python
from jina import requests, Executor
from docarray import DocList
from docarray.documents import TextDoc

class MyExecutor(Executor):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.counter = self.meter.create_counter('my_count', 'my count')

    @requests
    def encode(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
        self.counter.inc(len(docs))

```

This creates a [Counter](https://opentelemetry.io/docs/reference/specification/metrics/api/#counter) that you can use to incrementally track the number of Documents received in each request.

````{hint}
If metrics are not enabled by default or enabled in your environment, you should check `self.meter` and `self.counter` exists before usage. If metrics are disabled then `self.meter` will be `None`.

````

#### Example

```python
from jina import requests, Executor
from docarray import DocList
from docarray.documents.legacy import LegacyDocument

class MyExecutor(Executor):
    def preprocessing(self, docs: DocList[LegacyDocument]):
        ...

    def model_inference(self, tensor):
        ...

    @requests
    def encode(self, docs: DocList[LegacyDocument], **kwargs) -> DocList[LegacyDocument]:
        docs.tensors = self.preprocessing(docs)
        docs.embedding = self.model_inference(docs.tensors)

```

The `encode` function is composed of two sub-functions.

* `preprocessing` takes raw bytes from a DocList and puts them into a PyTorch tensor.
* `model inference` calls the forward function of a deep learning model.

By default, only the `encode` function is monitored:

````{tab} Decorator

```{code-block} python

---
emphasize-lines: 7, 11
---
from jina import Executor, requests, monitor
from docarray import DocList
from docarray.documents.legacy import LegacyDocument

class MyExecutor(Executor):

    @monitor()
    def preprocessing(self, docs: DocList[LegacyDocument]):
        ...

    @monitor()
    def model_inference(self, tensor):
        ...

    @requests
    def encode(self, docs: DocList[LegacyDocument], **kwargs) -> DocList[LegacyDocument]:
        docs.tensors = self.preprocessing(docs)
        docs.embedding = self.model_inference(docs.tensors)

```

````

````{tab} Context manager

```{code-block} python

---
emphasize-lines: 13, 15
---
from jina import Executor, requests
from docarray import DocList
from docarray.documents.legacy import LegacyDocument

def preprocessing(self, docs: DocList[LegacyDocument]):
    ...

def model_inference(self, tensor):
    ...

class MyExecutor(Executor):

    @requests
    def encode(self, docs: DocList[LegacyDocument], **kwargs) -> DocList[LegacyDocument]:
        with self.monitor('preprocessing_seconds', 'Time preprocessing the requests'):
            docs.tensors = preprocessing(docs)
        with self.monitor('model_inference_seconds', 'Time doing inference the requests'):
            docs.embedding = model_inference(docs.tensors)

```

````

## See also

* {ref}`List of available metrics <instrumenting-flow>`
* {ref}`How to deploy and use OpenTelemetry in Jina-serve <opentelemetry>`
* [Tracing in OpenTelemetry](https://opentelemetry.io/docs/concepts/signals/traces/)
* [Metrics in OpenTelemetry](https://opentelemetry.io/docs/concepts/signals/metrics/)

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/serving/executor/serve.md

(serve-executor-standalone)=

# Serve

{class}`~jina.Executor`s can be served and accessed over the network using gRPC or HTTP protocols, allowing you to use them to create services for tasks like model inference, data processing, generative AI, and search services.

There are different options for deploying and running a standalone Executor:

* Run the Executor directly from Python with the {class}`~jina.orchestrate.deployments.Deployment` class
* Run the {meth}`~jina.Deployment.to_kubernetes_yaml()` method to generate Kubernetes deployment configuration files

from an instance of {class}`~jina.orchestrate.deployments.Deployment`

* Run the static {meth}`~jina.serve.executors.BaseExecutor.to_docker_compose_yaml()` method to generate a Docker Compose service file

```{seealso}
Executors can also be combined to form a pipeline of microservices. We will see in a later step how
to achieve this with the {ref}`Flow <flow-cookbook>`

```

````{admonition} Served vs. shared Executor
:class: hint

In Jina there are two ways of running standalone Executors: *Served Executors* and *shared Executors*.

* A **served Executor** is launched by one of the following methods: {class}`~jina.orchestrate.deployments.Deployment`, `to_kubernetes_yaml()`, or `to_docker_compose_yaml()`.

It resides behind a {ref}`Gateway <architecture-overview>` and can be directly accessed by a {ref}`Client <client>`.
It can also be used as part of a Flow.

* A **shared Executor** is launched using the [Jina CLI](../../cli/index.rst) and does *not* sit behind a Gateway.

It is intended to be used in one or more Flows. However, it can be also accessed by a {ref}`Client <client>`.
Because a shared Executor does not reside behind a Gateway, it requires fewer networking hops when used inside of a Flow.
However, it is not suitable for exposing a standalone service without gRPC protocol.

In any case, the user needs to make sure that the Document types bound to each endpoint are compatible inside a Flow.

````

(deployment)=

## Serve directly

An {class}`~jina.Executor` can be served using the {class}`~jina.orchestrate.deployments.Deployment` class.

The {class}`~jina.orchestrate.deployments.Deployment` class aims to separate the deployment configuration from the serving logic.
In other words:

* the Executor cares about defining the logic to serve, which endpoints to define and what data to accept.
* the Deployment layer cares about how to orchestrate this service, how many replicas or shards, etc.

This separation also aims to enhance the reusability of Executors: the same implementation of an Executor can be
served in multiple ways/configurations using Deployment.

````{tab} Python class

```python

from docarray import DocList
from docarray.documents import TextDoc
from jina import Executor, requests, Deployment

class MyExec(Executor):
    @requests
    def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
        docs[0].text = 'executed MyExec'  # custom logic goes here

with Deployment(uses=MyExec, port=12345, replicas=2) as dep:
    docs = dep.post(on='/foo', inputs=DocList[TextDoc](TextDoc()), return_type=DocList[TextDoc])
    print(docs.text)

```

````

````{tab} YAML configuration
`executor.yaml`:

```

jtype: MyExec
py_modules:

  * executor.py

```

```python

from jina import Deployment

with Deployment(uses='executor.yaml', port=12345, replicas=2) as dep:
    docs = dep.post(on='/foo', inputs=DocList[TextDoc](TextDoc()), return_type=DocList[TextDoc])
    print(docs.text)

```

````

````{tab} Hub Executor

```python

from jina import Deployment

with Deployment(uses='jinaai://my-username/MyExec/', port=12345, replicas=2) as dep:
    docs = dep.post(on='/foo', inputs=DocList[TextDoc](TextDoc()), return_type=DocList[TextDoc])
    print(docs.text)

```

````

````{tab} Docker image

```python

from jina import Deployment

with Deployment(uses='docker://my-executor-image', port=12345, replicas=2) as dep:
    docs = dep.post(on='/foo', inputs=DocList[TextDoc](TextDoc()), return_type=DocList[TextDoc])
    print(docs.text)

```

````

```text
─────────────────────── 🎉 Deployment is ready to serve! ───────────────────────
╭────────────── 🔗 Endpoint ────────────────╮
│  ⛓     Protocol                    GRPC  │
│  🏠       Local           0.0.0.0:12345   │
│  🔒     Private     192.168.3.147:12345   │
│  🌍      Public    87.191.159.105:12345   │
╰───────────────────────────────────────────╯
['executed MyExec']

```

````{hint}
You can use `dep.block()` to serve forever:

```python

with Deployment(uses=MyExec, port=12345, replicas=2) as dep:
    dep.block()

```

````

## Serve from the CLI

You can run an Executor from CLI. In this case, the Executor occupies one process. The lifetime of the Executor is the lifetime of the process.

### From a local Executor python class

```shell
jina executor --uses MyExec --py-modules executor.py

```

### From a local Executor YAML configuration

With `executor.py` containing the definition of `MyExec`, now creating a new file called `my-exec.yml`:

```yaml
jtype: MyExec
py_modules:

* executor.py

```

This simply points Jina-serve to our file and Executor class. Now we can run the command:

```bash
jina executor --uses my-exec.yml --port 12345

```

### From Executor Hub

In this example, we use [`CLIPTextEncoder`](https://cloud.jina.ai/executor/livtkbkg) to create embeddings for our Documents.

````{tab} With Docker

```bash

jina executor --uses jinaai+docker://jina-ai/CLIPTextEncoder

```

````

````{tab} Without Docker

```bash

jina executor --uses jinaai://jina-ai/CLIPTextEncoder

```

````

This might take a few seconds, but in the end you should be greeted with the
following message:

```bash
WorkerRuntime@ 1[L]: Executor CLIPTextEncoder started

```

Just like that, our Executor is up and running.

(kubernetes-executor)=

## Serve from Deployment YAML

If you want a clear separation between deployment configuration and Executor logic, you can define the
configuration in a `Deployment` YAML configuration.
This is an example `deployment.yml` config file:

```yaml
jtype: Deployment
with:
  replicas: 2
  shards: 3
  uses: MyExecutor
  py_modules:

  * my_executor.py

```

Then, you can run the Deployment through the CLI or Python API:

````{tab} Python API

```python

from jina import Deployment

with Deployment.load_config('deployment.yml') as dep:
    dep.block()

```

````

````{tab} CLI

```shell

jina deployment --uses deployment.yml

```text

````

```text
─────────────────────── 🎉 Deployment is ready to serve! ───────────────────────
╭────────────── 🔗 Endpoint ────────────────╮
│  ⛓     Protocol                    GRPC  │
│  🏠       Local           0.0.0.0:12345   │
│  🔒     Private     192.168.3.147:12345   │
│  🌍      Public    87.191.159.105:12345   │
╰───────────────────────────────────────────╯

```

Read more about the {ref}`YAML specifications of Deployments <deployment-yaml-spec>`.

## Serve via Kubernetes

You can generate Kubernetes configuration files for your containerized Executor by using the {meth}`~jina.Deployment.to_kubernetes_yaml()` method:

```python
from jina import Deployment

dep = Deployment(
    uses='jinaai+docker://jina-ai/DummyHubExecutor', port_expose=8080, replicas=3
)
dep.to_kubernetes_yaml('/tmp/config_out_folder', k8s_namespace='my-namespace')

```

This will give the following output:

```text
INFO   executor@8065 K8s yaml files have been created under  [02/07/23 10:03:50]
       [b]/tmp/config_out_folder[/]. You can use it by
       running [b]kubectl apply -R -f
       /tmp/config_out_folder[/]

```

Afterwards, you can apply this configuration to your cluster:

```shell
kubectl apply -R -f /tmp/config_out_folder

```

The above example deploys the `DummyHubExecutor` from Executor Hub into your Kubernetes cluster.

````{admonition} Hint
:class: hint
The Executor you use needs to be already containerized and stored in a registry accessible from your Kubernetes cluster. We recommend [Executor Hub](https://cloud.jina.ai/executors) for this.

````

Once the Executor is deployed, you can expose a service:

```bash
kubectl expose deployment executor --name=executor-exposed --type LoadBalancer --port 80 --target-port 8080 -n my-namespace
sleep 60 # wait until the external ip is configured

```

Let's export the external IP address created and use it to send requests to the Executor.

```bash
export EXTERNAL_IP=`kubectl get service executor-exposed -n my-namespace -o=jsonpath='{.status.loadBalancer.ingress[0].ip}'`

```

Then, we can send requests using {meth}`~jina.Client`. Since Kubernetes load balancers cannot load balance streaming
gRPC requests, it is recommended to set `stream=False` when using gRPC (note that this is only applicable for Kubernetes deployments of Executors):

```python
import os
from jina import Client
from docarray import DocList
from docarray.documents import TextDoc

host = os.environ['EXTERNAL_IP']
port = 80

client = Client(host=host, port=port)

print(client.post(on='/', inputs=TextDoc(), return_type=DocList[TextDoc], stream=False).text)

```

```text
['hello']

```

````{admonition} Hint
:class: hint
You can also export an Executor deployment to kubernetes YAML files using the CLI command, in case you define a Deployment YAML config:
`jina export kubernetes deployment.yml output_path`

````

(external-shared-executor)=

### External and shared Executors

This type of standalone Executor can be either *external* or *shared*. By default, it is external.

* An external Executor is deployed alongside a {ref}`Gateway <architecture-overview>`.
* A shared Executor has no Gateway.

Although both types can join a {class}`~jina.Flow`, use a shared Executor if the Executor is only intended to join Flows
to have less network hops and save the costs of running the Gateway in Kubernetes.

## Serve via Docker Compose

You can generate a Docker Compose service file for your containerized Executor with the static {meth}`~jina.Deployment.to_docker_compose_yaml` method.

```python
from jina import Deployment

dep = Deployment(
    uses='jinaai+docker://jina-ai/DummyHubExecutor', port_expose=8080, replicas=3
)

dep.to_docker_compose_yaml(
    output_path='/tmp/docker-compose.yml',
)

```

```shell
docker-compose -f /tmp/docker-compose.yml up

```

The above example runs the `DummyHubExecutor` from Executor Hub locally on your computer using Docker Compose.

````{admonition} Hint
:class: hint
The Executor you use needs to be already containerized and stored in an accessible registry. We recommend [Executor Hub](https://cloud.jina.ai/executors) for this.

````

````{admonition} Hint
:class: hint
You can also export an Executor deployment to Docker compose YAML files using the CLI command, in case you define a Deployment YAML config:
`jina export docker-compose deployment.yml output_path`

````

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/serving/executor/yaml-spec.md

(executor-yaml-spec)=

# {octicon}`file-code` YAML specification

This page outlines the Executor YAML file specification. These configurations can be used in a {class}`~jina.Deployment` with `Deployment(uses='exec.yml)`, in a {class}`~jina.Flow` with `Flow().add(uses='exec.yml)` or loaded directly via `Executor.load_config('exec.yml')`.

Note that Executor YAML configuration always refers back to an Executor defined in a Python file.

## Example

The following is an example {class}`~jina.Executor` configuration:

```yaml
jtype: MyExecutor
with:
  match_args: {}
py_modules:

* executor.py

metas:
  name: Indexer
  description: Indexes all Documents
  url: https://github.com/janedoe/indexer
  keywords: ['indexer', 'executor']

```

## Keywords

### `jtype`

String specifying the Executor's Python type. Used to locate the correct class in the Python files given by `py_modules`.

(executor-with-keyword)=

### `with`

Collection containing keyword arguments passed to the Executor's `__init__()` method. Valid values depend on the Executor.

### `py_modules`

List of strings defining the Executor's Python dependencies. Most notably this must include the Python file containing the Executor definition itself, as well as any other files it imports.

### `metas`

Collection containing meta-information about the Executor.

Your Executor is annotated with this information when publishing to {ref}`Executor Hub <jina-hub>`. To get better appeal on Executor Hub, set the `metas` fields to the correct values:

* **`name`**: Human-readable name of the Executor.
* **`description`**: Human-readable description of the Executor.
* **`url`**: URL of where to find more information about the Executor, normally a GitHub repo URL.
* **`keywords`**: A list of strings to help users filter and locate your package.

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/serving/gateway/customization.md

(custom-gateway)=

# Customization

Gateways are customizable in Jina-serve. You can implement them in much the same way as an Executor.
With customized Gateways, Jina-serve gives you more power by letting you implement any server, protocol and
interface at the Gateway level. This means you have more freedom to:

* Define and expose your own API Gateway interface to clients. You can define your JSON schema or protos etc.
* Use your favorite server framework.
* Choose the protocol used to serve your app.

The next sections detail the steps to implement and use a custom Gateway.

## Implementing the custom Gateway

Just like for Executors, you can implement a custom Gateway by inheriting from a base `Gateway` class.
Jina-serve will instantiate your implemented class, inject runtime arguments and user-defined arguments into it,
run it, orchestrate it, and send it health-checks.

There are two Gateway base classes for implementing a custom Gateway:

* {class}`~jina.serve.runtimes.gateway.http.fastapi.FastAPIBaseGateway`: Use this abstract class to implement a custom Gateway using FastAPI.
* {class}`~jina.Gateway`: Use this abstract class to implement a custom Gateway of any type.

Whether your custom Gateway is based on a FastAPI app using {class}`~jina.serve.runtimes.gateway.http.fastapi.FastAPIBaseGateway`
or based on a general server using {class}`~jina.Gateway`, you will need to implement your server behavior in almost
the same way.
In the next section we will discuss the implementation steps, and then we will discuss how to use both base Gateway classes.

(custom-gateway-server-implementation)=

### Server implementation

When implementing the server to your custom Gateway:

1. Create an app/server and define the endpoints you want to expose as a service.
2. In each of your endpoints' implementation, convert server requests to your endpoint into `Document` objects.
3. Send `Documents` to Executors in the Flow using {ref}`a GatewayStreamer object <gateway-streamer>`. This lets you
use Executors as a service and receive response Documents back.
4. Convert response `Documents` to a server response and return it.
5. Implement  {ref}`the required health-checks <custom-gateway-health-check>` for the Gateway.
(This is not required when using {class}`~jina.serve.runtimes.gateway.http.fastapi.FastAPIBaseGateway`.)
6. Bind your Gateway server to {ref}`parameters injected by the runtime <gateway-runtime-arguments>`, i.e, `self.port`, `self.host`,...
(Also not required for {class}`~jina.serve.runtimes.gateway.http.fastapi.FastAPIBaseGateway`.)

Let's suppose you want to implement a '/service' GET endpoint in an HTTP server. Following the steps above, the server
implementation might look like the following:

```python
from fastapi import FastAPI
from uvicorn import Server, Config
from jina import Gateway
from docarray import DocList
from docarray.documents import TextDoc

class MyGateway(Gateway):
    async def setup_server(self):
        # step 1: create an app and define the service endpoint
        app = FastAPI(title='Custom Gateway')

        @app.get(path='/service')
        async def my_service(input: str):
            # step 2: convert input request to Documents
            docs = DocList[TextDoc]([TextDoc(text=input)])

            # step 3: send Documents to Executors using GatewayStreamer
            result = None
            async for response_docs in self.streamer.stream_docs(
                docs=docs,
                exec_endpoint='/',
                return_type=DocList[TextDoc]
            ):
                # step 4: convert response docs to server response and return it
                result = response_docs[0].text

            return {'result': result}

        # step 5: implement health-check

        @app.get(path='/')
        def health_check():
            return {}

        # step 6: bind the gateway server to the right port and host
        self.server = Server(Config(app, host=self.host, port=self.port))

```

### Subclass from {class}`~jina.serve.runtimes.gateway.http.fastapi.FastAPIBaseGateway`

{class}`~jina.serve.runtimes.gateway.http.fastapi.FastAPIBaseGateway` offers a simple API to implement custom
Gateways, but is restricted to FastAPI apps.

To implement a custom gateway using {class}`~jina.serve.runtimes.gateway.http.fastapi.FastAPIBaseGateway`,
simply implement the {meth}`~jina.serve.runtimes.gateway.http.fastapi.FastAPIBaseGateway.app` property:

```python
from jina.serve.runtimes.gateway.http.fastapi import FastAPIBaseGateway

class MyGateway(FastAPIBaseGateway):
    @property
    def app(self):
        from fastapi import FastAPI

        app = FastAPI(title='Custom FastAPI Gateway')

        @app.get(path='/endpoint')
        def custom_endpoint():
            return {'message': 'custom-fastapi-gateway'}

        return app

```

As an example, you can refer to {class}`~jina.serve.runtimes.gateway.http.HTTPGateway`.

{class}`~jina.serve.runtimes.gateway.http.fastapi.FastAPIBaseGateway` is a subclass of {class}`~jina.Gateway`.

### Subclass from {class}`~jina.Gateway`

{class}`~jina.Gateway` allows implementing more general cases of Gateways. You can use this class as long as your gateway
server is runnable in an `asyncio` loop.

To implement a custom gateway class using {class}`~jina.Gateway`:

* Create a class that inherits from {class}`~jina.Gateway`
* Implement a constructor `__init__`:

(This is optional. You don't need a constructor if your Gateway does not need user-defined attributes.)
If your Gateway has `__init__`, it needs to carry `**kwargs` in the signature and call `super().__init__(**kwargs)` in the body:

```python
from jina import Gateway

class MyGateway(Gateway):
    def __init__(self, foo: str, **kwargs):
        super().__init__(**kwargs)
        self.foo = foo

```

* Implement `async def setup_server():`. This should set up a server runnable on an asyncio loop (and other resources

needed for setting up the server). For instance:

```python
from jina import Gateway
from fastapi import FastAPI
from uvicorn import Server, Config

class MyGateway(Gateway):
    async def setup_server(self):
        app = FastAPI(title='My Custom Gateway')

        @app.get(path='/endpoint')
        def custom_endpoint():
            return {'message': 'custom-gateway'}

        self.server = Server(Config(app, host=self.host, port=self.port))

```

Please refer to {ref}`the Server Implementation section<custom-gateway-server-implementation>` for details on how to implement
the server.

* Implement `async def run_server():`. This should run the server and `await` for it while serving:

```python
from jina import Gateway

class MyGateway(Gateway):
    ...

    async def run_server(self):
        await self.server.serve()

```

* Implement `async def shutdown():`. This should stop the server and free all resources associated with it:

```python
from jina import Gateway

class MyGateway(Gateway):
    ...

    async def shutdown(self):
        self.server.should_exit = True
        await self.server.shutdown()

```

As an example, you can refer to {class}`~jina.serve.runtimes.gateway.grpc.GRPCGateway` and
{class}`~jina.serve.runtimes.gateway.websocket.WebSocketGateway`.

(gateway-streamer)=

## Calling Executors with {class}`~jina.serve.streamer.GatewayStreamer`

{class}`~jina.serve.streamer.GatewayStreamer` allows you to interface with Executors within the Gateway. An instance of
this class knows about the Flow topology and where each Executor lives.

Use this object to send Documents to Executors in the Flow. A {class}`~jina.serve.streamer.GatewayStreamer` object
connects the custom Gateway with the rest of the Flow.

You can get this object in 2 different ways:

* A `streamer` object (instance of {class}`~jina.serve.streamer.GatewayStreamer`) is injected by Jina-serve to your `Gateway` class.
* If your server logic cannot access the `Gateway` class (for instance separate script), you can still get a `streamer`

object using {meth}`~jina.serve.streamer.GatewayStreamer.get_streamer()`:

```python
from jina.serve.streamer import GatewayStreamer

streamer = GatewayStreamer.get_streamer()

```

After transforming requests that arrive to the Gateway server into Documents, you can send them to Executors in the Flow
using {meth}`~jina.serve.streamer.GatewayStreamer.stream_docs()`.

This method expects a DocList object and an endpoint exposed by the Flow Executors (similar to {ref}`Jina Client <client>`).
It returns an `AsyncGenerator` of DocLists:

```{code-block} python
---
emphasize-lines: 15, 16, 17, 18, 19, 20
---
from jina.serve.runtimes.gateway.http.fastapi import FastAPIBaseGateway
from docarray import DocList
from docarray.documents import TextDoc
from fastapi import FastAPI

class MyGateway(FastAPIBaseGateway):
    @property
    def app(self):
        app = FastAPI()

        @app.get("/endpoint")
        async def get(text: str):
            result = None
            async for docs in self.streamer.stream_docs(
                docs=DocList[TextDoc]([TextDoc(text=text)]),
                exec_endpoint='/',
                return_type=DocList[TextDoc],
            ):
                result = docs[0].text
            return {'result': result}

        return app

```

```{hint}
:class: note
if you omit the `return_type` parameter, the gateway streamer can still fetch the Executor output schemas and dynamically construct a DocArray model for it.
Even though the dynamically created schema is very similar to original schema, some validation checks can still fail (for instance adding to a typed `DocList`).
It is recommended to always pass the `return_type` parameter

```

### Recovering Executor errors

Exceptions raised by an `Executor` are captured in the server object which can be extracted by using the {meth}`jina.serve.streamer.stream()` method. The `stream` method
returns an `AsyncGenerator` of a tuple of `DocList` and an optional {class}`jina.excepts.ExecutorError` class that be used to check if the `Executor` has issues processing the input request.
The error can be utilized for retries, handling partial responses or returning default responses.

```{code-block} python
---
emphasize-lines: 5, 6, 7, 8, 9, 10, 11, 12
---
@app.get("/endpoint")
async def get(text: str):
    results = []
    errors = []
    async for for docs, error in self.streamer.stream(
        docs=DocList[TextDoc]([TextDoc(text=text)]),
        exec_endpoint='/',
        return_type=DocList[TextDoc],
    ):
        if error:
            errors.append(error)
        else:
            results.append(docs[0].text)
    return {'results': results, 'errors': [error.name for error in errors]}

```

```{hint}
:class: note
if you omit the `return_type` parameter, the gateway streamer can still fetch the Executor output schemas and dynamically construct a DocArray model for it.
Even though the dynamically created schema is very similar to original schema, some validation checks can still fail (for instance adding to a typed `DocList`).
It is recommended to always pass the `return_type` parameter

```

(executor-streamer)=

## Calling an individual Executor

Jina-serve injects an `executor` object into your Gateway class which lets you call individual Executors from the Gateway.

After transforming requests that arrive to the Gateway server into Documents, you can call the Executor in your Python code using `self.executor['executor_name'].post(args)`.
This method expects a DocList object and an endpoint exposed by the Executor (similar to {ref}`Jina Client <client>`).
It returns a 'coroutine' which returns a DocList.
Check the method documentation for more information: {meth}`~ jina.serve.streamer._ExecutorStreamer.post()`

In this example, we have a Flow with two Executors (`executor1` and `executor2`). We can call them individually using `self.executor['executor_name'].post`:

```{code-block} python
---
emphasize-lines: 16,17,41
---
from jina.serve.runtimes.gateway.http.fastapi import FastAPIBaseGateway
from jina import Flow, Executor, requests
from docarray import DocList
from docarray.documents import TextDoc
from fastapi import FastAPI
import time

class MyGateway(FastAPIBaseGateway):
    @property
    def app(self):
        app = FastAPI()

        @app.get("/endpoint")
        async def get(text: str):
            toc = time.time()
            docs1 = await self.executor['executor1'].post(on='/', inputs=DocList[TextDoc]([TextDoc(text=text)]), parameters={'k': 'v'}, return_type=DocList[TextDoc])
            docs2 = await self.executor['executor2'].post(on='/', inputs=DocList[TextDoc]([TextDoc(text=text)]), parameters={'k': 'v'}, return_type=DocList[TextDoc])
            return {'result': docs1.text + docs2.text, 'time_taken': time.time() - toc}

        return app

class FirstExec(Executor):
    @requests
    def func(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
        time.sleep(2)
        for doc in docs:
            doc.text += ' saw the first executor'

class SecondExec(Executor):
    @requests
    def func(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
        time.sleep(2)
        for doc in docs:
            doc.text += ' saw the second executor'

with Flow().config_gateway(uses=MyGateway, protocol='http').add(uses=FirstExec, name='executor1').add(uses=SecondExec, name='executor2') as flow:
    import requests as reqlib
    r = reqlib.get(f"http://localhost:{flow.port}/endpoint?text=hello")
    print(r.json())
    assert r.json()['result'] == ['hello saw the first executor', 'hello saw the second executor']
    assert r.json()['time_taken'] > 4

```

You can also call two Executors in parallel using asyncio. This will overlap their execution times -- speeding up the response time of the endpoint.
Here is one way to do it:

```{code-block} python
---
emphasize-lines: 17,18,19,43
---
from jina.serve.runtimes.gateway.http.fastapi import FastAPIBaseGateway
from jina import Flow, Executor, requests
from docarray import DocList
from docarray.documents import TextDoc
from fastapi import FastAPI
import time
import asyncio

class MyGateway(FastAPIBaseGateway):
    @property
    def app(self):
        app = FastAPI()

        @app.get("/endpoint")
        async def get(text: str):
            toc = time.time()
            call1 = self.executor['executor1'].post(on='/', inputs=DocList[TextDoc]([TextDoc(text=text)]), parameters={'k': 'v'}, return_type=DocList[TextDoc])
            call2 = self.executor['executor2'].post(on='/', inputs=DocList[TextDoc]([TextDoc(text=text)]), parameters={'k': 'v'}, return_type=DocList[TextDoc])
            docs1, docs2 = await asyncio.gather(call1, call2)
            return {'result': docs1.text + docs2.text, 'time_taken': time.time() - toc}

        return app

class FirstExec(Executor):
    @requests
    def func(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
        time.sleep(2)
        for doc in docs:
            doc.text += ' saw the first executor'

class SecondExec(Executor):
    @requests
    def func(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
        time.sleep(2)
        for doc in docs:
            doc.text += ' saw the second executor'

with Flow().config_gateway(uses=MyGateway, protocol='http').add(uses=FirstExec, name='executor1').add(uses=SecondExec, name='executor2') as flow:
    import requests as reqlib
    r = reqlib.get(f"http://localhost:{flow.port}/endpoint?text=hello")
    print(r.json())
    assert r.json()['result'] == ['hello saw the first executor', 'hello saw the second executor']
    assert r.json()['time_taken'] < 2.5

```

## Gateway arguments

(gateway-runtime-arguments)=

### Runtime attributes

Jina-serve injects runtime attributes into the Gateway classes. You can use them to set up your custom gateway:

* `logger`: Jina-serve logger object.
* `streamer`: {class}`~jina.serve.streamer.GatewayStreamer`. Use this object to send Documents from the Gateway to Executors. Refer to {ref}`this section <gateway-streamer>` for more information.
* `runtime_args`: `argparse.Namespace` object containing runtime arguments.
* `port`: main port exposed by the Gateway.
* `ports`: list of all ports exposed by the Gateway.
* `protocols`: list of all protocols supported by the Gateway.
* `host`: Host address to which the Gateway server should be bound.

Use these attributes to implement your Gateway logic. For instance, binding the server to the runtime provided `port` and
`host`:

```{code-block} python
---
emphasize-lines: 7
---
from jina import Gateway

class MyGateway(Gateway):
    ...
    async def setup_server(self):
        ...
        self.server = Server(Config(app, host=self.host, port=self.port))

```

```{admonition} Note
:class: note

Jina provides the Gateway with a list of ports and protocols to expose. Therefore, a custom Gateway can handle requests
on multiple ports using different protocols.

```

(user-defined-arguments)=

### User-defined parameters

You can also set other parameters by implementing a custom constructor `__init__`.You can override constructor
parameters in the Flow Python API (using `uses_with` parameter) or in the YAML configuration when including the Gateway.
Refer to the {ref}`Use Custom Gateway section <use-custom-gateway>` for more information.

(custom-gateway-health-check)=

## Required health-checks

Jina-serve relies on health-checks to determine the health of the Gateway. In environments like Kubernetes,
Docker Compose and Jina-serve Cloud, this information is crucial for orchestrating the Gateway.
Since you have full control over your custom gateways, you are always responsible for implementing health-check endpoints:

* If the protocol used is gRPC, a health servicer (for instance `health.aio.HealthServicer()`) from `grpcio-health-checking`

is expected to be added to the gRPC server. Refer to {class}`~jina.serve.runtimes.gateway.grpc.gateway.GRPCGateway` as
an example.

* Otherwise, an HTTP GET request to the root path is expected to return a `200` status code.

To test whether your server properly implements health-checks, you can use the command `jina ping <protocol>://host:port`

```{admonition} Important
:class: important

Although a Jina Gateway can expose multiple ports and protocols, the runtime only cares about the first exposed port
and protocol. Health checks will be sent only to the first port.

```

## Gateway YAML file

Like Executor `config` files, a custom Gateway implementation can be associated with a YAML configuration file.
Such a configuration can override user-defined parameters and define other runtime arguments (`port`, `protocol`, `py_modules`, etc).

You can define such a configuration in `config.yml`:

```yaml
!MyGateway
py_modules: my_gateway.py
with:
  arg1: hello
  arg2: world
port: 12345

```

For more information, please refer to the {ref}`Gateway YAML Specifications <gateway-yaml-spec>`

## Containerize the Custom Gateway

You may want to dockerize your custom Gateway so you can isolate its dependencies and make it ready to run in the cloud
or Kubernetes.

This assumes that you've already implemented a custom Gateway class and have defined a `config.yml` for it.
In this case, dockerizing the Gateway is straightforward:

* If you need dependencies other than Jina-serve, make sure to add a `requirements.txt` file (for instance, you use a server library).
* Create a `Dockerfile` as follows:

1. Use a [Jina-serve based image](https://hub.docker.com/r/jinaai/jina) with the `standard` tag as the base image in your Dockerfile.
This ensures that everything needed for Jina-serve to run the Gateway is installed. Make sure the Jina-serve version supports
custom Gateways:

```dockerfile
FROM jinaai/jina:latest-py38-standard

```

Alternatively, you can just install jina-serve using `pip`:

```dockerfile
RUN pip install jina

```

2. Install everything from `requirements.txt` if you included it:

```dockerfile
RUN pip install -r requirements.txt

```

3. Copy source code under the `workdir` folder:

```dockerfile
COPY . /workdir/
WORKDIR /workdir

```

4. Use the `jina gateway --uses config.yml` command as your image's entrypoint:

```dockerfile
ENTRYPOINT ["jina", "gateway", "--uses", "config.yml"]

```

Once you finish the `Dockerfile` you should end up with the following file structure:

```bash
├── my_gateway.py
└── requirements.txt
└── config.yml
└── Dockerfile

```

You can now build the Docker image:

```shell
cd my_gateway
docker build -t gateway-image

```

(use-custom-gateway)=

## Use the Custom Gateway

You can include the Custom Gateway in a Jina-serve Flow in different formats: Python class, configuration YAML and Docker image:

### Flow python API

````{tab} Python Class

```python

from jina import Gateway, Flow

class MyGateway(Gateway):
    def __init__(self, arg: str = None, **kwargs):
        super().__init__(**kwargs)
        self.arg = arg

    ...

flow = Flow().config_gateway(
    uses=MyGateway, port=12345, protocol='http', uses_with={'arg': 'value'}
)

```

````

````{tab} YAML configuration

```python

flow = Flow().config_gateway(
    uses='config.yml', port=12345, protocol='http', uses_with={'arg': 'value'}
)

```

````

````{tab} Docker Image

```python

flow = Flow().config_gateway(
    uses='docker://gateway-image',
    port=12345,
    protocol='http',
    uses_with={'arg': 'value'},
)

```

````

### Flow YAML configuration

````{tab} Python Class

```yaml

!Flow
gateway:
  py_modules: my_gateway/my_gateway.py
  uses: MyGateway
  with:
    arg: value
  protocol: http
  port: 12345

```

````

````{tab} YAML configuration

```yaml

!Flow
gateway:
  uses: my_gateway/config.yml
  protocol: http
  port: 12345

```

````

````{tab} Docker Image

```yaml

!Flow
gateway:
  uses: docker://gateway-image
  protocol: http
  port: 12345

```

````

```{admonition} Important
:class: important

When you include a custom Gateway in a Jina Flow, since Jina needs to know about the port and protocol to which
health checks will be sent, it is important to specify them when including the Gateway.

```

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/serving/gateway/customize-http-endpoints.md

# Customize HTTP endpoints

Not every {class}`~jina.Executor` endpoint will automatically be exposed through the external HTTP interface.
By default, any Flow exposes the following CRUD and debug HTTP endpoints: `/status`, `/post`, `/index`, `/search`, `/update`, and `/delete`.

Executors that provide additional endpoints (e.g. `/foo`) will be exposed only after manual configuration.
These custom endpoints can be added to the HTTP interface using `Flow.expose_endpoint`.

```{figure} expose-endpoints.svg
:align: center

```

````{tab} Python

```python

from jina import Executor, requests, Flow

class MyExec(Executor):
    @requests(on='/foo')
    def foo(self, docs, **kwargs):
        pass

f = Flow().config_gateway(protocol='http').add(uses=MyExec)
f.expose_endpoint('/foo', summary='my endpoint')
with f:
    f.block()

```

````

````{tab} YAML
You can enable custom endpoints in a Flow using yaml syntax as well.

```yaml

jtype: Flow
with:
  protocol: http
  expose_endpoints:
    /foo:
      summary: my endpoint

```

````

Now, sending an HTTP data request to the `/foo` endpoint is equivalent to calling `f.post('/foo', ...)` using the Python Client.

You can add more `kwargs` to build richer semantics on your HTTP endpoint. Those meta information will be rendered by Swagger UI and be forwarded to the OpenAPI schema.

````{tab} Python

```python

f.expose_endpoint('/bar', summary='my endpoint', tags=['fine-tuning'], methods=['PUT'])

```

````

````{tab} YAML

```yaml

jtype: Flow
with:
  protocol: http
  expose_endpoints:
    /bar:
      methods: ["PUT"]
      summary: my endpoint
      tags:

    * fine-tuning

```

````

However, if you want to send requests to a different Executor endpoint, you can still do it without exposing it in the HTTP endpoint, by sending an HTTP request to the `/post` HTTP endpoint while setting
`execEndpoint` in the request.

```text
curl --request POST \
'http://localhost:12345/post' \
--header 'Content-Type: application/json' -d '{"data": [{"text": "hello world"}], "execEndpoint": "/foo"}'

```

The above cURL command is equivalent to passing the `on` parameter to `client.post` as follows:

```python
from jina import Client
from docarray import DocList
from docarray.documents import TextDoc

client = Client(port=12345, protocol='http')
client.post(on='/foo', inputs=DocList[TextDoc]([TextDoc(text='hello world')]), return_type=DocList[TextDoc])

```

## Hide default endpoints

It is possible to hide the default CRUD and debug endpoints in production. This might be useful when the context is not applicable.
For example, in the code snippet below, we didn't implement any CRUD endpoints for the executor, hence it does not make sense to expose them to public.

````{tab} Python

```python

from jina import Flow

f = Flow().config_gateway(
    protocol='http', no_debug_endpoints=True, no_crud_endpoints=True
)

```

````

````{tab} YAML

```yaml

jtype: Flow
with:
  protocol: 'http'
  no_debug_endpoints: True,
  no_crud_endpoints: True

```

````

After setting up a Flow in this way, the {ref}`default HTTP endpoints <custom-http>` will not be exposed.

(cors)=

## Enable CORS (cross-origin resource sharing)

To make a Flow accessible from a website with a different domain, you need to enable [Cross-Origin Resource Sharing (CORS)](https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS).
Among other things, CORS is necessary to provide a {ref}`Swagger UI interface <swagger-ui>` for your Flow.

Note that CORS is disabled by default, for security reasons.
To enable CORS, configure your Flow in the following way:

```python
from jina import Flow

f = Flow().config_gateway(cors=True, protocol='http')

```

## Enable GraphQL endpoint

````{admonition} Caution
:class: caution

GraphQL support is an optional feature that requires optional dependencies.
To install these, run `pip install jina-serve[graphql]` or `pip install jina-serve[all]`.

Unfortunately, these dependencies are **not available through Conda**. You will have to use `pip` to be able to use GraphQL feature.

````

A {class}`~jina.Flow` can optionally expose a [GraphQL](https://graphql.org/) endpoint, located at `/graphql`.
To enable this endpoint, all you need to do is set `expose_graphql_endpoint=True` on your HTTP Flow:

````{tab} Python

```python

from jina import Flow

f = Flow().config_gateway(protocol='http', expose_graphql_endpoint=True)

```

````

````{tab} YAML

```yaml

jtype: Flow
with:
  protocol: 'http'
  expose_graphql_endpont: True,

```

````

````{admonition} See Also
:class: seealso

For more details about the Jina GraphQL endpoint, see {ref}`here <flow-graphql>`.

````

## Config Uvicorn server

HTTP support in Jina is powered by [Uvicorn](https://www.uvicorn.org/).
You can configure the Flow's internal Uvicorn sever to your heart's content by passing `uvicorn_kwargs` to the Flow:

```python
from jina import Flow

f = Flow().config_gateway(
    protocol='http', uvicorn_kwargs={'loop': 'asyncio', 'http': 'httptools'}
)

```

These arguments will be directly passed to the Uvicorn server.

````{admonition} See Also
:class: seealso

For more details about the arguments that are used here, and about other available settings for the Uvicorn server,
see their [website](https://www.uvicorn.org/settings/).

````

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/serving/gateway/health-check.md

(health-check-gateway)=

# Health Check

Just like each individual Executors, the Gateway also exposes a health check endpoint.

In contrast to Executors however, a Gateway can use gRPC, HTTP, or WebSocketss, and the health check endpoint changes accordingly.

## Using gRPC

When using gRPC as the protocol to communicate with the Gateway, the Gateway uses the exact same mechanism as Executors to expose its health status: It exposes the [standard gRPC health check](https://github.com/grpc/grpc/blob/master/doc/health-checking.md) to the outside world.

With the same Flow as before, you can use the same way to check the Gateway status:

```bash
docker run --network='host' fullstorydev/grpcurl -plaintext 127.0.0.1:12345 grpc.health.v1.Health/Check

```

```json
{
  "status": "SERVING"
}

```

## Using HTTP or WebSockets

````{admonition} Caution
:class: caution
For Gateways running with HTTP or WebSockets, the gRPC health check response codes outlined {ref}`above <health-check-microservices>` do not apply.

Instead, an error free response signifies healthiness.

````

When using HTTP or WebSockets as the Gateway protocol, you can query the endpoint `'/'` to check the status.

First, create a Flow with HTTP or WebSockets protocol:

```python
from jina import Flow

f = Flow(protocol='http', port=12345).add()
with f:
    f.block()

```

Then query the "empty" endpoint:

```bash
curl http://localhost:12345

```

You get a valid empty response indicating the Gateway's ability to serve:

```json
{}

```

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/serving/gateway/index.md

(gateway)=

# Gateway

Every {class}`~jina.Flow` has a Gateway component that receives requests over the network, allowing clients to send data
to the Flow for processing.

The Gateway is the first destination of a client request and its final destination, meaning that all incoming requests
are routed to the Gateway and the Gateway is responsible for handling and responding to those requests. The Gateway
supports multiple protocols and endpoints, such as gRPC, HTTP, WebSocket, and GraphQL, allowing clients to communicate
with the Flow using the protocol of their choice.

In most cases, the Gateway is automatically configured when you initialize a Flow object, so you do not need to
configure it yourself.

However, you can always explicitly configure the Gateway in Python using the
{meth}`~jina.Flow.config_gateway` method, or in YAML. The full YAML specification for configuring the Gateway can be
{ref}`found here<gateway-yaml-spec>`.

(flow-protocol)=

## Set protocol in Python

You can use three different protocols to serve the `Flow`: gRPC, HTTP and WebSocket.

````{tab} gRPC

```{code-block} python

---
emphasize-lines: 12, 14
---

from jina import Client, Executor, Flow, requests
from docarray import DocList
from docarray.documents import TextDoc

class FooExecutor(Executor):
    @requests
    def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
        for doc in docs:
            doc.text = 'foo was called'

f = Flow().config_gateway(protocol='grpc', port=12345).add(uses=FooExecutor)
with f:
    client = Client(port=12345)
    docs = client.post(on='/', inputs=TextDoc(), return_type=DocList[TextDoc])
    print(docs.text)

```

```text

['foo was called']

```

````

````{tab} HTTP

```{code-block} python

---
emphasize-lines: 12, 14
---

from jina import Client, Executor, Flow, requests
from docarray import DocList
from docarray.documents import TextDoc

class FooExecutor(Executor):
    @requests
    def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
        for doc in docs:
            doc.text = 'foo was called'

f = Flow().config_gateway(protocol='http', port=12345).add(uses=FooExecutor)
with f:
    client = Client(port=12345, protocol='http')
    docs = client.post(on='/', inputs=TextDoc(), return_type=DocList[TextDoc])
    print(docs.text)

```

```text

['foo was called']

```

````

````{tab} WebSocket

```{code-block} python

---
emphasize-lines: 12, 14
---

from jina import Client, Executor, Flow, requests
from docarray import DocList
from docarray.documents import TextDoc

class FooExecutor(Executor):
    @requests
    def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]:
        for doc in docs:
            doc.text = 'foo was called'

f = Flow().config_gateway(protocol='websocket', port=12345).add(uses=FooExecutor)
with f:
    client = Client(port=12345, protocol='websocket')
    docs = client.post(on='/', inputs=TextDoc(), return_type=DocList[TextDoc])
    print(docs.text)

```

```text

['foo was called']

```

````

## Set protocol in YAML

To configure the protocol in a YAML file:

````{tab} gRPC
Note that gRPC is the default protocol, so you can just omit it.

```{code-block} yaml

jtype: Flow
gateway:
  protocol: 'grpc'

```

````

````{tab} HTTP

```{code-block} yaml

jtype: Flow
gateway:
  protocol: 'http'

```

````

````{tab} WebSocket

```{code-block} yaml

jtype: Flow
gateway:
  protocol: 'websocket'

```

````

## Enable multiple protocols

You can enable multiple protocols on the Gateway. This allows polyglot clients connect to your Flow with different
protocols.

````{tab} Python

```{code-block} python

---
emphasize-lines: 2
---
from jina import Flow
flow = Flow().config_gateway(protocol=['grpc', 'http', 'websocket'])
with flow:
    flow.block()

```

````

````{tab} YAML

```yaml

jtype: Flow
gateway:
  protocol:

  * 'grpc'
  * 'http'
  * 'websocket'

```

````

```{figure} multi-protocol-flow.png
:width: 70%

```

```{admonition} Important
:class: important

In case you want to serve a Flow using multiple protocols, make sure to specify as much ports as protocols used.

```

(custom-http)=

(flow-tls)=

## Enable TLS for client traffics

You can enable TLS encryption between your Gateway and Clients, for any of the protocols supported by Jina-serve (HTTP, gRPC,
and WebSocket).

````{admonition} Caution
:class: caution
Enabling TLS will encrypt the data that is transferred between the Flow and the Client.
Data that is passed between the microservices configured by the Flow, such as Executors, will **not** be encrypted.

````

To enable TLS encryption, you need to pass a valid *keyfile* and *certfile* to the Flow, using
the `ssl_keyfile` `ssl_certfile`
parameters:

```python
from jina import Flow

Flow().config_gateway(
    port=12345,
    ssl_certfile='path/to/certfile.crt',
    ssl_keyfile='path/to/keyfile.crt',
)

```

If both of these are provided, the Flow will automatically configure itself to use TLS encryption for its communication
with any Client.

(server-compress)=

## Enable in-Flow compression

The communication between {class}`~jina.Executor`s inside a {class}`~jina.Flow` is done via gRPC. To optimize the
performance and the bandwidth of these connections, you can
enable [compression](https://grpc.github.io/grpc/python/grpc.html#compression) by specifying `compression` argument to
the Gateway.

The supported methods are: none, `gzip` and `deflate`.

```python
from jina import Flow

f = Flow().config_gateway(compression='gzip').add(...)

```

Note that this setting is only effective the internal communication of the Flow.
One can also specify the compression between client and gateway {ref}`as described here<client-compress>`.

## Get environment information

Gateway provides an endpoint that exposes environment information where it runs.

It is a dict-like structure with the following keys:

* `jina`: A dictionary containing information about the system and the versions of several packages including jina

  package itself

* `envs`: A dictionary containing all the values if set of the {ref}`environment variables used in Jina-serve <jina-serve-env-vars>`

### Use gRPC

To see how this works, first instantiate a Flow with an Executor exposed to a specific port and block it for serving:

```python
from jina import Flow

with Flow().config_gateway(protocol=['grpc'], port=12345) as f:
    f.block()

```

Then, you can use [grpcurl](https://github.com/fullstorydev/grpcurl)  sending status check request to the Gateway.

```shell
docker pull fullstorydev/grpcurl:latest
docker run --network='host' fullstorydev/grpcurl -plaintext 127.0.0.1:12345 jina.JinaInfoRPC/_status

```

The error-free output below signifies a correctly running Gateway:

```json
{
  "jina": {
    "architecture": "######",
    "ci-vendor": "######",
    "docarray": "######",
    "grpcio": "######",
    "jina": "######",
    "jina-proto": "######",
    "jina-vcs-tag": "######",
    "platform": "######",
    "platform-release": "######",
    "platform-version": "######",
    "processor": "######",
    "proto-backend": "######",
    "protobuf": "######",
    "python": "######",
    "pyyaml": "######",
    "session-id": "######",
    "uid": "######",
    "uptime": "######"
  },
  "envs": {
    "JINA_AUTH_TOKEN": "(unset)",
    "JINA_DEFAULT_HOST": "(unset)",
    "JINA_DEFAULT_TIMEOUT_CTRL": "(unset)",
    "JINA_DEPLOYMENT_NAME": "(unset)",
    "JINA_DISABLE_HEALTHCHECK_LOGS": "(unset)",
    "JINA_DISABLE_UVLOOP": "(unset)",
    "JINA_EARLY_STOP": "(unset)",
    "JINA_FULL_CLI": "(unset)",
    "JINA_GATEWAY_IMAGE": "(unset)",
    "JINA_GRPC_RECV_BYTES": "(unset)",
    "JINA_GRPC_SEND_BYTES": "(unset)",
    "JINA_HUBBLE_REGISTRY": "(unset)",
    "JINA_HUB_NO_IMAGE_REBUILD": "(unset)",
    "JINA_LOCKS_ROOT": "(unset)",
    "JINA_LOG_CONFIG": "(unset)",
    "JINA_LOG_LEVEL": "(unset)",
    "JINA_LOG_NO_COLOR": "(unset)",
    "JINA_MP_START_METHOD": "(unset)",
    "JINA_RANDOM_PORT_MAX": "(unset)",
    "JINA_RANDOM_PORT_MIN": "(unset)"
  }
}

```

```{tip}
You can also use it to check Executor status, as Executor's communication protocol is gRPC.

```

(gateway-grpc-server-options)=

### Configure Gateway gRPC options

The {class}`~jina.Gateway` supports the `grpc_server_options` parameter which allows more customization of the **gRPC**
server. The `grpc_server_options` parameter accepts a dictionary of **gRPC** configuration options which will be
used to overwrite the default options. The **gRPC** channel used for server to server communication can also be
customized using the `grpc_channel_options` parameter.

The default **gRPC** options are:

```python
('grpc.max_receive_message_length', -1),
('grpc.keepalive_time_ms', 9999),

# send keepalive ping every 9 second, default is 2 hours.

('grpc.keepalive_timeout_ms', 4999),

# keepalive ping time out after 4 seconds, default is 20 seconds

('grpc.keepalive_permit_without_calls', True),

# allow keepalive pings when there's no gRPC calls

('grpc.http1.max_pings_without_data', 0),

# allow unlimited amount of keepalive pings without data

('grpc.http1.min_time_between_pings_ms', 10000),

# allow grpc pings from client every 9 seconds

('grpc.http1.min_ping_interval_without_data_ms', 5000),

# allow grpc pings from client without data every 4 seconds

```

Refer to the [channel_arguments](https://grpc.github.io/grpc/python/glossary.html#term-channel_arguments) section for
the full list of available **gRPC** options.

```{hint}
:class: seealso
Refer to the {ref}`Configure gRPC Client options <client-grpc-channel-options>` section for configuring the `Client` **gRPC** channel options.
Refer to the {ref}`Configure Executor gRPC options <executor-grpc-channel-options>` section for configuring the `Executor` **gRPC** options.

```

### Use HTTP/WebSocket

When using HTTP or WebSocket as the Gateway protocol, you can use curl to target the `/status` endpoint and get the Jina-serve
info.

```shell
curl http://localhost:12345/status

```

```json
{
  "jina": {
    "jina": "######",
    "docarray": "######",
    "jina-proto": "######",
    "jina-vcs-tag": "(unset)",
    "protobuf": "######",
    "proto-backend": "######",
    "grpcio": "######",
    "pyyaml": "######",
    "python": "######",
    "platform": "######",
    "platform-release": "######",
    "platform-version": "######",
    "architecture": "######",
    "processor": "######",
    "uid": "######",
    "session-id": "######",
    "uptime": "######",
    "ci-vendor": "(unset)"
  },
  "envs": {
    "JINA_AUTH_TOKEN": "(unset)",
    "JINA_DEFAULT_HOST": "(unset)",
    "JINA_DEFAULT_TIMEOUT_CTRL": "(unset)",
    "JINA_DEPLOYMENT_NAME": "(unset)",
    "JINA_DISABLE_UVLOOP": "(unset)",
    "JINA_EARLY_STOP": "(unset)",
    "JINA_FULL_CLI": "(unset)",
    "JINA_GATEWAY_IMAGE": "(unset)",
    "JINA_GRPC_RECV_BYTES": "(unset)",
    "JINA_GRPC_SEND_BYTES": "(unset)",
    "JINA_HUBBLE_REGISTRY": "(unset)",
    "JINA_HUB_NO_IMAGE_REBUILD": "(unset)",
    "JINA_LOG_CONFIG": "(unset)",
    "JINA_LOG_LEVEL": "(unset)",
    "JINA_LOG_NO_COLOR": "(unset)",
    "JINA_MP_START_METHOD": "(unset)",
    "JINA_RANDOM_PORT_MAX": "(unset)",
    "JINA_RANDOM_PORT_MIN": "(unset)",
    "JINA_DISABLE_HEALTHCHECK_LOGS": "(unset)",
    "JINA_LOCKS_ROOT": "(unset)"
  }
}

```

(gateway-logging-configuration)=

## Custom logging configuration

The {ref}`Custom logging configuration <logging-configuration>` section describes customizing the logging configuration for all entities of the `Flow`.
The `Gateway` logging can also be individually configured using a custom `logging.json.yml` file as in the below example. The custom logging file
`logging.json.yml` is described in more detail in the {ref}`Custom logging configuration <logging-configuration>` section.

````{tab} Python

```python

from jina import Flow

f = Flow().config_gateway(log_config='./logging.json.yml')

```

````

````{tab} YAML

```yaml

jtype: Flow
gateway:
  log_config: './logging.json.yml'

```

````

## See also

* {ref}`Access the Flow with the Client <client>`
* {ref}`Deployment with Kubernetes <kubernetes>`
* {ref}`Deployment with Docker Compose <docker-compose>`

```{toctree}
:hidden:

health-check
rate-limit
customize-http-endpoints
customization
yaml-spec

```

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/serving/gateway/rate-limit.md

(prefetch)=

# Rate Limit

Requests always reach to the Flow as fast as possible. If a client sends their request faster than the {class}`~jina.Flow` can process them, this can put a high load on the Flow, which may cause out of memory issues.

At Gateway, you can control the number of in flight requests **per Client** with the `prefetch` argument. Setting `prefetch=2` lets the API accept only 2 requests per client in parallel, hence limiting the load of the Flow.

By default `prefetch=1000`. To disable it you can set it to 0.

```{code-block} python
---
emphasize-lines: 8, 10
---

def requests_generator():
    while True:
        yield Document(...)

class MyExecutor(Executor):
    @requests
    def foo(self, **kwargs):
        slow_operation()

# Makes sure only 2 requests reach the Executor at a time.

with Flow().config_gateway(prefetch=2).add(uses=MyExecutor) as f:
    f.post(on='/', inputs=requests_generator)

```

```{danger}
When working with very slow executors and a big amount of data, you must set `prefetch` to some small number to prevent out of memory problems. If you are unsure, always set `prefetch=1`.

```

````{tab} Python

```python

from jina import Flow

f = Flow().config_gateway(protocol='http', prefetch=10)

```

````

````{tab} YAML

```yaml

jtype: Flow
with:
  protocol: 'http'
  prefetch: 10

```

````

## Set timeouts

You can set timeouts for sending requests to the {class}`~jina.Executor`s within a {class}`~jina.Flow` by passing the `timeout_send` parameter. The timeout is specified in milliseconds. By default, it is `None` and the timeout is disabled.

If you use timeouts, you may also need to set the {ref}`prefetch <prefetch>` option in the Flow. Otherwise, requests may queue up at an Executor and eventually time out.

```{code-block} python
with Flow().config_gateway(timeout_send=1000) as f:
    f.post(on='/', inputs=[Document()])

```

The example above limits every request to the Executors in the Flow to a timeout of 1 second.

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/serving/gateway/yaml-spec.md

(gateway-yaml-spec)=

# {octicon}`file-code` YAML specification

This page outlines the specification for Gateway.

Gateway config is nested under the `gateway` section of a Flow YAML. For example,

```{code-block} yaml
---
emphasize-lines: 3-4
---
jtype: Flow
version: '1'
gateway:
  protocol: http

```

Defines a Gateway that uses HTTP protocol.

```{warning}
It is also possible to define a Gateway configuration directly under the top-level `with` field, but it is not recommended.

```

## Fields

The following fields are defined for Gateway and can be set under the `gateway` section (or the `with` section) of a Flow YAML.

```{include} ../flow/gateway-args.md

```

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/concepts/serving/index.md

(serving)=

# {fas}`gears` Serving

As seen in the {ref}`architecture overview <architecture-overview>`, Jina-serve is organized in different layers.

The Serving layer is composed of concepts that allow developers to write their logic to be served by the objects in {ref}`orchestration <orchestration>` layer.

Two objects belong to this family:

* Executor ({class}`~jina.Executor`), serves your logic based on [docarray](https://docs.docarray.org/) data structures.
* Gateway ({class}`~jina.Gateway`), directs all the traffic when multiple Executors are combined inside a Flow.

```{toctree}
:hidden:

executor/index
gateway/index

```

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/docarray-support.md

(docarray-support)=

# DocArray support

Jina-serve depends heavily on DocArray to provide the data that is processed inside Jina-serve Executors and sent by our Clients.
Recently, DocArray was heavily refactored for version 0.30.

Starting from that version, DocArray usage  has changed drastically, however Jina-serve can work seamlessly and automatically with any of the versions of Jina-serve.
Jina-serve will automatically detect the docarray version installed and use the corresponding methods and APIs. However, developers
must take into account that some APIs and usages have changed, especially when it comes to developing Executors.

The new version makes the dataclass feature of DocArray<0.30 a first-class citizen and for this
purpose it is built on top of [Pydantic](https://pydantic-docs.helpmanual.io/). An important shift is that
the new DocArray adapts to users' data, whereas DocArray<0.30 forces user to adapt to the Document schema.

## Document schema

At the heart of DocArray>=0.30 is a new schema that is more flexible and expressive than the original DocArray schema.

You can refer to the [DocArray README](https://github.com/docarray/docarray) for more details.
Please note that also the names of data structure change in the new version of DocArray.

TODO: ADD snippets for both versions

On the Jina-serve side, this flexibility extends to every Executor, where you can now customize input and output schemas:

* With DocArray<0.30 a Document has a fixed schema in the input and the output
* With DocArray>=0.30 (the version currently used by default in Jina-serve), an Executor defines its own input and output schemas.

It also provides several predefined schemas that you can use out of the box.

## Executor API

To reflect the change with DocArray >=0.30, the Executor API supports schema definition. The
design is inspired by [FastAPI](https://fastapi.tiangolo.com/).

The main difference, is that for `docarray<0.30` there is only a single [Document](https://docarray.org/legacy-docs/fundamentals/document/) with a fixed schema.
However, with `docarray>=0.30` user needs to define their own `Document` by subclassing from [BaseDoc](https://docs.docarray.org/user_guide/representing/first_step/) or taking any of the [predefined Document types](https://docs.docarray.org/data_types/first_steps/) provided.

````{tab} docarray>=0.30

```{code-block} python

from jina import Executor, requests
from docarray import DocList, BaseDoc
from docarray.documents import ImageDoc
from docarray.typing import AnyTensor

import numpy as np

class InputDoc(BaseDoc):
    img: ImageDoc

class OutputDoc(BaseDoc):
    embedding: AnyTensor

class MyExec(Executor):
    @requests(on='/bar')
    def bar(
        self, docs: DocList[InputDoc], **kwargs
    ) -> DocList[OutputDoc]:
        docs_return = DocList[OutputDoc](
            [OutputDoc(embedding=np.zeros((100, 1))) for _ in range(len(docs))]
        )
        return docs_return

```

````

````{tab} docarray<0.30

```{code-block} python

from jina import Executor, requests
from docarray import Document, DocumentArray

import numpy as np

class MyExec(Executor):
    @requests(on='/bar')
    def bar(
        self, docs: DocumentArray, **kwargs
    ):
        docs_return = DocumentArray(
            [Document(embedding=np.zeros((100, 1))) for _ in range(len(docs))]
        )
        return docs_return

```

````

To ease with the transition from the old to the new `docarray` versions, there is the [`LegacyDocument`](https://docs.docarray.org/API_reference/documents/documents/#docarray.documents.legacy.LegacyDocument) which is a predefined Document that aims to provide
the same data type as the original `Document` in `docarray<0.30`.

## Client API

In the client, the big change is that when using `docarray>=0.30`. you specify the schema that you expect the Deployment or Flow to return. You can pass the return type by using the `return_type` parameter in the `client.post` method:

````{tab} docarray>=0.30

```{code-block} python

from jina import Client
from docarray import DocList, BaseDoc
from docarray.documents import ImageDoc
from docarray.typing import AnyTensor

class InputDoc(BaseDoc):
    img: ImageDoc

class OutputDoc(BaseDoc):
    embedding: AnyTensor

c = Client(host='')
c.post('/', DocList[InputDoc]([InputDoc(img=ImageDoc()) for _ in range(10)]), return_type=DocList[OutputDoc])

```

````

````{tab} docarray<0.30

```{code-block} python

from jina import Client
from docarray import DocumentArray, Document

c = Client(host='')
c.post('/', DocumentArray([Document() for _ in range(10)]))

```

````

## See also

* [DocArray>=0.30](https://docs.docarray.org/) docs
* [DocArray<0.30](https://docarray.org/legacy-docs/) docs
* [Pydantic](https://pydantic-docs.helpmanual.io/) documentation for more details on the schema definition

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/envs/index.md

(jina-env-vars)=

# {octicon}`list-unordered` Environment Variables

Jina-serve uses environment variables to determine different behaviours. To see all supported environment variables and their current values, run:

```bash
jina -vf

```

If you use containerized Executors (including {ref}`Kubernetes <kubernetes>` and {ref}`Docker Compose <docker-compose>`), you can pass separate environment variables to each Executor in the following way:

`````{tab} FLow YAML

```yaml

jtype: Flow
with: {}
executors:

* name: executor0

  port: 49583
  env:
    JINA_LOG_LEVEL: DEBUG
    MYSECRET: ${{ ENV.MYSECRET }}

* name: executor1

  port: 62156
  env:
    JINA_LOG_LEVEL: INFO
    CUDA_VISIBLE_DEVICES: 1

```

`````

`````{tab} Deployment YAML

```yaml

jtype: Deployment
with:
  name: executor0
  port: 49583
  env:
    JINA_LOG_LEVEL: DEBUG
    MYSECRET: ${{ ENV.MYSECRET }}

```

`````

````{tab} Python API

```python

from jina import Flow
import os

secret = os.environ['MYSECRET']
f = (
    Flow()
    .add(env={'JINA_LOG_LEVEL': 'DEBUG', 'MYSECRET': secret})
    .add(env={'JINA_LOG_LEVEL': 'INFO', 'CUDA_VISIBLE_DEVICES': 1})
)
f.save_config("envflow.yml")

```

````

The following environment variables are used internally in Jina:

| Environment variable          | Description                                                                                                                                                                                     |
|-------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `JINA_AUTH_TOKEN`             | Authentication token of Jina Cloud                                                                                                                                                              |
| `JINA_DEFAULT_HOST`           | Default host where server is exposed                                                                                                                                                    |
| `JINA_DEFAULT_TIMEOUT_CTRL`   | Default timeout time used by Flow to check readiness of Executors                                                                                                                       |
| `JINA_DEPLOYMENT_NAME`        | Name of deployment, used by Head Runtime in Kubernetes to connect to different deployments                                                                                          |
| `JINA_DISABLE_UVLOOP`         | If set, Jina will not use uvloop event loop for concurrent execution                                                                                                                            |
| `JINA_FULL_CLI`               | If set, all CLI options will be shown in help                                                                                                                                                                                                                                                            |
| `JINA_GATEWAY_IMAGE`          | Used when exporting a Flow to Kubernetes or Docker Compose to override default gateway image                                                                                                                                                                                                             |
| `JINA_GRPC_RECV_BYTES`        | Set by gRPC service to keep track of received bytes                                                                                                                                     |
| `JINA_GRPC_SEND_BYTES`        | Set by gRPC service to keep track of sent bytes                                                                                                                                         |
| `JINA_K8S_ACCESS_MODES`       | Configures access modes for `PersistentVolumeClaim` attached to `StatefulSet`, when creating a `StatefulSet` in Kubernetes for an Executor using volumes. Defaults to '["ReadWriteOnce"]' |
| `JINA_K8S_STORAGE_CAPACITY`   | Configures capacity for `PersistentVolumeClaim` attached to `StatefulSet`, when creating a `StatefulSet` in Kubernetes for an Executor using volumes. Defaults to '10G'                   |
| `JINA_K8S_STORAGE_CLASS_NAME` | Configures storage class for `PersistentVolumeClaim` attached to `StatefulSet`, when creating a `StatefulSet` in Kubernetes for an Executor using volumes. Defaults to 'standard'         |
| `JINA_LOCKS_ROOT`             | Root folder where file locks for concurrent Executor initialization                                                                                                                         |
| `JINA_LOG_CONFIG`             | Configuration used for logger                                                                                                                                                           |
| `JINA_LOG_LEVEL`              | Logging level used: INFO, DEBUG, WARNING                                                                                                                                                    |
| `JINA_LOG_NO_COLOR`           | If set, disables color from rich console                                                                                                                                                        |
| `JINA_MP_START_METHOD`        | Sets multiprocessing start method used by Jina                                                                                                                                              |
| `JINA_OPTOUT_TELEMETRY`       | If set, disables telemetry                                                                                                                                                                                                                                                                                            |
| `JINA_RANDOM_PORT_MAX`        | Maximum port number used when selecting random ports to apply for Executors or Gateway                                                                                                                                                                                                                                                                                                                                                                                   |
| `JINA_RANDOM_PORT_MIN`        | Minimum port number used when selecting random ports to apply for Executors or Gateway                                                                                                                                                                                                                                                                                                                                                                                   |
| `JINA_STREAMER_ARGS`          | Jina uses this variable to inject `GatewayStreamer` arguments into host environment running a Gateway                                                                                         |

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/get-started/create-app.md

# {fas}`folder-plus` Create First Project

Let's build a toy application with Jina-serve. To start, use Jina-serve CLI to make a new Deployment or a Flow:

## Create a Deployment or Flow

A {ref}`Deployment <deployment>` lets you serve and scale a single model or microservice, whereas a {ref}`Flow <flow-cookbook>` lets you connect Deployments into a processing pipeline.

````{tab} Deployment

```bash

jina new hello-jina --type=deployment

```

This creates a new project folder called `hello-jina-serve` with the following file structure:

```text

hello-jina/
    |- client.py
    |- deployment.yml
    |- executor1/
            |- config.yml
            |- executor.py

```

* `deployment.yml` is the configuration file for the Deployment.
* `executor1/` is where you write your {ref}`Executor <executor-cookbook>` code.
* `config.yml` is the configuration file for the Executor. It stores metadata for your Executor, as well as dependencies.
* `client.py` is the entrypoint of your Jina project. You can run it via `python app.py`.

There are some other files like `README.md` and `requirements.txt` to provide extra metadata about that Executor. More information {ref}`can be found here<create-executor>`.

Now run it and observe the output of the server and client:

## Launch Deployment

```shell

jina deployment --uses deployment.yml

```

```shell

──── 🎉 Deployment is ready to serve! ────
╭────────────── 🔗 Endpoint ───────────────╮
│  ⛓     Protocol                   grpc  │
│  🏠       Local           0.0.0.0:54321  │
│  🔒     Private    xxx.xx.xxx.xxx:54321  │
│       Public       xx.xxx.xxx.xxx:54321  │
│  ⛓     Protocol                   http  │
│  🏠       Local           0.0.0.0:54322  │
│  🔒     Private    xxx.xx.xxx.xxx:54322  │
│       Public       xx.xxx.xxx.xxx:54322  │
╰──────────────────────────────────────────╯
╭─────────── 💎 HTTP extension ────────────╮
│  💬    Swagger UI    0.0.0.0:54322/docs  │
│  📚         Redoc   0.0.0.0:54322/redoc  │
╰──────────────────────────────────────────╯

```

````

````{tab} Flow

```bash

jina new hello-jina --type=flow

```

This creates a new project folder called `hello-jina-serve` with the following file structure:

```text

hello-jina/
    |- client.py
    |- flow.yml
    |- executor1/
            |- config.yml
            |- executor.py

```

* `flow.yml` is the configuration file for the Flow`.
* `executor1/` is where you write your {ref}`Executor <executor-cookbook>` code.
* `config.yml` is the configuration file for the Executor. It stores metadata for your Executor, as well as dependencies.
* `client.py` is the entrypoint of your Jina-serve project. You can run it via `python app.py`.

There are some other files like `README.md` and `requirements.txt` to provide extra metadata about that Executor. More information {ref}`can be found here<create-executor>`.

Now run it and observe the output of the server and client:

## Launch Flow

```shell

jina-serve flow --uses flow.yml

```

```shell

──── 🎉 Flow is ready to serve! ────
╭────────────── 🔗 Endpoint ───────────────╮
│  ⛓     Protocol                   grpc  │
│  🏠       Local           0.0.0.0:54321  │
│  🔒     Private    xxx.xx.xxx.xxx:54321  │
│       Public       xx.xxx.xxx.xxx:54321  │
│  ⛓     Protocol                   http  │
│  🏠       Local           0.0.0.0:54322  │
│  🔒     Private    xxx.xx.xxx.xxx:54322  │
│       Public       xx.xxx.xxx.xxx:54322  │
│  ⛓     Protocol              websocket  │
│  🏠       Local           0.0.0.0:54323  │
│  🔒     Private    xxx.xx.xxx.xxx:54323  │
│       Public       xx.xxx.xxx.xxx:54323  │
╰──────────────────────────────────────────╯
╭─────────── 💎 HTTP extension ────────────╮
│  💬    Swagger UI    0.0.0.0:54322/docs  │
│  📚         Redoc   0.0.0.0:54322/redoc  │
╰──────────────────────────────────────────╯

```

````

Deployments and Flows share many common ways of doing things. We'll go into those below.

## Connect with Client

The {ref}`client` lets you connect to your Deployment or Flow over gRPC, HTTP or WebSockets. {ref}`Third party clients <third-party-client>` for non-Python languages.

```bash
python client.py

```

```shell
['hello, world!', 'goodbye, world!']

```

## Add logic

You can use any Python library in an Executor. For example, add `pytorch` to `executor1/requirements.txt` and crunch some numbers.

In `executor.py`, add another endpoint `/get-tensor` as follows:

```{code-block} python
---
emphasize-lines: 13-16
---
import numpy as np
import torch

from jina import Executor, requests

class MyExecutor(Executor):
    @requests
    def foo(self, docs, **kwargs):
        docs[0].text = 'hello, world!'
        docs[1].text = 'goodbye, world!'

    @requests(on='/crunch-numbers')
    def bar(self, docs:, **kwargs):
        for doc in docs:
            doc.tensor = torch.tensor(np.random.random([10, 2]))

```

Kill the last server with `Ctrl-C` and restart the server with `jina flow --uses deployment.yml`.

## Call `/crunch-number` endpoint

Modify `client.py` to call the `/crunch-numbers` endpoint:

```python
from jina import Client
from docarray import DocList
from docarray.documents.legacy import LegacyDocument

if __name__ == '__main__':
    c = Client(port=54321)
    da = c.post('/crunch-numbers', DocList[LegacyDocument]([LegacyDocument(), LegacyDocument()]), return_type=DocList[LegacyDocument])
    print(da.tensor)

```

After you save that, you can run your new client:

```bash
python client.py

```

```text
tensor([[[0.9594, 0.9373],
         [0.4729, 0.2012],
         [0.7907, 0.3546],
         [0.6961, 0.7463],
         [0.3487, 0.7837],
         [0.7825, 0.0556],
         [0.3296, 0.2153],
         [0.2207, 0.0220],
         [0.9547, 0.9519],
         [0.6703, 0.4601]],

        [[0.9684, 0.6781],
         [0.7906, 0.8454],
         [0.2136, 0.9147],
         [0.3999, 0.7443],
         [0.2564, 0.0629],
         [0.4713, 0.1018],
         [0.3626, 0.0963],
         [0.7562, 0.2183],
         [0.9239, 0.3294],
         [0.2457, 0.9189]]], dtype=torch.float64)

```

## Deploy to cloud

JCloud offers free CPU and GPU instances to host Jina projects.

```{admonition} Deployments on JCloud
:class: important
At present, JCloud is only available for Flows. We are currently working on supporting Deployments.

```

```bash
jina-serve auth login

```

Log in with your GitHub, Google or Email account:

```bash
jina cloud flow deploy ./

```

```{figure} deploy-jcloud-ongoing.png

```

Deploying a Flow to the cloud is fully automatic and takes a few minutes.

After it is done, you should see the following message in the terminal.

```text
╭────────────── 🎉 Flow is available! ──────────────╮
│                                                   │
│   ID            1655d050ad                        │
│   Endpoint(s)   grpcs://1655d050ad.wolf.jina.ai   │
│                                                   │
╰───────────────────────────────────────────────────╯

```

Now change the Client's code to use the deployed endpoint shown above:

```{code-block} python
---
emphasize-lines: 6
---
from jina import Client
from docarray import DocList
from docarray.documents.legacy import LegacyDocument

if __name__ == '__main__':
    c = Client(host='grpcs://1655d050ad.wolf.jina.ai')
    da = c.post('/crunch-numbers', DocList[LegacyDocument]([LegacyDocument(), LegacyDocument()]))
    print(da.tensor)

```

```{tip}
The very first request can be a bit slow because the server is starting up.

```

```text
tensor([[[0.4254, 0.4305],
         [0.6200, 0.5783],
         [0.7989, 0.8742],
         [0.1324, 0.7228],
         [0.1274, 0.6538],
         [0.1533, 0.7543],
         [0.3025, 0.7702],
         [0.6938, 0.9289],
         [0.5222, 0.7280],
         [0.7298, 0.4923]],

        [[0.9747, 0.5026],
         [0.6438, 0.4007],
         [0.0899, 0.8635],
         [0.3142, 0.4142],
         [0.4447, 0.2540],
         [0.1109, 0.6260],
         [0.3850, 0.9894],
         [0.0845, 0.7538],
         [0.1444, 0.5136],
         [0.3368, 0.6162]]], dtype=torch.float64)

```

## Delete the deployed project

Don't forget to delete a Flow if you're not using it any more:

```bash
jina cloud flow remove 1655d050ad

```

```text
Successfully removed Flow 1655d050ad.

```

You've just finished your first toy Jina-serve project, congratulations! You can now start your own project.

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/get-started/install/apple-silicon-m1-m2.md

# On Apple Silicon

If you own a macOS device with an Apple Silicon M1/M2 chip, you can run Jina-serve **natively** on it (instead of running under Rosetta) and enjoy up to 10x faster performance. This chapter summarizes how to install Jina-serve.

## Check terminal and device

To ensure you are using the right terminal, run:

```bash
uname -m

```

It should return:

```text
arm64

```

## Install Homebrew

`brew` is a package manager for macOS. If you have already installed it, you need to confirm it is actually installed for Apple Silicon not for Rosetta. To check that, run:

```bash
which brew

```

```text
/opt/homebrew/bin/brew

```

If it's installed under `/usr/local/` instead of `/opt/homebrew/`, it means your `brew` is installed for Rosetta not for Apple Silicon. You need to [reinstall it](https://apple.stackexchange.com/a/410829).

```{danger}
Reinstalling `brew` can be a destructive operation. Ensure you have backed up your data before proceeding.

```

To (re)install brew, run:

```bash
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

```

You can observe the output to check if it contains `/opt/homebrew` to ensure you are installing for Apple Silicon.

## Install Python

Python also has to be installed for Apple Silicon. It is possible it is installed for Rosetta, and you are not aware of that. To confirm, run:

```python
import platform

platform.machine()

```

This should output:

```text
'arm64'

```

If not, then you are using Python under Rosetta, and you need to install Python for Apple Silicon with `brew`:

```bash
brew install python3

```

As of August 2022, this will install Python 3.10 natively for Apple Silicon.

Ensure you note down where `python` and `pip` are installed. In this example, they are installed to `/opt/homebrew/bin/python3` and `/opt/homebrew/opt/python@3.10/libexec/bin/pip` respectively.

## Install dependencies wheels

There are some core dependencies that Jina-serve needs to run, whose wheels are not available on PyPI but fortunately are available as wheels. To install them, run:

```bash
brew install protobuf numpy

```

## Install Jina-serve

Now we can install Jina-serve via `pip`. Ensure you use the correct `pip`:

```bash
/opt/homebrew/opt/python@3.10/libexec/bin/pip install jina

```

`grpcio` requires building the wheels, it will take some time.

Note: If the previous step fails, adding the environment variables below might solve the problem:

```bash
export GRPC_PYTHON_BUILD_SYSTEM_OPENSSL=1
export GRPC_PYTHON_BUILD_SYSTEM_ZLIB=1

```

After all the dependencies are installed, you can run Jina-serve CLI and check the system information.

```bash
jina -vf

```

```{code-block} text
---
emphasize-lines: 13-15
---

* jina 3.7.14
* docarray 0.15.4
* jcloud 0.0.35
* jina-hubble-sdk 0.16.1
* jina-proto 0.1.13
* protobuf 3.20.1
* proto-backend python
* grpcio 1.47.0
* pyyaml 6.0
* python 3.10.6
* platform Darwin
* platform-release 21.6.0
* platform-version Darwin Kernel Version 21.6.0: Sat Jun 18 17:07:28 PDT 2022; root:xnu-8020.140.41~1/RELEASE_ARM64_T8110
* architecture arm64
* processor arm
* uid 94731629138370
* session-id 49497356-254e-11ed-9624-56286d1a91c2
* uptime 2022-08-26T16:49:28.279723
* ci-vendor (unset)
* JINA_DEFAULT_HOST (unset)
* JINA_DEFAULT_TIMEOUT_CTRL (unset)
* JINA_DEPLOYMENT_NAME (unset)
* JINA_DISABLE_UVLOOP (unset)
* JINA_EARLY_STOP (unset)
* JINA_FULL_CLI (unset)
* JINA_GATEWAY_IMAGE (unset)
* JINA_GRPC_RECV_BYTES (unset)
* JINA_GRPC_SEND_BYTES (unset)
* JINA_HUB_NO_IMAGE_REBUILD (unset)
* JINA_LOG_CONFIG (unset)
* JINA_LOG_LEVEL (unset)
* JINA_LOG_NO_COLOR (unset)
* JINA_MP_START_METHOD (unset)
* JINA_OPTOUT_TELEMETRY (unset)
* JINA_RANDOM_PORT_MAX (unset)
* JINA_RANDOM_PORT_MIN (unset)

```

Congratulations! You have successfully installed Jina-serve on Apple Silicon.

````{tip}

To install MPS-enabled PyTorch, run:

```bash

/opt/homebrew/opt/python@3.10/libexec/bin/pip install -U --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu

```

````

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/get-started/install/docker.md

# Via Docker Image

Our universal Docker image is ready-to-use on linux/amd64 and linux/arm64. The Docker image name always starts with `jinaai/jina` followed by a tag composed of three parts:

```text
jinaai/jina:{version}{python_version}{extra}

```

* `{version}`: The version of Jina-serve. Possible values:
  * `latest`: the last release;
  * `master`: the master branch of `jina-ai/jina` repository;
  * `x.y.z`: the release of a particular version;
  * `x.y`: the alias to the last `x.y.z` patch release, i.e. `x.y` = `x.y.max(z)`;
* `{python_version}`: The Python version of the image. Possible values:
  * ` `, `-py37`: Python 3.7;
  * `-py38` for Python 3.8;
  * `-py39` for Python 3.9;
* `{extra}`: the extra dependency installed along with Jina-serve. Possible values:
  * ` `: Jina is installed inside the image with minimum dependencies `pip install jina`;
  * `-perf`: Jina is installed inside the image via `pip install jina`. It includes all performance dependencies;
  * `-standard`: Jina is installed inside the image via `pip install jina`. It includes all recommended dependencies;
  * `-devel`: Jina is installed inside the image via `pip install "jina[devel]"`. It includes `standard` plus some extra dependencies;

Examples:

* `jinaai/jina:0.9.6`: the `0.9.6` release with Python 3.7 and the entrypoint of `jina`.
* `jinaai/jina:latest`: the latest release with Python 3.7 and the entrypoint of `jina`
* `jinaai/jina:master`: the master with Python 3.7 and the entrypoint of `jina`

## Image alias and updates

| Event | Updated images | Aliases |
| --- | --- | --- |
| On Master Merge | `jinaai/jina:master{python_version}{extra}` | |
| On `x.y.z` release | `jinaai/jina:x.y.z{python_version}{extra}` | `jinaai/jina:latest{python_version}{extra}`, `jinaai/jina:x.y{python_version}{extra}` |

12 images are built, i.e. taking the combination of:

* `{python_version} = ["-py37", "-py38", "-py39"]`
* `{extra} = ["", "-devel", "-standard", "-perf"]`

## Image size on different tags

```{warning}
[Due to a known bug in shields.io/Docker Hub API](https://github.com/badges/shields/issues/7583), the following badge may show "invalid" status randomly.

```

|Image Size|
| ---|
|![](https://img.shields.io/docker/image-size/jinaai/jina/latest?label=jinaai%2Fjina%3Alatest&logo=docker)|
|![](https://img.shields.io/docker/image-size/jinaai/jina/latest-py38?label=jinaai%2Fjina%3Alatest-py38&logo=docker)|
|![](https://img.shields.io/docker/image-size/jinaai/jina/latest-py39?label=jinaai%2Fjina%3Alatest-py39&logo=docker)|
|![](https://img.shields.io/docker/image-size/jinaai/jina/latest-devel?label=jinaai%2Fjina%3Alatest-devel&logo=docker)|
|![](https://img.shields.io/docker/image-size/jinaai/jina/latest-perf?label=jinaai%2Fjina%3Alatest-perf&logo=docker)|
|![](https://img.shields.io/docker/image-size/jinaai/jina/latest-standard?label=jinaai%2Fjina%3Alatest-standard&logo=docker)|
|![](https://img.shields.io/docker/image-size/jinaai/jina/latest-py38-devel?label=jinaai%2Fjina%3Alatest-py38-devel&logo=docker)|
|![](https://img.shields.io/docker/image-size/jinaai/jina/latest-py38-perf?label=jinaai%2Fjina%3Alatest-py38-perf&logo=docker)|
|![](https://img.shields.io/docker/image-size/jinaai/jina/latest-py38-standard?label=jinaai%2Fjina%3Alatest-py38-standard&logo=docker)|
|![](https://img.shields.io/docker/image-size/jinaai/jina/latest-py39-devel?label=jinaai%2Fjina%3Alatest-py39-devel&logo=docker)|
|![](https://img.shields.io/docker/image-size/jinaai/jina/latest-py39-perf?label=jinaai%2Fjina%3Alatest-py39-perf&logo=docker)|
|![](https://img.shields.io/docker/image-size/jinaai/jina/latest-py39-standard?label=jinaai%2Fjina%3Alatest-py39-standard&logo=docker)|
|![](https://img.shields.io/docker/image-size/jinaai/jina/master?label=jinaai%2Fjina%3Amaster&logo=docker)|
|![](https://img.shields.io/docker/image-size/jinaai/jina/master-py38?label=jinaai%2Fjina%3Amaster-py38&logo=docker)|
|![](https://img.shields.io/docker/image-size/jinaai/jina/master-py39?label=jinaai%2Fjina%3Amaster-py39&logo=docker)|
|![](https://img.shields.io/docker/image-size/jinaai/jina/master-devel?label=jinaai%2Fjina%3Amaster-devel&logo=docker)|
|![](https://img.shields.io/docker/image-size/jinaai/jina/master-perf?label=jinaai%2Fjina%3Amaster-perf&logo=docker)|
|![](https://img.shields.io/docker/image-size/jinaai/jina/master-standard?label=jinaai%2Fjina%3Amaster-standard&logo=docker)|
|![](https://img.shields.io/docker/image-size/jinaai/jina/master-py38-devel?label=jinaai%2Fjina%3Amaster-py38-devel&logo=docker)|
|![](https://img.shields.io/docker/image-size/jinaai/jina/master-py38-perf?label=jinaai%2Fjina%3Amaster-py38-perf&logo=docker)|
|![](https://img.shields.io/docker/image-size/jinaai/jina/master-py38-standard?label=jinaai%2Fjina%3Amaster-py38-standard&logo=docker)|
|![](https://img.shields.io/docker/image-size/jinaai/jina/master-py39-devel?label=jinaai%2Fjina%3Amaster-py39-devel&logo=docker)|
|![](https://img.shields.io/docker/image-size/jinaai/jina/master-py39-perf?label=jinaai%2Fjina%3Amaster-py39-perf&logo=docker)|
|![](https://img.shields.io/docker/image-size/jinaai/jina/master-py39-standard?label=jinaai%2Fjina%3Amaster-py39-standard&logo=docker)|

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/get-started/install/index.md

```{include} ../install.md

```

```{toctree}
:hidden:

docker
apple-silicon-m1-m2
windows
troubleshooting

```

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/get-started/install/troubleshooting.md

# Troubleshooting

This article helps you to solve the installation problems of Jina-serve.

## On Linux/Mac, building wheels takes long time

The normal installation of Jina-serve takes 10 seconds. If yours takes longer than this, then it is likely you unnecessarily built wheels from scratch.

Every upstream dependency of Jina-serve has pre-built wheels exhaustively for x86/arm64, macos/Linux and Python 3.7/3.8/3.9, including `numpy`, `protobuf`, `grpcio` etc. This means when you install Jina-serve, your `pip` should directly leverage the pre-built wheels instead of building them from scratch locally. For example, you should expect the install log to contain `-cp38-cp38-macosx_10_15_x86_64.whl` when installing Jina-serve on macOS with Python 3.8.

If you find you are building wheels during installation (see an example below), then it is a sign that you are installing Jina-serve **wrongly**.

```text
Collecting numpy==2.0.*
  Downloading numpy-2.0.18.tar.gz (801 kB)
     |████████████████████████████████| 801 kB 1.1 MB/s
Building wheels for collected packages: numpy
  Building wheel for numpy (setup.py) ... done
  Created wheel for numpy ... numpy-2.0.18-cp38-cp38-macosx_10_15_x86_64.whl

```

### Solution: update your `pip`

It could simply be that your local `pip` is too old. Updating it should solve the problem:

```bash
pip install -U pip

```

### If not, then...

Then you are likely installing Jina-serve on a less-supported system/architecture. For example, on native Mac M1, Alpine Linux, or Raspberry Pi 2/3 (armv6/7).

## On Windows with `conda`

Unfortunately, `conda install` is not supported on Windows. You can either do `pip install jina` natively on Windows, or use `pip/conda install` under WSL2.

## Upgrading from Jina-serve 2.x to 3.x

If you upgraded an existing Jina-serve installation from 2.x to 3.x you may see the following error message:

```text
OSError: `docarray` dependency is not installed correctly, please reinstall with `pip install -U --force-reinstall docarray`

```

This can be fixed by reinstalling the `docarray` package manually:

```bash
pip install -U --force-reinstall docarray

```

To avoid this issue in the first place, we recommend installing Jina-serve in a new virtual environment instead of upgrading from an old installation.

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/get-started/install/windows.md

(jina-on-windows)=

# On Windows

You can install and use Jina-serve on Windows.

However, Jina-serve is built keeping *nix-based platforms in mind, and the upstream libraries that Jina-serve depends on also follow the similar ideology. Hence, there are some caveats when running Jina-serve on Windows. [If you face additional issues, please let us know.](https://github.com/jina-ai/jina/issues/)

```{caution}
There can be a significant performance impact while running Jina on Windows. You may not want to use it in production.

```

```{tip}
Alternatively, you can use the Windows Subsystem for Linux for better compatibility. Check the official guide [here](https://docs.microsoft.com/en-us/windows/wsl/install).
Make sure you install WSL**2**.
Once done, you can install Jina as on a native *nix platform.

```

## Known issues

### `multiprocessing spawn`

Jina-serve relies heavily on `multiprocessing` to enable scaling and distribution. Windows only supports [spawn start method for multiprocessing](https://docs.python.org/3/library/multiprocessing.html#the-spawn-and-forkserver-start-methods), which has a several caveats.

{ref}`Please follow the guidelines here.<multiprocessing-spawn>`

### Compatibility of Executors in the Hub

We've added preliminary support for using Executors listed in the Hub portal. Note that, these Executors are based on *nix OS and might not be compatible to run natively on Windows. Containers that are built on Windows are not yet supported.

```{seealso}
[Install Docker Desktop on Windows](https://docs.docker.com/desktop/windows/install/)

```

### `UnicodeEncodeError` on Jina-serve CLI

```jsx

```

Set environment variable `PYTHONIOENCODING='utf-8'` before starting your Python script.

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/get-started/install.md

(install)=

# {octicon}`desktop-download` Install

Jina-serve comes with multiple installation options, enabling different feature sets.
Standard install enables all major features of Jina-serve and is the recommended installation for most users.

````{tab} via PyPI

```shell

pip install -U jina

```

````

````{tab} via Conda

```shell

conda install jina -c conda-forge

```

````

````{tab} via Docker

```shell

docker run jinaai/jina:latest

```

````

## More install options

Version identifiers [are explained here](https://github.com/jina-ai/jina/blob/master/RELEASE.md).

### Minimum

Minimum install enables basic features of Jina-serve, but without support for HTTP, WebSocket, Docker and Hub.

Minimum install is often used when building and deploying an Executor.

````{tab} via PyPI

```shell

JINA_PIP_INSTALL_CORE=1 pip install jina

```

````

````{tab} via Conda

```shell

conda install jina-core -c conda-forge

```

````

````{tab} via Docker

```shell

docker run jinaai/jina:latest

```

````

### Minimum but more performant

Same as minimum install, but also install `uvloop` and `lz4`.

````{tab} via PyPI

```shell

JINA_PIP_INSTALL_PERF=1 pip install jina

```

````

````{tab} via Conda

```shell

conda install jina-perf -c conda-forge

```

````

````{tab} via Docker

```shell

docker run jinaai/jina:latest-perf

```

````

### Full development dependencies

This installs additional dependencies, useful for developing Jina-serve itself. This includes Pytest, CI components etc.

````{tab} via PyPI

```shell

pip install "jina[devel]"

```

````

````{tab} via Docker

```shell

docker run jinaai/jina:latest-devel

```

````

### Prerelease

Prerelease is the version always synced with the `master` branch of Jina-serve's GitHub repository.

````{tab} via PyPI

```shell

pip install --pre jina

```

````

````{tab} via Docker

```shell

docker run jinaai/jina:master

```

````

## Autocomplete commands on Bash, Zsh and Fish

After installing Jina via `pip`, you should be able to use your shell's autocomplete feature while using Jina's CLI. For example, typing `jina` then hitting your Tab key will provide the following suggestions:

```bash

jina

--help          --version       --version-full  check           client          flow            gateway         hello             pod             ping            deployment            hub

```

The autocomplete is context-aware. It also works when you type a second-level argument:

```bash

jina hub

--help  new     pull    push

```

Currently, the feature is enabled automatically on Bash, Zsh and Fish. It requires you to have a standard shell path as follows:

| Shell | Configuration file path      |
| ---   | ---                          |
| Bash  | `~/.bashrc`                  |
| Zsh   | `~/.zshrc`                   |
| Fish  | `~/.config/fish/config.fish` |

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/index.md

# Welcome to Jina-serve!

```{admonition} Survey
:class: tip

Take our **[user experience survey](https://10sw1tcpld4.typeform.com/to/EGAEReM7?utm_source=doc&utm_medium=github&utm_campaign=user%20experience&utm_term=feb2023&utm_content=survey)** to let us know your thoughts and help shape the future of Jina!

```

```{include} ../README.md
:start-after: <!-- start jina-description -->
:end-before: <!-- end jina-description -->

```

## Install

Make sure that you have Python 3.7+ installed on Linux/macOS/{ref}`Windows <jina-on-windows>`.

````{tab} via PyPI

```shell

pip install -U jina

```

````

````{tab} via Conda

```shell

conda install jina -c conda-forge

```

````

(build-ai-services)=
(build-a-pipeline)=

## Getting Started

Jina-serve supports developers in building AI services and pipelines:

````{tab} Build AI Services

```{include} ../README.md

:start-after: <!-- start build-ai-services -->
:end-before: <!-- end build-ai-services -->

```

````

````{tab} Build Pipelines

```{include} ../README.md

:start-after: <!-- start build-pipelines -->
:end-before: <!-- end build-pipelines -->

```

````

## Next steps

:::::{grid} 2
:gutter: 3

::::{grid-item-card} {octicon}`cross-reference;1.5em` Learn DocArray API
:link: https://docarray.docs.org

DocArray is the foundational data structure of Jina. Before starting Jina, first learn DocArray to quickly build a PoC.
::::

::::{grid-item-card} {octicon}`gear;1.5em` Learn Executor
:link: concepts/serving/executor/index
:link-type: doc

{term}`Executor` is a Python class that can serve logic using `Documents`.

::::

::::{grid-item-card} {octicon}`workflow;1.5em` Learn Deployment
:link: concepts/orchestration/deployment
:link-type: doc

{term}`Deployment` serves an Executor as a scalable service making it available to receive `Documents` using `gRPC` or `HTTP`.
::::

::::{grid-item-card} {octicon}`workflow;1.5em` Learn Flow
:link: concepts/orchestration/flow
:link-type: doc

{term}`Flow` orchestrates Executors using different Deployments into a processing pipeline to accomplish a task.
::::

::::{grid-item-card} {octicon}`cross-reference;1.5em` Learn Gateway
:link: concepts/serving/gateway/index

The Gateway is a microservice that serves as the entrypoint of a {term}`Flow`. It exposes multiple protocols for external communications and routes all internal traffic.
::::

::::{grid-item-card} {octicon}`package-dependents;1.5em` Explore Executor Hub
:link: concepts/executor/hub/index
:link-type: doc
:class-card: color-gradient-card-1

Executor Hub allows you to containerize, share, explore and make Executors ready for the cloud.

::::

::::{grid-item-card} {octicon}`cpu;1.5em` Deploy a Flow to Cloud
:link: concepts/jcloud/index
:link-type: doc
:class-card: color-gradient-card-2

Jina AI Cloud is the MLOps platform for hosting Jina-serve projects.
::::

:::::

```{include} ../README.md
:start-after: <!-- start support-pitch -->
:end-before: <!-- end support-pitch -->

```

```{toctree}
:caption: Get Started
:hidden:

get-started/install/index
get-started/create-app

```

```{toctree}
:caption: Concepts
:hidden:

concepts/preliminaries/index
concepts/serving/index
concepts/orchestration/index
concepts/client/index

```

```{toctree}
:caption: Cloud Native
:hidden:

cloud-nativeness/k8s
cloud-nativeness/docker-compose
cloud-nativeness/opentelemetry
jina-ai-cloud/index

```

```{toctree}
:caption: Developer Reference
:hidden:
:maxdepth: 1

api-rst
cli/index
yaml-spec
envs/index
telemetry
proto/docs
docarray-support

```

```{toctree}
:caption: Tutorials
:hidden:

tutorials/deploy-model
tutorials/gpu-executor
tutorials/deploy-pipeline
tutorials/llm-serve

```

```{toctree}
:caption: Legacy Support
:hidden:
:maxdepth: 1

Jina 2 Documentation <https://docs2.jina.ai/>

```

---
{ref}`genindex` | {ref}`modindex`

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/jina-ai-cloud/index.md

# {octicon}`beaker` Jina AI Cloud

:::::{grid} 2
:gutter: 3

::::{grid-item-card} {octicon}`package-dependents;1.5em` Explore Executor Hub
:link: ../concepts/serving/executor/hub/index
:link-type: doc

Executor Hub is an Executor marketplace that allows you to share, explore and test Executors.

::::

::::{grid-item-card} {octicon}`cpu;1.5em` Deploy a Flow to JCloud
:link: ../concepts/jcloud/index
:link-type: doc

JCloud is a cost-efficient hosting platform specifically designed for Jina-serve projects.
::::

:::::

Jina AI Cloud is the **portal** and **single entrypoint** to manage **all** your Jina AI resources, including:

* Data
* [docarray](https://docs.docarray.org/user_guide/storing/doc_store/store_jac/)
* [Finetuner artifacts](https://finetuner.jina.ai/walkthrough/save-model/#save-artifact)
* [Executors](../concepts/serving/executor/index.md)
* [Flows](../concepts/orchestration/flow.md)
* [Apps](https://now.jina.ai)

_Manage_ in this context means: CRUD, access control, personal access tokens, and subscription.

```{tip}
Are you ready to unlock the power of AI with Jina AI Cloud? Take a look at our [pricing options](https://cloud.jina.ai/pricing) now!

```

```{toctree}
:hidden:

login
../concepts/serving/executor/hub/index
../concepts/jcloud/index

```

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/jina-ai-cloud/login.md

# Login & Token Management

To use Jina AI Cloud, you need to log in, either via a GitHub or Google account. This section describes how to log in Jina AI Cloud and manage the personal access token. You can do it via webpage, CLI or Python API.

## via Webpage

Visit [https://jina.ai](https://jina.ai) and click on the "login" button.

### Login

```{figure} login-1.png

```

After log in you can see your name and avatar in the top-right corner.

```{figure} login-2.png

```

### Token Management

You can follow the GUI to create/delete personal access tokens for your Jina-serve applications.

```{figure} pat.png

```

To use a token, set it as the environment variable `JINA_AUTH_TOKEN`.

## via CLI

### Login (2)

```shell
jina auth login

```

This will open browser automatically and login via 3rd party. Token will be saved locally.

### Logout

If there is a valid token locally, this will disable that token and remove it from local config.

```shell
jina auth logout

```

### Token Management (2)

#### Create a new PAT

```shell
jina auth token create <name of PAT> -e <expiration days>

```

To use a token, set it as the environment variable `JINA_AUTH_TOKEN`.

#### List PATs

```shell
jina auth token list

```

#### Delete PAT

```shell
jina auth token delete <name of PAT>

```

## via Python API

Installed along with Jina-serve, you can leverage the `hubble` package to manage login from Python

### Login (3)

```python
import hubble

# Log in via browser or PAT. The token is saved locally.
# In Jupyter/Google Colab, interactive login is used automatically.
# To disable this feature, run `hubble.login(interactive=False)`.

hubble.login()

```

### Check login status

```python
import hubble

if hubble.is_logged_in():
    print('yeah')
else:
    print('no')

```

### Get a personal access token

Notice that the token you got from this function is always valid. If the token is invalid or expired, the result is `None`.

```python
import hubble

hubble.get_token()

```

If you are using inside an interactive environment, i.e. user can input via stdin:

```python
import hubble

hubble.get_token(interactive=True)

```

Mark a function as login required,

```python
import hubble

@hubble.login_required
def foo():
    pass

```

### Logout (2)

```python
import hubble

# If there is a valid token locally,
# this will disable that token and remove it from local config.

hubble.logout()

```

### Token management

After calling `hubble.login()`, you can use the client:

```python
import hubble

client = hubble.Client(max_retries=None, jsonify=True)

# Get current user information.

response = client.get_user_info()

# Create a new personal access token for longer expiration period.

response = client.create_personal_access_token(name='my-pat', expiration_days=30)

# Query all personal access tokens.

response = client.list_personal_access_tokens()

```

### Artifact management

```python
import hubble
import io

client = hubble.Client(max_retries=None, jsonify=True)

# Upload artifact to Hubble Artifact Storage by providing path.

response = client.upload_artifact(f='~/Documents/my-model.onnx', is_public=False)

# Upload artifact to Hubble Artifact Storage by providing `io.BytesIO`

response = client.upload_artifact(
    f=io.BytesIO(b"some initial binary data: \x00\x01"), is_public=False
)

# Get current artifact information.

response = client.get_artifact_info(id='my-artifact-id')

# Download artifact to local directory.

response = client.download_artifact(id='my-artifact-id', f='my-local-filepath')

# Download artifact as an io.BytesIO object

response = client.download_artifact(id='my-artifact-id', f=io.BytesIO())

# Get list of artifacts.

response = client.list_artifacts(filter={'metaData.foo': 'bar'}, sort={'type': -1})

# Delete the artifact.

response = client.delete_artifact(id='my-artifact-id')

```

### Error handling

```python
import hubble

client = hubble.Client()

try:
    client.get_user_info()
except hubble.excepts.AuthenticationRequiredError:
    print('Please login first.')
except Exception:
    print('Unknown error')

```

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/proto/docs.md

# Protocol Documentation

<a name="top"></a>

## Table of Contents

* [docarray.proto](#docarray-proto)
  * [DocumentArrayProto](#docarray-DocumentArrayProto)

* [jina.proto](#jina-proto)
  * [DataRequestListProto](#jina-DataRequestListProto)
  * [DataRequestProto](#jina-DataRequestProto)
  * [DataRequestProto.DataContentProto](#jina-DataRequestProto-DataContentProto)
  * [DataRequestProtoWoData](#jina-DataRequestProtoWoData)
  * [EndpointsProto](#jina-EndpointsProto)
  * [HeaderProto](#jina-HeaderProto)
  * [JinaInfoProto](#jina-JinaInfoProto)
  * [JinaInfoProto.EnvsEntry](#jina-JinaInfoProto-EnvsEntry)
  * [JinaInfoProto.JinaEntry](#jina-JinaInfoProto-JinaEntry)
  * [RelatedEntity](#jina-RelatedEntity)
  * [RouteProto](#jina-RouteProto)
  * [StatusProto](#jina-StatusProto)
  * [StatusProto.ExceptionProto](#jina-StatusProto-ExceptionProto)

  * [StatusProto.StatusCode](#jina-StatusProto-StatusCode)

  * [JinaDataRequestRPC](#jina-JinaDataRequestRPC)
  * [JinaDiscoverEndpointsRPC](#jina-JinaDiscoverEndpointsRPC)
  * [JinaGatewayDryRunRPC](#jina-JinaGatewayDryRunRPC)
  * [JinaInfoRPC](#jina-JinaInfoRPC)
  * [JinaRPC](#jina-JinaRPC)
  * [JinaSingleDataRequestRPC](#jina-JinaSingleDataRequestRPC)

* [Scalar Value Types](#scalar-value-types)

<a name="docarray-proto"></a>
<p align="right"><a href="#top">Top</a></p>

## docarray.proto

<a name="docarray-DocumentArrayProto"></a>

### DocumentArrayProto

this file is just a placeholder for the DA coming from jina._docarray dependency

<a name="jina-proto"></a>
<p align="right"><a href="#top">Top</a></p>

## jina.proto

<a name="jina-DataRequestListProto"></a>

### DataRequestListProto

Represents a list of data requests
This should be replaced by streaming

| Field | Type | Label | Description |
| ----- | ---- | ----- | ----------- |
| requests | [DataRequestProto](#jina-DataRequestProto) | repeated | requests in this list |

<a name="jina-DataRequestProto"></a>

### DataRequestProto

Represents a DataRequest

| Field | Type | Label | Description |
| ----- | ---- | ----- | ----------- |
| header | [HeaderProto](#jina-HeaderProto) |  | header contains meta info defined by the user |
| parameters | [google.protobuf.Struct](#google-protobuf-Struct) |  | extra kwargs that will be used in executor |
| routes | [RouteProto](#jina-RouteProto) | repeated | status info on every routes |
| data | [DataRequestProto.DataContentProto](#jina-DataRequestProto-DataContentProto) |  | container for docs and groundtruths |

<a name="jina-DataRequestProto-DataContentProto"></a>

### DataRequestProto.DataContentProto

| Field | Type | Label | Description |
| ----- | ---- | ----- | ----------- |
| docs | [docarray.DocumentArrayProto](#docarray-DocumentArrayProto) |  | the docs in this request |
| docs_bytes | [bytes](#bytes) |  | the docs in this request as bytes |

<a name="jina-DataRequestProtoWoData"></a>

### DataRequestProtoWoData

| Field | Type | Label | Description |
| ----- | ---- | ----- | ----------- |
| header | [HeaderProto](#jina-HeaderProto) |  | header contains meta info defined by the user |
| parameters | [google.protobuf.Struct](#google-protobuf-Struct) |  | extra kwargs that will be used in executor |
| routes | [RouteProto](#jina-RouteProto) | repeated | status info on every routes |

<a name="jina-EndpointsProto"></a>

### EndpointsProto

Represents the set of Endpoints exposed by an Executor

| Field | Type | Label | Description |
| ----- | ---- | ----- | ----------- |
| endpoints | [string](#string) | repeated | list of endpoints exposed by an Executor |

<a name="jina-HeaderProto"></a>

### HeaderProto

Represents a Header.

* The header&#39;s content will be defined by the user request.
* It will be copied to the envelope.header
* In-flow operations will modify the envelope.header
* While returning, copy envelope.header back to request.header

| Field | Type | Label | Description |
| ----- | ---- | ----- | ----------- |
| request_id | [string](#string) |  | the unique ID of this request. Multiple requests with the same ID will be gathered |
| status | [StatusProto](#jina-StatusProto) |  | status info |
| exec_endpoint | [string](#string) | optional | the endpoint specified by `@requests(on=&#39;/abc&#39;)` |
| target_executor | [string](#string) | optional | if set, the request is targeted to certain executor, regex strings |
| timeout | [uint32](#uint32) | optional | epoch time in seconds after which the request should be dropped |

<a name="jina-JinaInfoProto"></a>

### JinaInfoProto

| Field | Type | Label | Description |
| ----- | ---- | ----- | ----------- |
| jina | [JinaInfoProto.JinaEntry](#jina-JinaInfoProto-JinaEntry) | repeated | information about the system running and package version information including jina |
| envs | [JinaInfoProto.EnvsEntry](#jina-JinaInfoProto-EnvsEntry) | repeated | the environment variable setting |

<a name="jina-JinaInfoProto-EnvsEntry"></a>

### JinaInfoProto.EnvsEntry

| Field | Type | Label | Description |
| ----- | ---- | ----- | ----------- |
| key | [string](#string) |  |  |
| value | [string](#string) |  |  |

<a name="jina-JinaInfoProto-JinaEntry"></a>

### JinaInfoProto.JinaEntry

| Field | Type | Label | Description |
| ----- | ---- | ----- | ----------- |
| key | [string](#string) |  |  |
| value | [string](#string) |  |  |

<a name="jina-RelatedEntity"></a>

### RelatedEntity

Represents an entity (like an ExecutorRuntime)

| Field | Type | Label | Description |
| ----- | ---- | ----- | ----------- |
| id | [string](#string) |  | unique id of the entity, like the name of a pod |
| address | [string](#string) |  | address of the entity, could be an IP address, domain name etc, does not include port |
| port | [uint32](#uint32) |  | port this entity is listening on |
| shard_id | [uint32](#uint32) | optional | the id of the shard it belongs to, if it is a shard |

<a name="jina-RouteProto"></a>

### RouteProto

Represents a the route paths of this message as perceived by the Gateway
start_time is set when the Gateway sends a message to a Pod
end_time is set when the Gateway receives a message from a Pod
thus end_time - start_time includes Executor computation, runtime overhead, serialization and network

| Field | Type | Label | Description |
| ----- | ---- | ----- | ----------- |
| executor | [string](#string) |  | the name of the BasePod |
| start_time | [google.protobuf.Timestamp](#google-protobuf-Timestamp) |  | time when the Gateway starts sending to the Pod |
| end_time | [google.protobuf.Timestamp](#google-protobuf-Timestamp) |  | time when the Gateway received it from the Pod |
| status | [StatusProto](#jina-StatusProto) |  | the status of the execution |

<a name="jina-StatusProto"></a>

### StatusProto

Represents a Status

| Field | Type | Label | Description |
| ----- | ---- | ----- | ----------- |
| code | [StatusProto.StatusCode](#jina-StatusProto-StatusCode) |  | status code |
| description | [string](#string) |  | error description of the very first exception |
| exception | [StatusProto.ExceptionProto](#jina-StatusProto-ExceptionProto) |  | the details of the error |

<a name="jina-StatusProto-ExceptionProto"></a>

### StatusProto.ExceptionProto

| Field | Type | Label | Description |
| ----- | ---- | ----- | ----------- |
| name | [string](#string) |  | the class name of the exception |
| args | [string](#string) | repeated | the list of arguments given to the exception constructor. |
| stacks | [string](#string) | repeated | the exception traceback stacks |
| executor | [string](#string) |  | the name of the executor bind to that Executor (if applicable) |

<a name="jina-StatusProto-StatusCode"></a>

### StatusProto.StatusCode

| Name | Number | Description |
| ---- | ------ | ----------- |
| SUCCESS | 0 | success |
| ERROR | 1 | error |

<a name="jina-JinaDataRequestRPC"></a>

### JinaDataRequestRPC

jina gRPC service for DataRequests.

| Method Name | Request Type | Response Type | Description |
| ----------- | ------------ | ------------- | ------------|
| process_data | [DataRequestListProto](#jina-DataRequestListProto) | [DataRequestProto](#jina-DataRequestProto) | Used for passing DataRequests to the Executors |

<a name="jina-JinaDiscoverEndpointsRPC"></a>

### JinaDiscoverEndpointsRPC

jina gRPC service to expose Endpoints from Executors.

| Method Name | Request Type | Response Type | Description |
| ----------- | ------------ | ------------- | ------------|
| endpoint_discovery | [.google.protobuf.Empty](#google-protobuf-Empty) | [EndpointsProto](#jina-EndpointsProto) |  |

<a name="jina-JinaGatewayDryRunRPC"></a>

### JinaGatewayDryRunRPC

jina gRPC service to expose Endpoints from Executors.

| Method Name | Request Type | Response Type | Description |
| ----------- | ------------ | ------------- | ------------|
| dry_run | [.google.protobuf.Empty](#google-protobuf-Empty) | [StatusProto](#jina-StatusProto) |  |

<a name="jina-JinaInfoRPC"></a>

### JinaInfoRPC

jina gRPC service to expose information about running jina version and environment.

| Method Name | Request Type | Response Type | Description |
| ----------- | ------------ | ------------- | ------------|
| _status | [.google.protobuf.Empty](#google-protobuf-Empty) | [JinaInfoProto](#jina-JinaInfoProto) |  |

<a name="jina-JinaRPC"></a>

### JinaRPC

jina streaming gRPC service.

| Method Name | Request Type | Response Type | Description |
| ----------- | ------------ | ------------- | ------------|
| Call | [DataRequestProto](#jina-DataRequestProto) stream | [DataRequestProto](#jina-DataRequestProto) stream | Pass in a Request and a filled Request with matches will be returned. |

<a name="jina-JinaSingleDataRequestRPC"></a>

### JinaSingleDataRequestRPC

jina gRPC service for DataRequests.
This is used to send requests to Executors when a list of requests is not needed

| Method Name | Request Type | Response Type | Description |
| ----------- | ------------ | ------------- | ------------|
| process_single_data | [DataRequestProto](#jina-DataRequestProto) | [DataRequestProto](#jina-DataRequestProto) | Used for passing DataRequests to the Executors |

## Scalar Value Types

| .proto Type | Notes | C++ | Java | Python | Go | C# | PHP | Ruby |
| ----------- | ----- | --- | ---- | ------ | -- | -- | --- | ---- |
| <a name="double" /> double |  | double | double | float | float64 | double | float | Float |
| <a name="float" /> float |  | float | float | float | float32 | float | float | Float |
| <a name="int32" /> int32 | Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead. | int32 | int | int | int32 | int | integer | Bignum or Fixnum (as required) |
| <a name="int64" /> int64 | Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead. | int64 | long | int/long | int64 | long | integer/string | Bignum |
| <a name="uint32" /> uint32 | Uses variable-length encoding. | uint32 | int | int/long | uint32 | uint | integer | Bignum or Fixnum (as required) |
| <a name="uint64" /> uint64 | Uses variable-length encoding. | uint64 | long | int/long | uint64 | ulong | integer/string | Bignum or Fixnum (as required) |
| <a name="sint32" /> sint32 | Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s. | int32 | int | int | int32 | int | integer | Bignum or Fixnum (as required) |
| <a name="sint64" /> sint64 | Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s. | int64 | long | int/long | int64 | long | integer/string | Bignum |
| <a name="fixed32" /> fixed32 | Always four bytes. More efficient than uint32 if values are often greater than 2^28. | uint32 | int | int | uint32 | uint | integer | Bignum or Fixnum (as required) |
| <a name="fixed64" /> fixed64 | Always eight bytes. More efficient than uint64 if values are often greater than 2^56. | uint64 | long | int/long | uint64 | ulong | integer/string | Bignum |
| <a name="sfixed32" /> sfixed32 | Always four bytes. | int32 | int | int | int32 | int | integer | Bignum or Fixnum (as required) |
| <a name="sfixed64" /> sfixed64 | Always eight bytes. | int64 | long | int/long | int64 | long | integer/string | Bignum |
| <a name="bool" /> bool |  | bool | boolean | boolean | bool | bool | boolean | TrueClass/FalseClass |
| <a name="string" /> string | A string must always contain UTF-8 encoded or 7-bit ASCII text. | string | String | str/unicode | string | string | string | String (UTF-8) |
| <a name="bytes" /> bytes | May contain any arbitrary sequence of bytes. | string | ByteString | str | []byte | ByteString | string | String (ASCII-8BIT) |

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/proto/index.md

To update `jina` Protobuf:

```bash
docker run -v $(pwd)/jina/:/jina/ jinaai/protogen
```

```{include} docs.md
```

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/telemetry.md

# {fas}`tower-cell` Telemetry

```{warning}
To opt out from telemetry, set the `JINA_OPTOUT_TELEMETRY=1` as an environment variable.

```

Telemetry is the process of collecting data about the usage of a system. This data can be used to improve the system by understanding how it is being used and what areas need improvement.

Jina AI uses telemetry to collect data about how Jina-serve is being used. This data is then used to improve the software. For example, if we see that a lot of users are having trouble with a certain feature, we can improve that feature to make it easier to use.

Telemetry is important for Jina-serve because it allows the team to understand how the software is being used and what areas need improvement. Without telemetry, Jina-serve would not be able to improve as quickly or as effectively.

The data collected include:

* Jina-serve and its dependencies versions;
* A hashed unique user identifier;
* A hashed unique session identifier;
* Boolean events: start of a Flow, Gateway, Runtime, Client.

## Example payload

Here is an example payload when running the following code:

```python
from jina import Flow

with Flow().add() as f:
    pass

```

```python
{
    'architecture': 'x86_64',
    'ci-vendor': '(unset)',
    'docarray': '0.15.2',
    'event': 'WorkerRuntime.start',
    'grpcio': '1.46.3',
    'jina': '3.7.13',
    'jina-proto': '0.1.13',
    'platform': 'Darwin',
    'platform-release': '21.6.0',
    'platform-version': 'Darwin Kernel Version 21.6.0: Sat Jun 18 17:07:28 PDT '
    '2022; root:xnu-8020.140.41~1/RELEASE_ARM64_T8110',
    'processor': 'i386',
    'proto-backend': 'cpp',
    'protobuf': '3.20.1',
    'python': '3.7.9',
    'pyyaml': '6.0',
    'session-id': 'da9d4ade-2171-11ed-8713-56286d1a91c2',
    'uid': 94731629138370,
    'uptime': '2022-08-21T18:53:59.681842',
}
{
    'architecture': 'x86_64',
    'ci-vendor': '(unset)',
    'docarray': '0.15.2',
    'event': 'GRPCGatewayRuntime.start',
    'grpcio': '1.46.3',
    'jina': '3.7.13',
    'jina-proto': '0.1.13',
    'platform': 'Darwin',
    'platform-release': '21.6.0',
    'platform-version': 'Darwin Kernel Version 21.6.0: Sat Jun 18 17:07:28 PDT '
    '2022; root:xnu-8020.140.41~1/RELEASE_ARM64_T8110',
    'processor': 'i386',
    'proto-backend': 'cpp',
    'protobuf': '3.20.1',
    'python': '3.7.9',
    'pyyaml': '6.0',
    'session-id': 'da9fc390-2171-11ed-8713-56286d1a91c2',
    'uid': 94731629138370,
    'uptime': '2022-08-21T18:53:59.681842',
}
{
    'architecture': 'x86_64',
    'ci-vendor': '(unset)',
    'docarray': '0.15.2',
    'event': 'BaseExecutor.start',
    'grpcio': '1.46.3',
    'jina': '3.7.13',
    'jina-proto': '0.1.13',
    'platform': 'Darwin',
    'platform-release': '21.6.0',
    'platform-version': 'Darwin Kernel Version 21.6.0: Sat Jun 18 17:07:28 PDT '
    '2022; root:xnu-8020.140.41~1/RELEASE_ARM64_T8110',
    'processor': 'i386',
    'proto-backend': 'cpp',
    'protobuf': '3.20.1',
    'python': '3.7.9',
    'pyyaml': '6.0',
    'session-id': 'daa02f1a-2171-11ed-8713-56286d1a91c2',
    'uid': 94731629138370,
    'uptime': '2022-08-21T18:53:59.681842',
}
{
    'architecture': 'x86_64',
    'ci-vendor': '(unset)',
    'docarray': '0.15.2',
    'event': 'Flow.start',
    'grpcio': '1.46.3',
    'jina': '3.7.13',
    'jina-proto': '0.1.13',
    'platform': 'Darwin',
    'platform-release': '21.6.0',
    'platform-version': 'Darwin Kernel Version 21.6.0: Sat Jun 18 17:07:28 PDT '
    '2022; root:xnu-8020.140.41~1/RELEASE_ARM64_T8110',
    'processor': 'i386',
    'proto-backend': 'cpp',
    'protobuf': '3.20.1',
    'python': '3.7.9',
    'pyyaml': '6.0',
    'session-id': 'db4c0092-2171-11ed-8713-56286d1a91c2',
    'uid': 94731629138370,
    'uptime': '2022-08-21T18:53:59.681842',
}

```

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/tutorials/before-you-start.md

(before-start)=

# Before you start

Before you jump in to any tutorials, we recommend you do the following:

## Understand Jina-serve

Read through an {ref}`introduction to Jina concepts <architecture-overview>` to understand the basic components that will be used in the tutorial.

## Work in a virtual environment

We highly recommend you work in [a virtual environment](https://docs.python.org/3/library/venv.html) to prevent conflicts in packaging versions. This applies not just to Jina-serve, but Python as a whole.

## Install Jina-serve

For most purposes, you can install Jina-serve with:

```shell
pip install jina

```

For more installation options, see {ref}`our installation guide <install>`.

## Python vs YAML

Jina-serve supports YAML in many circumstances for easier deployment. For more information, see our {ref}`guide on coding in Python and YAML in Jina <python-yaml>`.

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/tutorials/deploy-model.md

# Deploy a model

```{admonition} Before you start...
:class: note

Please check our {ref}`"Before you start" guide<before-start>` to go over a few preliminary topics.

```

```{admonition} This tutorial was written for Jina 3.14
:class: warning

It will *probably* still work for later versions. If you have trouble, please ask on [our Discord](https://discord.jina.ai).

```

## Introduction

In this tutorial we'll build a fast, reliable and scalable gRPC-based AI service. In Jina-serve we call this an {class}`~jina.Executor`. Our Executor will use [Stable Diffusion](https://stability.ai/blog/stable-diffusion-public-release) to generate images from a given text prompt. We'll then use a {class}`~jina.Deployment` to serve it.

![](images/deployment.png)

```{admonition} Note
:class: note

A Deployment serves just one Executor. To use multiple Executors, read our {ref}`tutorial on building a pipeline<build-a-pipeline>`.

```

```{admonition} Run this tutorial in a notebook
:class: tip

You can also run this code interactively in [Colab](https://colab.research.google.com/github/jina-ai/jina/blob/master/.github/getting-started/notebook.ipynb#scrollTo=0l-lkmz4H-jW).

```

## Understand: Executors and Deployments

* All data that goes into and out of Jina-serve is in the form of [Documents](https://docs.docarray.org/user_guide/representing/first_step/) inside a [DocList](https://docs.docarray.org/user_guide/representing/array/) from the [DocArray](https://docs.docarray.org/) package.
* An {ref}`Executor <executor-cookbook>` is a self-contained gRPC microservice that performs a task on Documents. This could be very simple (like merely capitalizing the entire text of a Document) or a lot more complex (like generating vector embeddings for a given piece of content).
* A {ref}`Deployment <deployment>` lets you serve your Executor, scale it up with replicas, and allow users to send and receive requests.

When you build a model or service in Jina-serve, it's always in the form of an Executor. An Executor is a Python class that transforms and processes Documents, and can go way beyond image generation, for example, encoding text/images into vectors, OCR, extracting tables from PDFs, or lots more.

## Install prerequisites

In this example we need to install:

* The [Jina-serve framework](https://jina.ai/serve/) itself
* The dependencies of the specific model we want to serve and deploy

```shell
pip install jina
pip install diffusers

```

## Executor: Implement logic

Let's implement the service's logic in `text_to_image.py`. Don't worry too much about understanding this code right now -- we'll go through it below!

```python
import numpy as np
from jina import Executor, requests
from docarray import BaseDoc, DocList
from docarray.documents import ImageDoc

class ImagePrompt(BaseDoc):
    text: str

class TextToImage(Executor):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        from diffusers import StableDiffusionPipeline
        import torch
        self.pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", torch_dtype=torch.float16).to("cuda")

    @requests
    def generate_image(self, docs: DocList[ImagePrompt], **kwargs) -> DocList[ImageDoc]:
        images = self.pipe(docs.text).images  # image here is in [PIL format](https://pillow.readthedocs.io/en/stable/)
        for i, doc in enumerate(docs):
            doc.tensor = np.array(images[i])

```

### Imports

```python
from docarray import DocList, BaseDoc

```

[Documents](https://docs.docarray.org/user_guide/representing/first_step/) and [DocList](https://docs.docarray.org/user_guide/representing/array/) (from the DocArray package) are Jina-serve's native IO format.

```python
from jina import Executor, requests

```

Jina-serve's Executor class and requests decorator - we'll jump into these in the next section.

```python
import numpy as np

```

In our case, [NumPy](https://numpy.org/) is specific to this Executor only. We won't really cover it in this article, since we want to keep this as a general overview. (And there’s plenty of information about NumPy out there already).

### Document types

We then import or create the data types on which our Executor will work. In this case, it will get `ImagePrompt` documents and will output `ImageDoc` documents.

```python
from docarray import BaseDoc
from docarray.documents import ImageDoc

class ImagePrompt(BaseDoc):
    text: str

```

### Executor class

```python
class TextToImage(Executor):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        import torch
        from diffusers import StableDiffusionPipeline

        self.pipe = StableDiffusionPipeline.from_pretrained(
            "CompVis/stable-diffusion-v1-4", torch_dtype=torch.float16
        ).to("cuda")

```

All Executors are created from Jina-serve's Executor class. User-definable parameters (like `self.pipe`) are {ref}`arguments <executor-constructor> defined in the `__init__()` method.

### Requests decorator

```python
@requests
def generate_image(self, docs: DocList[ImagePrompt], **kwargs) -> DocList[ImageDoc]:
    images = self.pipe(docs.text).images  # image here is in [PIL format](https://pillow.readthedocs.io/en/stable/)
    for i, doc in enumerate(docs):
        doc.tensor = np.array(images[i])

```

Any Executor methods decorated with `@requests` can be called via an {ref}`endpoint <exec-endpoint>` when the Executor is run or deployed. Since we're using a bare `@requests` (rather than say `@requests(on='/foo')`), the `generate_image()` method will be called as the default fallback handler for any endpoint.

## Deployment: Deploy the Executor

With a Deployment you can run and scale up your Executor, adding sharding, replicas and dynamic batching.

![](images/deployment.png)

We can deploy our Executor with either the Python API or YAML:

````{tab} Python
In `deployment.py`:

```python

from jina import Deployment

dep = Deployment(uses=TextToImage, timeout_ready=-1)

with dep:
  dep.block()

```

And then run `python deployment.py` from the CLI.

````

````{tab} YAML
In `deployment.yaml`:

```yaml

jtype: Deployment
with:
  uses: TextToImage
  py_modules:

  * text_to_image.py # name of the module containing your Executor

  timeout_ready: -1

```

And run the YAML Deployment with the CLI: `jina deployment --uses deployment.yml`

````

You'll then see the following output:

```text
──────────────────────────────────────── 🎉 Deployment is ready to serve! ─────────────────────────────────────────
╭────────────── 🔗 Endpoint ───────────────╮
│  ⛓      Protocol                   GRPC │
│  🏠        Local          0.0.0.0:12345  │
│  🔒      Private      172.28.0.12:12345  │
│  🌍       Public    35.230.97.208:12345  │
╰──────────────────────────────────────────╯

```

```{admonition} Running in a notebook
In a notebook, you can't use `deployment.block()` and then make requests with the client. Please refer to the Colab link above for reproducible Jupyter Notebook code snippets.

```

## Client: Send and receive requests to your service

Use {class}`~jina.Client` to make requests to the service. As before, we use Documents as our basic IO format. We'll use the text prompt `rainbow unicorn butterfly kitten`:

```python
from jina import Client
from docarray import BaseDoc, DocList
from docarray.documents import ImageDoc

class ImagePrompt(BaseDoc):
    text: str

image_text = ImagePrompt(text='rainbow unicorn butterfly kitten')

client = Client(port=12345)  # use port from output above
response = client.post(on='/', inputs=DocList[ImagePrompt]([image_prompt]), return_type=DocList[ImageDoc])

response[0].display()

```

In a different terminal to your Deployment, run `python client.py` to generate an image from the `rainbow unicorn butterfly kitten` text prompt:

![](images/rainbow_kitten.png)

## Scale up the microservice

```{admonition} Python vs YAML
:class: info

For the rest of this tutorial we'll stick to using {ref}`YAML <yaml-spec>`. This separates our code from our Deployment logic.

```

Jina comes with scalability features out of the box like replicas, shards and dynamic batching. This lets you easily increase your application's throughput.

Let's edit our Deployment and scale it with {ref}`replicas <replicate-executors>` and {ref}`dynamic batching <executor-dynamic-batching>` to:

* Create two replicas, with a {ref}`GPU <gpu-executor>` assigned for each.
* Enable dynamic batching to process incoming parallel requests to the same model.

![](images/replicas.png)

Here's the updated YAML:

```{code-block} yaml
---
emphasize-lines: 6-12
---
jtype: Deployment
with:
  timeout_ready: -1
  uses: jinaai://jina-ai/TextToImage
  env:
   CUDA_VISIBLE_DEVICES: RR
  replicas: 2
  uses_dynamic_batching: # configure dynamic batching
    /default:
      preferred_batch_size: 10
      timeout: 200

```

As you can see, we've added GPU support (via `CUDA_VISIBLE_DEVICES`), two replicas (each assigned a GPU) and dynamic batching, which allows requests to be accumulated and batched together before being sent to the Executor.

Assuming your machine has two GPUs, using the scaled Deployment YAML will give better throughput compared to the normal deployment.

Thanks to the YAML syntax, you can inject deployment configurations regardless of Executor code. Of course, all of this is possible via the Python API too.

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/tutorials/gpu-executor.md

(gpu-executor)=

# Build a GPU Executor

This document shows you how to use an {class}`~jina.Executor` on a GPU, both locally and in a
Docker container. You will also learn how to use a GPU with pre-built Hub executors.

Using a GPU significantly speeds up encoding for most deep learning models,
reducing response latency by anything from 5 to 100 times, depending on the model and inputs used.

```{admonition} Important
:class: caution

This tutorial assumes familiarity with basic Jina concepts, such as Document, [Executor](../concepts/executor/index), and [Deployment](../concepts/deployment/index). Some knowledge of [Executor Hub](../concepts/executor/hub/index) is also needed for the last part of the tutorial.

```

## Jina-serve and GPUs in a nutshell

For a thorough walkthrough of using GPU resources in your code, check the full tutorial in the {ref}`next section <gpu-prerequisites>`.

If you already know how to use your GPU, just proceed like you usually would in your machine learning framework of choice.
Jina-serve lets you use GPUs like you would in a Python script or Docker
container, without imposing additional requirements or configuration.

Here's a minimal working example, written in PyTorch:

```python
import torch
from typing import Optional
from docarray import DocList, BaseDoc
from docarray.typing import AnyTensor
from jina import Executor, requests

class MyDoc(BaseDoc):
    text: str = ''
    embedding: Optional[AnyTensor[5]] = None

class MyGPUExec(Executor):
    def __init__(self, device: str = 'cpu', *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.device = device

    @requests
    def encode(self, docs: DocList[MyDoc], **kwargs) -> DocList[MyDoc]:
        with torch.inference_mode():
            # Generate random embeddings
            embeddings = torch.rand((len(docs), 5), device=self.device)
            docs.embedding = embeddings
            embedding_device = 'GPU' if embeddings.is_cuda else 'CPU'
            docs.text = [f'Embeddings calculated on {embedding_device}']

```

````{tab} Use with CPU

```python

from typing import Optional
from docarray import DocList, BaseDoc
from docarray.typing import AnyTensor
from jina import Deployment

dep = Deployment(uses=MyGPUExec, uses_with={'device': 'cpu'})
docs =  DocList[MyDoc]([MyDoc()])

with dep:
    docs = dep.post(on='/encode', inputs=docs, return_type=DocList[MyDoc])

print(f'Document embedding: {docs.embedding}')
print(docs.text)

```

```shell

           Deployment@80[I]:🎉 Deployment is ready to use!
    🔗 Protocol:         GRPC
    🏠 Local access:    0.0.0.0:49618
    🔒 Private network:    172.28.0.2:49618
    🌐 Public address:    34.67.105.220:49618
Document embedding: tensor([[0.1769, 0.1557, 0.9266, 0.8655, 0.6291]])
['Embeddings calculated on CPU']

```

````

````{tab} Use with GPU

```python

from typing import Optional
from docarray import DocList, BaseDoc
from docarray.typing import AnyTensor
from jina import Deployment

dep = Deployment(uses=MyGPUExec, uses_with={'device': 'cuda'})
docs =  DocList[MyDoc]([MyDoc()])

with dep:
    docs = dep.post(on='/encode', inputs=docs, return_type=DocList[MyDoc])

print(f'Document embedding: {docs.embedding}')
print(docs.text)

```

```shell

           Deployment@80[I]:🎉 Deployment is ready to use!
    🔗 Protocol:         GRPC
    🏠 Local access:    0.0.0.0:56276
    🔒 Private network:    172.28.0.2:56276
    🌐 Public address:    34.67.105.220:56276
Document embedding: tensor([[0.6888, 0.8646, 0.0422, 0.8501, 0.4016]])
['Embeddings calculated on GPU']

```

````

Just like that, your code runs on GPU, inside a Deployment.

Next, we will go through a more fleshed out example in detail, where we use a language model to embed text in our
Documents - all on GPU, and thus blazingly fast.

(gpu-prerequisites)=

## Prerequisites

For this tutorial, you need to work on a machine with an NVIDIA graphics card. If you
don't have such a machine at home, you can use various free cloud platforms (like Google Colab or Kaggle kernels).

Also ensure you have a recent version of [NVIDIA drivers](https://www.nvidia.com/Download/index.aspx)
installed. You don't need to install CUDA for this tutorial, but note that depending on
the deep learning framework that you use, that might be required (for local execution).

For the Docker part of the tutorial you will also need to have [Docker](https://docs.docker.com/get-docker/) and
[nvidia-docker](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html) installed.

To run Python scripts you need a virtual environment (for example [venv](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/#creating-a-virtual-environment) or [conda](https://conda.io/projects/conda/en/latest/user-guide/getting-started.html#managing-environments)), and install Jina-serve inside it using:

```bash
pip install jina

```

## Setting up the Executor

```{admonition} Executor Hub
:class: hint

Let's create an Executor using `jina hub new`. This creates your Executor locally
and privately, and makes it quick and easy to run your
Executor inside a Docker container, or (if you so choose) to publish it to Executor Hub later.

```

We'll create a simple sentence encoder, and start by creating the Executor
"skeleton" using Jina-serve's CLI:

```bash
jina hub new

```

When prompted, name your Executor `SentenceEncoder`, and accept the default
folder - this creates a `SentenceEncoder/` folder inside your current
directory, which will be our working directory for this tutorial.

For many questions you can accept the default options. However:

* Select `y` when prompted for advanced configuration.
* Select `y` when prompted to create a `Dockerfile`.

In the end, you should be greeted with suggested next steps.

<details>
  <summary> Next steps </summary>

```bash
╭────────────────────────────────────── 🎉 Next steps ───────────────────────────────────────╮
│                                                                                            │
│  Congrats! You have successfully created an Executor! Here are the next steps:             │
│  ╭──────────────────────── 1. Check out the generated Executor ─────────────────────────╮  │
│  │   1 cd /home/ubuntu/SentenceEncoder                                                  │  │
│  │   2 ls                                                                               │  │
│  ╰──────────────────────────────────────────────────────────────────────────────────────╯  │
│  ╭─────────────────────────── 2. Understand folder structure ───────────────────────────╮  │
│  │                                                                                      │  │
│  │   Filena…   Description                                                              │  │
│  │  ──────────────────────────────────────────────────────────────────────────────────  │  │
│  │   config…   The YAML config file of the Executor. You can define __init__ argumen…   │  │
│  │             ╭────────────────── config.yml ──────────────────╮                       │  │
│  │             │   1                                            │                       │  │
│  │             │   2 jtype: SentenceEncoder                     │                       │  │
│  │             │   3 with:                                      │                       │  │
│  │             │   4     foo: 1                                 │                       │  │
│  │             │   5     bar: hello                             │                       │  │
│  │             │   6 metas:                                     │                       │  │
│  │             │   7     py_modules:                            │                       │  │
│  │             │   8         - executor.py                      │                       │  │
│  │             │   9                                            │                       │  │
│  │             ╰────────────────────────────────────────────────╯                       │  │
│  │   Docker…   The Dockerfile describes how this executor will be built.                │  │
│  │   execut…   The main logic file of the Executor.                                     │  │
│  │   manife…   Metadata for the Executor, for better appeal on Executor Hub.                │  │
│  │                                                                                      │  │
│  │               Field   Description                                                    │  │
│  │              ────────────────────────────────────────────────────────────────────    │  │
│  │               name    Human-readable title of the Executor                           │  │
│  │               desc…   Human-readable description of the Executor                     │  │
│  │               url     URL to find more information on the Executor (e.g. GitHub…     │  │
│  │               keyw…   Keywords that help user find the Executor                      │  │
│  │                                                                                      │  │
│  │   README…   A usage guide of the Executor.                                           │  │
│  │   requir…   The Python dependencies of the Executor.                                 │  │
│  │                                                                                      │  │
│  ╰──────────────────────────────────────────────────────────────────────────────────────╯  │
│  ╭────────────────────────────── 3. Share it to Executor Hub ───────────────────────────────╮  │
│  │   1 jina hub push /home/ubuntu/SentenceEncoder                                       │  │
│  ╰──────────────────────────────────────────────────────────────────────────────────────╯  │
╰────────────────────────────────────────────────────────────────────────────────────────────╯

```

</details>

Now let's move to the newly created Executor directory:

```bash
cd SentenceEncoder

```

Continue by specifying our requirements in `requirements.txt`:

```text
sentence-transformers==2.0.0

```

And installing them using:

```bash
pip install -r requirements.txt

```

```{admonition} Do I need to install CUDA?
:class: hint

All machine learning frameworks rely on CUDA for running on a GPU. However, whether you
need CUDA installed on your system or not depends on the framework that you use.

In this tutorial, we use PyTorch, which already includes the necessary
CUDA binaries in its distribution. However, other frameworks, such as TensorFlow, require
you to install CUDA yourself.

```

```{admonition} Install only what you need
:class: hint

In this example we install the GPU-enabled version of PyTorch, which is the default
version when installing from PyPI. However, if you know that you only need to use your
Executor on CPU, you can save a lot of space (hundreds of MBs, or even GBs) by installing
CPU-only versions of your requirements. This translates into faster startup times
when using Docker containers.

In our case, we could change the `requirements.txt` file to install a CPU-only version
of PyTorch:

:::text
-f https://download.pytorch.org/whl/torch_stable.html
sentence-transformers
torch==1.9.0+cpu
:::

```

Now let's fill the `executor.py` file with the actual Executor code:

```{code-block} python
---
emphasize-lines: 16
---
import torch
from typing import Optional
from docarray import DocList, BaseDoc
from docarray.typing import AnyTensor
from jina import Executor, requests
from sentence_transformers import SentenceTransformer

class MyDoc(BaseDoc):
    text: str = ''
    embedding: Optional[AnyTensor[5]] = None

class SentenceEncoder(Executor):
    """A simple sentence encoder that can be run on a CPU or a GPU

    :param device: The pytorch device that the model is on, e.g. 'cpu', 'cuda', 'cuda:1'
    """

    def __init__(self, device: str = 'cpu', *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.model = SentenceTransformer('all-MiniLM-L6-v2', device=device)
        self.model.to(device)  # Move the model to device

    @requests
    def encode(self, docs: DocList[MyDoc], **kwargs) -> DocList[MyDoc]:
        """Add text-based embeddings to all documents"""
        with torch.inference_mode():
            embeddings = self.model.encode(docs.texts, batch_size=32)
        docs.embeddings = embeddings

```

Here all the device-specific magic happens on the two highlighted lines - when we create the
`SentenceEncoder` class instance we pass it the device, and then we move the PyTorch
model to our device. These are also the exact same steps to use in a standalone Python script.

To see how we would pass the device we want the Executor to use,
let's create another file - `main.py`, to demonstrate the usage of this
encoder by encoding 10,000 text documents.

```python
from typing import Optional
from jina import Deployment
from docarray import DocList, BaseDoc
from docarray.typing import AnyTensor
from executor import SentenceEncoder

class MyDoc(BaseDoc):
    text: str = ''
    embedding: Optional[AnyTensor[5]] = None

def generate_docs():
    for _ in range(10_000):
        yield MyDoc(
            text='Using a GPU allows you to significantly speed up encoding.'
        )

dep = Deployment(uses=SentenceEncoder, uses_with={'device': 'cpu'})

with dep:
    dep.post(on='/encode', inputs=generate_docs, show_progress=True, request_size=32, return_type=DocList[MyDoc])

```

## Running on GPU and CPU locally

We can observe the speed up by running the same code on both the CPU and GPU.

To toggle between the two, set your device type to `'cuda'`, and your GPU will take over the work:

```diff

* dep = Deployment(uses=SentenceEncoder, uses_with={'device': 'cuda'})
* dep = Deployment(uses=SentenceEncoder, uses_with={'device': 'cpu'})

```

Then, run the script:

```bash
python main.py

```

And compare the results:

````{tab} CPU

```shell

      executor0@26554[L]:ready and listening
        gateway@26554[L]:ready and listening
           Deployment@26554[I]:🎉 Deployment is ready to use!
        🔗 Protocol:            GRPC
        🏠 Local access:        0.0.0.0:56969
        🔒 Private network:     172.31.39.70:56969
        🌐 Public address:      52.59.231.246:56969
Working... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━ 0:00:20 15.1 step/s 314 steps done in 20 seconds

```

````

````{tab} GPU

```shell

      executor0@21032[L]:ready and listening
        gateway@21032[L]:ready and listening
           Deployment@21032[I]:🎉 Deployment is ready to use!
        🔗 Protocol:            GRPC
        🏠 Local access:        0.0.0.0:54255
        🔒 Private network:     172.31.39.70:54255
        🌐 Public address:      52.59.231.246:54255
Working... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━ 0:00:03 90.9 step/s 314 steps done in 3 seconds

```

````

Running this code on a `g4dn.xlarge` AWS instance with a single NVIDIA T4 GPU attached, we can see that embedding
time decreases from 20s to 3s by running on GPU.
That's more than a **6x speedup!** And that's not even the best we can do - if we increase the batch size to max out the GPU's memory we would get even larger speedups. But such optimizations are beyond the scope of this tutorial.

```{admonition} Note
:class: hint

You've probably noticed that there was a delay (about 3 seconds) when creating the Deployment.
This is because the weights of our model had to be transferred from CPU to GPU when we
initialized the Executor. However, this action only occurs once in the lifetime of the Executor,
so for most use cases we don't need to worry about it.

```

## Using GPU in a container

```{admonition} Using your GPU inside a container
:class: caution

For this part of the tutorial, you need to [install `nvidia-container-toolkit`](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html).

```

When you use your Executor in production you most likely want it in a Docker container, to provide proper environment isolation and easily use it on any device.

Using GPU-enabled Executors in this case is no harder than using them locally. We don't even need to modify the default `Dockerfile`.

```{admonition} Choosing the right base image
:class: hint

In our case we use the default `jinaai/jina:latest` base image. However, parallel to the comments about installing CUDA locally, you may need a different base image depending on your framework.

If you need CUDA installed in the image, you usually have two options: either take `nvidia/cuda` for the base image, or take the official GPU-enabled image of your framework, for example, `tensorflow/tensorflow:2.6.0-gpu`.

```

The other file we care about in this case is `config.yml`, and here the default version works as well. Let's build the Docker image:

```bash
docker build -t sentence-encoder .

```

You can run the container to check that everything is working well:

```bash
docker run sentence-encoder

```

Let's use the Docker version of our encoder with the GPU. If you've dealt with GPUs in containers before, you may remember that to use a GPU inside the container you need to pass `--gpus all` option to the `docker run` command. Jina lets you do just that.

We need to modify our `main.py` script to use a GPU-base containerized Executor:

```{code-block} python
---
emphasize-lines: 18
---
from typing import Optional
from jina import Deployment
from docarray import DocList, BaseDoc
from docarray.typing import AnyTensor
from executor import SentenceEncoder

class MyDoc(BaseDoc):
    text: str = ''
    embedding: Optional[AnyTensor[5]] = None

def generate_docs():
    for _ in range(10_000):
        yield MyDoc(
            text='Using a GPU allows you to significantly speed up encoding.'
        )

dep = Deployment(uses='docker://sentence-encoder', uses_with={'device': 'cuda'}, gpus='all')

with dep:
    dep.post(on='/encode', inputs=generate_docs, show_progress=True, request_size=32, return_type=DocList[MyDoc])

```

If we run this with `python main.py`, we'll get the same output as before, except that now we'll also get the output from the Docker container.

Every time we start the Executor, the Transformer model gets downloaded again. To speed this up, we want the encoder to load the model from a file which we have pre-downloaded to our disk.

We can do this with Docker volumes - Jina simply passes the argument to the Docker container. Here's how we modify `main.py`:

```python
dep = Deployment(
    uses='docker://sentence-encoder',
    uses_with={'device': 'cuda'},
    gpus='all',
    # This has to be an absolute path, replace /home/ubuntu with your home directory
    volumes="/home/ubuntu/.cache:/root/.cache",
)

```

We mounted the `~/.cache` directory, because that's where pre-built transformer models are saved. But this could be any custom directory - depending on the Python package you are using, and how you specify the model loading path.

Run `python main.py` again and you can see that no downloading happens inside the container, and that encoding starts faster.

## Using GPU with Hub Executors

We now saw how to use GPU with our Executor locally, and when using it in a Docker container. What about when we use Executors from Executor Hub - is there any difference?

Nope! Not only that, many Executors on Executor Hub already come with a GPU-enabled version pre-built, usually under the `gpu` tag (see [Executor Hub tags](hub_tags)). Let's modify our example to use the pre-built `TransformerTorchEncoder` from Executor Hub:

```diff
dep = Deployment(

*   uses='docker://sentence-encoder',
*   uses='jinaai+docker://jina-ai/TransformerTorchEncoder:latest-gpu',

    uses_with={'device': 'cuda'},
    gpus='all',
    # This has to be an absolute path, replace /home/ubuntu with your home directory
    volumes="/home/ubuntu/.cache:/root/.cache"
)

```

The first time you run the script, downloading the Docker image takes some time - GPU images are large! But after that, everything works just as it did with the local Docker image, out of the box.

```{admonition} Important
:class: caution

When using GPU encoders from Executor Hub, always use `jinaai+docker://`, and not `jinaai://`. As discussed above, these encoders may need CUDA installed (or other system dependencies), and installing that properly can be tricky. For that reason, use Docker images, which already come with all these dependencies pre-installed.

```

## Conclusion

Let's recap this tutorial:

1. Using Executors on a GPU locally is no different to using a GPU in a standalone script. You pass the device you want your Executor to use in the initialization.
2. To use an Executor on a GPU inside a Docker container, pass `gpus='all'`.
3. Use volumes (bind mounts), so you don't have to download large files each time you start the Executor.
4. Use GPU with Executors from Executor Hub - just use the Executor with the `gpu` tag.

When you start building your own Executor, check what system requirements (CUDA and similar) are needed, and install them locally (and in the `Dockerfile`) accordingly.

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/tutorials/index.md

# Tutorials

```{toctree}
deploy-model
deploy-pipeline
llm-serve

```

---
{ref}`genindex` | {ref}`modindex`

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/tutorials/llm-serve.md

# Build a Streaming API for a Large Language Model

```{include} ../../README.md
:start-after: <!-- start llm-streaming-intro -->
:end-before: <!-- end llm-streaming-intro -->

```

## Service Schemas

```{include} ../../README.md
:start-after: <!-- start llm-streaming-schemas -->
:end-before: <!-- end llm-streaming-schemas -->

```

```{admonition} Note
:class: note

Thanks to DocArray's flexibility, you can implement very flexible services. For instance, you can use
Tensor types to efficiently stream token logits back to the client and implement complex token sampling strategies on
the client side.

```

## Service initialization

```{include} ../../README.md
:start-after: <!-- start llm-streaming-init -->
:end-before: <!-- end llm-streaming-init -->

```

## Implement the streaming endpoint

```{include} ../../README.md
:start-after: <!-- start llm-streaming-endpoint -->
:end-before: <!-- end llm-streaming-endpoint -->

```

## Serve and send requests

```{include} ../../README.md
:start-after: <!-- start llm-streaming-serve -->
:end-before: <!-- end llm-streaming-serve -->

```

---

# Source: https://github.com/jina-ai/serve/blob/master/docs/yaml-spec.md

(yaml-spec)=

# {octicon}`file-code` YAML Specification

YAML is widely used in Jina-serve to define an Executor, Flow. This page helps you quickly navigate different YAML specifications.

## Executor-level YAML

Executor level YAML is placed inside the Executor directory, as a part of Executor file structure.

:::::{grid} 2
:gutter: 3

::::{grid-item-card} Executor YAML
:link: concepts/serving/executor/yaml-spec
:link-type: doc

Define the argument of `__init__`, Python module dependencies and other settings of an Executor.
::::

:::::

## Flow-level YAML

Flow level YAML is placed inside the Flow directory, as a part of Flow file structure. It defines the Executors that will be used in the Flow, the Gateway and the JCloud hosting specifications.

:::::{grid} 2
:gutter: 3

::::{grid-item-card} Flow YAML
:link: concepts/orchestration/flow/yaml-spec
:link-type: doc

Define the Executors, the topology and the Gateway settings of a Flow.
::::

::::{grid-item-card} Gateway YAML
:link: concepts/serving/gateway/yaml-spec
:link-type: doc

Define the protocol, TLS, authentication and other settings of a Gateway.
+++
Gateway specification is nested under the Flow YAML via `with:` keywords.
::::

::::{grid-item-card} JCloud YAML
:link: concepts/jcloud/yaml-spec
:link-type: doc

Define the resources and autoscaling settings on Jina-serve Cloud

+++
JCloud specification is nested under the Flow YAML via `jcloud:` keywords.

::::

:::::