# Jina Python Sdk > This section includes the API documentation from the`jina`codebase, as extracted from the`docstrings`_ in the code. --- # Source: https://github.com/jina-ai/jina/blob/master/docs/api-rst.rst ====================== :fab:`python` Python API ====================== This section includes the API documentation from the `jina` codebase, as extracted from the `docstrings `_ in the code. For further details, please refer to the full :ref:`user guide `. :mod:`jina.orchestrate.deployments` - Deployment -------------------- .. currentmodule:: jina.orchestrate.deployments .. autosummary:: :nosignatures: :template: class.rst __init__.Deployment :mod:`jina.orchestrate.flow` - Flow -------------------- .. currentmodule:: jina.orchestrate.flow .. autosummary:: :nosignatures: :template: class.rst base.Flow asyncio.AsyncFlow :mod:`jina.serve.executors` - Executor -------------------- .. currentmodule:: jina.serve.executors .. autosummary:: :nosignatures: :template: class.rst Executor BaseExecutor decorators.requests decorators.monitor :mod:`jina.clients` - Clients -------------------- .. currentmodule:: jina.clients .. autosummary:: :nosignatures: :template: class.rst Client grpc.GRPCClient grpc.AsyncGRPCClient http.HTTPClient http.AsyncHTTPClient websocket.WebSocketClient websocket.AsyncWebSocketClient :mod:`jina.types.request` - Networking messages -------------------- .. currentmodule:: jina.types.request .. autosummary:: :nosignatures: :template: class.rst Request data.DataRequest data.Response status.StatusMessage :mod:`jina.serve.runtimes` - Flow internals -------------------- .. currentmodule:: jina.serve.runtimes .. autosummary:: :nosignatures: :template: class.rst asyncio.AsyncNewLoopRuntime gateway.GatewayRuntime gateway.grpc.GRPCGatewayRuntime gateway.http.HTTPGatewayRuntime gateway.websocket.WebSocketGatewayRuntime worker.WorkerRuntime head.HeadRuntime --- # Source: https://github.com/jina-ai/jina/blob/master/docs/cli/index.rst :octicon:`terminal` Command-Line Interface ========================================== .. argparse:: :noepilog: :nodescription: :ref: jina.parsers.get_main_parser :prog: jina --- # Source: https://github.com/jina-ai/jina/blob/master/docs/cloud-nativeness/docker-compose.md (docker-compose)= # {fab}`docker` Docker Compose Support One of the simplest ways to prototype or serve in production is to run your {class}`~jina.Flow` with `docker-compose`. A {class}`~jina.Flow` is composed of {class}`~jina.Executor`s which run Python code that operates on `Documents`. These `Executors` live in different runtimes depending on how you want to deploy your Flow. By default, if you are serving your Flow locally they live within processes. Nevertheless, because Jina-serve is cloud native your Flow can easily manage Executors that live in containers and that are orchestrated by your favorite tools. One of the simplest is Docker Compose which is supported out of the box. You can deploy a Flow with Docker Compose in one line: ```{code-block} python --- emphasize-lines: 3 --- from jina import Flow flow = Flow(...).add(...).add(...) flow.to_docker_compose_yaml('docker-compose.yml') ``` Jina-serve generates a `docker-compose.yml` configuration file corresponding with your Flow. You can use this directly with Docker Compose, avoiding the overhead of manually defining all of your Flow's services. ````{admonition} Use Docker-based Executors :class: caution All Executors in the Flow should be used with `jinaai+docker://...` or `docker://...`. ```` ````{admonition} Health check available from 3.1.3 :class: caution If you use Executors that rely on Docker images built with a version of Jina-serve prior to 3.1.3, remove the health check from the dumped YAML file, otherwise your Docker Compose services will always be "unhealthy." ```` ````{admonition} Matching Jina-serve versions :class: caution If you change the Docker images in your Docker Compose generated file, ensure that all services included in the Gateway are built with the same Jina-serve version to guarantee compatibility. ```` ## Example: Index and search text using your own built Encoder and Indexer Install [`Docker Compose`](https://docs.docker.com/compose/install/) locally before starting this tutorial. For this example we recommend that you read {ref}`how to build and containerize the Executors to be run in Kubernetes. ` ### Deploy the Flow First define the Flow and generate the Docker Compose YAML configuration: ````{tab} YAML In a `flow.yml` file : ```yaml jtype: Flow with: port: 8080 protocol: http executors: - name: encoder uses: jinaai+docker:///EncoderPrivate replicas: 2 - name: indexer uses: jinaai+docker:///IndexerPrivate shards: 2 ``` Then in a shell run: ```shell jina export docker-compose flow.yml docker-compose.yml ``` ```` ````{tab} Python In python run ```python from jina import Flow flow = ( Flow(port=8080, protocol='http') .add(name='encoder', uses='jinaai+docker:///EncoderPrivate', replicas=2) .add( name='indexer', uses='jinaai+docker:///IndexerPrivate', shards=2, ) ) flow.to_docker_compose_yaml('docker-compose.yml') ``` ```` ````{admonition} Hint :class: hint You can use a custom jina Docker image for the Gateway service by setting the environment variable `JINA_GATEWAY_IMAGE` to the desired image before generating the configuration. ```` let's take a look at the generated compose file: ```yaml version: '3.3' ... services: encoder-rep-0: # # # # # # # # # # # # Encoder # encoder-rep-1: # # # # # # # # # # # indexer-head: # # # # # # # # # # # # # indexer-0: # Indexer # # # indexer-1: # # # # # # # # # # # gateway: ... ports: - 8080:8080 ``` ```{tip} :class: caution The default compose file generated by the Flow contains no special configuration or settings. You may want to adapt it to your own needs. ``` You can see that six services are created: - 1 for the **Gateway** which is the entrypoint of the **Flow**. - 2 associated with the encoder for the two Replicas. - 3 associated with the indexer, one for the Head and two for the Shards. Now, you can deploy this Flow : ```shell docker-compose -f docker-compose.yml up ``` ### Query the Flow Once we see that all the services in the Flow are ready, we can send index and search requests. First define a client: ```python from jina.clients import Client client = Client(host='http://localhost:8080') ``` ```python from typing import List, Optional from docarray import DocList, BaseDoc from docarray.typing import NdArray class MyDoc(BaseDoc): text: str embedding: Optional[NdArray] = None class MyDocWithMatches(MyDoc): matches: DocList[MyDoc] = [] scores: List[float] = [] docs = client.post( '/index', inputs=DocList[MyDoc]([MyDoc(text=f'This is document indexed number {i}') for i in range(100)]), return_type=DocList[MyDoc], request_size=10 ) print(f'Indexed documents: {len(docs)}') docs = client.post( '/search', inputs=DocList[MyDoc]([MyDoc(text=f'This is document query number {i}') for i in range(10)]), return_type=DocList[MyDocWithMatches], request_size=10 ) for doc in docs: print(f'Query {doc.text} has {len(doc.matches)} matches') ``` --- # Source: https://github.com/jina-ai/jina/blob/master/docs/cloud-nativeness/k8s.md(kubernetes-docs)=# {fas}`dharmachakra` Kubernetes Support```{toctree}:hidden:kubernetes```Jina-serve is a cloud-native framework and therefore runs natively and easily on Kubernetes.Deploying a Jina-serve Deploymenr or Flow on Kubernetes is actually the recommended way to use Jina-serve in production.A {class}`~jina.Deployment` and {class}`~jina.Flow` are services composed of single or multiple microservices called {class}`~jina.Executor` and {class}`~jina.Gateway`s which natively run in containers. This means that Kubernetes can natively take over the lifetime management of Executors.Deploying a {class}`~jina.Deployment` or `~jina.Flow` on Kubernetes means wrapping these services containers in the appropriate K8s abstraction (Deployment, StatefulSet, and so on), exposing them internally via K8s service and connecting them together by passing the right set of parameters.```{hint}This documentation is designed for users who want to **manually** deploy a Jina-serve project on Kubernetes.Check out {ref}`jcloud` if you want a **one-click** solution to deploy and host Jina, leveraging a cloud-native stack of Kubernetes, Prometheus and Grafana, **without worrying about provisioning**.```## Automatically translate a Deployment or Flow to Kubernetes concept```{hint}Manually building these Kubernetes YAML object is long and cumbersome. Therefore we provide a helper function {meth}`~jina.Flow.to_kubernetes_yaml` that does most of thistranslation work automatically.```This helper function can be called from:* Jina-serve's Python interface to translate a Flow defined in Python to K8s YAML files* Jina-serve's CLI interface to export a YAML Flow to K8s YAML files```{seealso}More detail in the {ref}`Deployment export documentation` and {ref}`Flow export documentation ````## Extra Kubernetes optionsIn general, Jina-serve follows a single principle when it comes to deploying in Kubernetes:You, the user, know your use case and requirements the best.This means that, while Jina-serve generates configurations for you that run out of the box, as a professional user you should always see them as just a starting point to get you off the ground.```{hint}The export function {meth}`~jina.Deployment.to_kubernetes_yaml` and {meth}`~jina.Flow.to_kubernetes_yaml` are helper functions to get your stared off the ground. **There are meant to be updated and adapted to every use case**```````{admonition} Matching Jina versions:class: cautionIf you change the Docker images for {class}`~jina.Executor` and {class}`~jina.Gateway` in your Kubernetes-generated file, ensure that all of them are built with the same Jina-serve version to guarantee compatibility.````You can't add basic Kubernetes features like `Secrets`, `ConfigMap` or `Labels` via the Pythonic or YAML interface. This is intentional and doesn't mean that we don't support these features. On the contrary, we let you fully express your Kubernetes configuration by using the Kubernetes API to add your own Kubernetes standard to Jina-serve.````{admonition} Hint:class: hintWe recommend you dump the Kubernetes configuration files and then edit them to suit your needs.````Here are possible configuration options you may need to add or change- Add labels `selector`s to the Deployments to suit your case- Add `requests` and `limits` for the resources of the different Pods- Set up persistent volume storage to save your data on disk- Pass custom configuration to your Executor with `ConfigMap`- Manage credentials of your Executor with Kubernetes secrets, you can use `f.add(..., env_from_secret={'SECRET_PASSWORD': {'name': 'mysecret', 'key': 'password'}})` to map them to Pod environment variables- Edit the default rolling update configuration(service-mesh-k8s)=## Required service mesh```{caution}A Service Mesh is required to be installed and correctly configured in the K8s cluster in which your deployed your Flow.```Service meshes work by attaching a tiny proxy to each of your Kubernetes Pods, allowing for smart rerouting, load balancing, request retrying, and host of [other features](https://linkerd.io/2.11/features/).Jina relies on a service mesh to load balance requests between replicas of the same Executor.You can use your favourite Kubernetes service mesh in combination with your Jina services, but the configuration filesgenerated by `to_kubernetes_yaml()` already include all necessary annotations for the [Linkerd service mesh](https://linkerd.io).````{admonition} Hint:class: hintYou can use any service mesh with Jina-serve, but Jina-serve Kubernetes configurations come with Linkerd annotations out of the box.````To use Linkerd you can follow the [install the Linkerd CLI guide](https://linkerd.io/2.11/getting-started/).````{admonition} Caution:class: cautionMany service meshes can perform retries themselves.Be careful about setting up service mesh level retries in combination with Jina, as it may lead to unwanted behaviour in combination withJina's own {ref}`retry policy `.Instead, you can disable Jina level retries by setting `Flow(retries=0)` in Python, or `retries: 0` in the FlowYAML's `with` block.````(kubernetes-replicas)=## Scaling Executors: Replicas and shardsJina supports two types of scaling:- **Replicas** can be used with any Executor type and are typically used for performance and availability.- **Shards** are used for partitioning data and should only be used with indexers since they store state.Check {ref}`here ` for more information about these scaling mechanisms.For shards, Jina creates one separate Deployment in Kubernetes per Shard.Setting `Deployment(..., shards=num_shards)` is sufficient to create a corresponding Kubernetes configuration.For replicas, Jina-serve uses [Kubernetes native replica scaling](https://kubernetes.io/docs/tutorials/kubernetes-basics/scale/scale-intro/) and **relies on a service mesh** to load-balance requests between replicas of the same Executor.Without a service mesh installed in your Kubernetes cluster, all traffic will be routed to the same replica.````{admonition} See Also:class: seealsoThe impossibility of load balancing between different replicas is a limitation of Kubernetes in combination with gRPC.If you want to learn more about this limitation, see [this](https://kubernetes.io/blog/2018/11/07/grpc-load-balancing-on-kubernetes-without-tears/) Kubernetes Blog post.````## Scaling the GatewayThe {ref}`Gateway ` is responsible for providing the API of the {ref}`Flow `.If you have a large Flow with many Clients and many replicated Executors, the Gateway can become the bottleneck.In this case you can also scale up the Gateway deployment to be backed by multiple Kubernetes Pods. For this reason, you can add `replicas` parameter to your Gateway before converting the Flow to Kubernetes.This can be done in a Pythonic way or in YAML:````{tab} Using PythonYou can use {meth}`~jina.Flow.config_gateway` to add `replicas` parameter```pythonfrom jina import Flowf = Flow().config_gateway(replicas=3).add()f.to_kubernetes_yaml('./k8s_yaml_path')```````````{tab} Using YAMLYou can add `replicas` in the `gateway` section of your Flow YAML```yamljtype: Flowgateway: replicas: 3executors: - name: encoder```````Alternatively, this can be done by the regular means of Kubernetes: Either increase the number of replicas in the {ref}`generated yaml configuration files ` or [add replicas while running](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#scaling-a-deployment).To expose your Gateway replicas outside Kubernetes, you can add a load balancer as described {ref}`here `.````{admonition} Hint:class: hintYou can use a custom Docker image for the Gateway deployment by setting the environment variable `JINA_GATEWAY_IMAGE` to the desired image before generating the configuration.````## See also- {ref}`Step by step deployment of a Jina-serve Flow on Kubernetes `- {ref}`Export a Flow to Kubernetes `- {meth}`~jina.Flow.to_kubernetes_yaml`- {ref}`Deploy a standalone Executor on Kubernetes `- [Kubernetes Documentation](https://kubernetes.io/docs/home/) --- # Source: https://github.com/jina-ai/jina/blob/master/docs/cloud-nativeness/kubernetes.md(kubernetes)=# Deploy on KubernetesThis how-to will go through deploying a Deployment and a simple Flow using Kubernetes, customizing the Kubernetes configurationto your needs, and scaling Executors using replicas and shards.Deploying Jina-serve services in Kubernetes is the recommended way to use Jina-serve in production because Kubernetes can easily take over the lifetime management of Executors and Gateways.```{seelaso}This page is a step by step guide, refer to the {ref}`Kubernetes support documentation ` for more details``````{hint}This guide is designed for users who want to **manually** deploy a Jina-serve project on Kubernetes.Check out {ref}`jcloud` if you want a **one-click** solution to deploy and host Jina, leveraging a cloud-native stack of Kubernetes, Prometheus and Grafana, **without worrying about provisioning**.```## PreliminariesTo follow this how-to, you need access to a Kubernetes cluster.You can either set up [`minikube`](https://minikube.sigs.k8s.io/docs/start/), or use one of many managed Kubernetessolutions in the cloud:- [Google Kubernetes Engine](https://cloud.google.com/kubernetes-engine)- [Amazon EKS](https://aws.amazon.com/eks)- [Azure Kubernetes Service](https://azure.microsoft.com/en-us/services/kubernetes-service)- [Digital Ocean](https://www.digitalocean.com/products/kubernetes/)You need to install Linkerd in your K8s cluster. To use Linkerd, [install the Linkerd CLI](https://linkerd.io/2.11/getting-started/) and [its control plane](https://linkerd.io/2.11/getting-started/) in your cluster.This automatically sets up and manages the service mesh proxies when you deploy the Flow.To understand why you need to install a service mesh like Linkerd refer to this {ref}`section `(build-containerize-for-k8s)=## Build and containerize your ExecutorsFirst, we need to build the Executors that we are going to use and containerize them {ref}`manually ` or by leveraging {ref}`Executor Hub `. In this example,we are going to use the Hub.We are going to build two Executors, the first is going to use `CLIP` to encode textual Documents, and the second is going to use an in-memory vector index. This waywe can build a simple neural search system.First, we build the encoder Executor.````{tab} executor.py```{code-block} pythonimport torchfrom typing import Optionalfrom transformers import CLIPModel, CLIPTokenizerfrom docarray import DocList, BaseDocfrom docarray.typing import NdArrayfrom jina import Executor, requestsclass MyDoc(BaseDoc): text: str embedding: Optional[NdArray] = Noneclass Encoder(Executor): def __init__( self, pretrained_model_name_or_path: str = 'openai/clip-vit-base-patch32', device: str = 'cpu', *args,**kwargs ): super().__init__(*args, **kwargs) self.device = device self.tokenizer = CLIPTokenizer.from_pretrained(pretrained_model_name_or_path) self.model = CLIPModel.from_pretrained(pretrained_model_name_or_path) self.model.eval().to(device) def _tokenize_texts(self, texts): x = self.tokenizer( texts, max_length=77, padding='longest', truncation=True, return_tensors='pt', ) return {k: v.to(self.device) for k, v in x.items()} @requests def encode(self, docs: DocList[MyDoc], **kwargs) -> DocList[MyDoc]: with torch.inference_mode(): input_tokens = self._tokenize_texts(docs.text) docs.embedding = self.model.get_text_features(**input_tokens).cpu().numpy() return docs```````````{tab} requirements.txt```torch==1.12.0transformers==4.16.2```````````{tab} config.yml```jtype: Encodermetas: name: EncoderPrivate py_modules: - executor.py```````Putting all these files into a folder named CLIPEncoder and calling `jina hub push CLIPEncoder --private` should give:```shell╭────────────────────────── Published ───────────────────────────╮│ ││ 📛 Name EncoderPrivate ││ 🔗 Jina Hub URL https://cloud.jina.ai/executor// ││ 👀 Visibility private ││ │╰────────────────────────────────────────────────────────────────╯╭───────────────────────────────────────────────────── Usage ─────────────────────────────────────────────────────╮│ ││ Container YAML uses: jinaai+docker:///EncoderPrivate:latest ││ Python .add(uses='jinaai+docker:///EncoderPrivate:latest') ││ ││ Source YAML uses: jinaai:///EncoderPrivate:latest ││ Python .add(uses='jinaai:///EncoderPrivate:latest') ││ │╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯```Then we can build an indexer to provide `index` and `search` endpoints:````{tab} executor.py```{code-block} pythonfrom typing import Optional, Listfrom docarray import DocList, BaseDocfrom docarray.index import InMemoryExactNNIndexfrom docarray.typing import NdArrayfrom jina import Executor, requestsclass MyDoc(BaseDoc): text: str embedding: Optional[NdArray] = Noneclass MyDocWithMatches(MyDoc): matches: DocList[MyDoc] = [] scores: List[float] = []class Indexer(Executor): def __init__(self, *args, **kwargs): super().__init__(*args, **kwargs) self._indexer = InMemoryExactNNIndex[MyDoc]() @requests(on='/index') def index(self, docs: DocList[MyDoc], **kwargs) -> DocList[MyDoc]: self._indexer.index(docs) return docs @requests(on='/search') def search(self, docs: DocList[MyDoc], **kwargs) -> DocList[MyDocWithMatches]: res = DocList[MyDocWithMatches]() ret = self._indexer.find_batched(docs, search_field='embedding') matched_documents = ret.documents matched_scores = ret.scores for query, matches, scores in zip(docs, matched_documents, matched_scores): output_doc = MyDocWithMatches(**query.dict()) output_doc.matches = matches output_doc.scores = scores.tolist() res.append(output_doc) return res```````````{tab} config.yml```jtype: Indexermetas: name: IndexerPrivate py_modules: - executor.py```````Putting all these files into a folder named Indexer and calling `jina hub push Indexer --private` should give:```shell╭────────────────────────── Published ───────────────────────────╮│ ││ 📛 Name IndexerPrivate ││ 🔗 Jina Hub URL https://cloud.jina.ai/executor// ││ 👀 Visibility private ││ │╰────────────────────────────────────────────────────────────────╯╭───────────────────────────────────────────────────── Usage ─────────────────────────────────────────────────────╮│ ││ Container YAML uses: jinaai+docker:///IndexerPrivate:latest ││ Python .add(uses='jinaai+docker:///IndexerPrivate:latest') ││ │ ││ Source YAML uses: jinaai:///IndexerPrivate:latest ││ Python .add(uses='jinaai:///IndexerPrivate:latest') ││ │╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯```Now, since we have created private Executors, we need to make sure that K8s has the right credentials to downloadfrom the private registry:First, we need to create the namespace where our Flow will run:```shellkubectl create namespace custom-namespace```Second, we execute this python script:```pythonimport jsonimport osimport base64JINA_CONFIG_JSON_PATH = os.path.join(os.path.expanduser('~'), os.path.join('.jina', 'config.json'))CONFIG_JSON = 'config.json'with open(JINA_CONFIG_JSON_PATH) as fp: auth_token = json.load(fp)['auth_token']config_dict = dict()config_dict['auths'] = dict()config_dict['auths']['registry.hubble.jina.ai'] = {'auth': base64.b64encode(f':{auth_token}'.encode()).decode()}with open(CONFIG_JSON, mode='w') as fp: json.dump(config_dict, fp)```Finally, we add a secret to be used as [imagePullSecrets](https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/) in the namespace from our config.json:```shell scriptkubectl -n custom-namespace create secret generic regcred --from-file=.dockerconfigjson=config.json --type=kubernetes.io/dockerconfigjson```## Deploy an embedding model inside a DeploymentNow we are ready to first deploy our embedding model as an embedding service in Kubernetes.For now, define a Deployment,either in {ref}`YAML ` or directly in Python, as we do here:```pythonfrom jina import Deploymentd = Deployment(port=8080, name='encoder', uses='jinaai+docker:///EncoderPrivate', image_pull_secrets=['regcred'])```You can serve any Deployment you want.Just ensure that the Executor is containerized, either by using *'jinaai+docker'*, or by {ref}`containerizing your localExecutors `.Next, generate Kubernetes YAML configs from the Flow. Notice, that this step may be a little slow, because [Executor Hub](https://cloud.jina.ai/) mayadapt the image to your Jina-serve and docarray version.```pythond.to_kubernetes_yaml('./k8s_deployment', k8s_namespace='custom-namespace')```The following file structure will be generated - don't worry if it's slightly different -- there can bechanges from one Jina-serve version to another:```.└── k8s_deployment └── encoder.yml```You can inspect these files to see how Deployment and Executor concepts are mapped to Kubernetes entities.And as always, feel free to modify these files as you see fit for your use case.````{admonition} Caution: Executor YAML configurations:class: cautionAs a general rule, the configuration files produced by `to_kubernetes_yaml()` should run out of the box, and if you strictlyfollow this how-to they will.However, there is an exception to this: If you use a local dockerized Executor, and this Executors configuration is storedin a file other than `config.yaml`, you will have to adapt this Executor's Kubernetes YAML.To do this, open the file and replace `config.yaml` with the actual path to the Executor configuration.This is because when a Flow contains a Docker image, it can't see what Executorconfiguration was used to create that image.Since all of our tutorials use `config.yaml` for that purpose, the Flow uses this as a best guess.Please adapt this if you named your Executor configuration file differently.````Next you can actually apply these configuration files to your cluster, using `kubectl`.This launches the Deployment service.Now, deploy this Deployment to your cluster:```shellkubectl apply -R -f ./k8s_deployment```Check that the Pods were created:```shellkubectl get pods -n custom-namespace``````textNAME READY STATUS RESTARTS AGEencoder-81a5b3cf9-ls2m3 1/1 Running 0 60m```Once you see that the Deployment ready, you can start embedding documents:```pythonfrom typing import Optionalimport portforwardfrom docarray import DocList, BaseDocfrom docarray.typing import NdArrayfrom jina.clients import Clientclass MyDoc(BaseDoc): text: str embedding: Optional[NdArray] = Nonewith portforward.forward('custom-namespace', 'encoder-81a5b3cf9-ls2m3', 8080, 8080): client = Client(host='localhost', port=8080) client.show_progress = True docs = client.post( '/encode', inputs=DocList[MyDoc]([MyDoc(text=f'This is document indexed number {i}') for i in range(100)]), return_type=DocList[MyDoc], request_size=10 ) for doc in docs: print(f'{doc.text}: {doc.embedding}')```## Deploy a simple FlowNow we are ready to build a Flow composed of multiple Executors.By *simple* in this context we mean a Flow without replicated or sharded Executors - you can see how to use those inKubernetes {ref}`later on `.For now, define a Flow,either in {ref}`YAML ` or directly in Python, as we do here:```pythonfrom jina import Flowf = ( Flow(port=8080, image_pull_secrets=['regcred']) .add(name='encoder', uses='jinaai+docker:///EncoderPrivate') .add( name='indexer', uses='jinaai+docker:///IndexerPrivate', ))```You can essentially define any Flow of your liking.Just ensure that all Executors are containerized, either by using *'jinaai+docker'*, or by {ref}`containerizing your localExecutors `.The example Flow here simply encodes and indexes text data using two Executors pushed to the [Executor Hub](https://cloud.jina.ai/).Next, generate Kubernetes YAML configs from the Flow. Notice, that this step may be a little slow, because [Executor Hub](https://cloud.jina.ai/) mayadapt the image to your Jina-serve and docarray version.```pythonf.to_kubernetes_yaml('./k8s_flow', k8s_namespace='custom-namespace')```The following file structure will be generated - don't worry if it's slightly different -- there can bechanges from one Jina-serve version to another:```.└── k8s_flow ├── gateway │ └── gateway.yml └── encoder │ └── encoder.yml └── indexer └── indexer.yml```You can inspect these files to see how Flow concepts are mapped to Kubernetes entities.And as always, feel free to modify these files as you see fit for your use case.Next you can actually apply these configuration files to your cluster, using `kubectl`.This launches all Flow microservices.Now, deploy this Flow to your cluster:```shellkubectl apply -R -f ./k8s_flow```Check that the Pods were created:```shellkubectl get pods -n custom-namespace``````textNAME READY STATUS RESTARTS AGEencoder-8b5575cb9-bh2x8 1/1 Running 0 60mgateway-66d5f45ff5-4q7sw 1/1 Running 0 60mindexer-8f676fc9d-4fh52 1/1 Running 0 60m```Note that the Jina gateway was deployed with name `gateway-7df8765bd9-xf5tf`.Once you see that all the Deployments in the Flow are ready, you can start indexing documents:```pythonfrom typing import List, Optionalimport portforwardfrom docarray import DocList, BaseDocfrom docarray.typing import NdArrayfrom jina.clients import Clientclass MyDoc(BaseDoc): text: str embedding: Optional[NdArray] = Noneclass MyDocWithMatches(MyDoc): matches: DocList[MyDoc] = [] scores: List[float] = []with portforward.forward('custom-namespace', 'gateway-66d5f45ff5-4q7sw', 8080, 8080): client = Client(host='localhost', port=8080) client.show_progress = True docs = client.post( '/index', inputs=DocList[MyDoc]([MyDoc(text=f'This is document indexed number {i}') for i in range(100)]), return_type=DocList[MyDoc], request_size=10 ) print(f'Indexed documents: {len(docs)}') docs = client.post( '/search', inputs=DocList[MyDoc]([MyDoc(text=f'This is document query number {i}') for i in range(10)]), return_type=DocList[MyDocWithMatches], request_size=10 ) for doc in docs: print(f'Query {doc.text} has {len(doc.matches)} matches')```### Deploy with shards and replicasAfter your service mesh is installed, your cluster is ready to run a Flow with scaled Executors.You can adapt the Flow from above to work with two replicas for the encoder, and two shards for the indexer:```pythonfrom jina import Flowf = ( Flow(port=8080, image_pull_secrets=['regcred']) .add(name='encoder', uses='jinaai+docker:///CLIPEncoderPrivate', replicas=2) .add( name='indexer', uses='jinaai+docker:///IndexerPrivate', shards=2, ))```Again, you can generate your Kubernetes configuration:```pythonf.to_kubernetes_yaml('./k8s_flow', k8s_namespace='custom-namespace')```Now you should see the following file structure:```.└── k8s_flow ├── gateway │ └── gateway.yml └── encoder │ └─ encoder.yml └── indexer ├── indexer-0.yml ├── indexer-1.yml └── indexer-head.yml```Apply your configuration like usual:````{admonition} Hint: Cluster cleanup:class: hintIf you already have the simple Flow from the first example running on your cluster, make sure to delete it using `kubectl delete -R -f ./k8s_flow`.```````shellkubectl apply -R -f ./k8s_flow```### Deploy with custom environment variables and secretsYou can customize the environment variables that are available inside runtime, either defined directly or read from a [Kubernetes secret](https://kubernetes.io/docs/concepts/configuration/secret/):````{tab} with Python```pythonfrom jina import Flowf = ( Flow(port=8080, image_pull_secrets=['regcred']) .add( name='indexer', uses='jinaai+docker:///IndexerPrivate', env={'k1': 'v1', 'k2': 'v2'}, env_from_secret={ 'SECRET_USERNAME': {'name': 'mysecret', 'key': 'username'}, 'SECRET_PASSWORD': {'name': 'mysecret', 'key': 'password'}, }, ))f.to_kubernetes_yaml('./k8s_flow', k8s_namespace='custom-namespace')```````````{tab} with flow YAMLIn a `flow.yml` file :```yamljtype: Flowversion: '1'with: protocol: httpexecutors:- name: indexer uses: jinaai+docker:///IndexerPrivate env: k1: v1 k2: v2 env_from_secret: SECRET_USERNAME: name: mysecret key: username SECRET_PASSWORD: name: mysecret key: password```You can generate Kubernetes YAML configs using `jina export`:```shelljina export kubernetes flow.yml ./k8s_flow --k8s-namespace custom-namespace```````After creating the namespace, you need to create the secrets mentioned above:```shellkubectl -n custom-namespace create secret generic mysecret --from-literal=username=jina --from-literal=password=123456```Then you can apply your configuration.(kubernetes-expose)=## Exposing the serviceThe previous examples use port-forwarding to send documents to the services.In real world applications,you may want to expose your service to make it reachable by users so that you can serve search requests.```{caution}Exposing the Deployment or Flow only works if the environment of your `Kubernetes cluster` supports `External Loadbalancers`.```Once the service is deployed, you can expose a service. In this case we give an example of exposing the encoder when using a Deployment,but you can expose the gateway service when using a Flow:```bashkubectl expose deployment executor --name=executor-exposed --type LoadBalancer --port 80 --target-port 8080 -n custom-namespacesleep 60 # wait until the external ip is configured```Export the external IP address. This is needed for the client when sending Documents to the Flow in the next section.```bashexport EXTERNAL_IP=`kubectl get service executor-expose -n custom-namespace -o=jsonpath='{.status.loadBalancer.ingress[0].ip}'````### ClientThe client:- Sends Documents to the exposed service on `$EXTERNAL_IP`- Gets the responses.You should configure your Client to connect to the service via the external IP address as follows:```pythonimport osfrom typing import List, Optionalfrom docarray import DocList, BaseDocfrom docarray.typing import NdArrayfrom jina.clients import Clientclass MyDoc(BaseDoc): text: str embedding: Optional[NdArray] = Noneclass MyDocWithMatches(MyDoc): matches: DocList[MyDoc] = [] scores: List[float] = []host = os.environ['EXTERNAL_IP']port = 80client = Client(host=host, port=port)client.show_progress = Truedocs = DocList[MyDoc]([MyDoc(text=f'This is document indexed number {i}') for i in range(100)])queried_docs = client.post("/search", inputs=docs, return_type=DocList[MyDocWithMatches])matches = queried_docs[0].matchesprint(f"Matched documents: {len(matches)}")```## Update your Executor in KubernetesIn Kubernetes, you can update your Executors by patching the Deployment corresponding to your Executor.For instance, in the example above, you can change the CLIPEncoderPrivate's `pretrained_model_name_or_path` parameter by changing the content of the Deployment inside the `executor.yml` dumped by `.to_kubernetes_yaml`.You need to add `--uses_with` and pass the batch size argument to it. This is passed to the container inside the Deployment:```yaml spec: containers: - args: - executor - --name - encoder - --k8s-namespace - custom-namespace - --uses - config.yml - --port - '8080' - --uses-metas - '{}' - --uses-with - '{"pretrained_model_name_or_path": "other_model"}' - --native command: - jina```After doing so, re-apply your configuration so the new Executor will be deployed without affecting the other unchanged Deployments:```shell scriptkubectl apply -R -f ./k8s_deployment```````{admonition} Other patching options:class: seealsoIn Kubernetes Executors are ordinary Kubernetes Deployments, so you can use other patching options provided by Kubernetes:- `kubectl replace` to replace an Executor using a complete configuration file- `kubectl patch` to patch an Executor using only a partial configuration file- `kubectl edit` to edit an Executor configuration on the fly in your editorYou can find more information about these commands in the [official Kubernetes documentation](https://kubernetes.io/docs/concepts/cluster-administration/manage-deployment/).````## Key takeawaysIn short, there are just three key steps to deploy Jina on Kubernetes:1. Use `.to_kubernetes_yaml()` to generate Kubernetes configuration files from a Jina Deployment or Flow object.2. Apply the generated file via `kubectl`(Modify the generated files if necessary)3. Expose your service outside the K8s cluster## See also- {ref}`Kubernetes support documentation `- {ref}`Monitor service once it is deployed `- {ref}`See how failures and retries are handled `- {ref}`Learn more about scaling Executors ` --- # Source: https://github.com/jina-ai/jina/blob/master/docs/cloud-nativeness/monitoring.md(monitoring)=# Prometheus/Grafana Support (Legacy)```{admonition} Deprecated:class: cautionThe Prometheus-only based feature will soon be deprecated in favor of the OpenTelemetry Setup. Refer to {ref}`OpenTelemetry Setup ` for the details on OpenTelemetry setup for Jina-serve.Refer to the {ref}`OpenTelemetry migration guide ` for updating your existing Prometheus and Grafana configurations.```We recommend the Prometheus/Grafana stack to leverage the metrics exposed by Jina-serve. In this setup, Jina-serve exposes different metrics, and Prometheus scrapes these endpoints, as well ascollecting, aggregating, and storing the metrics.External entities (like Grafana) can access these aggregated metrics via the query language [PromQL](https://prometheus.io/docs/prometheus/latest/querying/basics/) and let users visualize the metrics with dashboards.```{hint}Jina supports exposing metrics, but you are in charge of installing and managing your Prometheus/Grafana instances.```In this guide, we deploy the Prometheus/Grafana stack and use it to monitor a Flow.(deploy-flow-monitoring)=## Deploying the Flow and the monitoring stack### Deploying on KubernetesOne challenge of monitoring a {class}`~jina.Flow` is communicating its different metrics endpoints to Prometheus.Fortunately, the [Prometheus operator for Kubernetes](https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/user-guides/getting-started.md) makes this fairly easy because it can automatically discover new metrics endpoints to scrape.We recommend deploying your Jina-serve Flow on Kubernetes to leverage the full potential of the monitoring feature because:* The Prometheus operator can automatically discover new endpoints to scrape.* You can extend monitoring with the rich built-in Kubernetes metrics.You can deploy Prometheus and Grafana on your Kubernetes cluster by running:```bashhelm install prometheus prometheus-community/kube-prometheus-stack --set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false``````{hint}setting the `serviceMonitorSelectorNilUsesHelmValues` to false allows the Prometheus Operator to discover metrics endpoint outside of the helm scope which is needed to discover the Flow metrics endpoints.```Deploy the Flow that we want to monitor:For this example we recommend reading {ref}`how to build and containerize the Executors to be run in Kubernetes. `````{tab} via YAMLThis example shows how to start a Flow with monitoring enabled via YAML:In a `flow.yaml` file```yamljtype: Flowwith: monitoring: trueexecutors:- uses: jinaai+docker:///EncoderPrivate``````bashjina export kubernetes flow.yml ./config ```````````{tab} via Python API```pythonfrom jina import Flowf = Flow(monitoring=True).add(uses='jinaai+docker:///EncoderPrivate')f.to_kubernetes_yaml('config')```````This creates a `config` folder containing the Kubernetes YAML definition of the Flow.```{seealso}You can see in-depth how to deploy a Flow on Kubernetes {ref}`here ````Then deploy the Flow:```bashkubectl apply -R -f config```Wait for a couple of minutes, and you should see that the Pods are ready:```bashkubectl get pods``````{figure} ../../.github/2.0/kubectl_pods.png:align: center```Then you can see that the new metrics endpoints are automatically discovered:```bashkubectl port-forward svc/prometheus-operated 9090:9090``````{figure} ../../.github/2.0/prometheus_target.png:align: center```Before querying the gateway you need to port-forward```bashkubectl port-forward svc/gateway 8080:8080```To access Grafana, run:```bashkb port-forward svc/prometheus-grafana 3000:80```Then open `http://localhost:3000` in your browser. The username is `admin` and password is `prom-operator`.You should see the Grafana home page.### Deploying locallyDeploy the Flow that we want to monitor:````{tab} via Python code```pythonfrom jina import Flowwith Flow(monitoring=True, port_monitoring=8000, port=8080).add( uses='jinaai+docker:///EncoderPrivate', port_monitoring=9000) as f: f.block()```````````{tab} via docker-compose```pythonfrom jina import FlowFlow(monitoring=True, port_monitoring=8000, port=8080).add( uses='jinaai+docker:///EncoderPrivate', port_monitoring=9000).to_docker_compose_yaml('config.yaml')``````bashdocker-compose -f config.yaml up```````To monitor a Flow locally you need to install Prometheus and Grafana locally. The easiest way to do this is withDocker Compose.First clone the repo which contains the config file:```bashgit clone https://github.com/jina-ai/example-grafana-prometheuscd example-grafana-prometheus/prometheus-grafana-local```then```bashdocker-compose up```Access the Grafana dashboard at `http://localhost:3000`. The username is `admin` and the password is `foobar`.```{caution}This example works locally because Prometheus is configured to listen to ports 8000 and 9000. However,in contrast to deploying on Kubernetes, you need to tell Prometheus which port to look at. You can change theseports by modifying [prometheus.yml](https://github.com/jina-ai/example-grafana-prometheus/blob/8baf519f7258da68cfe224775fc90537a749c305/prometheus-grafana-local/prometheus/prometheus.yml#L64).```### Deploying on JcloudIf your Flow is deployed on JCloud, you don't need to provision a monitoring stack yourself. Prometheus and Grafana arehandled by JCloud and you can find a dashboard URL with `jc status `## Using Grafana to visualize metricsAccess the Grafana homepage, then go to `Browse` then `import` and copy and paste the [JSON file](https://github.com/jina-ai/example-grafana-prometheus/blob/main/grafana-dashboards/flow.json)You should see the following dashboard:```{figure} ../../.github/2.0/grafana.png:align: center```````{admonition} Hint:class: hintYou should query your Flow to generate the first metrics. Otherwise the dashboard looks empty.````You can query the Flow by running:```pythonfrom typing import Optionalfrom docarray import DocList, BaseDocfrom docarray.typing import NdArrayfrom jina import Clientclass MyDoc(BaseDoc): text: str embedding: Optional[NdArray] = Noneclient = Client(port=51000)client.post(on='/', inputs=DocList[MyDoc]([MyDoc(text=f'Text for document {i}') for in range(100)]), return_type=DocList[MyDoc], request_size=10,)```## See also- [Using Grafana to visualize Prometheus metrics](https://grafana.com/docs/grafana/latest/getting-started/getting-started-prometheus/)- {ref}`Defining custom metrics in an Executor ` --- # Source: https://github.com/jina-ai/jina/blob/master/docs/cloud-nativeness/opentelemetry-migration.md(opentelemetry-migration)=# Migrate from Prometheus/Grafana to OpenTelemetryThe {ref}`Prometheus/Grafana ` based monitoring setup will soon be deprecated in favor of the {ref}`OpenTelemetry setup `. This section provides the details required to update/migrate your Prometheus configuration and Grafana dashboard to continue monitoring with OpenTelemetry. Refer to {ref}`Opentelemetry setup ` for the new setup before proceeding further.```{hint}:class: seealsoRefer to {ref}`Prometheus/Grafana-only ` section for the soon to be deprecated setup.```## Update Prometheus configurationWith a Prometheus-only setup, you need to set up a `scrape_configs` configuration or service discovery plugin to specify the targets for pulling metrics data. In the OpenTelemetry setup, each Pod pushes metrics to the OpenTelemetry Collector. The Prometheus configuration now only needs to scrape from the OpenTelemetry Collector to get all the data from OpenTelemetry-instrumented applications.The new Prometheus configuration for the `otel-collector` Collector hostname is:```yamlscrape_configs: - job_name: 'otel-collector' scrape_interval: 500ms static_configs: - targets: ['otel-collector:8888'] # metrics from the collector itself - targets: ['otel-collector:8889'] # metrics collected from other applications```## Update Grafana dashboardThe OpenTelemetry [Histogram](https://opentelemetry.io/docs/reference/specification/metrics/api/#histogram) provides quantile window buckets automatically (unlike the Prometheus [Summary](https://prometheus.io/docs/concepts/metric_types/#summary) instrument). You need to manually configure the required quantile window. The quatile window metric will then be available as a separate time series metric.In addition, the OpenTelemetry `Counter/UpDownCounter` instruments do not add the `_total` suffix to the base metric name.To adapt Prometheus queries in Grafana:- Use the [histogram_quantile](https://prometheus.io/docs/prometheus/latest/querying/functions/#histogram_quantile) function to query the average or desired quantile window time series data from Prometheus. For example, to view the 0.99 quantile of the `jina_receiving_request_seconds` metric over the last 10 minutes, use query `histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[10m]))`.- Remove the `_total` prefix from the Counter/UpDownCounter metric names.You can download a [sample Grafana dashboard JSON file](https://github.com/jina-ai/example-grafana-prometheus/blob/main/grafana-dashboards/flow-histogram-metrics.json) and import it into Grafana to get started with some pre-built graphs.```{hint}A list of available metrics is in the {ref}`Flow Instrumentation ` section.``` --- # Source: https://github.com/jina-ai/jina/blob/master/docs/cloud-nativeness/opentelemetry.md(opentelemetry)=# {octicon}`telescope-fill` OpenTelemetry Support```{toctree}:hidden:opentelemetry-migrationmonitoring``````{hint}Prometheus-only based metrics collection will soon be deprecated. Refer to {ref}`Monitor with Prometheus and Grafana ` for the old setup.```There are two major setups required to visualize/monitor your application's signals using [OpenTelemetry](https://opentelemetry.io). The first setup is covered by Jina-serve which integrates the [OpenTelemetry API and SDK](https://opentelemetry-python.readthedocs.io/en/stable/api/index.html) at the application level. The {ref}`Flow Instrumentation ` page covers in detail the steps required to enable OpenTelemetry in a Flow. A {class}`~jina.Client` can also be instrumented which is documented in the {ref}`Client Instrumentation ` section.This section covers the OpenTelemetry infrastructure setup required to collect, store and visualize the traces and metrics data exported by the Pods. This setup is the user's responsibility, and this section only serves as the initial/introductory guide to running OpenTelemetry infrastructure components.Since OpenTelemetry is open source and is mostly responsible for the API standards and specification, various providers implement the specification. This section follows the default recommendations from the OpenTelemetry documentation that also fits into the Jina-serve implementations.## Exporting traces and metrics dataPods created using a {class}`~jina.Flow` with tracing or metrics enabled use the [SDK Exporters](https://opentelemetry.io/docs/instrumentation/python/exporters/) to send the data to a central [Collector](https://opentelemetry.io/docs/collector/) component. You can use this collector to further process and store the data for visualization and alerting.The push/export-based mechanism also allows the application to start pushing data immediately on startup. This differs from the pull-based mechanism where you need a separate scraping registry to discovery service to identify data scraping targets.You can configure the exporter backend host and port using the `traces_exporter_host`, `traces_exporter_port`, `metrics_exporter_host` and `metrics_exporter_port`. Even though the Collector is metric data-type agnostic (it accepts any type of OpenTelemetry API data model), we provide separate configuration for Tracing and Metrics to give you more flexibility in choosing infrastructure components.Jina-serve's default exporter implementation is `OTLPSpanExporter` and `OTLPMetricExporter`. The exporters also use the gRPC data transfer protocol. The following environment variables can be used to further configure the exporter client based on your requirements. The full list of exporter related environment variables are documented by the [PythonSDK library](https://opentelemetry-python.readthedocs.io/en/latest/exporter/otlp/otlp.html). Apart from `OTEL_EXPORTER_OTLP_PROTOCOL` and `OTEL_EXPORTER_OTLP_TRACES_ENDPOINT`, you can use all other library version specific environment variables to configure the exporter clients.## CollectorThe [Collector](https://opentelemetry.io/docs/collector/) is a huge ecosystem of components that support features like scraping, collecting, processing and further exporting data to storage backends. The collector itself can also expose endpoints to allow scraping data. We recommend reading the official documentation to understand the the full set of features and configuration required to run a Collector. Read the below section to understand the minimum number of components and the respective configuration required for operating with Jina-serve.We recommend using the [OpenTelemetry Collector](https://opentelemetry.io/docs/collector/) from the contrib repository. We also use:- [Jaeger](https://www.jaegertracing.io) for collecting traces, visualizing tracing data and alerting based on tracing data.- [Prometheus](https://prometheus.io) for collecting metric data and/or alerting.- [Grafana](https://grafana.com) for visualizing data from Prometheus/Jaeger and/or alerting based on the data queried.```{hint}Jaeger provides a comprehensive out of the box tools for end-to-end tracing monitoring, visualization and alerting. You can substitute other tools to achieve the necessary goals of observability and performance analysis. The same can be said for Prometheus and Grafana.```### Docker ComposeA minimal `docker-compose.yml` file can look like:```yamlversion: "3"services: # Jaeger jaeger: image: jaegertracing/all-in-one:latest ports: - "16686:16686" otel-collector: image: otel/opentelemetry-collector:0.61.0 command: [ "--config=/etc/otel-collector-config.yml" ] volumes: - ${PWD}/otel-collector-config.yml:/etc/otel-collector-config.yml ports: - "8888" # Prometheus metrics exposed by the collector - "8889" # Prometheus exporter metrics - "4317:4317" # OTLP gRPC receiver depends_on: - jaeger prometheus: container_name: prometheus image: prom/prometheus:latest volumes: - ${PWD}/prometheus-config.yml:/etc/prometheus/prometheus.yml ports: - "9090:9090" grafana: container_name: grafana image: grafana/grafana-oss:latest ports: - 3000:3000```The corresponding OpenTelemetry Collector configuration below needs to be stored in file `otel-collector-config.yml`:```yamlreceivers: otlp: protocols: grpc:exporters: jaeger: endpoint: jaeger:14250 tls: insecure: true prometheus: endpoint: "0.0.0.0:8889" resource_to_telemetry_conversion: enabled: true # can be used to add additional labels const_labels: label1: value1processors: batch:service: extensions: [] pipelines: traces: receivers: [otlp] exporters: [jaeger] processors: [batch] metrics: receivers: [otlp] processors: [batch] exporters: [prometheus]```This setup creates a gRPC Collector Receiver on port 4317 that collects data pushed by the Flow Pods. Collector exporters for Jaeger and Prometheus backends are configured to export tracing and metrics data respectively. The final **service** section creates a collector pipeline combining the receiver (collect data) and exporter (to backend), process (batching) sub-components.The minimal Prometheus configuration needs to be stored in `prometheus-config.yml`.```yamlscrape_configs: - job_name: 'otel-collector' scrape_interval: 500ms static_configs: - targets: ['otel-collector:8889'] - targets: ['otel-collector:8888']```The Prometheus configuration now only needs to scrape from the OpenTelemetry Collector to get all the data from OpenTelemetry Metrics instrumented applications.### Running a Flow locallyRun the Flow and a sample request that we want to instrument locally. If the backends are running successfully the Flow has exported data to the Collector which can be queried and viewed.First start a Flow:```pythonfrom jina import Flow, Executor, requestsfrom docarray import DocList, BaseDocimport timeclass MyExecutor(Executor): @requests def foo(self, docs: DocList[BaseDoc], **kwargs) -> DocList[BaseDoc]: time.sleep(0.5) return docswith Flow( port=54321, tracing=True, traces_exporter_host='http://localhost', traces_exporter_port=4317, metrics=True, metrics_exporter_host='http://localhost', metrics_exporter_port=4317,).add(uses=MyExecutor) as f: f.block()```Second execute requests using the instrumented {class}`jina.Client`:```pythonfrom jina import Clientfrom docarray import DocList, BaseDocclient = Client( host='grpc://localhost:54321', tracing=True, traces_exporter_host='http://localhost', traces_exporter_port=4317,)client.post('/', DocList[BaseDoc]([BaseDoc()]), return_type=DocList[BaseDoc])client.teardown_instrumentation()``````{hint}The {class}`jina.Client` currently only supports OpenTelemetry Tracing.```## Viewing Traces in Jaeger UIYou can open the Jaeger UI [here](http://localhost:16686). You can find more information on the Jaeger UI in the official [docs](https://www.jaegertracing.io/docs/1.38/external-guides/#using-jaeger).```{hint}The list of available traces are documented in the {ref}`Flow Instrumentation ` section.```## Monitor with Prometheus and GrafanaExternal entities (like Grafana) can access these aggregated metrics via the [PromQL](https://prometheus.io/docs/prometheus/latest/querying/basics/) query language, and let users visualize metrics with dashboards. Check out a [comprehensive tutorial](https://prometheus.io/docs/visualization/grafana/) for more information.Download a [sample Grafana dashboard JSON file](https://github.com/jina-ai/example-grafana-prometheus/blob/main/grafana-dashboards/flow-histogram-metrics.json) and import it into Grafana to get started with some pre-built graphs:```{figure} ../../.github/2.0/grafana-histogram-metrics.png:align: center``````{hint}:class: seealsoA list of available metrics is in the {ref}`Flow Instrumentation ` section.To update your existing Prometheus and Grafana configurations, refer to the {ref}`OpenTelemetry migration guide `.``` --- # Source: https://github.com/jina-ai/jina/blob/master/docs/concepts/client/callbacks.md(callback-functions)=# CallbacksAfter performing {meth}`~jina.clients.mixin.PostMixin.post`, you may want to further process the obtained results.For this purpose, Jina-serve implements a promise-like interface, letting you specify three kinds of callback functions:- `on_done` is executed while streaming, after successful completion of each request- `on_error` is executed while streaming, whenever an error occurs in each request- `on_always` is always performed while streaming, no matter the success or failure of each requestNote that these callbacks only work for requests (and failures) *inside the stream*, for example inside an Executor.If the failure is due to an error happening outside ofstreaming, then these callbacks will not be triggered.For example, a `SIGKILL` from the client OS during the handling of the request, or a networking issue,will not trigger the callback.Callback functions in Jina-serve expect a `Response` of the type {class}`~jina.types.request.data.DataRequest`, which contains resulting Documents,parameters, and other information.## Handle DataRequest in callbacks`DataRequest`s are objects that are sent by Jina-serve internally. Callback functions process DataRequests, and `client.post()`can return DataRequests.`DataRequest` objects can be seen as a container for data relevant for a given request, it contains the following fields:````{tab} headerThe request header.```pythonfrom pprint import pprintfrom jina import ClientClient().post(on='/', on_done=lambda x: pprint(x.header))``````consolerequest_id: "ea504823e9de415d890a85d1d00ccbe9"exec_endpoint: "/"target_executor: ""```````````{tab} parametersThe input parameters of the associated request. In particular, `DataRequest.parameters['__results__']` is areserved field that gets populated by Executors returning a Python `dict`.Information in those returned `dict`s gets collected here, behind each Executor ID.```pythonfrom pprint import pprintfrom jina import ClientClient().post(on='/', on_done=lambda x: pprint(x.parameters))``````console{'__results__': {}}```````````{tab} routesThe routing information of the data request. It contains the which Executors have been called, and the order in which they were called.The timing and latency of each Executor is also recorded.```pythonfrom pprint import pprintfrom jina import ClientClient().post(on='/', on_done=lambda x: pprint(x.routes))``````console[executor: "gateway"start_time { seconds: 1662637747 nanos: 790248000}end_time { seconds: 1662637747 nanos: 794104000}, executor: "executor0"start_time { seconds: 1662637747 nanos: 790466000}end_time { seconds: 1662637747 nanos: 793982000}]```````````{tab} docsThe DocList being passed between and returned by the Executors. These are the Documents usually processed in a callback function, and are often the main payload.```pythonfrom pprint import pprintfrom jina import ClientClient().post(on='/', on_done=lambda x: pprint(x.docs))``````console```````Accordingly, a callback that processing documents can be defined as:```{code-block} python---emphasize-lines: 4---from jina.types.request.data import DataRequestdef my_callback(resp: DataRequest): foo(resp.docs)```## Handle exceptions in callbacksServer error can be caught by Client's `on_error` callback function. You can get the error message and traceback from `header.status`:```pythonfrom pprint import pprintfrom jina import Flow, Client, Executor, requestsclass MyExec1(Executor): @requests def foo(self, **kwargs): raise NotImplementedErrorwith Flow(port=12345).add(uses=MyExec1) as f: c = Client(port=f.port) c.post(on='/', on_error=lambda x: pprint(x.header.status))``````textcode: ERRORdescription: "NotImplementedError()"exception { name: "NotImplementedError" stacks: "Traceback (most recent call last):\n" stacks: " File \"/Users/hanxiao/Documents/jina/jina/serve/runtimes/worker/__init__.py\", line 181, in process_data\n result = await self._data_request_handler.handle(requests=requests)\n" stacks: " File \"/Users/hanxiao/Documents/jina/jina/serve/runtimes/request_handlers/data_request_handler.py\", line 152, in handle\n return_data = await self._executor.__acall__(\n" stacks: " File \"/Users/hanxiao/Documents/jina/jina/serve/executors/__init__.py\", line 301, in __acall__\n return await self.__acall_endpoint__(__default_endpoint__, **kwargs)\n" stacks: " File \"/Users/hanxiao/Documents/jina/jina/serve/executors/__init__.py\", line 322, in __acall_endpoint__\n return func(self, **kwargs)\n" stacks: " File \"/Users/hanxiao/Documents/jina/jina/serve/executors/decorators.py\", line 213, in arg_wrapper\n return fn(executor_instance, *args, **kwargs)\n" stacks: " File \"/Users/hanxiao/Documents/jina/toy44.py\", line 10, in foo\n raise NotImplementedError\n" stacks: "NotImplementedError\n" executor: "MyExec1"}```In the example below, our Flow passes the message then prints the result when successful.If something goes wrong, it beeps. Finally, the result is written to output.txt.```pythonfrom jina import Flow, Clientfrom docarray import BaseDocdef beep(*args): # make a beep sound import sys sys.stdout.write('\a')with Flow().add() as f, open('output.txt', 'w') as fp: client = Client(port=f.port) client.post( '/', BaseDoc(), on_done=print, on_error=beep, on_always=lambda x: x.docs.save(fp), )```````{admonition} What errors can be handled by the callback?:class: cautionCallbacks can handle errors that are caused by Executors raising an Exception.A callback will not receive exceptions:- from the Gateway having connectivity errors with the Executors.- between the Client and the Gateway.```` --- # Source: https://github.com/jina-ai/jina/blob/master/docs/concepts/client/index.md(client)=# {fas}`laptop-code` Client{class}`~jina.Client` enables you to send Documents to a running {class}`~jina.Flow`. Same as Gateway, Client supports four networking protocols: **gRPC**, **HTTP**, **WebSocket** and **GraphQL** with/without TLS.You may have observed two styles of using a Client in the docs:````{tab} Implicit, inside a Flow```{code-block} python---emphasize-lines: 6---from jina import Flowf = Flow()with f: f.post('/')```````````{tab} Explicit, outside a Flow```{code-block} python---emphasize-lines: 3,4---from jina import Clientc = Client(...) # must match the Flow setupc.post('/')```````The implicit style is easier in debugging and local development, as you don't need to specify the host, port and protocol of the Flow. However, it makes very strong assumptions on (1) one Flow only corresponds to one client (2) the Flow is running on the same machine as the Client. For those reasons, explicit style is recommended for production use.```{hint}If you want to connect to your Flow from a programming language other than Python, please follow the third partyclient {ref}`documentation `.```## ConnectTo connect to a Flow started by:```pythonfrom jina import Flowwith Flow(port=1234, protocol='grpc') as f: f.block()``````text────────────────────────── 🎉 Flow is ready to serve! ──────────────────────────╭────────────── 🔗 Endpoint ───────────────╮│ ⛓ Protocol GRPC ││ 🏠 Local 0.0.0.0:1234 ││ 🔒 Private 192.168.1.126:1234 ││ 🌍 Public 87.191.159.105:1234 │╰──────────────────────────────────────────╯```The Client has to specify the followings parameters to match the Flow and how it was set up:* the `protocol` it needs to use to communicate with the Flow* the `host` and the `port` as exposed by the Flow* if it needs to use `TLS` encryption (to connect to a {class}`~jina.Flow` that has been {ref}`configured to use TLS ` in combination with gRPC, http, or websocket)````{Hint} Default portThe default port for the Client is `80` unless you are using `TLS` encryption it will be `443`````You can define these parameters by passing a valid URI scheme as part of the `host` argument:````{tab} TLS disabled```pythonfrom jina import ClientClient(host='http://my.awesome.flow:1234')Client(host='ws://my.awesome.flow:1234')Client(host='grpc://my.awesome.flow:1234')```````````{tab} TLS enabled```pythonfrom jina import ClientClient(host='https://my.awesome.flow:1234')Client(host='wss://my.awesome.flow:1234')Client(host='grpcs://my.awesome.flow:1234')```````Equivalently, you can pass each relevant parameter as a keyword argument:````{tab} TLS disabled```pythonfrom jina import ClientClient(host='my.awesome.flow', port=1234, protocol='http')Client(host='my.awesome.flow', port=1234, protocol='websocket')Client(host='my.awesome.flow', port=1234, protocol='grpc')```````````{tab} TLS enabled```pythonfrom jina import ClientClient(host='my.awesome.flow', port=1234, protocol='http', tls=True)Client(host='my.awesome.flow', port=1234, protocol='websocket', tls=True)Client(host='my.awesome.flow', port=1234, protocol='grpc', tls=True)```````You can also use a mix of both:```pythonfrom jina import ClientClient(host='https://my.awesome.flow', port=1234)Client(host='my.awesome.flow:1234', protocol='http', tls=True)```````{admonition} Caution:class: cautionYou can't define these parameters both by keyword argument and by host scheme - you can't have two sources of truth.Example: the following code will raise an exception:```pythonfrom jina import ClientClient(host='https://my.awesome.flow:1234', port=4321)```````````{admonition} Caution:class: cautionWe apply `RLock` to avoid [this gRPC issue](https://github.com/grpc/grpc/issues/25364), so that `grpc` clients can be used in a multi-threaded environment.What you should do, is to rely on asynchronous programming or multi-processing rather than multi-threading.For instance, if you're building a web server, you can introduce multi-processing based parallelism to your app using`gunicorn`: `gunicorn main:app --workers 4 --worker-class uvicorn.workers.UvicornWorker ...`````## Client APIWhen using `docarray>=0,30`, you specify the schema that you expect the Deployment or Flow to return. You can pass the return type by using the `return_type` parameter in the `client.post` method:```{code-block} python---emphasize-lines: 7---from jina import Clientfrom docarray import DocList, BaseDocclass InputDoc(BaseDoc): text: str = ''class OutputDoc(BaseDoc): tags: Dict[str, int] = {}c = Client(host='https://my.awesome.flow:1234', port=4321)c.post( on='/', inputs=InputDoc(), return_type=DocList[OutputDoc],)```(client-compress)=## Enable compressionIf the communication to the Gateway is via gRPC, you can pass `compression` parameter to {meth}`~jina.clients.mixin.PostMixin.post` to benefit from [gRPC compression](https://grpc.github.io/grpc/python/grpc.html#compression) methods.The supported choices are: None, `gzip` and `deflate`.```pythonfrom jina import Clientclient = Client()client.post(..., compression='Gzip')```Note that this setting is only effective the communication between the client and the Flow's gateway.One can also specify the compression of the internal communication {ref}`as described here`.## Test readiness of the server```{include} ../orchestration/readiness.md:start-after: :end-before: ```## Simple profiling of the latencyBefore sending any real data, you can test the connectivity and network latency by calling the {meth}`~jina.clients.mixin.ProfileMixin.profiling` method:```pythonfrom jina import Clientc = Client(host='grpc://my.awesome.flow:1234')c.profiling()``````text Roundtrip 24ms 100%├── Client-server network 17ms 71%└── Server 7ms 29% ├── Gateway-executors network 0ms 0% ├── executor0 5ms 71% └── executor1 2ms 29%```## Logging configurationSimilar to the {ref}`Flow logging configuration `, the {class}`jina.Client` also accepts the `log_config` argument. The Client can be configured as below:```pythonfrom jina import Clientclient = Client(log_config='./logging.json.yml')```If the Flow is configured with custom logging, the argument will be forwarded to the implicit client.```pythonfrom jina import Flowf = Flow(log_config='./logging.json.yml')with f: # the implicit client automatically uses the log_config from the Flow for consistency f.post('/')``````{toctree}:hidden:send-receive-datasend-parameterssend-graphql-mutationtransient-errorscallbacksrate-limitinstrumentationthird-party-clients``` --- # Source: https://github.com/jina-ai/jina/blob/master/docs/concepts/client/instrumentation.md(instrumenting-client)=## InstrumentationThe {class}`~jina.Client` supports request tracing, giving you an end-to-end view of a request's lifecycle. The client supports **gRPC**, **HTTP** and **WebSocket** protocols.````{tab} Implicit, inside a Flow```{code-block} python---emphasize-lines: 4, 5, 6---from jina import Flowf = Flow( tracing=True, traces_exporter_host='http://localhost', traces_exporter_port=4317, )with f: f.post('/')```````````{tab} Explicit, outside a Flow```{code-block} python---emphasize-lines: 5, 6, 7---from jina import Client# must match the Flow setupc = Client( tracing=True, traces_exporter_host='http://localhost', traces_exporter_port=4317,)c.post('/')```````Each protocol client creates the first trace ID which will be propagated to the `Gateway`. The `Gateway` then creates child spans using the available trace ID which is further propagated to each Executor request. Using the trace ID, all associated spans can be collected to build a trace view of the whole request lifecycle.```{admonition} Using custom/external tracing context:class: cautionThe {class}`~jina.Client` doesn't currently support external tracing context which can potentially be extracted from an upstream request.```You can find more about instrumentation from the resources below:- [Tracing in OpenTelemetry](https://opentelemetry.io/docs/concepts/signals/traces/)- {ref}`Instrumenting a Flow `- {ref}`Deploying and using OpenTelemetry in Jina-serve ` --- # Source: https://github.com/jina-ai/jina/blob/master/docs/concepts/client/rate-limit.md(client-post-prefetch)=# Rate LimitThere are two ways of applying a rate limit using the {class}`~jina.Client`.1. Set using the `Client` class constructor and defaults to 1,000 requests.1. Set the argument when using {meth}`~jina.clients.mixin.PostMixin.post` method. If not provided, the default value of1,000 requests will be used. The method argument will override the argument provided in the `Client` class constructor.The `prefetch` argument controls the number of in flight requests made by the {meth}`~jina.clients.mixin.PostMixin.post`method. Using the default value might overload the {class}`~jina.Gateway` or {class}`~jina.Executor` especially if the operation characteristics of the `Deployment` or `Flow`are unknown. Furthermore the Client can send various types of requests which can have varying resource usage.For example, a high number of `index` requests can contain a large data payload requiring high input/output operation.This increases CPU consumption and eventually lead to a build up of the requests on the Flow. If the queue of in-flight requestsis already large, a very light weight `search` request to return the total number ofDocuments in the index might be blocked until the queue of `index` requests can be completely processed. To prevent such a scenario,apply the `prefetch` value on the {meth}`~jina.clients.mixin.PostMixin.post` method to limit the rate ofrequests for expensive operations.Apply the `prefetch` argument on the {meth}`~jina.clients.mixin.PostMixin.post` method to dynamically increasethe server responsiveness for customer-facing requests which require faster response times vs. background requests such as cronjobs oranalytics requests which can be processed slowly.```pythonfrom jina import Clientclient = Client()# uses the default limit of 1,000 requestssearch_responses = client.post(...)# sets a hard limit of 5 in flight requestsindex_responses = client.post(..., prefetch=5)```A global rate limit on the {class}`~jina.Gateway` can also be set using the {ref}`prefetch ` option in the `Flow`.This argument however serves as a global rate limit and cannot be customized based on the request workload. The `prefetch`argument for the `Client` serves as a class level rate limit for all requests made from the client. The `prefetch`argument for the {meth}`~jina.clients.mixin.PostMixin.post` method serves as a method level overriding the arguments at the`Client` and the `Flow`. --- # Source: https://github.com/jina-ai/jina/blob/master/docs/concepts/client/send-graphql-mutation.md# Send GraphQL MutationIf the Flow is configured with GraphQL endpoint, then you can use Jina-serve {class}`~jina.Client` {meth}`~jina.clients.mixin.MutateMixin.mutate` to fetch data via GraphQL mutations:````{admonition} Only available for docarray<0.30:class: noteThis feature is only available when using `docarray<0.30`.```````pythonfrom jina import ClientPORT = ...c = Client(port=PORT)mut = ''' mutation { docs(data: {text: "abcd"}) { id matches { embedding } } } '''response = c.mutate(mutation=mut)```Note that `response` here is `Dict` not a `DocumentArray`. This is because GraphQL allows the user to specify only certain fields that they want to have returned, so the output might not be a valid DocumentArray, it can be only a string.## Mutations and argumentsThe Flow GraphQL API exposes the mutation `docs`, which sends its inputs to the Flow's Executors,just like HTTP `post` as described {ref}`above `.A GraphQL mutation takes the same set of arguments used in {ref}`HTTP `.The response from GraphQL can include all fields available on a DocumentArray.````{admonition} See Also:class: seealsoFor more details on the GraphQL format of Document and DocumentArray, see the [documentation page](https://docarray.jina.ai/advanced/graphql-support/)or [developer reference](https://docarray.jina.ai/api/docarray.document.mixins.strawberry/).````## FieldsThe available fields in the GraphQL API are defined by the [Document Strawberry type](https://docarray.jina.ai/advanced/graphql-support/?highlight=graphql).Essentially, you can ask for any property of a Document, including `embedding`, `text`, `tensor`, `id`, `matches`, `tags`,and more. --- # Source: https://github.com/jina-ai/jina/blob/master/docs/concepts/client/send-parameters.md(client-executor-parameters)=# Send ParametersThe {class}`~jina.Client` can send key-value pairs as parameters to {class}`~jina.Executor`s as shown below:```{code-block} python---emphasize-lines: 15---from jina import Client, Executor, Deployment, requestsfrom docarray import BaseDocclass MyExecutor(Executor): @requests def foo(self, parameters, **kwargs): print(parameters['hello'])dep = Deployment(uses=MyExecutor)with dep: client = Client(port=dep.port) client.post('/', BaseDoc(), parameters={'hello': 'world'})```````{hint}:class: noteYou can send a parameters-only data request via:```pythonwith dep: client = Client(port=dep.port) client.post('/', parameters={'hello': 'world'})```This might be useful to control `Executor` objects during their lifetime.````Since Executors {ref}`can use Pydantic models to have strongly typed parameters `, you can also send parameters as Pydantic models in the client API(specific-params)=## Send parameters to specific ExecutorsYou can send parameters to specific Executor by using the `executor__parameter` syntax.The Executor named `executorname` will receive the parameter `paramname` (without the `executorname__` in the key name)and none of the other Executors will receive it.For instance in the following Flow:```pythonfrom jina import Flow, Clientfrom docarray import BaseDoc, DocListwith Flow().add(name='exec1').add(name='exec2') as f: client = Client(port=f.port) client.post( '/index', DocList[BaseDoc]([BaseDoc()]), parameters={'exec1__parameter_exec1': 'param_exec1', 'exec2__parameter_exec1': 'param_exec2'}, )```The Executor `exec1` will receive `{'parameter_exec1':'param_exec1'}` as parameters, whereas `exec2` will receive `{'parameter_exec1':'param_exec2'}`.This feature is intended for the case where there are multiple Executors that take the same parameter names, but you want to use different values for each Executor.This is often the case for Executors from the Hub, since they tend to share a common interface for parameters.```{admonition} Difference to target_executorWhy do we need this feature if we already have `target_executor`?On the surface, both of them is about sending information to a partial Flow, i.e. a subset of Executors. However, they work differently under the hood. `target_executor` directly send info to those specified executors, ignoring the topology of the Flow; whereas `executor__parameter`'s request follows the topology of the Flow and only send parameters to the Executor that matches.Think about roll call and passing notes in a classroom. `target_executor` is like calling a student directly, whereas `executor__parameter` is like asking him/her to pass the notes to the next student one by one while each picks out the note with its own name.``` --- # Source: https://github.com/jina-ai/jina/blob/master/docs/concepts/client/send-receive-data.md# Send & Receive DataAfter a {class}`~jina.Client` has connected to a {class}`~jina.Deployment` or a {class}`~jina.Flow`, it can send requests to the service using its{meth}`~jina.clients.mixin.PostMixin.post` method.This expects as inputs the {ref}`Executor endpoint ` that you want to target, as well as a Document orIterable of Documents:````{tab} A single Document```{code-block} python---emphasize-lines: 6---from docarray.documents import TextDocd1 = TextDoc(text='hello')client = Client(...)client.post('/endpoint', d1)```````````{tab} A list of Documents```{code-block} python---emphasize-lines: 7---from docarray.documents import TextDocd1 = TextDoc(text='hello')d2 = TextDoc(text='world')client = Client(...)client.post('/endpoint', inputs=[d1, d2])```````````{tab} A DocList```{code-block} python---emphasize-lines: 6---from docarray import DocListd1 = TextDoc(text='hello')d2 = TextDoc(text='world')da = DocList[TextDoc]([d1, d2])client = Client(...)client.post('/endpoint', da)```````````{tab} A Generator of Document```{code-block} python---emphasize-lines: 3-5, 9---from docarray.documents import TextDocdef doc_gen(): for j in range(10): yield TextDoc(content=f'hello {j}')client = Client(...)client.post('/endpoint', doc_gen)```````````{tab} No Document```{code-block} python---emphasize-lines: 3---client = Client(...)client.post('/endpoint')``````````{admonition} Caution:class: caution`Flow` and `Deployment` also provide a `.post()` method that follows the same interface as `client.post()`.However, once your solution is deployed remotely, these objects are not present anymore.Hence, `deployment.post()` and `flow.post()` are not recommended outside of testing or debugging use cases.```(request-size-client)=## Send data in batchesEspecially during indexing, a Client can send up to thousands or millions of Documents to a {class}`~jina.Flow`.Those Documents are internally batched into a `Request`, providing a smaller memory footprint and faster response timesthanksto {ref}`callback functions `.The size of these batches can be controlled with the `request_size` keyword.The default `request_size` is 100 Documents. The optimal size will depend on your use case.```pythonfrom jina import Deployment, Clientfrom docarray import DocList, BaseDocwith Deployment() as dep: client = Client(port=f.port) client.post('/', DocList[BaseDoc](BaseDoc() for _ in range(100)), request_size=10)```## Send data asynchronouslyThere is an async version of the Python Client which works with {meth}`~jina.clients.mixin.PostMixin.post` and{meth}`~jina.clients.mixin.MutateMixin.mutate`.While the standard `Client` is also asynchronous under the hood, its async version exposes this fact to the outsideworld,by allowing *coroutines* as input, and returning an *asynchronous iterator*.This means you can iterate over Responses one by one, as they come in.```pythonimport asynciofrom jina import Client, Deploymentfrom docarray import BaseDocasync def async_inputs(): for _ in range(10): yield BaseDoc() await asyncio.sleep(0.1)async def run_client(port): client = Client(port=port, asyncio=True) async for resp in client.post('/', async_inputs, request_size=1): print(resp)with Deployment() as dep: # Using it as a Context Manager will start the Deployment asyncio.run(run_client(dep.port))```Async send is useful when calling an external service from an Executor.```pythonfrom jina import Client, Executor, requestsfrom docarray import DocList, BaseDocclass DummyExecutor(Executor): c = Client(host='grpc://0.0.0.0:51234', asyncio=True) @requests async def process(self, docs: DocList[BaseDoc], **kwargs) -> DocList[BaseDoc]: return self.c.post('/', docs, return_type=DocList[BaseDoc])```## Send data to specific ExecutorsUsually a {class}`~jina.Flow` will send each request to all {class}`~jina.Executor`s with matching endpoints asconfigured. But the {class}`~jina.Client` also allows you to only target specific Executors in a Flow usingthe `target_executor` keyword. The request will then only be processed by the Executors which match the providedtarget_executor regex. Its usage is shown in the listing below.```pythonfrom jina import Client, Executor, Flow, requestsfrom docarray import DocListfrom docarray.documents import TextDocclass FooExecutor(Executor): @requests async def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]: for doc in docs: doc.text = f'foo was here and got {len(docs)} document'class BarExecutor(Executor): @requests async def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]: for doc in docs: doc.text = f'bar was here and got {len(docs)} document'f = ( Flow() .add(uses=FooExecutor, name='fooExecutor') .add(uses=BarExecutor, name='barExecutor'))with f: # Using it as a Context Manager will start the Flow client = Client(port=f.port) docs = client.post(on='/', inputs=TextDoc(text=''), target_executor='bar*', return_type=DocList[TextDoc]) print(docs.text)```This will send the request to all Executors whose names start with 'bar', such as 'barExecutor'.In the simplest case, you can specify a precise Executor name, and the request will be sent only to that singleExecutor.## Use Unary or Streaming gRPCThe Flow with **gRPC** protocol implements the unary and the streaming RPC lifecycle for communicating with the clients.When sending more than one request using the batching or the iterator mechanism, the RPC lifecycle for the{meth}`~jina.clients.mixin.PostMixin.post` method can be controlled using the `stream` boolean method argument. Bydefault the stream option is set to `True` which uses the streaming RPC to send the data to the Flow. If the streamoption is set to `False`, the unary RPC is used to send the data to the Flow.Both RPC lifecycles are implemented to provide the flexibility for the clients.There might be performance penalties when using the streaming RPC in the Python gRPC implementation.```{hint}This option is only valid for **gRPC** protocol.Refer to the gRPC [Performance Best Practices](https://grpc.io/docs/guides/performance/#general) guide for more implementations details and considerations.```(client-grpc-channel-options)=## Configure gRPC Client optionsThe `Client` supports the `grpc_channel_options` parameter which allows more customization of the **gRPC** channelconstruction. The `grpc_channel_options` parameter accepts a dictionary of **gRPC** configuration options which will beused to overwrite the default options. The default **gRPC** options are:```('grpc.max_send_message_length', -1),('grpc.max_receive_message_length', -1),('grpc.keepalive_time_ms', 9999),# send keepalive ping every 9 second, default is 2 hours.('grpc.keepalive_timeout_ms', 4999),# keepalive ping time out after 4 seconds, default is 20 seconds('grpc.keepalive_permit_without_calls', True),# allow keepalive pings when there's no gRPC calls('grpc.http1.max_pings_without_data', 0),# allow unlimited amount of keepalive pings without data('grpc.http1.min_time_between_pings_ms', 10000),# allow grpc pings from client every 9 seconds('grpc.http1.min_ping_interval_without_data_ms', 5000),# allow grpc pings from client without data every 4 seconds```If the `max_attempts` is greater than 1 on the {meth}`~jina.clients.mixin.PostMixin.post` method,the `grpc.service_config` option will not be applied since the retryoptions will be configured internally.Refer to the [channel_arguments](https://grpc.github.io/grpc/python/glossary.html#term-channel_arguments) section forthe full list of available **gRPC** options.```{hint}:class: seealsoRefer to the {ref}`Configure Executor gRPC options ` section for configuring the `Executor` **gRPC** options.```## Returns{meth}`~jina.clients.mixin.PostMixin.post` returns a `DocList` containing all Documents flattened over allRequests. When setting `return_responses=True`, this behavior is changed to returning a list of{class}`~jina.types.request.data.Response` objects.If a callback function is provided, `client.post()` will return none.````{tab} Return as DocList objects```pythonfrom jina import Deployment, Clientfrom docarray import DocListfrom docarray.documents import TextDocwith Deployment() as dep: client = Client(port=dep.port) docs = client.post(on='', inputs=TextDoc(text='Hi there!'), return_type=DocList[TextDoc]) print(docs) print(docs.text)``````console['Hi there!']```````````{tab} Return as Response objects```pythonfrom docarray import DocListfrom docarray.documents import TextDocwith Deployment() as dep: client = Client(port=dep.port) resp = client.post(on='', inputs=TextDoc(text='Hi there!'), return_type=DocList[TextDoc], return_responses=True) print(resp) print(resp[0].docs.text)``````console[]['Hi there!']```````````{tab} Handle response via callback```pythonfrom jina import Flow, Clientfrom docarray import DocListfrom docarray.documents import TextDocwith Deployment() as dep: client = Client(port=f.port) resp = client.post( on='', inputs=TextDoc(text='Hi there!'), on_done=lambda resp: print(resp.docs.texts), ) print(resp)``````console['Hi there!']None```````### Return type{meth}`~jina.clients.mixin.PostMixin.post` returns the Documents as the server sends them back. In order for the client toreturn the user's expected document type, the `return_type` argument is required.The `return_type` can be a parametrized `DocList` or a single `BaseDoc` type. If the return type parameter is a `BaseDoc` type,the results will be returned as a `DocList[T]` except if the result contains a single Document, in that case the only Document in the list is returnedinstead of the DocList.### Callbacks vs returnsCallback operates on every sub-request generated by `request_size`. The callback function consumes the response one byone. The old response is immediately free from the memory after the consumption.When callback is not provided, the client accumulates all DocLists of all Requests before returning.This means you will not receive results until all Requests have been processed, which is slower and requires morememory.### Force the order of responsesNote that the Flow processes Documents in an asynchronous and a distributed manner. The order of the Flow processing therequests may not be the same order as the Client sending them. Hence, the response order may also not be consistent asthe sending order.To force the order of the results to be deterministic and the same as when they are sent, passing `results_in_order`parameter to {meth}`~jina.clients.mixin.PostMixin.post`.```pythonimport randomimport timefrom jina import Deployment, Executor, requests, Clientfrom docarray import DocListfrom docarray.documents import TextDocclass RandomSleepExecutor(Executor): @requests def foo(self, *args, **kwargs): rand_sleep = random.uniform(0.1, 1.3) time.sleep(rand_sleep)dep = Deployment(uses=RandomSleepExecutor, replicas=3)input_text = [f'ordinal-{i}' for i in range(180)]input_da = DocList[TextDoc]([TextDoc(text=t) for t in input_text])with f: c = Client(port=dep.port, protocol=dep.protocol) output_da = c.post('/', inputs=input_da, request_size=10, return_type=DocList[TextDoc], results_in_order=True) for input, output in zip(input_da, output_da): assert input.text == output.text``` --- # Source: https://github.com/jina-ai/jina/blob/master/docs/concepts/client/third-party-clients.md(third-party-client)=# Third-party clientsThis page is about accessing the Flow with other clients, e.g. `curl`, or programming languages other than Python.````{admonition} Mostly developed for docarray<0.30:class: noteNote that most of these clients have been developed for versions of Jina compatible with `docarray<0.30.0`. This means, they will only be able to communicate with servicesusing Jina-serve with docarray<0.30.0````## GolangOur [Go Client](https://github.com/jina-ai/client-go) supports gRPC, HTTP and WebSocket protocols, allowing you to connect to Jina-serve from your Go applications.## PHPA big thanks to our community member [Jonathan Rowley](https://jina-ai.slack.com/team/U03973EA7BN) for developing a [PHP client](https://github.com/Dco-ai/php-jina) for Jina-serve!## KotlinA big thanks to our community member [Peter Willemsen](https://jina-ai.slack.com/team/U03R0KNBK98) for developing a [Kotlin client](https://github.com/peterwilli/JinaKotlin) for Jina-serve!(http-interface)=## HTTP```{admonition} Available Protocols:class: cautionJina-serve Flows can use one of {ref}`three protocols `: gRPC, HTTP, or WebSocket.Only Flows that use HTTP can be accessed via the methods described below.```Apart from using the {ref}`Jina Client `, the most common way of interacting with your deployed Flow is via HTTP.You can always use `post` to interact with a Flow, using the `/post` HTTP endpoint.With the help of [OpenAPI schema](https://swagger.io/specification/), one can send data requests to a Flow via `cURL`, JavaScript, [Postman](https://www.postman.com/), or any other HTTP client or programming library.(http-arguments)=### ArgumentsYour HTTP request can include the following parameters:| Name | Required | Description | Example || ---------------- | ------------ | -------------------------------------------------------------------------------------- | ------------------------------------------------- || `execEndpoint` | **required** | Executor endpoint to target | `"execEndpoint": "/index"` || `data` | optional | List specifying the input [Documents](https://docarray.jina.ai/fundamentals/document/) | `"data": [{"text": "hello"}, {"text": "world"}]`. || `parameters` | optional | Dictionary of parameters to be sent to the Executors | `"parameters": {"param1": "hello world"}` || `targetExecutor` | optional | String indicating an Executor to target. Default targets all Executors | `"targetExecutor": "MyExec"` |Instead of using the generic `/post` endpoint, you can directly use endpoints like `/index` or `/search` to perform a specific operation.In this case your data request is sent to the corresponding Executor endpoint, so you don't need to specify the parameter `execEndpoint`.`````{dropdown} Example````{tab} cURL```{code-block} bash---emphasize-lines: 2---curl --request POST \'http://localhost:12345/search' \--header 'Content-Type: application/json' -d '{"data": [{"text": "hello world"}]}'```````````{tab} javascript```{code-block} javascript---emphasize-lines: 2---fetch( 'http://localhost:12345/search', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({"data": [{"text": "hello world"}]})}).then(response => response.json()).then(data => console.log(data));````````````The response you receive includes `data` (an array of [Documents](https://docarray.jina.ai/fundamentals/document/)), as well as the fields `routes`, `parameters`, and `header`.```{admonition} See also: Flow REST API:class: seealsoFor a more detailed description of the REST API of a generic Flow, including the complete request body schema and request samples, please check:1. [OpenAPI Schema](https://schemas.jina.ai/rest/latest.json)2. [Redoc UI](https://schemas.jina.ai/rest/)For a specific deployed Flow, you can get the same overview by accessing the `/redoc` endpoint.```(swagger-ui)=### Use cURLHere's an example that uses `cURL`:```bashcurl --request POST 'http://localhost:12345/post' --header 'Content-Type: application/json' -d '{"data": [{"text": "hello world"}],"execEndpoint": "/search"}'```````{dropdown} Sample response``` { "requestId": "e2978837-e5cb-45c6-a36d-588cf9b24309", "data": { "docs": [ { "id": "84d9538e-f5be-11eb-8383-c7034ef3edd4", "granularity": 0, "adjacency": 0, "parentId": "", "text": "hello world", "chunks": [], "weight": 0.0, "matches": [], "mimeType": "", "tags": { "mimeType": "", "parentId": "" }, "location": [], "offset": 0, "embedding": null, "scores": {}, "modality": "", "evaluations": {} } ], "groundtruths": [] }, "header": { "execEndpoint": "/index", "targetPeapod": "", "noPropagate": false }, "parameters": {}, "routes": [ { "pod": "gateway", "podId": "5742d5dd-43f1-451f-88e7-ece0588b7557", "startTime": "2021-08-05T07:26:58.636258+00:00", "endTime": "2021-08-05T07:26:58.636910+00:00", "status": null } ], "status": { "code": 0, "description": "", "exception": null } }```````### Use JavaScriptSending a request from the front-end JavaScript code is a common use case too. Here's how this looks:```javascriptfetch('http://localhost:12345/post', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({"data": [{"text": "hello world"}],"execEndpoint": "/search"})}).then(response => response.json()).then(data => console.log(data));```````{dropdown} Output```javascript{ "data": [ { "id": "37e6f1bc7ec82fc4ba75691315ae54a6", "text": "hello world" "matches": ... }, "header": { "requestId": "c725217aa7714de88039866fb5aa93d2", "execEndpoint": "/index", "targetExecutor": "" }, "routes": [ { "executor": "gateway", "startTime": "2022-04-01T13:11:57.992497+00:00", "endTime": "2022-04-01T13:11:57.997802+00:00" }, { "executor": "executor0", "startTime": "2022-04-01T13:11:57.993686+00:00", "endTime": "2022-04-01T13:11:57.997274+00:00" } ], ]}```````### Use Swagger UIFlows provide a customized [Swagger UI](https://swagger.io/tools/swagger-ui/) which you can use to visually interact with the Flowthrough a web browser.```{admonition} Available Protocols:class: cautionOnly Flows that have enabled {ref}`CORS ` expose the Swagger UI interface.```For a Flow that is exposed on port `PORT`, you can navigate to the Swagger UI at `http://localhost:PORT/docs`:```{figure} ../../../.github/2.0/swagger-ui.png:align: center```Here you can see all the endpoints that are exposed by the Flow, such as `/search` and `/index`.To send a request, click on the endpoint you want to target, then `Try it out`.Now you can enter your HTTP request, and send it by clicking `Execute`.You can again use the [REST HTTP request schema](https://schemas.jina.ai/rest/), but do not need to specify `execEndpoint`.Below, in `Responses`, you can see the reply, together with a visual representation of the returned Documents.### Use Postman[Postman](https://www.postman.com/) is an application that allows the testing of web APIs from a graphical interface. You can store all the templates for your REST APIs in it, using Collections.We provide a [suite of templates for Jina Flow](https://github.com/jina-ai/jina/tree/master/.github/Jina.postman_collection.json). You can import it in Postman in **Collections**, with the **Import** button. It provides templates for the main operations. You need to create an Environment to define the `{{url}}` and `{{port}}` environment variables. These would be the hostname and the port where the Flow is listening.This contribution was made by [Jonathan Rowley](https://jina-ai.slack.com/archives/C0169V26ATY/p1649689443888779?thread_ts=1649428823.420879&cid=C0169V26ATY), in our [community Slack](https://slack.jina.ai).## gRPCTo use the gRPC protocol with a language other than Python you will need to:* Download the two proto definition files: `jina.proto` and `docarray.proto` from [GitHub](https://github.com/jina-ai/jina/tree/master/jina/proto) (be sure to use the latest release branch)* Compile them with [protoc](https://grpc.io/docs/protoc-installation/) and specify which programming language you want to compile them with.* Add the generated files to your project and import them into your code.You should finally be able to communicate with your Flow using the gRPC protocol. You can find more information on the gRPC`message` and `service` that you can use to communicate in the [Protobuf documentation](../../proto/docs.md).(flow-graphql)=## GraphQL````{admonition} See Also:class: seealsoThis article does not serve as the introduction to GraphQL.If you are not already familiar with GraphQL, we recommend you learn more about GraphQL from the [official documentation](https://graphql.org/learn/).You may also want to learn about [Strawberry](https://strawberry.rocks/), the library that powers Jina-serve's GraphQL support.````Jina Flows that use the HTTP protocol can also provide a GraphQL API, which is located behind the `/graphql` endpoint.GraphQL has the advantage of letting you define your own response schema, which means that only the fields you requireare sent over the wire.This is especially useful when you don't need potentially large fields, like image tensors.You can access the Flow from any GraphQL client, like `sgqlc`.```pythonfrom sgqlc.endpoint.http import HTTPEndpointHOSTNAME, PORT = ...endpoint = HTTPEndpoint(url=f'{HOSTNAME}:{PORT}/graphql')mut = ''' mutation { docs(data: {text: "abcd"}) { id matches { embedding } } }'''response = endpoint(mut)```## WebSocketWebSocket uses persistent connections between the client and Flow, hence allowing streaming use cases.While you can always use the Python client to stream requests like any other protocol, WebSocket allows streaming JSON from anywhere (CLI / Postman / any other programming language).You can use the same set of arguments as {ref}`HTTP ` in the payload.We use [subprotocols](https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_API/Writing_WebSocket_servers#subprotocols) to separate streaming JSON vs bytes.The Flow defaults to `json` if you don't specify a sub-protocol while establishing the connection (Our Python client uses `bytes` streaming by using [jina-serve.proto](../../proto/docs.md) definition).````{Hint}- Choose WebSocket over HTTP if you want to stream requests.- Choose WebSocket over gRPC if - you want to stream using JSON, not bytes. - your client language doesn't support gRPC. - you don't want to compile the [Protobuf definitions](../../proto/docs.md) for your gRPC client.````## See also- {ref}`Access a Flow with the Client `- {ref}`Configure a Flow `- [Flow REST API reference](https://schemas.jina.ai/rest/) --- # Source: https://github.com/jina-ai/jina/blob/master/docs/concepts/client/transient-errors.md(transient-errors)=# Transient ErrorsMost transient errors can be attributed to network issues between the client and target server or between a server'sdependencies like a database. The errors can be:- ignored if an operation produced by a generator or sequence of operations isn't relevant to the overall success.- retried up to a certain limit which assumes that the recovery logic kicks in to repair transient errors.- accept that the operation cannot be successfully completed.## Transient fault handling with retriesThe {meth}`~jina.clients.mixin.PostMixin.post` method accepts `max_attempts`, `initial_backoff`, `max_backoff`and `backoff_multiplier` parameters to control the capacity to retry requests when a transient connectivity erroroccurs, using an exponential backoff strategy.This can help to overcome transient network connectivity issues which are broadly captured by the{class}`~grpc.aio.AioRpcError`, {class}`~aiohttp.ClientError`, {class}`~asyncio.CancelledError` and{class}`~jina.excepts.InternalNetworkError`exception types.The `max_attempts` parameter determines the number of sending attempts, including the original request.The `initial_backoff`, `max_backoff`, and `backoff_multiplier` parameters determine the randomized delay in secondsbefore retry attempts.The initial retry attempt will occur at `initial_backoff`. In general, the *n-th* attempt will occurat `random(0, min(initial_backoff*backoff_multiplier**(n-1), max_backoff))`.### Handling gRPC retries for streaming and unary RPC methodsThe {meth}`~jina.clients.mixin.PostMixin.post` method supports the `stream` boolean parameter (defaults to `True`). Ifset to `True`,the **gRPC** server side streaming RPC method will be invoked. If set to `False`, the server side unary RPC method willbe invoked. Some important implication ofusing retries with **gRPC** are:- The built-in **gRPC** retries are limited in scope and are implemented to work under certain circumstances. More details are specified in the [design document](https://github.com/grpc/proposal/blob/master/A6-client-retries.md).- If the `stream` parameter is set to True and if the `inputs` parameters is a `GeneratorType` or an `Iterable`, the retry must be handled as below because the result must be consumed to check for errors in the stream of responses. The **gRPC** service retry is still configured but cannot be guaranteed. ```python from jina import Client from dorcarray import BaseDoc from jina.clients.base.retry import wait_or_raise_err from jina.helper import run_async client = Client(host='grpc://localhost:12345') max_attempts = 5 initial_backoff = 0.8 backoff_multiplier = 1.5 max_backoff = 5 def input_generator(): for _ in range(10): yield BaseDoc() for attempt in range(1, max_attempts + 1): try: response = client.post( '/', inputs=input_generator(), request_size=2, timeout=0.5, ) assert len(response) == 1 except ConnectionError as err: run_async( wait_or_raise_err, attempt=attempt, err=err, max_attempts=max_attempts, backoff_multiplier=backoff_multiplier, initial_backoff=initial_backoff, max_backoff=max_backoff, ) ```- If the `stream` parameter is set to True and the `inputs` parameter is a `Document` or a `DocList`, the retry is handled internally on the `max_attempts`, `initial_backoff`, `backoff_multiplier` and `max_backoff` parameters.- If the `stream` parameter is set to False, the {meth}`~jina.clients.mixin.PostMixin.post` method invokes the unary RPC method and the retry is handled internally.```{hint}The retry parameters `max_attempts`, `initial_backoff`, `backoff_multiplier` and `max_backoff` of the {meth}`~jina.clients.mixin.PostMixin.post` method will be used to set the **gRPC** retry service options. This improves the chances of success if the gRPC retry conditions are met.```## Continue streaming when an Executor error occursThe {meth}`~jina.clients.mixin.PostMixin.post` accepts a `continue_on_error` parameter. When set to `True`, the Clientwill keep trying to send the remaining requests. The `continue_on_error` parameter will only applyto Exceptions caused by an Executor, but in case of network connectivity issues, an Exception will be raised.The `continue_on_error` parameter handles the errors that are returned by the Executor as part of its response. Theerrors can be logical errors that might be raisedduring the execution of the operation. This doesn't include transient errors represented by{class}`~grpc.aio.AioRpcError`, {class}`~aiohttp.ClientError`, {class}`~asyncio.CancelledError` and{class}`~jina.excepts.InternalNetworkError` triggered during the Gateway and Executor communication.The `retries` parameter of the Gateway control the number of retries for the transient errors that arise between theGateway and Executor communication.```{hint}Refer to {ref}`Network Errors ` section for more information.```## Retries with a large inputs or long-running operationsWhen using the gRPC client, it is recommended to set the `stream` parameter to False so that the unary RPC is invoked bythe {class}`~jina.Client`which performs the retry internally with the request from the `inputs` iterator or generator. The `request_size`parameter must also be set to perform smaller operations which can be retried without much overhead on the server.The **HTTP** and **WebSocket**```{hint}Refer to {ref}`Callbacks ` section for dealing with success and failures after retries.``` --- # Source: https://github.com/jina-ai/jina/blob/master/docs/concepts/jcloud/configuration.md(jcloud-configuration)=# {octicon}`file-code` ConfigurationJCloud extends Jina-serve's {ref}`Flow YAML specification` by introducing the special field `jcloud`. This lets you define resources and scaling policies for each Executor and Gateway.Here's a Flow with two Executors that have specific resource needs: `indexer` requires a 10 GB `ebs` disk, whereas `encoder` requires a G4 instance, which implies that two cores and 4 GB RAM are used. See the below sections for further information about instance types.```{code-block} yaml---emphasize-lines: 5-7,10-16---jtype: Flowexecutors: - name: encoder uses: jinaai+docker:///Encoder jcloud: resources: instance: C4 - name: indexer uses: jinaai+docker:///Indexer jcloud: resources: storage: kind: ebs size: 10G```## Allocate Executor resourcesSince each Executor has its own business logic, it may require different cloud resources. One Executor might need more RAM, whereas another might need a bigger disk.In JCloud, you can pass highly customizable, finely-grained resource requests for each Executor using the `jcloud.resources` argument in your Flow YAML.### InstanceJCloud uses the concept of an "instance" to represent a specific set of hardware specifications.In the above example, a C4 instance type represents two cores and 4 GB RAM based on the CPU tiers instance definition table below.````{admonition} Note:class: noteWe will translate the raw numbers from input to instance tier that fits most closely if you are still using the legacy resource specification interface, such as:```{code-block} yamljcloud: resources: cpu: 8 memory: 8G```There are circumstances in the instance tier where they don't exactly fulfill the CPU cores and memory you need, like in the above example.In cases like this we "ceil" the requests to the lowest tier that satisfies all the specifications.In this case, `C6` would be considered, as `C5`'s `Cores` are lower than what's being requested (4 vs 8).````There are also two types of instance tiers, one for CPU instances, one for GPU.(jcloud-pricing)=#### PricingEach instance has a fixed `Credits Per Hour` number, indicating how many credits JCloud will chargeif a certain instance is used. For example, if an Executor uses `C3`, it implies that `10` credits will be spentfrom the operating user account. Other important facts to note: - If the Flow is powering other App(s) you create, you will be charged by the App(s), not the underlying Flow. - `Credits Per Hour` is on an Executor/Gateway basis, the total `Credits Per Hour` of a Flow is the sum of all the credits each components cost. - If shards/replicas are used in an Executor/Gateway, the same instance type will be used, so `Credits Per Hour` will be multiplied. For example, if an Executor uses `C3` and it has two replicas, the `Credits Per Hour` for the Executor would double to `20`. The only exception is when sharding is used. In that case `C1` would be used for the shards head, regardless of what instance type has been entered for the shared Executor.```{hint}Please visit [Jina AI Cloud Pricing](https://cloud.jina.ai/pricing/) for more information about billing and credits.```#### CPU tiers| Instance | Cores | Memory | Credits per hour || -------- | ----- | ------ | ---------------- || C1 | 0.1 | 0.2 GB | 1 || C2 | 0.5 | 1 GB | 5 || C3 | 1 | 2 GB | 10 || C4 | 2 | 4 GB | 20 || C5 | 4 | 8 GB | 40 || C6 | 8 | 16 GB | 80 || C7 | 16 | 32 GB | 160 || C8 | 32 | 64 GB | 320 |By default, C1 is allocated to each Executor and Gateway.JCloud offers the general Intel Xeon processor (Skylake 8175M or Cascade Lake 8259CL) for the CPU instances.#### GPU tiersJCloud supports GPU workloads with two different usages: `shared` or `dedicated`.If GPU is enabled, JCloud will provide NVIDIA A10G Tensor Core GPUs with 24 GB memory for workloads in both usage types.```{hint}When using GPU resources, it may take a few extra minutes before all Executors are ready to serve traffic.```| Instance | GPU | Memory | Credits per hour || -------- | ------ | ------ | ---------------- || G1 | shared | 14 GB | 100 || G2 | 1 | 14 GB | 125 || G3 | 2 | 24 GB | 250 || G4 | 4 | 56 GB | 500 |##### Shared GPUAn Executor using a `shared` GPU shares this GPU with up to four other Executors.This enables time-slicing, which allows workloads that land on oversubscribed GPUs to interleave with one another.To use `shared` GPU, `G1` needs to be specified as the instance type.The tradeoffs with a `shared` GPU are increased latency, jitter, and potential out-of-memory (OOM) conditions when many different applications are time-slicing on the GPU. If your application is consuming a lot of memory, we suggest using a dedicated GPU.##### Dedicated GPUUsing a dedicated GPU is the default way to provision a GPU for an Executor. This automatically creates nodes or assigns the Executor to a GPU node. In this case, the Executor owns the whole GPU.To use a `dedicated` GPU, `G2`/ `G3` / `G4` needs to be specified as instance type.### StorageJCloud supports three kinds of storage: ephemeral (default), [efs](https://aws.amazon.com/efs/) (network file storage) and [ebs](https://aws.amazon.com/ebs/) (block device).`ephemeral` storage will assign space to an Executor when it is created. Data in `ephemeral` storage is deleted permanently if Executors are restarted or rescheduled.````{hint}By default, we assign `ephemeral` storage to all Executors in a Flow. This lets the storage resize dynamically, so you don't need to shrink/grow volumes manually.If your Executor needs to share data with other Executors and retain data persistency, consider using `efs`. Note that:- IO performance is slower compared to `ebs` or `ephemeral`- The disk can be shared with other Executors or Flows.- Default storage size is 5 GB.If your Executor needs high IO, you can use `ebs` instead. Note that:- The disk cannot be shared with other Executors or Flows.- Default storage size is 5 GB.````JCloud also supports retaining the data that a Flow was using while it was active. You can set the `retain` argument to `true` to enable this feature.```{code-block} yaml---emphasize-lines: 5-10,12,15---jtype: Flowexecutors: - name: executor1 uses: jinaai+docker:///Executor1 jcloud: resources: storage: kind: ebs size: 10G retain: true - name: executor2 uses: jinaai+docker:///Executor2 jcloud: resources: storage: kind: efs```#### PricingHere are the numbers in terms of credits per GB per month for the three kinds of storage described above.| Instance | Credits per GB per month || --------- | ------------------------ || Ephemeral | 0 || EBS | 30 || EFS | 75 |For example, using 10 GB of EBS storage for a month costs `30` credits.If shards/replicas are used, we will multiply credits further by the number of storages created.## Scale out ExecutorsOn JCloud, demand-based autoscaling functionality is naturally offered thanks to the underlying Kubernetes architecture. This means that you can maintain [serverless](https://en.wikipedia.org/wiki/Serverless_computing) deployments in a cost-effective way with no headache of setting the [right number of replicas](https://jina.ai/serve/how-to/scale-out/#scale-out-your-executor) anymore!### Autoscaling with `jinaai+serverless://`The easiest way to scale out your Executor is to use a Serverless Executor. This can be enabled by using `jinaai+serverless://` instead of `jinaai+docker://` in Executor's `uses`, such as:```{code-block} yaml---emphasize-lines: 4---jtype: Flowexecutors: - name: executor1 uses: jinaai+serverless:///Executor1```JCloud autoscaling leverages [Knative](https://knative.dev/docs/) behind the scenes, and `jinahub+serverless` uses a set of Knative configurations as defaults.```{hint}For more information about the Knative autoscaling configurations, please visit [Knative autoscaling](https://knative.dev/docs/serving/autoscaling/).```### Autoscaling with custom argsIf `jinaai+serverless://` doesn't meet your requirements, you can further customize autoscaling configurations by using the `autoscale` argument on a per-Executor basis in the Flow YAML, such as:```{code-block} yaml---emphasize-lines: 5-10---jtype: Flowexecutors: - name: executor1 uses: jinaai+docker:///Executor1 jcloud: autoscale: min: 1 max: 2 metric: rps target: 50```Below are the defaults and requirements for the configurations:| Name | Default | Allowed | Description || ------ | ----------- | ------------------------ | ------------------------------------------------- || min | 1 | int | Minimum number of replicas (`0` means serverless) || max | 2 | int, up to 5 | Maximum number of replicas || metric | concurrency | `concurrency` / `rps` / `cpu` / `memory` | Metric for scaling || scale_down_delay | 30s | str, `0s` <= value <= `1h` | Time window which must pass at reduced concurrency before a scaling down || target | 100 | int | Target number the replicas try to maintain. |The unit of `target` depends of the metric specified. Refer to the table below:| Metric | Target || ---- | ----- || `concurrency` | Number of concurrent requests processed at any given time. || `rps` | Number of requests processed per second per replica. || `cpu` | Average % CPU utilization of each pod
(e.g. `60` means replicas will be scaled up when pods on average reach 60% CPU utilization) || `memory` | Average mebibytes of memory used by each pod
(e.g. `200` means replicas will be scaled up when the average pods' memory consumption exceeds 200MiB). |After you make a JCloud deployment using the autoscaling configuration, the Flow serving part is just the same: the only difference you may notice is it takes a few extra seconds to handle the initial requests since it needs to scale the deployments behind the scenes. Let JCloud handle the scaling from now on, and you can deal with the code!Note, that if `metric` is `cpu` or `memory`, `min` will be reset to 1 if user sets it to set to 0.### PricingAt present, pricing for autoscaled Executor/Gateway largely follows the same {ref}`JCloud pricing rules ` as other Jina AI services.We track the minimum number of replicas in autoscale configurations and use it as a multiplier for the replicas used when calculating the`Credits Per Hour`.### Restrictions```{admonition} **Restrictions**- Autoscale does not currently allow the use of `ebs` as a storage type in combination. Please use `efs` and `ephemeral` instead.- Autoscale is not supported for multi-protocol Gateways.```## Configure availability toleranceIf service issues cause disruption of Executors, JCloud lets you specify a tolerance level for number of replicas that stay up or go down.The JCloud parameters `minAvailable` and `maxUnavailable` ensure that Executors will stay up even if a certain number of replicas go down.| Name | Default | Allowed | Description || :--------------- | :-----: | :---------------------------------------------------------------------------------------: | :------------------------------------------------------- || `minAvailable` | N/A | Lower than number of [replicas](https://jina.ai/serve/concepts/flow/scale-out/#scale-out) | Minimum number of replicas available during disruption || `maxUnavailable` | N/A | Lower than numbers of [replicas](https://jina.ai/serve/concepts/flow/scale-out/#scale-out) | Maximum number of replicas unavailable during disruption |```{code-block} yaml---emphasize-lines: 5-6---jtype: Flowexecutors: - uses: jinaai+docker:///Executor1 replicas: 5 jcloud: minAvailable: 2```In case of disruption, ensure at least two replicas will still be available, while three may be down.```{code-block} yaml---emphasize-lines: 5-6---jtype: Flowexecutors: - uses: jinaai+docker:///Executor1 replicas: 5 jcloud: maxUnavailable: 2```In case of disruption, ensure that if a maximum of two replicas are down, at least three replicas will still be available.## Configure GatewayThe Gateway can be customized just like an Executor.### Set timeoutBy default, the Gateway will close connections that have been idle for over 600 seconds. If you want a longer connection timeout threshold, change the `timeout` parameter under `gateway.jcloud`.```{code-block} yaml---emphasize-lines: 2-4---jtype: Flowgateway: jcloud: timeout: 800executors: - name: executor1 uses: jinaai+docker:///Executor1```### Control Gateway resourcesTo customize the Gateway's CPU or memory, specify the instance type under `gateway.jcloud.resources`:```{code-block} yaml---emphasize-lines: 2-6---jtype: Flowgateway: jcloud: resources: instance: C3executors: - name: encoder uses: jinaai+docker:///Encoder```## Expose ExecutorsA Flow deployment without a Gateway is often used for {ref}`external-executors`, which can be shared between different Flows. You can expose an Executor by setting `expose: true` (and un-expose the Gateway by setting `expose: false`):```{code-block} yaml---emphasize-lines: 2-4, 8-9---jtype: Flowgateway: jcloud: expose: false # don't expose the Gatewayexecutors: - name: custom uses: jinaai+docker:///CustomExecutor jcloud: expose: true # expose the Executor``````{figure} img/expose-executor.png:width: 70%```You can expose the Gateway along with Executors:```{code-block} yaml---emphasize-lines: 2-4,8-9---jtype: Flowgateway: jcloud: expose: trueexecutors: - name: custom1 uses: jinaai+docker:///CustomExecutor1 jcloud: expose: true # expose the Executor``````{figure} img/gateway-and-executors.png:width: 70%```## Other deployment options### Customize Flow nameYou can use the `name` argument to specify the Flow name in the Flow YAML:```{code-block} yaml---emphasize-lines: 2-3---jtype: Flowjcloud: name: my-nameexecutors: - name: executor1 uses: jinaai+docker:///Executor1```### Specify Jina versionTo control Jina's version while deploying a Flow to `jcloud`, you can pass the `version` argument in the Flow YAML:```{code-block} yaml---emphasize-lines: 2-3---jtype: Flowjcloud: version: 3.10.0executors: - name: executor1 uses: jinaai+docker:///Executor1```### Add LabelsYou can use `labels` (as key-value pairs) to attach metadata to your Flows and Executors:Flow level `labels`:```{code-block} yaml---emphasize-lines: 2-5---jtype: Flowjcloud: labels: username: johndoe app: fashion-searchexecutors: - name: executor1 uses: jinaai+docker:///Executor1```Executor level `labels`:```{code-block} yaml---emphasize-lines: 5-8---jtype: Flowexecutors: - name: executor1 uses: jinaai+docker:///Executor1 jcloud: labels: index: partial group: backend``````{hint}Keys in `labels` have the following restrictions: - Must be 63 characters or fewer. - Must begin and end with an alphanumeric character ([a-z0-9A-Z]) with dashes (-), underscores (_), dots (.), and alphanumerics between. - The following keys are skipped if passed in the Flow YAML. - `user` - `jina`-version```### MonitoringTo enable [tracing support](https://jina.ai/serve/cloud-nativeness/opentelemetry/) in Flows, you can pass `enable: true` argument in the Flow YAML. (Tracing support is not enabled by default in JCloud)```{code-block} yaml---emphasize-lines: 2-5---jtype: Flowjcloud: monitor: traces: enable: trueexecutors: - name: executor1 uses: jinaai+docker:///Executor1```You can pass the `enable: true` argument to `gateway` to only enable tracing support in the Gateway:```{code-block} yaml---emphasize-lines: 2-6---jtype: Flowgateway: jcloud: monitor: traces: enable: trueexecutors: - name: executor1 uses: jinaai+docker:///Executor1```You can also only enable tracing support in `executor1`.```{code-block} yaml---emphasize-lines: 5-8---jtype: Flowexecutors: - name: executor1 uses: jinaai+docker:///Executor1 jcloud: monitor: traces: enable: true``` --- # Source: https://github.com/jina-ai/jina/blob/master/docs/concepts/jcloud/index.md(jcloud)=# Jina AI Cloud Hosting```{toctree}:hidden:configuration``````{figure} https://jina.ai/serve/_images/jcloud-banner.png:width: 0 %:scale: 0 %``````{figure} img/jcloud-banner.png:scale: 0 %:width: 0 %```After building a Jina-serve project, the next step is to deploy and host it on the cloud. [Jina AI Cloud](https://cloud.jina.ai/) is Jina-serve's reliable, scalable and production-ready cloud-hosting solution that manages your project lifecycle without surprises or hidden development costs.```{tip}Are you ready to unlock the power of AI with Jina AI Cloud? Take a look at our [pricing options](https://cloud.jina.ai/pricing) now!```In addition to deploying Flows, `jcloud` supports the creation of secrets and jobs which are created in the Flow's namespace.## BasicsJina AI Cloud provides a CLI that you can use via `jina cloud` from the terminal (or `jcloud` or simply `jc` for minimalists.)````{hint}You can also install just the JCloud CLI without installing the Jina-serve package.```bashpip install jcloudjc -h```If you installed the JCloud CLI individually, all of its commands fall under the `jc` or `jcloud` executable.In case the command `jc` is already occupied by another tool, use `jcloud` instead. If your pip install doesn't register bash commands for you, you can run `python -m jcloud -h`.````For the rest of this section, we use `jc` or `jcloud`. But again they are interchangeable with `jina cloud`.## Flows### DeployIn Jina's idiom, a project is a [Flow](https://jina.ai/serve/concepts/orchestration/flow/), which represents an end-to-end task such as indexing, searching or recommending. In this document, we use "project" and "Flow" interchangeably.A Flow can have two types of file structure: a single YAML file or a project folder.#### Single YAML fileA self-contained YAML file, consisting of all configuration at the [Flow](https://jina.ai/serve/concepts/orchestration/flow/)-level and [Executor](https://jina.ai/serve/concepts/serving/executor/)-level.> All Executors' `uses` must follow the format `jinaai+docker:///MyExecutor` (from [Executor Hub](https://cloud.jina.ai)) to avoid any local file dependencies:```yaml# flow.ymljtype: Flowexecutors: - name: sentencizer uses: jinaai+docker://jina-ai/Sentencizer```To deploy:```bashjc flow deploy flow.yml```````{caution}When `jcloud` deploys a flow it automatically appends the following global arguments to the `flow.yml`, if not present:```yamljcloud: version: jina-version docarray: docarray-version```The `jina` and `docarray` corresponds to your development environment's `jina` and `docarray` versions.````````{tip}We recommend testing locally before deployment:```bashjina flow --uses flow.yml```````#### Project folder````{tip}The best practice for creating a Jina AI Cloud project is to use:```bashjc new```This ensures the correct project structure that is accepted by Jina AI Cloud.````Just like a regular Python project, you can have sub-folders of Executor implementations and a `flow.yml` on the top-level to connect all Executors together.You can create an example local project using `jc new hello`. The default structure looks like:```hello/├── .env├── executor1│ ├── config.yml│ ├── executor.py│ └── requirements.txt└── flow.yml```Where:- `hello/` is your top-level project folder.- `executor1` directory has all Executor related code/configuration. You can read the best practices for [file structures](https://jina.ai/serve/concepts/serving/executor/file-structure/). Multiple Executor directories can be created.- `flow.yml` Your Flow YAML.- `.env` All environment variables used during deployment.To deploy:```bashjc flow deploy hello```The Flow is successfully deployed when you see:```{figure} img/deploy.png:width: 70%```---You will get a Flow ID, say `merry-magpie-82b9c0897f`. This ID is required to manage, view logs and remove the Flow.As this Flow is deployed with the default gRPC gateway (feel free to change it to `http` or `websocket`), you can use `jina.Client` to access it:```pythonfrom jina import Client, Documentprint( Client(host='grpcs://merry-magpie-82b9c0897f.wolf.jina.ai').post( on='/', inputs=Document(text='hello') ))```(jcloud-flow-status)=### Get statusTo get the status of a Flow:```bashjc flow status merry-magpie-82b9c0897f``````{figure} img/status.png:width: 70%```### MonitoringBasic monitoring is provided to Flows deployed on Jina AI Cloud.To access the [Grafana](https://grafana.com/)-powered dashboard, first get {ref}`the status of the Flow`. The `Grafana Dashboard` link is displayed at the bottom of the pane. Visit the URL to find basic metrics like 'Number of Request Gateway Received' and 'Time elapsed between receiving a request and sending back the response':```{figure} img/monitoring.png:width: 80%```### List FlowsTo list all of your "Starting", "Serving", "Failed", "Updating", and "Paused" Flows:```bashjc flows list``````{figure} img/list.png:width: 90%```You can also filter your Flows by passing a phase:```bashjc flows list --phase Deleted``````{figure} img/list_deleted.png:width: 90%```Or see all Flows:```bashjc flows list --phase all``````{figure} img/list_all.png:width: 90%```### Remove FlowsYou can remove a single Flow, multiple Flows or even all Flows by passing different identifiers.To remove a single Flow:```bashjc flow remove merry-magpie-82b9c0897f```To remove multiple Flows:```bashjc flow remove merry-magpie-82b9c0897f wondrous-kiwi-b02db6a066```To remove all Flows:```bashjc flow remove all```By default, removing multiple or all Flows is an interactive process where you must give confirmation before each Flow is deleted. To make it non-interactive, set the below environment variable before running the command:```bashexport JCLOUD_NO_INTERACTIVE=1```### Update a FlowYou can update a Flow by providing an updated YAML.To update a Flow:```bashjc flow update super-mustang-c6cf06bc5b flow.yml``````{figure} img/update_flow.png:width: 70%```### Pause / Resume FlowYou have the option to pause a Flow that is not currently in use but may be needed later. This will allow the Flow to be resumed later when it is needed again by using `resume`.To pause a Flow:```bashjc flow pause super-mustang-c6cf06bc5b``````{figure} img/pause_flow.png:width: 70%```To resume a Flow:```bashjc flow resume super-mustang-c6cf06bc5b``````{figure} img/resume_flow.png:width: 70%```### Restart Flow, Executor or GatewayIf you need to restart a Flow, there are two options: restart all Executors and the Gateway associated with the Flow, or selectively restart only a specific Executor or the Gateway.To restart a Flow:```bashjc flow restart super-mustang-c6cf06bc5b``````{figure} img/restart_flow.png:width: 70%```To restart the Gateway:```bashjc flow restart super-mustang-c6cf06bc5b --gateway``````{figure} img/restart_gateway.png:width: 70%```To restart an Executor:```bashjc flow restart super-mustang-c6cf06bc5b --executor executor0``````{figure} img/restart_executor.png:width: 70%```### Recreate a Deleted FlowTo recreate a deleted Flow:```bashjc flow recreate profound-rooster-eec4b17c73``````{figure} img/recreate_flow.png:width: 70%```### Scale an ExecutorYou can also manually scale any Executor.```bashjc flow scale good-martin-ca6bfdef84 --executor executor0 --replicas 2``````{figure} img/scale_executor.png:width: 70%```### Normalize a FlowTo normalize a Flow:```bashjc flow normalize flow.yml``````{hint}Normalizing a Flow is the process of building the Executor image and pushing the image to Hubble.```### Get Executor or Gateway logsTo get the Gateway logs:```bashjc flow logs --gateway central-escargot-354a796df5``````{figure} img/gateway_logs.png:width: 70%```To get the Executor logs:```bashjc flow logs --executor executor0 central-escargot-354a796df5``````{figure} img/executor_logs.png:width: 70%```## Secrets### Create a SecretTo create a Secret for a Flow:```bashjc secret create mysecret rich-husky-af14064067 --from-literal "{'env-name': 'secret-value'}"``````{tip}You can optionally pass the `--update` flag to automatically update the Flow spec with the updated secret information. This flag will update the Flow which is hosted on the cloud. Finally, you can also optionally pass a Flow's yaml file path with `--path` to update the yaml file locally. Refer to [this](https://jina.ai/serve/cloud-nativeness/kubernetes/#deploy-flow-with-custom-environment-variables-and-secrets) section for more information.``````{caution}If the `--update` flag is not passed then you have to manually update the flow with `jc update flow rich-husky-af14064067 updated-flow.yml````### List SecretsTo list all the Secrets created in a Flow's namespace:```bashjc secret list rich-husky-af14064067``````{figure} img/list_secrets.png:width: 90%```### Get a SecretTo retrieve a Secret's details:```bashjc secret get mysecret rich-husky-af14064067``````{figure} img/get_secret.png:width: 90%```### Remove Secret```bashjc secret remove rich-husky-af14064067 mysecret```### Update a SecretYou can update a Secret for a Flow.```bashjc secret update rich-husky-af14064067 mysecret --from-literal "{'env-name': 'secret-value'}"``````{tip}You can optionally pass the `--update` flag to automatically update the Flow spec with the updated secret information. This flag will update the Flow which is hosted on the cloud. Finally, you can also optionally pass a Flow's yaml file path with `--path` to update the yaml file locally. Refer to [this](https://jina.ai/serve/cloud-nativeness/kubernetes/#deploy-flow-with-custom-environment-variables-and-secrets) section for more information.``````{caution}Updating a Secret automatically restarts a Flow.```## Jobs### Create a JobTo create a Job for a Flow:```bashjc job create job-name rich-husky-af14064067 image 'job entrypoint' --timeout 600 --backofflimit 2``````{tip}`image` can be any Executor image passed to a Flow's Executor `uses` or any normal docker image prefixed with `docker://````### List JobsTo listg all Jobs created in a Flow's namespace:```bashjc jobs list rich-husky-af14064067``````{figure} img/list_jobs.png:width: 90%```### Get a JobTo retrieve a Job's details:```bashjc job get myjob1 rich-husky-af14064067``````{figure} img/get_job.png:width: 90%```### Remove Job```bashjc job remove rich-husky-af14064067 myjob1```### Get Job LogsTo get the Job logs:```bashjc job logs myjob1 -f rich-husky-af14064067``````{figure} img/job_logs.png:width: 90%```## Deployments### Deploy```{caution}When `jcloud` deploys a deployment it automatically appends the following global arguments to the `deployment.yml`, if not present:``````yamljcloud: version: jina-version docarray: docarray-version```#### Single YAML fileA self-contained YAML file, consisting of all configuration information at the [Deployment](https://jina.ai/serve/concepts/orchestration/deployment/)-level and [Executor](https://jina.ai/serve/concepts/serving/executor/)-level.> A Deployment's `uses` parameter must follow the format `jinaai+docker:///MyExecutor` (from [Executor Hub](https://cloud.jina.ai)) to avoid any local file dependencies:```yaml# deployment.ymljtype: Deploymentwith: protocol: grpc uses: jinaai+docker://jina-ai/Sentencizer```To deploy:```bashjc deployment deploy ./deployment.yaml```The Deployment is successfully deployed when you see:```{figure} img/deployment/deploy.png:width: 70%```---You will get a Deployment ID, for example `pretty-monster-130a5ac952`. This ID is required to manage, view logs, and remove the Deployment.Since this Deployment is deployed with the default gRPC protocol (feel free to change it to `http`), you can use `jina.Client` to access it:```pythonfrom jina import Client, Documentprint( Client(host='grpcs://executor-pretty-monster-130a5ac952.wolf.jina.ai').post( on='/', inputs=Document(text='hello') ))```(jcloud-deployoment-status)=### Get statusTo get the status of a Deployment:```bashjc deployment status pretty-monster-130a5ac952``````{figure} img/deployment/status.png:width: 70%```### List DeploymentsTo list all of your "Starting", "Serving", "Failed", "Updating", and "Paused" Deployments:```bashjc deployment list``````{figure} img/deployment/list.png:width: 90%```You can also filter your Deployments by passing a phase:```bashjc deployment list --phase Deleted``````{figure} img/deployment/list_deleted.png:width: 90%```Or see all Deployments:```bashjc deployment list --phase all``````{figure} img/deployment/list_all.png:width: 90%```### Remove DeploymentsYou can remove a single Deployment, multiple Deployments, or even all Deployments by passing different commands to the `jc` executable at the command line.To remove a single Deployment:```bashjc deployment remove pretty-monster-130a5ac952```To remove multiple Deployments:```bashjc deployment remove pretty-monster-130a5ac952 artistic-tuna-ab154c4dcc```To remove all Deployments:```bashjc deployment remove all```By default, removing all or multiple Deployments is an interactive process where you must give confirmation before each Deployment is deleted. To make it non-interactive, set the below environment variable before running the command:```bashexport JCLOUD_NO_INTERACTIVE=1```### Update a DeploymentYou can update a Deployment by providing an updated YAML.To update a Deployment:```bashjc deployment update pretty-monster-130a5ac952 deployment.yml``````{figure} img/deployment/update.png:width: 70%```### Pause / Resume DeploymentYou have the option to pause a Deployment that is not currently in use but may be needed later. This will allow the Deployment to be resumed later when it is needed again by using `resume`.To pause a Deployment:```bashjc deployment pause pretty-monster-130a5ac952``````{figure} img/deployment/pause.png:width: 70%```To resume a Deployment:```bashjc eployment resume pretty-monster-130a5ac952``````{figure} img/deployment/resume.png:width: 70%```### Restart DeploymentTo restart a Deployment:```bashjc deployment restart pretty-monster-130a5ac952``````{figure} img/deployment/restart.png:width: 70%```### Recreate a Deleted DeploymentTo recreate a deleted Deployment:```bashjc deployment recreate pretty-monster-130a5ac952``````{figure} img/deployment/recreate.png:width: 70%```### Scale a DeploymentYou can also manually scale any Deployment.```bashjc deployment scale pretty-monster-130a5ac952 --replicas 2``````{figure} img/deployment/scale.png:width: 70%```### Get Deployment logsTo get the Deployment logs:```bashjc deployment logs pretty-monster-130a5ac952``````{figure} img/deployment/logs.png:width: 70%```## ConfigurationPlease refer to {ref}`Configuration ` for configuring the Flow on Jina AI Cloud.## RestrictionsJina AI Cloud scales according to your needs. You can demand different instance types with GPU/memory/CPU predefined based on the needs of your Flows and Executors. If you have specific resource requirements, please contact us [on Discord](https://discord.jina.ai) or raise a [GitHub issue](https://github.com/jina-ai/jcloud/issues/new/choose).```{admonition} Restrictions- Deployments are only supported in the `us-east` region.``` --- # Source: https://github.com/jina-ai/jina/blob/master/docs/concepts/orchestration/add-executors.md(add-executors)=# Add Executors## Define Executor with `uses`An {class}`~jina.Executor`'s type is defined by the `uses` keyword:````{tab} Deployment```pythonfrom jina import Deploymentdep = Deployment(uses=MyExec)```````````{tab} Flow```pythonfrom jina import Flowf = Flow().add(uses=MyExec)```````Note that some usages are not supported on JCloud due to security reasons and the nature of facilitating local debugging.| Local Dev | JCloud | `uses=...` | Description ||-----------|--------|-----------------------------------------------|-----------------------------------------------------------------------------------------------------------|| ✅ | ❌ | `ExecutorClass` | Use `ExecutorClass` from the inline context. || ✅ | ❌ | `'my.py_modules.ExecutorClass'` | Use `ExecutorClass` from `my.py_modules`. || ✅ | ✅ | `'executor-config.yml'` | Use an Executor from a YAML file defined by {ref}`Executor YAML interface `. || ✅ | ❌ | `'jinaai://jina-ai/TransformerTorchEncoder/'` | Use an Executor as Python source from Executor Hub. || ✅ | ✅ | `'jinaai+docker://jina-ai/TransformerTorchEncoder'` | Use an Executor as a Docker container from Executor Hub. || ✅ | ❌ | `'docker://sentence-encoder'` | Use a pre-built Executor as a Docker container. |````{admonition} Hint: Load multiple Executors from the same directory:class: hintYou don't need to specify the parent directory for each Executor.Instead, you can configure a common search path for all Executors:```.├── app│ └── ▶ main.py└── executor ├── config1.yml ├── config2.yml └── my_executor.py``````{code-block} pythondep = Deployment(extra_search_paths=['../executor']).add(uses='config1.yml')) # Deploymentf = Flow(extra_search_paths=['../executor']).add(uses='config1.yml').add(uses='config2.yml') # Flow```````(flow-configure-executors)=## Configure ExecutorsYou can set and override {class}`~jina.Executor` configuration when adding them to an Orchestration.This example shows how to start a Flow with an Executor using the Python API:````{tab} Deployment```pythonfrom jina import Deploymentdep = Deployment( uses='MyExecutor', py_modules=["executor.py"], uses_with={"parameter_1": "foo", "parameter_2": "bar"}, uses_metas={ "name": "MyExecutor", "description": "MyExecutor does a thing to the stuff in your Documents", }, uses_requests={"/index": "my_index", "/search": "my_search", "/random": "foo"}, workspace="some_custom_path",)with dep: ...```````````{tab} Flow```pythonfrom jina import Flowf = Flow().add( uses='MyExecutor', py_modules=["executor.py"], uses_with={"parameter_1": "foo", "parameter_2": "bar"}, uses_metas={ "name": "MyExecutor", "description": "MyExecutor does a thing to the stuff in your Documents", }, uses_requests={"/index": "my_index", "/search": "my_search", "/random": "foo"}, workspace="some_custom_path",)with f: ...```````- `py_modules` is a list of strings that defines the Executor's Python dependencies;- `uses_with` is a key-value map that defines the {ref}`arguments of the Executor'` `__init__` method.- `uses_requests` is a key-value map that defines the {ref}`mapping from endpoint to class method`. This is useful to overwrite the default endpoint-to-method mapping defined in the Executor python implementation.- `uses_metas` is a key-value map that defines some of the Executor's {ref}`internal attributes`. It contains the following fields: - `name` is a string that defines the name of the Executor; - `description` is a string that defines the description of this Executor. It is used in the automatic docs UI;- `workspace` is a string that defines the {ref}`workspace `.### Set `with` via `uses_with`To set/override an Executor's `with` configuration, use `uses_with`. The `with` configuration refers to user-definedconstructor kwargs.````{tab} Deployment```pythonfrom jina import Executor, requests, Deploymentclass MyExecutor(Executor): def __init__(self, param1=1, param2=2, param3=3, *args, **kwargs): super().__init__(*args, **kwargs) self.param1 = param1 self.param2 = param2 self.param3 = param3 @requests def foo(self, docs, **kwargs): print('param1:', self.param1) print('param2:', self.param2) print('param3:', self.param3)dep = Deployment(uses=MyExecutor, uses_with={'param1': 10, 'param3': 30})with dep: dep.post('/')``````text executor0@219662[L]:ready and listening gateway@219662[L]:ready and listening Deployment@219662[I]:🎉 Deployment is ready to use! 🔗 Protocol: GRPC 🏠 Local access: 0.0.0.0:32825 🔒 Private network: 192.168.1.101:32825 🌐 Public address: 197.28.82.165:32825param1: 10param2: 2param3: 30```````````{tab} Flow```pythonfrom jina import Executor, requests, Flowclass MyExecutor(Executor): def __init__(self, param1=1, param2=2, param3=3, *args, **kwargs): super().__init__(*args, **kwargs) self.param1 = param1 self.param2 = param2 self.param3 = param3 @requests def foo(self, docs, **kwargs): print('param1:', self.param1) print('param2:', self.param2) print('param3:', self.param3)f = Flow().add(uses=MyExecutor, uses_with={'param1': 10, 'param3': 30})with f: f.post('/')``````text executor0@219662[L]:ready and listening gateway@219662[L]:ready and listening Flow@219662[I]:🎉 Flow is ready to use! 🔗 Protocol: GRPC 🏠 Local access: 0.0.0.0:32825 🔒 Private network: 192.168.1.101:32825 🌐 Public address: 197.28.82.165:32825param1: 10param2: 2param3: 30```````### Set `requests` via `uses_requests`You can set/override an Executor's `requests` configuration and bind methods to custom endpoints.In the following code:- We replace the endpoint `/foo` bound to the `foo()` function with both `/non_foo` and `/alias_foo`.- We add a new endpoint `/bar` for binding `bar()`.Note the `all_req()` function is bound to **all** endpoints except those explicitly bound to other functions, i.e. `/non_foo`, `/alias_foo` and `/bar`.````{tab} Deployment```pythonfrom jina import Executor, requests, Deploymentclass MyExecutor(Executor): @requests def all_req(self, parameters, **kwargs): print(f'all req {parameters.get("recipient")}') @requests(on='/foo') def foo(self, parameters, **kwargs): print(f'foo {parameters.get("recipient")}') def bar(self, parameters, **kwargs): print(f'bar {parameters.get("recipient")}')dep = Deployment( uses=MyExecutor, uses_requests={ '/bar': 'bar', '/non_foo': 'foo', '/alias_foo': 'foo', },)with dep dep.post('/bar', parameters={'recipient': 'bar()'}) dep.post('/non_foo', parameters={'recipient': 'foo()'}) dep.post('/foo', parameters={'recipient': 'all_req()'}) dep.post('/alias_foo', parameters={'recipient': 'foo()'})``````text executor0@221058[L]:ready and listening gateway@221058[L]:ready and listening Deployment@221058[I]:🎉 Deployment is ready to use! 🔗 Protocol: GRPC 🏠 Local access: 0.0.0.0:36507 🔒 Private network: 192.168.1.101:36507 🌐 Public address: 197.28.82.165:36507bar bar()foo foo()all req all_req()foo foo()```````````{tab} Flow```pythonfrom jina import Executor, requests, Flowclass MyExecutor(Executor): @requests def all_req(self, parameters, **kwargs): print(f'all req {parameters.get("recipient")}') @requests(on='/foo') def foo(self, parameters, **kwargs): print(f'foo {parameters.get("recipient")}') def bar(self, parameters, **kwargs): print(f'bar {parameters.get("recipient")}')f = Flow().add( uses=MyExecutor, uses_requests={ '/bar': 'bar', '/non_foo': 'foo', '/alias_foo': 'foo', },)with f: f.post('/bar', parameters={'recipient': 'bar()'}) f.post('/non_foo', parameters={'recipient': 'foo()'}) f.post('/foo', parameters={'recipient': 'all_req()'}) f.post('/alias_foo', parameters={'recipient': 'foo()'})``````text executor0@221058[L]:ready and listening gateway@221058[L]:ready and listening Flow@221058[I]:🎉 Flow is ready to use! 🔗 Protocol: GRPC 🏠 Local access: 0.0.0.0:36507 🔒 Private network: 192.168.1.101:36507 🌐 Public address: 197.28.82.165:36507bar bar()foo foo()all req all_req()foo foo()```````### Set `metas` via `uses_metas`To set/override an Executor's `metas` configuration, use `uses_metas`:````{tab} Deployment```pythonfrom jina import Executor, requests, Deploymentclass MyExecutor(Executor): @requests def foo(self, docs, **kwargs): print(self.metas.name)dep = Deployment( uses=MyExecutor, uses_metas={'name': 'different_name'},)with dep: dep.post('/')``````text executor0@219291[L]:ready and listening gateway@219291[L]:ready and listening Deployment@219291[I]:🎉 Deployment is ready to use! 🔗 Protocol: GRPC 🏠 Local access: 0.0.0.0:58827 🔒 Private network: 192.168.1.101:58827different_name```````````{tab} Flow```pythonfrom jina import Executor, requests, Flowclass MyExecutor(Executor): @requests def foo(self, docs, **kwargs): print(self.metas.name)flow = Flow().add( uses=MyExecutor, uses_metas={'name': 'different_name'},)with flow as f: f.post('/')``````text executor0@219291[L]:ready and listening gateway@219291[L]:ready and listening Flow@219291[I]:🎉 Flow is ready to use! 🔗 Protocol: GRPC 🏠 Local access: 0.0.0.0:58827 🔒 Private network: 192.168.1.101:58827different_name```````(external-executors)=## Use external ExecutorsUsually an Orchestration starts and stops its own Executor(s). External Executors are owned by *other* Orchestrations, meaning they can reside on any machine and their lifetime are controlled by others.Using external Executors is useful for sharing expensive Executors (like stateless, GPU-based encoders) between Orchestrations.Both {ref}`served and shared Executors ` can be used as external Executors.When you add an external Executor, you have to provide a `host` and `port`, and enable the `external` flag:````{tab} Deployment```pythonfrom jina import DeploymentDeployment(host='123.45.67.89', port=12345, external=True)# orDeployment(host='123.45.67.89:12345', external=True)```````````{tab} Flow```pythonfrom jina import FlowFlow().add(host='123.45.67.89', port=12345, external=True)# orFlow().add(host='123.45.67.89:12345', external=True)```````The Orchestration doesn't start or stop this Executor and assumes that it is externally managed and available at `123.45.67.89:12345`.Despite the lifetime control, the external Executor behaves just like a regular one. You can even add the same Executor to multiple Orchestrations.### Enable TLSYou can also use external Executors with `tls`:````{tab} Deployment```pythonfrom jina import DeploymentDeployment(host='123.45.67.89:443', external=True, tls=True)```````````{tab} Flow```pythonfrom jina import FlowFlow().add(host='123.45.67.89:443', external=True, tls=True)```````After that, the external Executor behaves just like an internal one. You can even add the same Executor to multiple Orchestrations.```{hint}Using `tls` to connect to the External Executor is especially needed to use an external Executor deployed with JCloud. See the JCloud {ref}`documentation ` for further details```### Pass argumentsExternal Executors may require extra configuration to run. Think about an Executor that requires authentication to run. You can pass the `grpc_metadata` parameter to the Executor. `grpc_metadata` is a dictionary of key-value pairs to be passed along with every gRPC request sent to that Executor.````{tab} Deployment```pythonfrom jina import DeploymentDeployment( host='123.45.67.89', port=443, external=True, grpc_metadata={'authorization': ''},)```````````{tab} Flow```pythonfrom jina import FlowFlow().add( host='123.45.67.89', port=443, external=True, grpc_metadata={'authorization': ''},)``````````{hint}The `grpc_metadata` parameter here follows the `metadata` concept in gRPC. See [gRPC documentation](https://grpc.io/docs/what-is-grpc/core-concepts/#metadata) for details.``` --- # Source: https://github.com/jina-ai/jina/blob/master/docs/concepts/orchestration/deployment-args.md| Name | Description | Type | Default ||----|----|----|----|| `name` | The name of this object.

This will be used in the following places:
- how you refer to this object in Python/YAML/CLI
- visualization
- log message header
- ...

When not given, then the default naming strategy will apply. | `string` | `None` || `workspace` | The working directory for any IO operations in this object. If not set, then derive from its parent `workspace`. | `string` | `None` || `log_config` | The config name or the absolute path to the YAML config file of the logger used in this object. | `string` | `default` || `quiet` | If set, then no log will be emitted from this object. | `boolean` | `False` || `quiet_error` | If set, then exception stack information will not be added to the log | `boolean` | `False` || `suppress_root_logging` | If set, then no root handlers will be suppressed from logging. | `boolean` | `False` || `uses` | The YAML path represents a flow. It can be either a local file path or a URL. | `string` | `None` || `reload` | If set, auto-reloading on file changes is enabled: the Flow will restart while blocked if YAML configuration source is changed. This also applies apply to underlying Executors, if their source code or YAML configuration has changed. | `boolean` | `False` || `env` | The map of environment variables that are available inside runtime | `object` | `None` || `inspect` | The strategy on those inspect deployments in the flow.

If `REMOVE` is given then all inspect deployments are removed when building the flow. | `string` | `COLLECT` | --- # Source: https://github.com/jina-ai/jina/blob/master/docs/concepts/orchestration/deployment.md(deployment)=# Deployment```{important}A Deployment is part of the orchestration layer {ref}`Orchestration `. Be sure to read up on that too!```A {class}`~jina.Deployment` orchestrates a single {class}`~jina.Executor` to accomplish a task. Documents are processed by Executors.You can think of a Deployment as an interface to configure and launch your {ref}`microservice architecture `, while the heavy lifting is done by the {ref}`service ` itself.(why-deployment)=## Why use a Deployment?Once you've learned about Documents, DocLists and Executors, you can split a big task into small independent modules and services.- Deployments let you scale these Executors independently to match your requirements.- Deployments let you easily use other cloud-native orchestrators, such as Kubernetes, to manage your service.(create-deployment)=## CreateThe most trivial {class}`~jina.Deployment` is an empty one. It can be defined in Python or from a YAML file:````{tab} Python```pythonfrom jina import Deploymentdep = Deployment()```````````{tab} YAML```yamljtype: Deployment```````For production, you should define your Deployments with YAML. This is because YAML files are independent of the Python logic code and easier to maintain.## Minimum working example````{tab} Pythonic style```pythonfrom jina import Deployment, Executor, requestsfrom docarray import DocList, BaseDocclass MyExecutor(Executor): @requests(on='/bar') def foo(self, docs: DocList[BaseDoc], **kwargs) -> DocList[BaseDoc]: print(docs)dep = Deployment(name='myexec1', uses=MyExecutor)with dep: dep.post(on='/bar', inputs=BaseDoc(), return_type=DocList[BaseDoc], on_done=print)```````````{tab} Deployment-as-a-Service styleServer:```pythonfrom jina import Deployment, Executor, requestsfrom docarray import DocList, BaseDocclass MyExecutor(Executor): @requests(on='/bar') def foo(self, docs: DocList[BaseDoc], **kwargs) -> DocList[BaseDoc]: print(docs)dep = Deployment(port=12345, name='myexec1', uses=MyExecutor)with dep: dep.block()```Client:```pythonfrom jina import Clientfrom docarray import DocList, BaseDocc = Client(port=12345)c.post(on='/bar', inputs=BaseDoc(), return_type=DocList[BaseDoc], on_done=print)```````````{tab} Load from YAML`deployment.yml`:```yamljtype: Deploymentname: myexec1uses: FooExecutorpy_modules: exec.py````exec.py`:```pythonfrom jina import Deployment, Executor, requestsfrom docarray import DocList, BaseDocfrom docarray.documents import TextDocclass FooExecutor(Executor): @requests def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]: for doc in docs: doc.text = 'foo was here' docs.summary() return docs``````pythonfrom jina import Deploymentfrom docarray import DocList, BaseDocfrom docarray.documents import TextDocdep = Deployment.load_config('deployment.yml')with dep: try: dep.post(on='/bar', inputs=TextDoc(), on_done=print) except Exception as ex: # handle exception pass``````````{caution}The statement `with dep:` starts the Deployment, and exiting the indented with block stops the Deployment, including its Executors.Exceptions raised inside the `with dep:` block will close the Deployment context manager. If you don't want this, use a `try...except` block to surround the statements that could potentially raise an exception.```## Convert between Python and YAMLA Python Deployment definition can easily be converted to/from a YAML definition:````{tab} Load from YAML```pythonfrom jina import Deploymentdep = Deployment.load_config('flow.yml')```````````{tab} Export to YAML```pythonfrom jina import Deploymentdep = Deployment()dep.save_config('deployment.yml')```````## Start and stopWhen a {class}`~jina.Deployment` starts, all the replicated Executors will start as well, making it possible to {ref}`reach the service through its API `.There are three ways to start a Deployment: In Python, from a YAML file, or from the terminal.- Generally in Python: use Deployment as a context manager.- As an entrypoint from terminal: use `Jina CLI ` and a Deployment YAML file.- As an entrypoint from Python code: use Deployment as a context manager inside `if __name__ == '__main__'`- No context manager, manually call {meth}`~jina.Deployment.start` and {meth}`~jina.Deployment.close`.````{tab} General in Python```pythonfrom jina import Deploymentdep = Deployment()with dep: pass```The statement `with dep:` starts the Deployment, and exiting the indented `with` block stops the Deployment, including its Executor.````````{tab} Jina-serve CLI entrypoint```bashjina deployment --uses deployment.yml```````````{tab} Python entrypoint```pythonfrom jina import Deploymentdep = Deployment()if __name__ == '__main__': with dep: pass```The statement `with dep:` starts the Deployment, and exiting the indented `with` block stops the Deployment, including its Executor.````````{tab} Python no context manager```pythonfrom jina import Deploymentdep = Deployment()dep.start()dep.close()```````Your addresses and entrypoints can be found in the output. When you enable more features such as monitoring, HTTP gateway, TLS encryption, this display expands to contain more information.(multiprocessing-spawn)=### Set multiprocessing `spawn`Some corner cases require forcing a `spawn` start method for multiprocessing, for example if you encounter "Cannot re-initialize CUDA in forked subprocess".You can use `JINA_MP_START_METHOD=spawn` before starting the Python script to enable this.```bashJINA_MP_START_METHOD=spawn python app.py``````{caution}In case you set `JINA_MP_START_METHOD=spawn`, make sure to use Flow as a context manager inside `if __name__ == '__main__'`.The script entrypoint (starting the flow) [needs to be protected when using `spawn` start method](https://docs.python.org/3/library/multiprocessing.html#the-spawn-and-forkserver-start-methods).```````{hint}There's no need to set this for Windows, as it only supports spawn method for multiprocessing.````## Serve### Serve foreverIn most scenarios, a Deployment should remain reachable for prolonged periods of time. This can be achieved from the terminal:````{tab} Python```pythonfrom jina import Deploymentdep = Deployment()with dep: dep.block()````````{tab} YAML```shelljina-serve deployment --uses deployment.yml```````The `.block()` method blocks the execution of the current thread or process, enabling external clients to access the Deployment.In this case, the Deployment can be stopped by interrupting the thread or process.### Serve until an eventAlternatively, a `multiprocessing` or `threading` `Event` object can be passed to `.block()`, which stops the Deployment once set.```pythonfrom jina import Deploymentimport threadingdef start_deployment(stop_event): """start a blocking Deployment.""" dep = Deployment() with dep: dep.block(stop_event=stop_event)e = threading.Event() # create new Eventt = threading.Thread(name='Blocked-Flow', target=start_flow, args=(e,))t.start() # start Deployment in new Thread# do some stuffe.set() # set event and stop (unblock) the Deployment```## ExportA Deployment YAML can be exported as a Docker Compose YAML or Kubernetes YAML bundle.(docker-compose-export)=### Docker Compose````{tab} Python```pythonfrom jina import Deploymentdep = Deployment()dep.to_docker_compose_yaml()```````````{tab} Terminal```shelljina-serve export docker-compose deployment.yml docker-compose.yml```````This will generate a single `docker-compose.yml` file.For advanced utilization of Docker Compose with Jina-serve, refer to {ref}`How to `(deployment-kubernetes-export)=### Kubernetes````{tab} Python```pythonfrom jina import Deploymentdep = Deploymentdep.to_kubernetes_yaml('dep_k8s_configuration')```````````{tab} Terminal```shelljina-serve export kubernetes deployment.yml ./my-k8s```````The generated folder can be used directly with `kubectl` to deploy the Deployment to an existing Kubernetes cluster.For advanced utilisation of Kubernetes with Jina-serve please refer to {ref}`How to ````{tip}Based on your local Jina version, Executor Hub may rebuild the Docker image during the YAML generation process.If you do not wish to rebuild the image, set the environment variable `JINA_HUB_NO_IMAGE_REBUILD`.``````{tip}If an Executor requires volumes to be mapped to persist data, Jina will create a StatefulSet for that Executor instead of a Deployment.You can control the access mode, storage class name and capacity of the attached Persistent Volume Claim by using {ref}`Jina environment variables ``JINA_K8S_ACCESS_MODES`, `JINA_K8S_STORAGE_CLASS_NAME` and `JINA_K8S_STORAGE_CAPACITY`. Only the first volume will be considered to be mounted.``````{admonition} See also:class: seealsoFor more in-depth guides on deployment, check our how-tos for {ref}`Docker compose ` and {ref}`Kubernetes `.``````{caution}The port or ports arguments are ignored when calling the Kubernetes YAML, Jina-serve will start the services binding to the ports 8080, except when multiple protocolsneed to be served when the consecutive ports (8081, ...) will be used. This is because the Kubernetes service will direct the traffic from you and it is irrelevantto the services around because in Kubernetes services communicate via the service names irrespective of the internal port.```(logging-configuration)=## LoggingThe default {class}`jina.logging.logger.JinaLogger` uses rich console logging that writes to the system console. The `log_config` argument can be used to pass in a string of the pre-configured logging configuration names in Jina-serve or the absolute YAML file path of the custom logging configuration. For most cases, the default logging configuration sufficiently covers local, Docker and Kubernetes environments.Custom logging handlers can be configured by following the Python official [Logging Cookbook](https://docs.python.org/3/howto/logging-cookbook.html#logging-cookbook) examples. An example custom logging configuration file defined in a YAML file `logging.json.yml` is:```yamlhandlers: - StreamHandlerlevel: INFOconfigs: StreamHandler: format: '%(asctime)s:{name:>15}@%(process)2d[%(levelname).1s]:%(message)s' formatter: JsonFormatter```The logging configuration can be used as follows:````{tab} Python```pythonfrom jina import Deploymentdep = Deployment(log_config='./logging.json.yml')```````````{tab} YAML```yamljtype: Deploymentwith: log_config: './logging.json.yml'```````### Supported protocolsA Deployment can be used to deploy an Executor and serve it using `gRPC` or `HTTP` protocol, or a composition of them.### gRPC protocolgRPC is the default protocol used by a Deployment to expose Executors to the outside world, and is used to communicate between the Gateway and an Executor inside a Flow.### HTTP protocolHTTP can be used for a stand-alone Deployment (without being part of a Flow), which allows external services to connect via REST.```pythonfrom jina import Deployment, Executor, requestsfrom docarray import DocListfrom docarray.documents import TextDocclass MyExec(Executor): @requests def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]: for doc in docs: doc.text = 'foo was here' docs.summary() return docsdep = Deployment(protocol='http', port=12345, uses=MyExec)with dep: dep.block()````This will make it available at port 12345 and you can get the [OpenAPI schema](https://swagger.io/specification/) for the service.```{figure} images/http-deployment-swagger.png:scale: 70%```### Composite protocolA Deployment can also deploy an Executor and serve it with a combination of gRPC and HTTP protocols.```pythonfrom jina import Deployment, Executor, requestsfrom docarray import DocListfrom docarray.documents import TextDocclass MyExec(Executor): @requests def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]: for doc in docs: doc.text = 'foo was here' docs.summary() return docsdep = Deployment(protocol=['grpc', 'http'], port=[12345, 12346], uses=MyExec)with dep: dep.block()````This will make the Deployment reachable via gRPC and HTTP simultaneously.## MethodsThe most important methods of the `Deployment` object are the following:| Method | Description ||--------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|| {meth}`~jina.Deployment.start()` | Starts the Deployment. This will start all its Executors and check if they are ready to be used. || {meth}`~jina.Deployment.close()` | Stops and closes the Deployment. This will stop and shutdown all its Executors. || `with` context manager | Uses the Deployment as a context manager. It will automatically start and stop your Deployment. | || {meth}`~jina.clients.mixin.PostMixin.post()` | Sends requests to the Deployment API. || {meth}`~jina.Deployment.block()` | Blocks execution until the program is terminated. This is useful to keep the Deployment alive so it can be used from other places (clients, etc). || {meth}`~jina.Deployment.to_docker_compose_yaml()` | Generates a Docker-Compose file listing all Executors as services. || {meth}`~jina.Deployment.to_kubernetes_yaml()` | Generates Kubernetes configuration files in ``. Based on your local Jina-serve version, Executor Hub may rebuild the Docker image during the YAML generation process. If you do not wish to rebuild the image, set the environment variable `JINA_HUB_NO_IMAGE_REBUILD`. || {meth}`~jina.clients.mixin.HealthCheckMixin.is_deployment_ready()` | Check if the Deployment is ready to process requests. Returns a boolean indicating the readiness. | --- # Source: https://github.com/jina-ai/jina/blob/master/docs/concepts/orchestration/flow-args.md| Name | Description | Type | Default ||----|----|----|----|| `name` | The name of this object.

This will be used in the following places:
- how you refer to this object in Python/YAML/CLI
- visualization
- log message header
- ...

When not given, then the default naming strategy will apply. | `string` | `None` || `workspace` | The working directory for any IO operations in this object. If not set, then derive from its parent `workspace`. | `string` | `None` || `log_config` | The config name or the absolute path to the YAML config file of the logger used in this object. | `string` | `default` || `quiet` | If set, then no log will be emitted from this object. | `boolean` | `False` || `quiet_error` | If set, then exception stack information will not be added to the log | `boolean` | `False` || `suppress_root_logging` | If set, then no root handlers will be suppressed from logging. | `boolean` | `False` || `uses` | The YAML path represents a flow. It can be either a local file path or a URL. | `string` | `None` || `reload` | If set, auto-reloading on file changes is enabled: the Flow will restart while blocked if YAML configuration source is changed. This also applies apply to underlying Executors, if their source code or YAML configuration has changed. | `boolean` | `False` || `env` | The map of environment variables that are available inside runtime | `object` | `None` || `inspect` | The strategy on those inspect deployments in the flow.

If `REMOVE` is given then all inspect deployments are removed when building the flow. | `string` | `COLLECT` | --- # Source: https://github.com/jina-ai/jina/blob/master/docs/concepts/orchestration/flow.md(flow-cookbook)=# Flow```{important}A Flow is a set of {ref}`Deployments `. Be sure to read up on those before diving more deeply into Flows!```A {class}`~jina.Flow` orchestrates {class}`~jina.Executor`s into a processing pipeline to accomplish a task. Documents "flow" through the pipeline and are processed by Executors.You can think of Flow as an interface to configure and launch your {ref}`microservice architecture `, while the heavy lifting is done by the {ref}`services ` themselves. In particular, each Flow also launches a {ref}`Gateway ` service, which can expose all other services through an API that you define.## Why use a Flow?Once you've learned about Documents, DocList and Executor,, you can split a big task into small independent modules and services.But you need to chain them together to create, build ,and serve an application. Flows enable you to do exactly this.- Flows connect microservices (Executors) to build a service with proper client/server style interfaces over HTTP, gRPC, or WebSockets.- Flows let you scale these Executors independently to match your requirements.- Flows let you easily use other cloud-native orchestrators, such as Kubernetes, to manage your service.(create-flow)=## CreateThe most trivial {class}`~jina.Flow` is an empty one. It can be defined in Python or from a YAML file:````{tab} Python```pythonfrom jina import Flowf = Flow()```````````{tab} YAML```yamljtype: Flow``````````{important}All arguments received by {class}`~jina.Flow()` API will be propagated to other entities (Gateway, Executor) with the following exceptions:- `uses` and `uses_with` won't be passed to Gateway- `port`, `port_monitoring`, `uses` and `uses_with` won't be passed to Executor``````{tip}An empty Flow contains only {ref}`the Gateway`.``````{figure} images/zero-flow.svg:scale: 70%```For production, you should define your Flows with YAML. This is because YAML files are independent of the Python logic code and easier to maintain.## Minimum working example````{tab} Pythonic style```pythonfrom jina import Flow, Executor, requestsfrom docarray import DocList, BaseDocclass MyExecutor(Executor): @requests(on='/bar') def foo(self, docs: DocList[BaseDoc], **kwargs) -> DocList[BaseDoc]: print(docs)f = Flow().add(name='myexec1', uses=MyExecutor)with f: f.post(on='/bar', inputs=BaseDoc(), return_type=DocList[BaseDoc], on_done=print)```````````{tab} Flow-as-a-Service styleServer:```pythonfrom jina import Flow, Executor, requestsfrom docarray import DocList, BaseDocclass MyExecutor(Executor): @requests(on='/bar') def foo(self, docs: DocList[BaseDoc], **kwargs) -> DocList[BaseDoc]: print(docs)f = Flow(port=12345).add(name='myexec1', uses=MyExecutor)with f: f.block()```Client:```pythonfrom jina import Client, Documentc = Client(port=12345)c.post(on='/bar', inputs=BaseDoc(), return_type=DocList[BaseDoc], on_done=print)```````````{tab} Load from YAML`my.yml`:```yamljtype: Flowexecutors: - name: myexec1 uses: FooExecutor py_modules: exec.py````exec.py`:```pythonfrom jina import Deployment, Executor, requestsfrom docarray import DocList, BaseDocfrom docarray.documents import TextDocclass FooExecutor(Executor): @requests def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]: for doc in docs: doc.text = 'foo was here' docs.summary() return docs``````pythonfrom jina import Flowfrom docarray import DocList, BaseDocfrom docarray.documents import TextDocf = Flow.load_config('my.yml')with f: try: f.post(on='/bar', inputs=TextDoc(), on_done=print) except Exception as ex: # handle exception pass``````````{caution}The statement `with f:` starts the Flow, and exiting the indented with block stops the Flow, including all Executors defined in it.Exceptions raised inside the `with f:` block will close the Flow context manager. If you don't want this, use a `try...except` block to surround the statements that could potentially raise an exception.```## Start and stopWhen a {class}`~jina.Flow` starts, all included Executors (single for a Deployment, multiple for a Flow) will start as well, making it possible to {ref}`reach the service through its API `.There are three ways to start an Flow: In Python, from a YAML file, or from the terminal.- Generally in Python: use Deployment or Flow as a context manager in Python.- As an entrypoint from terminal: use `Jina CLI ` and a Flow YAML file.- As an entrypoint from Python code: use Flow as a context manager inside `if __name__ == '__main__'`- No context manager: manually call {meth}`~jina.Flow.start` and {meth}`~jina.Flow.close`.````{tab} General in Python```pythonfrom jina import Flowf = Flow()with f: pass```````````{tab} Jina-serve CLI entrypoint```bashjina flow --uses flow.yml```````````{tab} Python entrypoint```pythonfrom jina import Flowf = Flow()if __name__ == '__main__': with f: pass```````````{tab} Python no context manager```pythonfrom jina import Flowf = Flow()f.start()f.close()```````The statement `with f:` starts the Flow, and exiting the indented `with` block stops the Flow, including all its Executors.A successful start of a Flow looks like this:```{figure} images/success-flow.png:scale: 70%```Your addresses and entrypoints can be found in the output. When you enable more features such as monitoring, HTTP gateway, TLS encryption, this display expands to contain more information.```{admonition} Multiprocessing spawn:class: warningSome corner cases require forcing a `spawn` start method for multiprocessing, for example if you encounter "Cannot re-initialize CUDA in forked subprocess". Read {ref}`more in the docs ````## Serve### Serve foreverIn most scenarios, a Flow should remain reachable for prolonged periods of time. This can be achieved from Python or the terminal:````{tab} Python```pythonfrom jina import Flowf = Flow()with f: f.block()```The `.block()` method blocks the execution of the current thread or process, enabling external clients to access the Flow.````````{tab} Terminal```shelljina flow --uses flow.yml```````In this case, the Flow can be stopped by interrupting the thread or process.### Serve until an eventAlternatively, a `multiprocessing` or `threading` `Event` object can be passed to `.block()`, which stops the Flow once set.```pythonfrom jina import Flowimport threadingdef start_flow(stop_event): """start a blocking Flow.""" f = Flow() with f: f.block(stop_event=stop_event)e = threading.Event() # create new Eventt = threading.Thread(name='Blocked-Flow', target=start_flow, args=(e,))t.start() # start Flow in new Thread# do some stuffe.set() # set event and stop (unblock) the Flow```### Serve on Google Colab```{admonition} Example built with docarray<0.30:class: noteThis example is built using docarray<0.30 version. Most of the concepts are similar, but some APIs of how Executors are built change when using newer docarray version.```[Google Colab](https://colab.research.google.com/) provides an easy-to-use Jupyter notebook environment with GPU/TPU support. Flows are fully compatible with Google Colab and you can use it in the following ways:```{figure} images/jina-on-colab.svg:align: center:width: 70%``````{button-link} https://colab.research.google.com/github/jina-ai/jina/blob/master/docs/Using_Jina_on_Colab.ipynb:color: primary:align: center{octicon}`link-external` Open the notebook on Google Colab```Please follow the walkthrough and enjoy the free GPU/TPU!```{tip}Hosing services on Google Colab is not recommended if your server aims to be long-lived or permanent. It is often used for quick experiments, demonstrations or leveraging its free GPU/TPU. For stable, secure and free hosting of your Flow, check out [JCloud](https://jina.ai/serve/concepts/jcloud/).```## ExportA Flow YAML can be exported as a Docker Compose YAML or Kubernetes YAML bundle.(docker-compose-export)=### Docker Compose````{tab} Python```pythonfrom jina import Flowf = Flow().add()f.to_docker_compose_yaml()```````````{tab} Terminal```shelljina export docker-compose flow.yml docker-compose.yml```````This will generate a single `docker-compose.yml` file.For advanced utilization of Docker Compose with Jina, refer to {ref}`How to `(flow-kubernetes-export)=### Kubernetes````{tab} Python```pythonfrom jina import Flowf = Flow().add()f.to_kubernetes_yaml('flow_k8s_configuration')```````````{tab} Terminal```shelljina export kubernetes flow.yml ./my-k8s```````The generated folder can be used directly with `kubectl` to deploy the Flow to an existing Kubernetes cluster.For advanced utilisation of Kubernetes with Jina please refer to {ref}`How to ````{tip}Based on your local Jina-serve version, Executor Hub may rebuild the Docker image during the YAML generation process.If you do not wish to rebuild the image, set the environment variable `JINA_HUB_NO_IMAGE_REBUILD`.``````{tip}If an Executor requires volumes to be mapped to persist data, Jina-serve will create a StatefulSet for that Executor instead of a Deployment.You can control the access mode, storage class name and capacity of the attached Persistent Volume Claim by using {ref}`Jina environment variables ``JINA_K8S_ACCESS_MODES`, `JINA_K8S_STORAGE_CLASS_NAME` and `JINA_K8S_STORAGE_CAPACITY`. Only the first volume will be considered to be mounted.``````{admonition} See also:class: seealsoFor more in-depth guides on Flow deployment, check our how-tos for {ref}`Docker compose ` and {ref}`Kubernetes `.``````{caution}The port or ports arguments are ignored when calling the Kubernetes YAML, Jina will start the services binding to the ports 8080, except when multiple protocolsneed to be served when the consecutive ports (8081, ...) will be used. This is because the Kubernetes service will direct the traffic from you and it is irrelevantto the services around because in Kubernetes services communicate via the service names irrespective of the internal port.```## Add Executors```{important}This section is for Flow-specific considerations when working with Executors. Check more information on {ref}`working with Executors `.```A {class}`~jina.Flow` orchestrates its {class}`~jina.Executor`s as a graph and sends requests to all Executors in the order specified by {meth}`~jina.Flow.add` or listed in {ref}`a YAML file`.When you start a Flow, Executors always run in **separate processes**. Multiple Executors run in **different processes**. Multiprocessing is the lowest level of separation when you run a Flow locally. When running a Flow on Kubernetes, Docker Swarm, {ref}`jcloud`, different Executors run in different containers, pods or instances.Executors can be added into a Flow with {meth}`~jina.Flow.add`.```pythonfrom jina import Flowf = Flow().add()```This adds an "empty" Executor called {class}`~jina.serve.executors.BaseExecutor` to the Flow. This Executor (without any parameters) performs no actions.```{figure} images/no-op-flow.svg:scale: 70%```To more easily identify an Executor, you can change its name by passing the `name` parameter:```pythonfrom jina import Flowf = Flow().add(name='myVeryFirstExecutor').add(name='secondIsBest')``````{figure} images/named-flow.svg:scale: 70%```You can also define the above Flow in YAML:```yamljtype: Flowexecutors: - name: myVeryFirstExecutor - name: secondIsBest```Save it as `flow.yml` and run it:```bashjina flow --uses flow.yml```More Flow YAML specifications can be found in {ref}`Flow YAML Specification`.### How Executors process Documents in a FlowLet's understand how Executors process Documents's inside a Flow, and how changes are chained and applied, affecting downstream Executors in the Flow.```pythonfrom jina import Executor, requests, Flowfrom docarray import DocList, BaseDocfrom docarray.documents import TextDocclass PrintDocuments(Executor): @requests def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]: for doc in docs: print(f' PrintExecutor: received document with text: "{doc.text}"') return docsclass ProcessDocuments(Executor): @requests(on='/change_in_place') def in_place(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]: # This Executor only works on `docs` and doesn't consider any other arguments for doc in docs: print(f'ProcessDocuments: received document with text "{doc.text}"') doc.text = 'I changed the executor in place' @requests(on='/return_different_docarray') def ret_docs(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]: # This executor only works on `docs` and doesn't consider any other arguments ret = DocList[TextDoc]() for doc in docs: print(f'ProcessDocuments: received document with text: "{doc.text}"') ret.append(TextDoc(text='I returned a different Document')) return retf = Flow().add(uses=ProcessDocuments).add(uses=PrintDocuments)with f: f.post(on='/change_in_place', inputs=DocList[TextDoc]([TextDoc(text='request1')]), return_type=DocList[TextDoc]) f.post( on='/return_different_docarray', inputs=DocList[TextDoc]([TextDoc(text='request2')]), return_type=DocList[TextDoc])) )``````shell────────────────────────── 🎉 Flow is ready to serve! ──────────────────────────╭────────────── 🔗 Endpoint ───────────────╮│ ⛓ Protocol GRPC ││ 🏠 Local 0.0.0.0:58746 ││ 🔒 Private 192.168.1.187:58746 ││ 🌍 Public 212.231.186.65:58746 │╰──────────────────────────────────────────╯ ProcessDocuments: received document with text "request1" PrintExecutor: received document with text: "I changed the executor in place" ProcessDocuments: received document with text: "request2" PrintExecutor: received document with text: "I returned a different Document"```### Define topologies over Executors{class}`~jina.Flow`s are not restricted to sequential execution. Internally they are modeled as graphs, so they can represent any complex, non-cyclic topology.A typical use case for such a Flow is a topology with a common pre-processing part, but different indexers separating embeddings and data.To define a custom topology you can use the `needs` keyword when adding an {class}`~jina.Executor`. By default, a Flow assumes that every Executor needs the previously added Executor.```pythonfrom jina import Executor, requests, Flowfrom docarray import DocListfrom docarray.documents import TextDocclass FooExecutor(Executor): @requests async def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]: docs.append(TextDoc(text=f'foo was here and got {len(docs)} document'))class BarExecutor(Executor): @requests async def bar(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]: docs.append(TextDoc(text=f'bar was here and got {len(docs)} document'))class BazExecutor(Executor): @requests async def baz(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]: docs.append(TextDoc(text=f'baz was here and got {len(docs)} document'))class MergeExecutor(Executor): @requests async def merge(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]: return docsf = ( Flow() .add(uses=FooExecutor, name='fooExecutor') .add(uses=BarExecutor, name='barExecutor', needs='fooExecutor') .add(uses=BazExecutor, name='bazExecutor', needs='fooExecutor') .add(uses=MergeExecutor, needs=['barExecutor', 'bazExecutor']))``````{figure} images/needs-flow.svg:width: 70%:align: centerComplex Flow where one Executor requires two Executors to process Documents beforehand```When sending message to this Flow,```pythonwith f: print(f.post('/', return_type=DocList[TextDoc]).text)```This gives the output:```text['foo was here and got 0 document', 'bar was here and got 1 document', 'baz was here and got 1 document']```Both `BarExecutor` and `BazExecutor` only received a single `Document` from `FooExecutor` because they are run in parallel. The last Executor `executor3` receives both DocLists and merges them automatically.This automated merging can be disabled with `no_reduce=True`. This is useful for providing custom merge logic in a separate Executor. In this case the last `.add()` call would look like `.add(needs=['barExecutor', 'bazExecutor'], uses=CustomMergeExecutor, no_reduce=True)`. This feature requires Jina >= 3.0.2.## Chain Executors in Flow with different schemasWhen using `docarray>=0.30.0`, when building a Flow you should ensure that the Document types used as input of an Executor match the schemaof the output of its incoming previous Flow.For instance, this Flow will fail to start because the Document types are wrongly chained.````{tab} Valid Flow```{code-block} pythonfrom jina import Executor, requests, Flowfrom docarray import DocList, BaseDocfrom docarray.typing import NdArrayimport numpy as npclass SimpleStrDoc(BaseDoc): text: strclass TextWithEmbedding(SimpleStrDoc): embedding: NdArrayclass TextEmbeddingExecutor(Executor): @requests(on='/foo') def foo(docs: DocList[SimpleStrDoc], **kwargs) -> DocList[TextWithEmbedding] ret = DocList[TextWithEmbedding]() for doc in docs: ret.append(TextWithEmbedding(text=doc.text, embedding=np.ramdom.rand(10)) return retclass ProcessEmbedding(Executor): @requests(on='/foo') def foo(docs: DocList[TextWithEmbedding], **kwargs) -> DocList[TextWithEmbedding] for doc in docs: self.logger.info(f'Getting embedding with shape {doc.embedding.shape}')flow = Flow().add(uses=TextEmbeddingExecutor, name='embed').add(uses=ProcessEmbedding, name='process')with flow: flow.block()```````````{tab} Invalid Flow```{code-block} pythonfrom jina import Executor, requests, Flowfrom docarray import DocList, BaseDocfrom docarray.typing import NdArrayimport numpy as npclass SimpleStrDoc(BaseDoc): text: strclass TextWithEmbedding(SimpleStrDoc): embedding: NdArrayclass TextEmbeddingExecutor(Executor): @requests(on='/foo') def foo(docs: DocList[SimpleStrDoc], **kwargs) -> DocList[TextWithEmbedding] ret = DocList[TextWithEmbedding]() for doc in docs: ret.append(TextWithEmbedding(text=doc.text, embedding=np.ramdom.rand(10)) return retclass ProcessText(Executor): @requests(on='/foo') def foo(docs: DocList[SimpleStrDoc], **kwargs) -> DocList[TextWithEmbedding] for doc in docs: self.logger.info(f'Getting embedding with type {doc.text}')# This Flow will fail to start because the input type of "process" does not match the output type of "embed"flow = Flow().add(uses=TextEmbeddingExecutor, name='embed').add(uses=ProcessText, name='process')with flow: flow.block()```````Jina is also compatible with docarray<0.30, when using that version, only a single Document schema existed (equivalent to [LegacyDocument]() in docarray>0.30) and thereforethere were no explicit compatibility issues between schemas. However, the complexity was implicitly there (An Executor may expect a Document to be filled with `text` and only fail at Runtime).(floating-executors)=### Floating ExecutorsSome Executors in your Flow can be used for asynchronous background tasks that take time and don't generate a required output. For instance,logging specific information in external services, storing partial results, etc.You can unblock your Flow from such tasks by using *floating Executors*.Normally, all Executors form a pipeline that handles and transforms a given request until it is finally returned to the Client.However, floating Executors do not feed their outputs back into the pipeline. Therefore, the Executor's output does not affect the response for the Client, and the response can be returned without waiting for the floating Executor to complete its task.Those Executors are marked with the `floating` keyword when added to a `Flow`:```pythonimport timefrom jina import Executor, requests, Flowfrom docarray import DocListfrom docarray.documents import TextDocclass FastChangingExecutor(Executor): @requests() def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]: for doc in docs: doc.text = 'Hello World'class SlowChangingExecutor(Executor): @requests() def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]: time.sleep(2) print(f' Received {docs.text}') for doc in docs: doc.text = 'Change the document but will not affect response'f = ( Flow() .add(name='executor0', uses=FastChangingExecutor) .add( name='floating_executor', uses=SlowChangingExecutor, needs=['gateway'], floating=True, ))with f: f.post(on='/endpoint', inputs=DocList[TextDoc]([TextDoc()]), return_type=DocList[TextDoc]) # we need to send a first start_time = time.time() response = f.post(on='/endpoint', inputs=DocList[TextDoc]([TextDoc(), TextDoc()]), return_type=DocList[TextDoc]) end_time = time.time() print(f' Response time took {end_time - start_time}s') print(f' {response.texts}')``````text Response time took 0.011997222900390625s ['Hello World', 'Hello World'] Received ['Hello World', 'Hello World']```In this example the response is returned without waiting for the floating Executor to complete. However, the Flow is not closed untilthe floating Executor has handled the request.You can plot the Flow and see the Executor is floating disconnected from the **Gateway**.```{figure} images/flow_floating.svg:width: 70%```A floating Executor can *never* come before a non-floating Executor in your Flow's {ref}`topology `.This leads to the following behaviors:- **Implicit reordering**: When you add a non-floating Executor after a floating Executor without specifying its `needs` parameter, the non-floating Executor is chained after the previous non-floating one.```pythonfrom jina import Flowf = Flow().add().add(name='middle', floating=True).add()f.plot()``````{figure} images/flow_middle_1.svg:width: 70%```- **Chaining floating Executors**: To chain more than one floating Executor, you need to add all of them with the `floating` flag, and explicitly specify the `needs` argument.```pythonfrom jina import Flowf = Flow().add().add(name='middle', floating=True).add(needs=['middle'], floating=True)f.plot()``````{figure} images/flow_chain_floating.svg:width: 70%```- **Overriding the `floating` flag**: If you add a floating Executor as part of `needs` parameter of a non-floating Executor, then the floating Executor is no longer considered floating.```pythonfrom jina import Flowf = Flow().add().add(name='middle', floating=True).add(needs=['middle'])f.plot()``````{figure} images/flow_cancel_floating.svg:width: 70%```(conditioning)=### Add ConditioningSometimes you may not want all Documents to be processed by all Executors. For example when you process text and image Documents you want to forward them to different Executors depending on their data type.You can set conditioning for every {class}`~jina.Executor` in the Flow. Documents that don't meet the condition will be removed before reaching that Executor. This allows you to build a selection control in the Flow.#### Define conditionsTo add a condition to an Executor, pass it to the `when` parameter of {meth}`~jina.Flow.add` method of the Flow. This then defines *when* a Document is processed by the Executor:You can use the [MongoDB query language](https://www.mongodb.com/docs/compass/current/query/filter/#query-your-data) used in [docarray](https://docs.docarray.org/API_reference/utils/filter/) which follows to specify a filter condition for each Executor.```pythonfrom jina import Flowf = Flow().add(when={'tags__key': {'$eq': 5}})```Then only Documents that satisfy the `when` condition will reach the associated Executor. Any Documents that don't satisfy that condition won't reach the Executor.If you are trying to separate Documents according to the data modality they hold, you need to choose a condition accordingly.````{admonition} See Also:class: seealsoIn addition to `$exists` you can use a number of other operators to define your filter: `$eq`, `$gte`, `$lte`, `$size`, `$and`, `$or` and many more. For details, consult [MongoDB query language](https://www.mongodb.com/docs/compass/current/query/filter/#query-your-data) and [docarray](https://docs.docarray.org/API_reference/utils/filter/).```````python# define filter conditionstext_condition = {'text': {'$exists': True}}tensor_condition = {'tensor': {'$exists': True}}```These conditions specify that only Documents that hold data of a specific modality can pass the filter.````{tab} Python```{code-block} python---emphasize-lines: 16, 24---from jina import Flow, Executor, requestsfrom docarray import DocList, BaseDocfrom typing import Dictclass MyDoc(BaseDoc): text: str = '' tags: Dict[str, int]class MyExec @requests def foo(self, docs: DocList[MyDoc], **kwargs) -> DocList[MyDoc]: for doc in docs: print(f'{doc.tags}')f = Flow().add(uses=MyExec).add(uses=MyExec, when={'tags__key': {'$eq': 5}}) # Create the empty Flow, add conditionwith f: # Using it as a Context Manager starts the Flow ret = f.post( on='/search', inputs=DocList[MyDoc]([MyDoc(tags={'key': 5}), MyDoc(tags={'key': 4})]), return_type=DocList[MyDoc] )for doc in ret: print(f'{doc.tags}') # only the Document fulfilling the condition is processed and therefore returned.``````shell{'key': 5.0}```````````{tab} YAML```yamljtype: Flowexecutors: - name: executor uses: MyExec when: tags__key: $eq: 5``````{code-block} python---emphasize-lines: 9---from jina import Flowf = Flow.load_config('flow.yml') # Load the Flow definition from Yaml filewith f: # Using it as a Context Manager starts the Flow ret = f.post( on='/search', inputs=DocList[MyDoc]([MyDoc(tags={'key': 5}), MyDoc(tags={'key': 4})]), return_type=DocList[MyDoc] )for doc in ret: print(f'{doc.tags}') # only the Document fulfilling the condition is processed and therefore returned.``````shell{'key': 5.0}```````Note that if a Document does not satisfy the `when` condition of a filter, the filter removes the Document *for that entire branch of the Flow*.This means that every Executor located behind a filter is affected by this, not just the specific Executor that defines the condition.As with a real-life filter, once something fails to pass through it, it no longer continues down the pipeline.Naturally, parallel branches in a Flow do not affect each other. So if a Document gets filtered out in only one branch, it canstill be used in the other branch, and also after the branches are re-joined:````{tab} Parallel Executors```{code-block} python---emphasize-lines: 18, 19---from jina import Flow, Executor, requestsfrom docarray import DocList, BaseDocfrom typing import Dictclass MyDoc(BaseDoc): text: str = '' tags: Dict[str, int]class MyExec @requests def foo(self, docs: DocList[MyDoc], **kwargs) -> DocList[MyDoc]: for doc in docs: print(f'{doc.tags}')f = ( Flow() .add(uses=MyExec, name='first') .add(uses=MyExec, when={'tags__key': {'$eq': 5}}, needs='first', name='exec1') .add(uses=MyExec, when={'tags__key': {'$eq': 4}}, needs='first', name='exec2') .needs_all(uses=MyExec, name='join'))``````{figure} images/conditional-flow.svg:width: 70%:align: center``````pythonwith f: ret = f.post( on='/search', inputs=DocList[MyDoc]([MyDoc(tags={'key': 5}), MyDoc(tags={'key': 4})]), return_type=DocList[MyDoc] )for doc in ret: print(f'{doc.tags}')``````shell{'key': 5.0}{'key': 4.0}```````````{tab} Sequential Executors```{code-block} python---emphasize-lines: 18, 19---from jina import Flow, Executor, requestsfrom docarray import DocList, BaseDocfrom typing import Dictclass MyDoc(BaseDoc): text: str = '' tags: Dict[str, int]class MyExec @requests def foo(self, docs: DocList[MyDoc], **kwargs) -> DocList[MyDoc]: for doc in docs: print(f'{doc.tags}')f = ( Flow() .add(uses=MyExec, name='first') .add(uses=MyExec, when={'tags__key': {'$eq': 5}}, name='exec1', needs='first') .add(uses=MyExec, when={'tags__key': {'$eq': 4}}, needs='exec1', name='exec2'))``````{figure} images/sequential-flow.svg:width: 70%``````pythonwith f: ret = f.post( on='/search', inputs=DocList[MyDoc]([MyDoc(tags={'key': 5}), MyDoc(tags={'key': 4})]), return_type=DocList[MyDoc] )for doc in ret: print(f'{doc.tags}')``````shell```````This feature is useful to prevent some specialized Executors from processing certain Documents.It can also be used to build *switch-like nodes*, where some Documents pass through one branch of the Flow,while other Documents pass through a different parallel branch.Note that whenever a Document does not satisfy the condition of an Executor, it is not even sent to that Executor.Instead, only a tailored Request without any payload is transferred.This means that you can not only use this feature to build complex logic, but also to minimize your networking overhead.(merging-upstream)=### Merging upstream DocumentsOften when you're building a Flow, you want an Executor to receive Documents from multiple upstream Executors.```{figure} images/flow-merge-executor.svg:width: 70%:align: center```For this you can use the `docs_matrix` or `docs_map` parameters (part of Executor endpoints signature). These Flow-specific arguments that can be used alongside an Executor's {ref}`default arguments `:```{code-block} python---emphasize-lines: 11, 12---from typing import Dict, Union, List, Optionalfrom jina import Executor, requestsfrom docarray import DocListclass MergeExec(Executor): @requests async def foo( self, docs: DocList[...], parameters: Dict, docs_matrix: Optional[List[DocList[...]]], docs_map: Optional[Dict[str, DocList[...]]], ) -> DocList[MyDoc]: pass```- Use `docs_matrix` to receive a List of all incoming DocLists from upstream Executors:```python[ DocList[...](...), # from Executor1 DocList[...](...), # from Executor2 DocList[...](...), # from Executor3]```- Use `docs_map` to receive a Dict, where each item's key is the name of an upstream Executor and the value is the DocList coming from that Executor:```python{ 'Executor1': DocList[...](...), 'Executor2': DocList[...](...), 'Executor3': DocList[...](...),}```(no-reduce)=#### Reducing multiple DocLists to one DocListThe `no_reduce` argument determines whether DocLists are reduced into one when being received:- To reduce all incoming DocLists into **one single DocList**, do not set `no_reduce` or set it to `False`. The `docs_map` and `docs_matrix` will be `None`.- To receive **a list all incoming DocList** set `no_reduce` to `True`. The Executor will receive the DocLists independently under `docs_matrix` and `docs_map`.```pythonfrom jina import Flow, Executor, requestsfrom docarray import DocList, BaseDocclass MyDoc(BaseDoc): text: str = ''class Exec1(Executor): @requests def foo(self, docs: DocList[MyDoc], **kwargs) -> DocList[MyDoc]: for doc in docs: doc.text = 'Exec1'class Exec2(Executor): @requests def foo(self, docs: DocList[MyDoc], **kwargs) -> DocList[MyDoc]: for doc in docs: doc.text = 'Exec2'class MergeExec(Executor): @requests def foo(self, docs: DocList[MyDoc], docs_matrix, **kwargs) -> DocList[MyDoc]: documents_to_return = DocList[MyDoc]() for doc1, doc2 in zip(*docs_matrix): print( f'MergeExec processing pairs of Documents "{doc1.text}" and "{doc2.text}"' ) documents_to_return.append( MyDoc(text=f'Document merging from "{doc1.text}" and "{doc2.text}"') ) return documents_to_returnf = ( Flow() .add(uses=Exec1, name='exec1') .add(uses=Exec2, name='exec2') .add(uses=MergeExec, needs=['exec1', 'exec2'], no_reduce=True))with f: returned_docs = f.post(on='/', inputs=MyDoc(), return_type=DocList[MyDoc])print(f'Resulting documents {returned_docs[0].text}')``````shell────────────────────────── 🎉 Flow is ready to serve! ──────────────────────────╭────────────── 🔗 Endpoint ───────────────╮│ ⛓ Protocol GRPC ││ 🏠 Local 0.0.0.0:55761 ││ 🔒 Private 192.168.1.187:55761 ││ 🌍 Public 212.231.186.65:55761 │╰──────────────────────────────────────────╯MergeExec processing pairs of Documents "Exec1" and "Exec2"Resulting documents Document merging from "Exec1" and "Exec2"```## VisualizeA {class}`~jina.Flow` has a built-in `.plot()` function which can be used to visualize the `Flow`:```pythonfrom jina import Flowf = Flow().add().add()f.plot('flow.svg')``````{figure} images/flow.svg:width: 70%``````pythonfrom jina import Flowf = Flow().add(name='e1').add(needs='e1').add(needs='e1')f.plot('flow-2.svg')``````{figure} images/flow-2.svg:width: 70%```You can also do it in the terminal:```bashjina export flowchart flow.yml flow.svg```You can also visualize a remote Flow by passing the URL to `jina export flowchart`.(logging-configuration)=## LoggingThe default {class}`jina.logging.logger.JinaLogger` uses rich console logging that writes to the system console. The `log_config` argument can be used to pass in a string of the pre-configured logging configuration names in Jina or the absolute YAML file path of the custom logging configuration. For most cases, the default logging configuration sufficiently covers local, Docker and Kubernetes environments.Custom logging handlers can be configured by following the Python official [Logging Cookbook](https://docs.python.org/3/howto/logging-cookbook.html#logging-cookbook) examples. An example custom logging configuration file defined in a YAML file `logging.json.yml` is:```yamlhandlers: - StreamHandlerlevel: INFOconfigs: StreamHandler: format: '%(asctime)s:{name:>15}@%(process)2d[%(levelname).1s]:%(message)s' formatter: JsonFormatter```The logging configuration can be used as follows:````{tab} Python```pythonfrom jina import Flowf = Flow(log_config='./logging.json.yml')```````````{tab} YAML```yamljtype: Flowwith: log_config: './logging.json.yml'```````(logging-override)=### Custom logging configurationThe default {ref}`logging ` or custom logging configuration at the Flow level will be propagated to the `Gateway` and `Executor` entities. If that is not desired, every `Gateway` or `Executor` entity can be provided with its own custom logging configuration.You can configure two different `Executors` as in the below example:```pythonfrom jina import Flowf = ( Flow().add(log_config='./logging.json.yml').add(log_config='./logging.file.yml')) # Create a Flow with two Executors````logging.file.yml` is another YAML file with a custom `FileHandler` configuration.````{hint}Refer to {ref}`Gateway logging configuration ` section for configuring the `Gateway` logging.````````{caution}When exporting the Flow to Kubernetes, the log_config file path must refer to the absolute local path of each container. The custom loggingfile must be included during the containerization process. If the availability of the file is unknown then its best to rely on the defaultconfiguration. This restriction also applies to dockerized `Executors`. When running a dockerized Executor locally, the logging configurationfile can be mounted using {ref}`volumes `.````## MethodsThe most important methods of the `Flow` object are the following:| Method | Description ||--------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|| {meth}`~jina.Flow.add` | Adds an Executor to the Flow || {meth}`~jina.Flow.start()` | Starts the Flow. This will start all its Executors and check if they are ready to be used. || {meth}`~jina.Flow.close()` | Stops and closes the Flow. This will stop and shutdown all its Executors. || `with` context manager | Uses the Flow as a context manager. It will automatically start and stop your Flow. | || {meth}`~jina.Flow.plot()` | Visualizes the Flow. Helpful for building complex pipelines. || {meth}`~jina.clients.mixin.PostMixin.post()` | Sends requests to the Flow API. || {meth}`~jina.Flow.block()` | Blocks execution until the program is terminated. This is useful to keep the Flow alive so it can be used from other places (clients, etc). || {meth}`~jina.Flow.to_docker_compose_yaml()` | Generates a Docker-Compose file listing all Executors as services. || {meth}`~jina.Flow.to_kubernetes_yaml()` | Generates Kubernetes configuration files in ``. Based on your local Jina and docarray versions, Executor Hub may rebuild the Docker image during the YAML generation process. If you do not wish to rebuild the image, set the environment variable `JINA_HUB_NO_IMAGE_REBUILD`. || {meth}`~jina.clients.mixin.HealthCheckMixin.is_flow_ready()` | Check if the Flow is ready to process requests. Returns a boolean indicating the readiness. | --- # Source: https://github.com/jina-ai/jina/blob/master/docs/concepts/orchestration/gateway-args.md| Name | Description | Type | Default ||----|----|----|----|| `name` | The name of this object.

This will be used in the following places:
- how you refer to this object in Python/YAML/CLI
- visualization
- log message header
- ...

When not given, then the default naming strategy will apply. | `string` | `gateway` || `workspace` | The working directory for any IO operations in this object. If not set, then derive from its parent `workspace`. | `string` | `None` || `log_config` | The config name or the absolute path to the YAML config file of the logger used in this object. | `string` | `default` || `quiet` | If set, then no log will be emitted from this object. | `boolean` | `False` || `quiet_error` | If set, then exception stack information will not be added to the log | `boolean` | `False` || `timeout_ctrl` | The timeout in milliseconds of the control request, -1 for waiting forever | `number` | `60` || `entrypoint` | The entrypoint command overrides the ENTRYPOINT in Docker image. when not set then the Docker image ENTRYPOINT takes effective. | `string` | `None` || `docker_kwargs` | Dictionary of kwargs arguments that will be passed to Docker SDK when starting the docker '
container.

More details can be found in the Docker SDK docs: https://docker-py.readthedocs.io/en/stable/ | `object` | `None` || `prefetch` | Number of requests fetched from the client before feeding into the first Executor.

Used to control the speed of data input into a Flow. 0 disables prefetch (1000 requests is the default) | `number` | `1000` || `title` | The title of this HTTP server. It will be used in automatics docs such as Swagger UI. | `string` | `None` || `description` | The description of this HTTP server. It will be used in automatics docs such as Swagger UI. | `string` | `None` || `cors` | If set, a CORS middleware is added to FastAPI frontend to allow cross-origin access. | `boolean` | `False` || `no_debug_endpoints` | If set, `/status` `/post` endpoints are removed from HTTP interface. | `boolean` | `False` || `no_crud_endpoints` | If set, `/index`, `/search`, `/update`, `/delete` endpoints are removed from HTTP interface.

Any executor that has `@requests(on=...)` bound with those values will receive data requests. | `boolean` | `False` || `expose_endpoints` | A JSON string that represents a map from executor endpoints (`@requests(on=...)`) to HTTP endpoints. | `string` | `None` || `uvicorn_kwargs` | Dictionary of kwargs arguments that will be passed to Uvicorn server when starting the server

More details can be found in Uvicorn docs: https://www.uvicorn.org/settings/ | `object` | `None` || `ssl_certfile` | the path to the certificate file | `string` | `None` || `ssl_keyfile` | the path to the key file | `string` | `None` || `expose_graphql_endpoint` | If set, /graphql endpoint is added to HTTP interface. | `boolean` | `False` || `protocol` | Communication protocol of the server exposed by the Gateway. This can be a single value or a list of protocols, depending on your chosen Gateway. Choose the convenient protocols from: ['GRPC', 'HTTP', 'WEBSOCKET']. | `array` | `[]` || `host` | The host address of the runtime, by default it is 0.0.0.0. | `string` | `0.0.0.0` || `proxy` | If set, respect the http_proxy and https_proxy environment variables. otherwise, it will unset these proxy variables before start. gRPC seems to prefer no proxy | `boolean` | `False` || `uses` | The config of the gateway, it could be one of the followings:
* the string literal of an Gateway class name
* a Gateway YAML file (.yml, .yaml, .jaml)
* a docker image (must start with `docker://`)
* the string literal of a YAML config (must start with `!` or `jtype: `)
* the string literal of a JSON config

When use it under Python, one can use the following values additionally:
- a Python dict that represents the config
- a text file stream has `.read()` interface | `string` | `None` || `uses_with` | Dictionary of keyword arguments that will override the `with` configuration in `uses` | `object` | `None` || `py_modules` | The customized python modules need to be imported before loading the gateway

Note that the recommended way is to only import a single module - a simple python file, if your
gateway can be defined in a single file, or an ``__init__.py`` file if you have multiple files,
which should be structured as a python package. | `array` | `None` || `replicas` | The number of replicas of the Gateway. This replicas will only be applied when converted into Kubernetes YAML | `number` | `1` || `grpc_server_options` | Dictionary of kwargs arguments that will be passed to the grpc server as options when starting the server, example : {'grpc.max_send_message_length': -1} | `object` | `None` || `graph_description` | Routing graph for the gateway | `string` | `{}` || `graph_conditions` | Dictionary stating which filtering conditions each Executor in the graph requires to receive Documents. | `string` | `{}` || `deployments_addresses` | JSON dictionary with the input addresses of each Deployment | `string` | `{}` || `deployments_metadata` | JSON dictionary with the request metadata for each Deployment | `string` | `{}` || `deployments_no_reduce` | list JSON disabling the built-in merging mechanism for each Deployment listed | `string` | `[]` || `compression` | The compression mechanism used when sending requests from the Head to the WorkerRuntimes. For more details, check https://grpc.github.io/grpc/python/grpc.html#compression. | `string` | `None` || `timeout_send` | The timeout in milliseconds used when sending data requests to Executors, -1 means no timeout, disabled by default | `number` | `None` || `runtime_cls` | The runtime class to run inside the Pod | `string` | `GatewayRuntime` || `timeout_ready` | The timeout in milliseconds of a Pod waits for the runtime to be ready, -1 for waiting forever | `number` | `600000` || `env` | The map of environment variables that are available inside runtime | `object` | `None` || `env_from_secret` | The map of environment variables that are read from kubernetes cluster secrets | `object` | `None` || `floating` | If set, the current Pod/Deployment can not be further chained, and the next `.add()` will chain after the last Pod/Deployment not this current one. | `boolean` | `False` || `reload` | If set, the Gateway will restart while serving if YAML configuration source is changed. | `boolean` | `False` || `port` | The port for input data to bind the gateway server to, by default, random ports between range [49152, 65535] will be assigned. The port argument can be either 1 single value in case only 1 protocol is used or multiple values when many protocols are used. | `number` | `random in [49152, 65535]` || `monitoring` | If set, spawn an http server with a prometheus endpoint to expose metrics | `boolean` | `False` || `port_monitoring` | The port on which the prometheus server is exposed, default is a random port between [49152, 65535] | `number` | `random in [49152, 65535]` || `retries` | Number of retries per gRPC call. If <0 it defaults to max(3, num_replicas) | `number` | `-1` || `tracing` | If set, the sdk implementation of the OpenTelemetry tracer will be available and will be enabled for automatic tracing of requests and customer span creation. Otherwise a no-op implementation will be provided. | `boolean` | `False` || `traces_exporter_host` | If tracing is enabled, this hostname will be used to configure the trace exporter agent. | `string` | `None` || `traces_exporter_port` | If tracing is enabled, this port will be used to configure the trace exporter agent. | `number` | `None` || `metrics` | If set, the sdk implementation of the OpenTelemetry metrics will be available for default monitoring and custom measurements. Otherwise a no-op implementation will be provided. | `boolean` | `False` || `metrics_exporter_host` | If tracing is enabled, this hostname will be used to configure the metrics exporter agent. | `string` | `None` || `metrics_exporter_port` | If tracing is enabled, this port will be used to configure the metrics exporter agent. | `number` | `None` || `stateful` | If set, start consensus module to make sure write operations are properly replicated between all the replicas | `boolean` | `False` || `pod_ports` | When using StatefulExecutors, if they want to restart it is important to keep the RAFT cluster configuration | `number` | `None` | --- # Source: https://github.com/jina-ai/jina/blob/master/docs/concepts/orchestration/handle-exceptions.md(flow-error-handling)=# Handle ExceptionsWhen building a complex solution, things sometimes go wrong. Jina-serve does its best to recover from failures, handle them gracefully, and report useful failure information to the user.The following outlines (more or less) common failure cases, and explains how Jina-serve responds to each.## Executor errorsIn general there are two places where an Executor level error can happen:- If an {class}`~jina.Executor`'s `__init__` method raises an Exception, the Orchestration cannot start. In this case this Executor runtime raises the Exception, and the Orchestration throws a `RuntimeFailToStart` Exception.- If one of the Executor's `@requests` methods raises an Exception, the error message is added to the response and sent back to the client. If the gRPC or WebSockets protocols are used, the networking stream is not interrupted and can accept further requests.In both cases, the {ref}`Jina Client ` raises an Exception.### Terminate an Executor on certain errorsSome exceptions like network errors or request timeouts can be transient and can recover automatically. Sometimes fatal errors or user-defined errors put the Executor in an unusable state, in which case it can be restarted. Locally the Orchestration must be re-run manually to restore Executor availability.On Kubernetes deployments, this can be automated by terminating the Executor process, causing the Pod to terminate. The autoscaler restores availability by creating a new Pod to replace the terminated one. Termination can be enabled for one or more errors by using the `exit_on_exceptions` argument when adding the Executor to an Orchestration When it matches the caught exception, the Executor terminates gracefully.A sample Orchestration can be `Deployment(uses=MyExecutor, exit_on_exceptions: ['Exception', 'RuntimeException'])`. The `exit_on_exceptions` argument accepts a list of Python or user-defined Exception or Error class names.## Network errorsWhen an Orchestration Gateway can't reach an {ref}`Executor or Head `, the Orchestration attempts to re-connect to the faulty deployment according to a retry policy. The same applies to calls to Executors that time out. The specifics of this policy depend on the Orchestration's environment, as outlined below.````{admonition} Hint: Prevent Executor timeouts:class: hintIf you regularly experience Executor call timeouts, set the Orchestration's `timeout_send` attribute to a larger valueby setting `Deployment(timeout_send=time_in_ms)` or `Flow(timeout_send=time_in_ms)` in Pythonor `timeout_send: time_in_ms` in your Orchestration YAML with-block.Neural network forward passes on CPU (and other unusually expensive operations) commonly lead to timeouts with the default setting.````````{admonition} Hint: Custom retry policy:class: hintYou can override the default retry policy and instead choose a number of retries performed for each Executorwith `Orchestration(retries=n)` in Python, or `retries: n` in the OrchestrationYAML `with` block.````If, during the complete execution of this policy, no successful call to any Executor replica can be made, the request is aborted and the failure is {ref}`reported to the client `.### Request retries: Local deploymentIf an Orchestration is deployed locally (with or without {ref}`containerized Executors `), the following policy for failed requests applies on a per-Executor basis:- If there are multiple replicas of the target Executor, try each replica at least once, or until the request succeeds.- Irrespective of the number of replicas, try the request at least three times, or until it succeeds. If there are fewer than three replicas, try them in a round-robin fashion.### Request retries: Deployment with KubernetesIf an Orchestration is {ref}`deployed in Kubernetes ` without a service mesh, retries cannot be distributed to different replicas of the same Executor.````{admonition} See Also:class: seealsoThe impossibility of retries across different replicas is a limitation of Kubernetes in combination with gRPC.If you want to learn more about this limitation, see [this](https://kubernetes.io/blog/2018/11/07/grpc-load-balancing-on-kubernetes-without-tears/) Kubernetes blog post.An easy way to overcome this limitation is to use a service mesh like [Linkerd](https://linkerd.io/).````Concretely, this results in the following per-Executor retry policy:- Try the request three times, or until it succeeds, always on the same replica of the Executor### Request retries: Deployment with Kubernetes and service meshA Kubernetes service mesh can enable load balancing, and thus retries, between an Executor's replicas.````{admonition} Hint:class: hintWhile Jina supports any service mesh, the output of `f.to_kubernetes_yaml()` already includes the necessary annotations for [Linkerd](https://linkerd.io/).````If a service mesh is installed alongside Jina-serve in the Kubernetes cluster, the following retry policy applies for each Executor:- Try the request at least three times, or until it succeeds- Distribute the requests to the replicas according to the service mesh's configuration````{admonition} Caution:class: cautionMany service meshes have the ability to perform retries themselves. Be careful about setting up service mesh level retries in combination with Jina, as it may lead to unwanted behaviour in combination with Jina's own retry policy.Instead, you may want to disable Jina level retries by setting `Orchestration(retries=0)` or `Deployment(retries=0)` in Python, or `retries: 0` in the Orchestration YAML `with` block.````(failure-reporting)=### Failure reportingIf the retry policy is exhausted for a given request, the error is reported back to the corresponding client.The resulting error message contains the *network address* of the failing Executor. If multiple replicas are present, all addresses are reported - unless the Orchestration is deployed using Kubernetes, in which case the replicas are managed by Kubernetes and only a single address is available.Depending on the client-to-gateway protocol, and the type of error, the error message is returned in one of the following ways:**Could not connect to Executor:**- **gRPC**: A response with the gRPC status code 14 (*UNAVAILABLE*) is issued, and the error message is contained in the `details` field.- **HTTP**: A response with the HTTP status code 503 (*SERVICE_UNAVAILABLE*) is issued, and the error message is contained in `response['header']['status']['description']`.- **WebSockets**: The stream closes with close code 1011 (*INTERNAL_ERROR*) and the message is contained in the WebSocket close message.**Call to Executor timed out:**- **gRPC**: A response with the gRPC status code 4 (*DEADLINE_EXCEEDED*) is issued, and the error message is contained in the `details` field.- **HTTP**: A response with the HTTP status code 504 (*GATEWAY_TIMEOUT*) is issued, and the error message is contained in `response['header']['status']['description']`.- **WebSockets**: The stream closes with close code 1011 (*INTERNAL_ERROR*) and the message is contained in the WebSockets close message.For any of these scenarios, the {ref}`Jina Client ` raises a `ConnectionError` containing the error message.## Debug via breakpointStandard Python breakpoints don't work inside `Executor` methods when called inside an Orchestration context manager. Nevertheless, `import epdb; epdb.set_trace()` works just like a native Python breakpoint. Note that you need to `pip install epdb` to access this type of breakpoints.```{admonition} Debugging in Flows:class: infoThe below code is for Deployments, but can easily be adapted for Flows.```````{tab} ✅ Do```{code-block} python---emphasize-lines: 7---from jina import Deployment, Executor, requestsclass CustomExecutor(Executor): @requests def foo(self, **kwargs): a = 25 import epdb; epdb.set_trace() print(f'\n\na={a}\n\n')def main(): dep = Deployment(uses=CustomExecutor) with dep: dep.post(on='')if __name__ == '__main__': main()```````````{tab} 😔 Don't```{code-block} python---emphasize-lines: 7---from jina import Deployment, Executor, requestsclass CustomExecutor(Executor): @requests def foo(self, **kwargs): a = 25 breakpoint() print(f'\n\na={a}\n\n')def main(): dep = Deployment(uses=CustomExecutor) with dep: dep.post(on='')if __name__ == '__main__': main()``````` --- # Source: https://github.com/jina-ai/jina/blob/master/docs/concepts/orchestration/health-check.md# Health CheckOnce an Orchestration is running, you can use `jina ping` [CLI](../../api/jina_cli.rst) to run a health check of the complete Orchestration or (in the case of a Flow) individual Executors or Gateway.````{tab} DeploymentStart a Deployment in Python:```pythonfrom jina import Deploymentdep = Deployment(protocol='grpc', port=12345)with dep: dep.block()```Check the readiness of the Deployment:```bashjina ping deployment grpc://localhost:12345```````````{tab} FlowStart a Flow in Python:```pythonfrom jina import Flowf = Flow(protocol='grpc', port=12345).add(port=12346)with f: f.block()```Check the readiness of the Flow:```bashjina ping flow grpc://localhost:12345```You can also check the readiness of an individual Executor:```bashjina ping executor localhost:12346```...or the readiness of the Gateway service:```bashjina ping gateway grpc://localhost:12345```````When these commands succeed, you should see something like:```textINFO JINA@28600 readiness check succeeded 1 times!!!``````{admonition} Use in Kubernetes:class: noteThe CLI exits with code 1 when the readiness check is not successful, which makes it a good choice to be used as readinessProbe for Executor and Gateway whendeployed in Kubernetes.``` --- # Source: https://github.com/jina-ai/jina/blob/master/docs/concepts/orchestration/hot-reload.md# Hot ReloadWhile developing your Orchestration, you may want it to reload automatically as you change the YAML configuration.For this you can use the Orchestration's `reload` argument to reload it with the updated configuration every time you change the YAML configuration.````{admonition} Caution:class: cautionThis feature aims to let developers iterate faster while developing, but is not intended for production use.````````{admonition} Note:class: noteThis feature requires `watchfiles>=0.18` to be installed.````````{tab} DeploymentTo see how this works, let's define a Deployment in `deployment.yml` with a `reload` option:```yamljtype: Deploymentuses: ConcatenateTextExecutoruses_with: text_to_concat: foowith: port: 12345 reload: True```Load and expose the Orchestration:```pythonimport osfrom jina import Deployment, Executor, requestsfrom docarray import DocListfrom docarray.documents import TextDocclass ConcatenateTextExecutor(Executor): @requests def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]: for doc in docs: doc.text += text_to_concat return docsos.environ['JINA_LOG_LEVEL'] = 'DEBUG'dep = Deployment.load_config('deployment.yml')with dep: dep.block()```You can see that the Orchestration is running and serving:```pythonfrom jina import Clientfrom docarray import DocListfrom docarray.documents import TextDocc = Client(port=12345)print(c.post(on='/', inputs=DocList[TextDoc](TextDoc()), return_type=DocList[TextDoc])[0].text)``````textfoo```You can edit the Orchestration YAML file and save the changes:```yamljtype: Deploymentuses: ConcatenateTextExecutoruses_with: text_to_concat: barwith: port: 12345 reload: True```You should see the following in the Orchestration's logs:```textINFO Deployment@28301 change in Deployment YAML deployment.yml observed, restarting Deployment```After this, the behavior of the Deployment's Executor will change:```pythonfrom jina import Clientfrom docarray import DocListfrom docarray.documents import TextDocc = Client(port=12345)print(c.post(on='/', inputs=DocList[TextDoc](TextDoc()), return_type=DocList[TextDoc])[0].text)``````textbar```````````{tab} FlowTo see how this works, let's define a Flow in `flow.yml` with a `reload` option:```yamljtype: Flowwith: port: 12345 reload: Trueexecutors:- name: exec1 uses: ConcatenateTextExecutor```Load and expose the Orchestration:```pythonimport osfrom jina import Deployment, Executor, requestsfrom docarray import DocListfrom docarray.documents import TextDocclass ConcatenateTextExecutor(Executor): @requests def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]: for doc in docs: doc.text += text_to_concat return docsos.environ['JINA_LOG_LEVEL'] = 'DEBUG'f = Flow.load_config('flow.yml')with f: f.block()```You can see that the Flow is running and serving:```pythonfrom jina import Clientfrom docarray import DocListfrom docarray.documents import TextDocc = Client(port=12345)print(c.post(on='/', inputs=DocList[TextDoc](TextDoc()), return_type=DocList[TextDoc])[0].text)``````textadd text```You can edit the Flow YAML file and save the changes:```yamljtype: Flowwith: port: 12345 reload: Trueexecutors:- name: exec1 uses: ConcatenateTextExecutor- name: exec2 uses: ConcatenateTextExecutor```You should see the following in the Flow's logs:```textINFO Flow@28301 change in Flow YAML flow.yml observed, restarting Flow```After this, the Flow will have two Executors with the new topology:```pythonfrom jina import Clientfrom docarray import DocListfrom docarray.documents import TextDocc = Client(port=12345)print(c.post(on='/', inputs=DocList[TextDoc](TextDoc()), return_type=DocList[TextDoc])[0].text)``````textadd text add text``` --- # Source: https://github.com/jina-ai/jina/blob/master/docs/concepts/orchestration/index.md(orchestration)=# {fas}`network-wired` OrchestrationAs seen in the {ref}`architecture overview `, Jina-serve is organized in different layers.The Orchestration layer is composed of concepts that let you orchestrate, serve and scale your Executors with ease.Two objects belong to this family:- A single Executor ({class}`~Deployment`), ideal for serving a single model or microservice.- A pipeline of Executors ({class}`~Flow`), ideal for more complex operations where Documents need to be processed in multiple ways.Both Deployment and Flow share similar syntax and behavior. The main differences are:- Deployments orchestrate a single Executor, while Flows orchestrate multiple Executors connected into a pipeline.- Flows have a {ref}`Gateway `, while Deployments do not.```{toctree}:hidden:deploymentflowadd-executorsscale-outhot-reloadhandle-exceptionsreadinesshealth-checkinstrumentationtroubleshooting-on-multiprocessyaml-spec``` --- # Source: https://github.com/jina-ai/jina/blob/master/docs/concepts/orchestration/instrumentation.md(instrumenting-flow)=# InstrumentationA {class}`~jina.Flow` exposes configuration parameters for leveraging [OpenTelemetry](https://opentelemetry.io) Tracing and Metrics observability features. These tools let you instrument and collect various signals which help to analyze your application's real-time behavior.A {class}`~jina.Flow` is composed of several Pods, namely the {class}`~jina.serve.runtimes.gateway.GatewayRuntime`, {class}`~jina.Executor`s, and potentially a {class}`~jina.serve.runtimes.head.HeadRuntime` (see the {ref}`architecture overview `). Each Pod is its own microservice. These services expose their own metrics using the Python [OpenTelemetry API and SDK](https://opentelemetry-python.readthedocs.io/en/stable/api/trace.html).Tracing and Metrics can be enabled and configured independently to allow more flexibility in the data collection and visualization setup.```{hint}:class: seealsoRefer to {ref}`OpenTelemetry Setup ` for a full detail on the OpenTelemetry data collection and visualization setup.``````{caution}Prometheus-only based metrics collection will soon be deprecated. Refer to {ref}`Prometheus/Grafana Support ` section for the deprecated setup.```## Tracing````{tab} Python```pythonfrom jina import Flowf = Flow( tracing=True, traces_exporter_host='http://localhost', traces_exporter_port=4317,).add(uses='jinaai://jina-ai/SimpleIndexer')with f: f.block()```````````{tab} YAMLIn `flow.yaml`:```yamljtype: Flowwith: tracing: true tracing_exporter_host: 'localhost' tracing_exporter_port: 4317executors:- uses: jinaai://jina-ai/SimpleIndexer``````bashjina flow --uses flow.yaml```````This Flow creates two Pods: one for the Gateway, and one for the SimpleIndexer Executor. The Flow propagates the Tracing configuration to each Pod so you don't need to duplicate the arguments on each Executor.The `traces_exporter_host` and `traces_exporter_port` arguments configure the traces [exporter](https://opentelemetry.io/docs/instrumentation/python/exporters/#trace-1) which are responsible for pushing collected data to the [collector](https://opentelemetry.io/docs/collector/) backend.```{hint}:class: seealsoRefer to {ref}`OpenTelemetry Setup ` for more details on exporter and collector setup and usage.```### Available TracesEach Pod supports different default traces out of the box, and also lets you define your own custom traces in the Executor. The `Runtime` name is used to create the OpenTelemetry [Service](https://opentelemetry.io/docs/reference/specification/resource/semantic_conventions/#service) [Resource](https://opentelemetry.io/docs/reference/specification/resource/) attribute. The default value for the `name` argument is the `Runtime` or `Executor` class name.Because not all Pods have the same role, they expose different kinds of traces:#### Gateway Pods| Operation name | Description ||-------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------|| `/jina.JinaRPC/Call` | Traces the request from the client to the Gateway server. || `/jina.JinaSingleDataRequestRPC/process_single_data` | Internal operation for the request originating from the Gateway to the target Head or Executor. |#### Head Pods| Operation name | Description ||-------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------|| `/jina.JinaSingleDataRequestRPC/process_single_data` | Internal operation for the request originating from the Gateway to the target Head. Another child span is created for the request originating from the Head to the Executor.|#### Executor Pods| Operation name | Description ||-------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------|| `/jina.JinaSingleDataRequestRPC/process_single_data` | Executor server operation for the request originating from the Gateway/Head to the Executor request handler. || `/endpoint` | Internal operation for the request originating from the Executor request handler to the target `@requests(=/endpoint)` method. The `endpoint` will be `default` if no endpoint name is provided. |```{seealso}Beyond the above-mentioned default traces, you can define {ref}`custom traces ` for your Executor.```## Metrics```{hint}Prometheus-only based metrics collection will soon be deprecated. Refer to {ref}`Prometheus/Grafana Support ` section for the deprecated setup.```````{tab} Python```pythonfrom jina import Flowf = Flow( metrics=True, metrics_exporter_host='http://localhost', metrics_exporter_port=4317,).add(uses='jinaai://jina-ai/SimpleIndexer')with f: f.block()```````````{tab} YAMLIn `flow.yaml`:```yamljtype: Flowwith: metrics: true metrics_exporter_host: 'localhost' metrics_exporter_port: 4317executors:- uses: jinaai://jina-ai/SimpleIndexer``````bashjina flow --uses flow.yaml```````The Flow propagates the Metrics configuration to each Pod. The `metrics_exporter_host` and `metrics_exporter_port` arguments configure the metrics [exporter](https://opentelemetry.io/docs/instrumentation/python/exporters/#metrics-1) responsible for pushing collected data to the [collector](https://opentelemetry.io/docs/collector/) backend.```{hint}:class: seealsoRefer to {ref}`OpenTelemetry Setup ` for more details on the exporter and collector setup and usage.```### Available metricsEach Pod supports different default metrics out of the box, also letting you define your own custom metrics in the Executor. All metrics add the `Runtime` name to the [metric attributes](https://opentelemetry.io/docs/reference/specification/metrics/semantic_conventions/) which can be used to filter data from different Pods.Because not all Pods have the same role, they expose different kinds of metrics:#### Gateway Pods| Metrics name | Metrics type | Description ||-------------------------------------|------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------|| `jina_receiving_request_seconds` | [Histogram](https://opentelemetry.io/docs/reference/specification/metrics/api/#histogram) | Measures time elapsed between receiving a request from the client and sending back the response. || `jina_sending_request_seconds` | [Histogram](https://opentelemetry.io/docs/reference/specification/metrics/api/#histogram) | Measures time elapsed between sending a downstream request to an Executor/Head and receiving the response back. || `jina_number_of_pending_requests` | [UpDownCounter](https://opentelemetry.io/docs/reference/specification/metrics/api/#updowncounter) | Counts the number of pending requests. || `jina_successful_requests` | [Counter](https://opentelemetry.io/docs/reference/specification/metrics/api/#counter) | Counts the number of successful requests returned by the Gateway. || `jina_failed_requests` | [Counter](https://opentelemetry.io/docs/reference/specification/metrics/api/#counter) | Counts the number of failed requests returned by the Gateway. || `jina_sent_request_bytes` | [Histogram](https://opentelemetry.io/docs/reference/specification/metrics/api/#histogram) | Measures the size in bytes of the request sent by the Gateway to the Executor or to the Head. || `jina_received_response_bytes` | [Histogram](https://opentelemetry.io/docs/reference/specification/metrics/api/#histogram) | Measures the size in bytes of the request returned by the Executor. || `jina_received_request_bytes` | [Histogram](https://opentelemetry.io/docs/reference/specification/metrics/api/#histogram) | Measures the size of the request in bytes received at the Gateway level. || `jina_sent_response_bytes` | [Histogram](https://opentelemetry.io/docs/reference/specification/metrics/api/#histogram) | Measures the size in bytes of the response returned from the Gateway to the Client. |```{seealso}You can find more information on the different type of metrics in Prometheus [here](https://prometheus.io/docs/concepts/metric_types/#metric-types)```#### Head Pods| Metric name | Metric type | Description ||-----------------------------------------|-------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------|| `jina_receiving_request_seconds` | [Histogram](https://opentelemetry.io/docs/reference/specification/metrics/api/#histogram) | Measures the time elapsed between receiving a request from the Gateway and sending back the response. || `jina_sending_request_seconds` | [Histogram](https://opentelemetry.io/docs/reference/specification/metrics/api/#histogram) | Measures the time elapsed between sending a downstream request to an Executor and receiving the response back. || `jina_number_of_pending_requests` | [UpDownCounter](https://opentelemetry.io/docs/reference/specification/metrics/api/#updowncounter)| Counts the number of pending requests. || `jina_successful_requests` | [Counter](https://opentelemetry.io/docs/reference/specification/metrics/api/#counter) | Counts the number of successful requests returned by the Head. || `jina_failed_requests` | [Counter](https://opentelemetry.io/docs/reference/specification/metrics/api/#counter) | Counts the number of failed requests returned by the Head. || `jina_sent_request_bytes` | [Histogram](https://opentelemetry.io/docs/reference/specification/metrics/api/#histogram) | Measures the size in bytes of the request sent by the Head to the Executor. || `jina_received_response_bytes` | [Histogram](https://opentelemetry.io/docs/reference/specification/metrics/api/#histogram) | Measures the size in bytes of the response returned by the Executor. || `jina_received_request_bytes` | [Histogram](https://opentelemetry.io/docs/reference/specification/metrics/api/#histogram) | Measures the size of the request in bytes received at the Head level. || `jina_sent_response_bytes` | [Histogram](https://opentelemetry.io/docs/reference/specification/metrics/api/#histogram) | Measures the size in bytes of the response returned from the Head to the Gateway. |#### Executor PodsThe Executor also adds the Executor class name and the request endpoint for the `@requests` or `@monitor` decorated method level metrics:| Metric name | Metric type | Description ||----------------------------------|----------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------|| `jina_receiving_request_seconds` | [Histogram](https://opentelemetry.io/docs/reference/specification/metrics/api/#histogram) | Measures the time elapsed between receiving a request from the Gateway (or the head) and sending back the response. || `jina_process_request_seconds` | [Histogram](https://opentelemetry.io/docs/reference/specification/metrics/api/#histogram) | Measures the time spend calling the requested method || `jina_document_processed` | [Counter](https://opentelemetry.io/docs/reference/specification/metrics/api/#counter) | Counts the number of Documents processed by an Executor || `jina_successful_requests` | [Counter](https://opentelemetry.io/docs/reference/specification/metrics/api/#counter) | Total count of successful requests returned by the Executor across all endpoints || `jina_failed_requests` | [Counter](https://opentelemetry.io/docs/reference/specification/metrics/api/#counter) | Total count of failed requests returned by the Executor across all endpoints || `jina_received_request_bytes` | [Histogram](https://opentelemetry.io/docs/reference/specification/metrics/api/#histogram) | Measures the size in bytes of the request received at the Executor level || `jina_sent_response_bytes` | [Histogram](https://opentelemetry.io/docs/reference/specification/metrics/api/#histogram) | Measures the size in bytes of the response returned from the Executor to the Gateway |```{seealso}Beyond the default metrics outlined above, you can also define {ref}`custom metrics ` for your Executor.``````{hint}`jina_process_request_seconds` and `jina_receiving_request_seconds` are different: - `jina_process_request_seconds` only tracks time spent calling the function. - `jina_receiving_request_seconds` tracks time spent calling the function **and** the gRPC communication overhead.```## See also- {ref}`Defining custom traces and metrics in an Executor `- {ref}`How to deploy and use OpenTelemetry in Jina-serve `- [Tracing in OpenTelemetry](https://opentelemetry.io/docs/concepts/signals/traces/)- [Metrics in OpenTelemetry](https://opentelemetry.io/docs/concepts/signals/metrics/) --- # Source: https://github.com/jina-ai/jina/blob/master/docs/concepts/orchestration/readiness.md# ReadinessAn Orchestration is marked as "ready", when:- Its Executor is fully loaded and ready (in the case of a Deployment)- All its Executors and Gateway are fully loaded and ready (in the case of a Flow)After that, an Orchestration is able to process requests.{class}`~jina.Client` offers an API to query these readiness endpoints. You can do this via the Orchestration directly, via the Client, or via the CLI: You can call {meth}`~jina.clients.mixin.HealthCheckMixin.is_flow_ready` or {meth}`~jina.Flow.is_flow_ready`. It returns `True` if the Flow is ready, and `False` if it is not.## Via Orchestration````{tab} Deployment```pythonfrom jina import Deploymentdep = Deployment()with dep: print(dep.is_deployment_ready())print(dep.is_deployment_ready())``````textTrueFalse```````````{tab} Flow```pythonfrom jina import Flowf = Flow.add()with f: print(f.is_flow_ready())print(f.is_flow_ready())``````textTrueFalse```````## Via Jina-serve ClientYou can check the readiness from the client:````{tab} Deployment```pythonfrom jina import Deploymentdep = Deployment(port=12345)with dep: dep.block()``````pythonfrom jina import Clientclient = Client(port=12345)print(client.is_deployment_ready())``````textTrue```````````{tab} Flow```pythonfrom jina import Flowf = Flow(port=12345).add()with f: f.block()``````pythonfrom jina import Clientclient = Client(port=12345)print(client.is_flow_ready())``````textTrue```````### Via CLI`````{tab} Deployment```pythonfrom jina import Deploymentdep = Deployment(port=12345)with dep: dep.block()``````bashjina-serve ping executor grpc://localhost:12345```````{tab} Success```textINFO Jina-serve@92877 ping grpc://localhost:12345 at 0 round... [09/08/22 12:58:13]INFO Jina-serve@92877 ping grpc://localhost:12345 at 0 round takes 0 seconds (0.04s)INFO Jina-serve@92877 ping grpc://localhost:12345 at 1 round... [09/08/22 12:58:14]INFO Jina-serve@92877 ping grpc://localhost:12345 at 1 round takes 0 seconds (0.01s)INFO Jina-serve@92877 ping grpc://localhost:12345 at 2 round... [09/08/22 12:58:15]INFO Jina-serve@92877 ping grpc://localhost:12345 at 2 round takes 0 seconds (0.01s)INFO Jina-serve@92877 avg. latency: 24 ms [09/08/22 12:58:16]```````````{tab} Failure```textINFO Jina-serve@92986 ping grpc://localhost:12345 at 0 round... [09/08/22 12:59:00]ERROR GRPCClient@92986 Error while getting response from grpc server WARNI… Jina-serve@92986 not responding, retry (1/3) in 1sINFO Jina-serve@92986 ping grpc://localhost:12345 at 0 round takes 0 seconds (0.01s)INFO Jina-serve@92986 ping grpc://localhost:12345 at 1 round... [09/08/22 12:59:01]ERROR GRPCClient@92986 Error while getting response from grpc server WARNI… Jina-serve@92986 not responding, retry (2/3) in 1sINFO Jina-serve@92986 ping grpc://localhost:12345 at 1 round takes 0 seconds (0.01s)INFO Jina-serve@92986 ping grpc://localhost:12345 at 2 round... [09/08/22 12:59:02]ERROR GRPCClient@92986 Error while getting response from grpc server WARNI… Jina-serve@92986 not responding, retry (3/3) in 1sINFO Jina-serve@92986 ping grpc://localhost:12345 at 2 round takes 0 seconds (0.02s)WARNI… Jina-serve@92986 message lost 100% (3/3)`````````````````{tab} Flow```pythonfrom jina import Flowf = Flow(port=12345)with f: f.block()``````bashjina-serve ping flow grpc://localhost:12345```````{tab} Success```textINFO Jina-serve@92877 ping grpc://localhost:12345 at 0 round... [09/08/22 12:58:13]INFO Jina-serve@92877 ping grpc://localhost:12345 at 0 round takes 0 seconds (0.04s)INFO Jina-serve@92877 ping grpc://localhost:12345 at 1 round... [09/08/22 12:58:14]INFO Jina-serve@92877 ping grpc://localhost:12345 at 1 round takes 0 seconds (0.01s)INFO Jina-serve@92877 ping grpc://localhost:12345 at 2 round... [09/08/22 12:58:15]INFO Jina-serve@92877 ping grpc://localhost:12345 at 2 round takes 0 seconds (0.01s)INFO Jina-serve@92877 avg. latency: 24 ms [09/08/22 12:58:16]```````````{tab} Failure```textINFO Jina-serve@92986 ping grpc://localhost:12345 at 0 round... [09/08/22 12:59:00]ERROR GRPCClient@92986 Error while getting response from grpc server WARNI… Jina-serve@92986 not responding, retry (1/3) in 1sINFO Jina-serve@92986 ping grpc://localhost:12345 at 0 round takes 0 seconds (0.01s)INFO Jina-serve@92986 ping grpc://localhost:12345 at 1 round... [09/08/22 12:59:01]ERROR GRPCClient@92986 Error while getting response from grpc server WARNI… Jina-serve@92986 not responding, retry (2/3) in 1sINFO Jina-serve@92986 ping grpc://localhost:12345 at 1 round takes 0 seconds (0.01s)INFO Jina-serve@92986 ping grpc://localhost:12345 at 2 round... [09/08/22 12:59:02]ERROR GRPCClient@92986 Error while getting response from grpc server WARNI… Jina-serve@92986 not responding, retry (3/3) in 1sINFO Jina-serve@92986 ping grpc://localhost:12345 at 2 round takes 0 seconds (0.02s)WARNI… Jina-serve@92986 message lost 100% (3/3)````````````## Readiness check via third-party clientsYou can check the status of a Flow using any gRPC/HTTP/WebSockets client, not just via Jina-serve Client.To see how this works, first instantiate the Flow with its corresponding protocol and block it for serving:````{tab} Deployment```pythonfrom jina import Deploymentimport osPROTOCOL = 'grpc' # it could also be http or websocketos.environ[ 'JINA_LOG_LEVEL'] = 'DEBUG' # this way we can check what is the PID of the Executordep = Deployment(protocol=PROTOCOL, port=12345)with dep: dep.block()``````text⠋ Waiting ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/0 -:--:--DEBUG gateway/rep-0@19075 adding connection for deployment executor0/heads/0 to grpc://0.0.0.0:12346 [05/31/22 18:10:16]DEBUG executor0/rep-0@19074 start listening on 0.0.0.0:12346 [05/31/22 18:10:16]DEBUG gateway/rep-0@19075 start server bound to 0.0.0.0:12345 [05/31/22 18:10:17]DEBUG executor0/rep-0@19059 ready and listening [05/31/22 18:10:17]DEBUG gateway/rep-0@19059 ready and listening [05/31/22 18:10:17]╭─── 🎉 Deployment is ready to serve! ───╮│ 🔗 Protocol GRPC ││ 🏠 Local 0.0.0.0:12345 ││ 🔒 Private 192.168.1.13:12345 │╰────────────────────────────────────────╯DEBUG Deployment@19059 2 Deployments (i.e. 2 Pods) are running in this Deployment```````````{tab} Flow```pythonfrom jina import Flowimport osPROTOCOL = 'grpc' # it could also be http or websocketos.environ[ 'JINA_LOG_LEVEL'] = 'DEBUG' # this way we can check what is the PID of the Executorf = Flow(protocol=PROTOCOL, port=12345).add()with f: f.block()``````text⠋ Waiting ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/0 -:--:--DEBUG gateway/rep-0@19075 adding connection for deployment executor0/heads/0 to grpc://0.0.0.0:12346 [05/31/22 18:10:16]DEBUG executor0/rep-0@19074 start listening on 0.0.0.0:12346 [05/31/22 18:10:16]DEBUG gateway/rep-0@19075 start server bound to 0.0.0.0:12345 [05/31/22 18:10:17]DEBUG executor0/rep-0@19059 ready and listening [05/31/22 18:10:17]DEBUG gateway/rep-0@19059 ready and listening [05/31/22 18:10:17]╭────── 🎉 Flow is ready to serve! ──────╮│ 🔗 Protocol GRPC ││ 🏠 Local 0.0.0.0:12345 ││ 🔒 Private 192.168.1.13:12345 │╰────────────────────────────────────────╯DEBUG Flow@19059 2 Deployments (i.e. 2 Pods) are running in this Flow```````### Using gRPCWhen using grpc, use [grpcurl](https://github.com/fullstorydev/grpcurl) to access the Gateway's gRPC service that is responsible for reporting the Orchestration status.```shelldocker pull fullstorydev/grpcurl:latestdocker run --network='host' fullstorydev/grpcurl -plaintext 127.0.0.1:12345 jina.JinaGatewayDryRunRPC/dry_run```The error-free output below signifies a correctly running Orchestration:```json{}```You can simulate an Executor going offline by killing its process.```shell scriptkill -9 $EXECUTOR_PID # in this case we can see in the logs that it is 19059```Then by doing the same check, you can see that it returns an error:```shelldocker run --network='host' fullstorydev/grpcurl -plaintext 127.0.0.1:12345 jina.JinaGatewayDryRunRPC/dry_run```````{dropdown} Error output```json{ "code": "ERROR", "description": "failed to connect to all addresses |Gateway: Communication error with deployment at address(es) 0.0.0.0:12346. Head or worker(s) may be down.", "exception": { "name": "InternalNetworkError", "args": [ "failed to connect to all addresses |Gateway: Communication error with deployment at address(es) 0.0.0.0:12346. Head or worker(s) may be down." ], "stacks": [ "Traceback (most recent call last):\n", " File \"/home/joan/jina/jina/jina/serve/networking.py\", line 750, in task_wrapper\n timeout=timeout,\n", " File \"/home/joan/jina/jina/jina/serve/networking.py\", line 197, in send_discover_endpoint\n await self._init_stubs()\n", " File \"/home/joan/jina/jina/jina/serve/networking.py\", line 174, in _init_stubs\n self.channel\n", " File \"/home/joan/jina/jina/jina/serve/networking.py\", line 1001, in get_available_services\n async for res in response:\n", " File \"/home/joan/.local/lib/python3.7/site-packages/grpc/aio/_call.py\", line 326, in _fetch_stream_responses\n await self._raise_for_status()\n", " File \"/home/joan/.local/lib/python3.7/site-packages/grpc/aio/_call.py\", line 237, in _raise_for_status\n self._cython_call.status())\n", "grpc.aio._call.AioRpcError: \u003cAioRpcError of RPC that terminated with:\n\tstatus = StatusCode.UNAVAILABLE\n\tdetails = \"failed to connect to all addresses\"\n\tdebug_error_string = \"{\"created\":\"@1654012804.794351252\",\"description\":\"Failed to pick subchannel\",\"file\":\"src/core/ext/filters/client_channel/client_channel.cc\",\"file_line\":3134,\"referenced_errors\":[{\"created\":\"@1654012804.794350006\",\"description\":\"failed to connect to all addresses\",\"file\":\"src/core/lib/transport/error_utils.cc\",\"file_line\":163,\"grpc_status\":14}]}\"\n\u003e\n", "\nDuring handling of the above exception, another exception occurred:\n\n", "Traceback (most recent call last):\n", " File \"/home/joan/jina/jina/jina/serve/runtimes/gateway/grpc/__init__.py\", line 155, in dry_run\n async for _ in self.streamer.stream(request_iterator=req_iterator):\n", " File \"/home/joan/jina/jina/jina/serve/stream/__init__.py\", line 78, in stream\n async for response in async_iter:\n", " File \"/home/joan/jina/jina/jina/serve/stream/__init__.py\", line 154, in _stream_requests\n response = self._result_handler(future.result())\n", " File \"/home/joan/jina/jina/jina/serve/runtimes/gateway/request_handling.py\", line 146, in _process_results_at_end_gateway\n await asyncio.gather(gather_endpoints(request_graph))\n", " File \"/home/joan/jina/jina/jina/serve/runtimes/gateway/request_handling.py\", line 88, in gather_endpoints\n raise err\n", " File \"/home/joan/jina/jina/jina/serve/runtimes/gateway/request_handling.py\", line 80, in gather_endpoints\n endpoints = await asyncio.gather(*tasks_to_get_endpoints)\n", " File \"/home/joan/jina/jina/jina/serve/networking.py\", line 754, in task_wrapper\n e=e, retry_i=i, dest_addr=connection.address\n", " File \"/home/joan/jina/jina/jina/serve/networking.py\", line 697, in _handle_aiorpcerror\n details=e.details(),\n", "jina.excepts.InternalNetworkError: failed to connect to all addresses |Gateway: Communication error with deployment at address(es) 0.0.0.0:12346. Head or worker(s) may be down.\n" ] }}```````### Using HTTP or WebSocketsWhen using HTTP or WebSockets as the Gateway protocol, use curl to target the `/dry_run` endpoint and get the status of the Flow.```shellcurl http://localhost:12345/dry_run```Error-free output signifies a correctly running Flow:```json{"code":0,"description":"","exception":null}```You can simulate an Executor going offline by killing its process:```shell scriptkill -9 $EXECUTOR_PID # in this case we can see in the logs that it is 19059```Then by doing the same check, you can see that the call returns an error:```json{"code":1,"description":"failed to connect to all addresses |Gateway: Communication error with deployment executor0 at address(es) {'0.0.0.0:12346'}. Head or worker(s) may be down.","exception":{"name":"InternalNetworkError","args":["failed to connect to all addresses |Gateway: Communication error with deployment executor0 at address(es) {'0.0.0.0:12346'}. Head or worker(s) may be down."],"stacks":["Traceback (most recent call last):\n"," File \"/home/joan/jina/jina/jina/serve/networking.py\", line 726, in task_wrapper\n timeout=timeout,\n"," File \"/home/joan/jina/jina/jina/serve/networking.py\", line 241, in send_requests\n await call_result,\n"," File \"/home/joan/.local/lib/python3.7/site-packages/grpc/aio/_call.py\", line 291, in __await__\n self._cython_call._status)\n","grpc.aio._call.AioRpcError: \n","\nDuring handling of the above exception, another exception occurred:\n\n","Traceback (most recent call last):\n"," File \"/home/joan/jina/jina/jina/serve/runtimes/gateway/http/app.py\", line 142, in _flow_health\n data_type=DataInputType.DOCUMENT,\n"," File \"/home/joan/jina/jina/jina/serve/runtimes/gateway/http/app.py\", line 399, in _get_singleton_result\n async for k in streamer.stream(request_iterator=request_iterator):\n"," File \"/home/joan/jina/jina/jina/serve/stream/__init__.py\", line 78, in stream\n async for response in async_iter:\n"," File \"/home/joan/jina/jina/jina/serve/stream/__init__.py\", line 154, in _stream_requests\n response = self._result_handler(future.result())\n"," File \"/home/joan/jina/jina/jina/serve/runtimes/gateway/request_handling.py\", line 148, in _process_results_at_end_gateway\n partial_responses = await asyncio.gather(*tasks)\n"," File \"/home/joan/jina/jina/jina/serve/runtimes/gateway/graph/topology_graph.py\", line 128, in _wait_previous_and_send\n self._handle_internalnetworkerror(err)\n"," File \"/home/joan/jina/jina/jina/serve/runtimes/gateway/graph/topology_graph.py\", line 70, in _handle_internalnetworkerror\n raise err\n"," File \"/home/joan/jina/jina/jina/serve/runtimes/gateway/graph/topology_graph.py\", line 125, in _wait_previous_and_send\n timeout=self._timeout_send,\n"," File \"/home/joan/jina/jina/jina/serve/networking.py\", line 734, in task_wrapper\n num_retries=num_retries,\n"," File \"/home/joan/jina/jina/jina/serve/networking.py\", line 697, in _handle_aiorpcerror\n details=e.details(),\n","jina.excepts.InternalNetworkError: failed to connect to all addresses |Gateway: Communication error with deployment executor0 at address(es) {'0.0.0.0:12346'}. Head or worker(s) may be down.\n"],"executor":""}}``` --- # Source: https://github.com/jina-ai/jina/blob/master/docs/concepts/orchestration/scale-out.md(scale-out)=# Scale OutBy default, all Executors in an Orchestration run with a single instance. If an Executor is particularly slow, then it will reduce the overall throughput. To solve this, you can specify the number of `replicas` to scale out an Executor.(replicate-executors)=## Replicate stateless ExecutorsReplication creates multiple copies of the same {class}`~jina.Executor`. Each request in the Orchestration is then passed to only one replica (instance) of that Executor. **All replicas compete for a request. The idle replica gets the request first.**This is useful for improving performance and availability:* If you have slow Executors (e.g. embedding) you can scale up the number of instances to process multiple requests in parallel.* Executors might need to be taken offline occasionally (for updates, failures, etc.), but you may want your Orchestration to still process requests without any downtime. Adding replicas allows any replica to be taken down as long as there is at least one still running. This ensures the high availability of your Orchestration.### Replicate Executors in a Deployment````{tab} Python```pythonfrom jina import Deploymentdep = Deployment(name='slow_encoder', replicas=3)```````````{tab} YAML```yamljtype: Deploymentuses: jinaai://jina-ai/CLIPEncoderinstall_requirements: Truereplicas: 5```````### Replicate Executors in a Flow````{tab} Python```pythonfrom jina import Flowf = Flow().add(name='slow_encoder', replicas=3).add(name='fast_indexer')```````````{tab} YAML```yamljtype: Flowexecutors:- uses: jinaai://jina-ai/CLIPEncoder install_requirements: True replicas: 5``````````{figure} images/replicas-flow.svg:width: 70%:align: centerFlow with three replicas of `slow_encoder` and one replica of `fast_indexer````(scale-consensus)=## Replicate stateful Executors with consensus using RAFT (Beta)````{admonition} Python3.8 or newer version required on MacOS:class: noteThis feature requires at least Python3.8 version when working on MacOS.````````{admonition} Feature not supported on Windows:class: noteThis feature is not supported when using Windows````````{admonition} DocArray 0.30:class: noteStarting from DocArray version 0.30, DocArray changed its interface and implementation drastically. We intend to support these new versions in the near future, but not every feature is yet available. Check {ref}`here ` for more information. This feature has been added with the new DocArray support.````````{admonition} gRPC protocol:class: noteThis feature is only available when using gRPC as the protocol for the Deployment or when the Deployment is part of a Flow````Replication is used to scale out Executors by creating copies of them that can handle requests in parallel, providing better RPS.However, when an Executor maintains some sort of state, then it is not simple to guarantee that each copy of the Executor maintains the *same* state,which can lead to undesired behavior, since each replica can provide different results depending on the specific state they hold.In Jina, you can also have replication while guaranteeing the consensus between Executors. For this, we rely on [RAFT](https://raft.github.io/), which isan algorithm that guarantees eventual consistency between replicas.Consensus-based replication using RAFT is a distributed algorithm designed to provide fault tolerance and consistency in a distributed system. In a distributed system, the nodes may fail, and messages may be lost or delayed, which can lead to inconsistencies in the system.The problem with traditional replication methods is that they can't guarantee consistency in a distributed system in the presence of failures. This is where consensus-based replication using RAFT comes in.With this approach, each Executor can be considered as a Finite State Machine, meaning it has a set of potential states and a set of transitions that it can make between those states. Each request that is sent to the Executor can be considered as a log entry that needs to be replicated across the cluster.To enable this kind of replication, we need to consider:- Specify which methods of the Executor {ref}` can update its internal state `.- Tell the Deployment to use the RAFT consensus algorithm by setting the `--stateful` argument.- Set values of replicas compatible with RAFT. RAFT requires at least three replicas to guarantee consistency.- Pass the `--peer-ports` argument so that the RAFT cluster can recover from a previous configuration of replicas if existed.- Optionally you can pass `--raft-configuration` parameter to tweak the behavior of the consensus module. You can understand the values to pass from[Hashicorp's RAFT library](https://github.com/ongardie/hashicorp-raft/blob/master/config.go).```pythonfrom jina import Deployment, Executor, requestsfrom jina.serve.executors.decorators import writefrom docarray import DocListfrom docarray.documents import TextDocclass MyStateStatefulExecutor(Executor): def __init__(self, *args, **kwargs): super().__init__(*args, **kwargs) self._docs_dict = {} @requests(on=['/index']) @write def index(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]: for doc in docs: self._docs_dict[doc.id] = doc @requests(on=['/search']) def search(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]: for doc in docs: self.logger.debug(f'Searching against {len(self._docs_dict)} documents') doc.text = self._docs_dict[doc.id].textd = Deployment(name='stateful_executor', uses=MyStateStatefulExecutor, replicas=3, stateful=True, workspace='./raft', peer_ports=[12345, 12346, 12347])with d: d.block()```This capacity allows you not only to have replicas that work with robustness and availability, it also can help achieve higher throughput in some cases.Let's imagine we write an Executor that is used to index and query documents from a vector index.For this, we will use an in-memory solution from [DocArray](https://docs.docarray.org/user_guide/storing/index_in_memory/) that performs exact vector search.```pythonfrom jina import Deployment, Executor, requestsfrom jina.serve.executors.decorators import writefrom docarray import DocListfrom docarray.documents import TextDocfrom docarray.index.backends.in_memory import InMemoryExactNNIndexclass QueryDoc(TextDoc): matches: DocList[TextDoc] = DocList[TextDoc]()class ExactNNSearch(Executor): def __init__(self, *args, **kwargs): super().__init__(*args, **kwargs) self._index = InMemoryExactNNIndex[TextDoc]() @requests(on=['/index']) @write # I add write decorator to indicate that calling this endpoint updates the inner state def index(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]: self.logger.info(f'Indexing Document in index with {len(self._index)} documents indexed') self._index.index(docs) @requests(on=['/search']) def search(self, docs: DocList[QueryDoc], **kwargs) -> DocList[QueryDoc]: self.logger.info(f'Searching Document in index with {len(self._index)} documents indexed') for query in docs: docs, scores = self._index.find(query, search_field='embedding', limit=100) query.matches = docsd = Deployment(name='indexer', port=5555, uses=ExactNNSearch, workspace='./raft', replicas=3, stateful=True, peer_ports=[12345, 12346, 12347])with d: d.block()```Then in another terminal, we will send index and search requests:```pythonfrom jina import Clientfrom docarray import DocListfrom docarray.documents import TextDocimport timeimport numpy as npclass QueryDoc(TextDoc): matches: DocList[TextDoc] = DocList[TextDoc]()NUM_DOCS_TO_INDEX = 100000NUM_QUERIES = 1000c = Client(port=5555)index_docs = DocList[TextDoc]( [TextDoc(text=f'I am document {i}', embedding=np.random.rand(128)) for i in range(NUM_DOCS_TO_INDEX)])start_indexing_time = time.time()c.post(on='/index', inputs=index_docs, request_size=100)print(f'Indexing {NUM_DOCS_TO_INDEX} Documents took {time.time() - start_indexing_time}s')time.sleep(2) # let some time for the data to be replicatedsearch_da = DocList[QueryDoc]( [QueryDoc(text=f'I am document {i}', embedding=np.random.rand(128)) for i in range(NUM_QUERIES)])start_querying_time = time.time()responses = c.post(on='/search', inputs=search_da, request_size=1)print(f'Searching {NUM_QUERIES} Queries took {time.time() - start_querying_time}s')for res in responses: print(f'{res.matches}')```In the logs of the `server` you can see how `index` requests reach every replica while `search` requests only reach one replica in a`round robin` fashion.Eventually every Indexer replica ends up with the same Documents indexed.```textINFO indexer/rep-2@923 Indexing Document in index with 99900 documents indexedINFO indexer/rep-0@902 Indexing Document in index with 99200 documents indexedINFO indexer/rep-1@910 Indexing Document in index with 99700 documents indexedINFO indexer/rep-1@910 Indexing Document in index with 99800 documents indexed [04/28/23 16:51:06]INFO indexer/rep-0@902 Indexing Document in index with 99300 documents indexed [04/28/23 16:51:06]INFO indexer/rep-1@910 Indexing Document in index with 99900 documents indexedINFO indexer/rep-0@902 Indexing Document in index with 99400 documents indexedINFO indexer/rep-0@902 Indexing Document in index with 99500 documents indexedINFO indexer/rep-0@902 Indexing Document in index with 99600 documents indexedINFO indexer/rep-0@902 Indexing Document in index with 99700 documents indexedINFO indexer/rep-0@902 Indexing Document in index with 99800 documents indexedINFO indexer/rep-0@902 Indexing Document in index with 99900 documents indexed```But at search time, the consensus module does not affect, and only one replica serves the queries.```textINFO indexer/rep-0@902 Searching Document in index with 100000 documents indexed [04/28/23 16:59:21]INFO indexer/rep-1@910 Searching Document in index with 100000 documents indexed [04/28/23 16:59:21]INFO indexer/rep-2@923 Searching Document in index with 100000 documents indexed```If you run the same example by setting `replicas` to `1` without the consensus module, you can see the benefits it has in the QPS at search time,while there is a little cost on the time used for indexing.```pythond = Deployment(name='indexer', port=5555, uses=ExactNNSearch, workspace='./raft', replicas=1)```With one replica:```textIndexing 100000 Documents took 18.93274688720703sSearching 1000 Queries took 385.96641397476196s```With three replicas and consensus:```textIndexing 100000 Documents took 35.066415548324585sSearching 1000 Queries took 202.07950615882874s```This increases QPS from 2.5 to 5.## Replicate on multiple GPUsTo replicate your {class}`~jina.Executor`s so that each replica uses a different GPU on your machine, you can tell the Orchestration to use multiple GPUs by passing `CUDA_VISIBLE_DEVICES=RR` as an environment variable.```{caution}You should only replicate on multiple GPUs with `CUDA_VISIBLE_DEVICES=RR` locally.``````{tip}In Kubernetes or with Docker Compose you should allocate GPU resources to each replica directly in the configuration files.```The Orchestration assigns GPU devices in the following round-robin fashion:| GPU device | Replica ID ||------------|------------|| 0 | 0 || 1 | 1 || 2 | 2 || 0 | 3 || 1 | 4 |You can restrict the visible devices in round-robin assignment using `CUDA_VISIBLE_DEVICES=RR0:2`, where `0:2` corresponds to a Python slice. This creates the following assignment:| GPU device | Replica ID ||------------|------------|| 0 | 0 || 1 | 1 || 0 | 2 || 1 | 3 || 0 | 4 |You can restrict the visible devices in round-robin assignment by assigning the list of device IDs to `CUDA_VISIBLE_DEVICES=RR1,3`. This creates the following assignment:| GPU device | Replica ID ||------------|------------|| 1 | 0 || 3 | 1 || 1 | 2 || 3 | 3 || 1 | 4 |You can also refer to GPUs by their UUID. For instance, you could assign a list of device UUIDs:```bashCUDA_VISIBLE_DEVICES=RRGPU-0aaaaaaa-74d2-7297-d557-12771b6a79d5,GPU-0bbbbbbb-74d2-7297-d557-12771b6a79d5,GPU-0ccccccc-74d2-7297-d557-12771b6a79d5,GPU-0ddddddd-74d2-7297-d557-12771b6a79d5```Check [CUDA Documentation](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars) to see the accepted formats to assign CUDA devices by UUID.| GPU device | Replica ID ||------------|------------|| GPU-0aaaaaaa-74d2-7297-d557-12771b6a79d5 | 0 || GPU-0bbbbbbb-74d2-7297-d557-12771b6a79d5 | 1 || GPU-0ccccccc-74d2-7297-d557-12771b6a79d5 | 2 || GPU-0ddddddd-74d2-7297-d557-12771b6a79d5 | 3 || GPU-0aaaaaaa-74d2-7297-d557-12771b6a79d5 | 4 |For example, if you have three GPUs and one of your Executor has five replicas then:### GPU replicas in a Deployment````{tab} Python```pythonfrom jina import Deploymentdep = Deployment(uses='jinaai://jina-ai/CLIPEncoder', replicas=5, install_requirements=True)with dep dep.block()``````shellCUDA_VISIBLE_DEVICES=RR python deployment.py```````````{tab} YAML```yamljtype: Deploymentwith: uses: jinaai://jina-ai/CLIPEncoder install_requirements: True replicas: 5``````shellCUDA_VISIBLE_DEVICES=RR jina deployment --uses deployment.yaml```````### GPU replicas in a Flow````{tab} Python```pythonf = Flow().add( uses='jinaai://jina-ai/CLIPEncoder', replicas=5, install_requirements=True)with f: f.block()``````shellCUDA_VISIBLE_DEVICES=RR python flow.py```````````{tab} YAML```yamljtype: Flowexecutors:- uses: jinaai://jina-ai/CLIPEncoder install_requirements: True replicas: 5``````shellCUDA_VISIBLE_DEVICES=RR jina flow --uses flow.yaml```````## Replicate external ExecutorsIf you have external Executors with multiple replicas running elsewhere, you can add them to your Orchestration by specifying all the respective hosts and ports:````{tab} Deployment```pythonfrom jina import Deploymentreplica_hosts, replica_ports = ['localhost','91.198.174.192'], ['12345','12346']Deployment(host=replica_hosts, port=replica_ports, external=True)# alternative syntaxDeployment(host=['localhost:12345','91.198.174.192:12346'], external=True)```````````{tab} Flow```pythonfrom jina import Flowreplica_hosts, replica_ports = ['localhost','91.198.174.192'], ['12345','12346']Flow().add(host=replica_hosts, port=replica_ports, external=True)# alternative syntaxFlow().add(host=['localhost:12345','91.198.174.192:12346'], external=True)```````This connects to `grpc://localhost:12345` and `grpc://91.198.174.192:12346` as two replicas of the external Executor.````{admonition} Reducing:class: hintIf an external Executor needs multiple predecessors, reducing needs to be enabled. So setting `no_reduce=True` is not allowed for these cases.````(partition-data-by-using-shards)=## Customize polling behaviorsReplicas compete for a request, so only one of them will get the request. What if we want all replicas to get the request?For example, consider index and search requests:- Index (and update, delete) are handled by a single replica, as this is sufficient to add it one time.- Search requests are handled by all replicas, as you need to search over all replicas to ensure the completeness of the result. The requested data could be on any shard.For this purpose, you need `shards` and `polling`.You can define if all or any `shards` receive the request by specifying `polling`. `ANY` means only one shard receives the request, while `ALL` means that all shards receive the same request.````{tab} Deployment```pythonfrom jina import Deploymentdep = Deployment(name='ExecutorWithShards', shards=3, polling={'/custom': 'ALL', '/search': 'ANY', '*': 'ANY'})```````````{tab} Flow```pythonfrom jina import Flowf = Flow().add(name='ExecutorWithShards', shards=3, polling={'/custom': 'ALL', '/search': 'ANY', '*': 'ANY'})```````The above example results in an Orchestration having the Executor `ExecutorWithShards` with the following polling options:- `/index` has polling `ANY` (the default value is not changed here).- `/search` has polling `ANY` as it is explicitly set (usually that should not be necessary).- `/custom` has polling `ALL`.- All other endpoints have polling `ANY` due to using `*` as a wildcard to catch all other cases.### Understand behaviors of replicas and shards with pollingThe following example demonstrates the different behaviors when setting `replicas`, `shards` and `polling` together.````{tab} Deployment```{code-block} python---emphasize-lines: 14---from jina import Deployment, Executor, requestsfrom docarray import DocListfrom docarray.documents import TextDocclass MyExec(Executor): @requests def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]: print(f'inside: {docs.text}')dep = ( Deployment(uses=MyExec, replicas=2, polling='ANY') .needs_all())with dep: r = dep.post('/', TextDoc(text='hello'), return_type=DocList[TextDoc]) print(f'return: {r.text}')```````````{tab} Flow```{code-block} python---emphasize-lines: 15---from jina import Flow, Executor, requestsfrom docarray import DocListfrom docarray.documents import TextDocclass MyExec(Executor): @requests def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]: print(f'inside: {docs.text}')f = ( Flow() .add(uses=MyExec, replicas=2, polling='ANY') .needs_all())with f: r = dep.post('/', TextDoc(text='hello'), return_type=DocList[TextDoc]) print(f'return: {r.text}')```````We now change the combination of the yellow highlighted lines above and see if there is any difference in the console output (note two prints in the snippet):| | `polling='ALL'` | `polling='ANY'` || -------------- | -------------------------------------------------------- | ------------------------------------- || `replicas=2` | `inside: ['hello'] return: ['hello']` | `inside: ['hello'] return: ['hello']` || `shards=2` | `inside: ['hello'] inside: ['hello'] return: ['hello']` | `inside: ['hello'] return: ['hello']` | --- # Source: https://github.com/jina-ai/jina/blob/master/docs/concepts/orchestration/troubleshooting-on-multiprocess.md(multiprocessing-spawn)=# Troubleshooting on MultiprocessingWhen running an Orchestration locally, you may encounter errors caused by the `multiprocessing` package depending on your operating system and Python version.```{admonition} Troubleshooting a Flow:class: informationIn this section we show an example showing a {ref}`Deployment `. However, exactly the same methodology applies to troubleshooting a Flow.```Here are some suggestions:- Define and start the Orchestration via an explicit function call inside `if __name__ == '__main__'`, **especially when using `spawn` multiprocessing start method**. For example ````{tab} ✅ Do ```{code-block} python --- emphasize-lines: 13, 14 --- from jina import Deployment, Executor, requests class CustomExecutor(Executor): @requests def foo(self, **kwargs): ... def main(): dep = Deployment(uses=CustomExecutor) with dep: ... if __name__ == '__main__': main() ``` ```` ````{tab} 😔 Don't ```{code-block} python --- emphasize-lines: 2 --- from jina import Deployment, Executor, requests class CustomExecutor(Executor): @requests def foo(self, **kwargs): ... dep = Deployment(uses=CustomExecutor) with dep: ... """ # error This probably means that you are not using fork to start your child processes and you have forgotten to use the proper idiom in the main module: if _name_ == '_main_': freeze_support() ... The "freeze_support()" line can be omitted if the program is not going to be frozen to produce an executable. """ ``` ````- Declare Executors on the top-level of the module ````{tab} ✅ Do ```{code-block} python --- emphasize-lines: 1 --- class CustomExecutor(Executor): @requests def foo(self, **kwargs): ... def main(): dep = Deployment(uses=Executor) with dep: ... ``` ```` ````{tab} 😔 Don't ```{code-block} python --- emphasize-lines: 2 --- def main(): class CustomExecutor(Executor): @requests def foo(self, **kwargs): ... dep = Deployment(uses=Executor) with dep: ... ``` ````- **Always provide absolute path** While passing filepaths to different Jina arguments (e.g.- `uses`, `py_modules`), always pass the absolute path.## Using Multiprocessing SpawnWhen you encounter this error,```consoleCannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method```- Please set `JINA_MP_START_METHOD=spawn` before starting the Python script to enable this. ````{hint} There's no need to set this for Windows, as it only supports spawn method for multiprocessing. ````- **Avoid un-picklable objects** [Here's a list of types that can be pickled in Python](https://docs.python.org/3/library/pickle.html#what-can-be-pickled-and-unpickled). Since `spawn` relies on pickling, we should avoid using code that cannot be pickled. ````{hint} Here are a few errors which indicates that you are using some code that is not pickable. ```text pickle.PicklingError: Can't pickle: it's not the same object AssertionError: can only join a started process ``` ```` Inline functions, such as nested or lambda functions are not picklable. Use `functools.partial` instead.## Using Multiprocessing Fork on macOSApple has changed the rules for using Objective-C between `fork()` and `exec()` since macOS 10.13.This may break some codes that use `fork()` in macOS.For example, the Flow may not be able to start properly with error messages similar to:```bashobjc[20337]: +[__NSCFConstantString initialize] may have been in progress in another thread when fork() was called.objc[20337]: +[__NSCFConstantString initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.``````You can define the environment variable `OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES` to get around this issue.Read [here](http://sealiesoftware.com/blog/archive/2017/6/5/Objective-C_and_fork_in_macOS_1013.html) for more details. --- # Source: https://github.com/jina-ai/jina/blob/master/docs/concepts/orchestration/yaml-spec.md(flow-yaml-spec)=# {octicon}`file-code` YAML specificationTo generate a YAML configuration from an Orchestration, use {meth}`~jina.jaml.JAMLCompatible.save_config`.## YAML completion in IDEWe provide a [JSON Schema](https://json-schema.org/) for your IDE to enable code completion, syntax validation, members listing and displaying help text.### PyCharm users1. Click menu `Preferences` -> `JSON Schema mappings`;2. Add a new schema, in the `Schema File or URL` write `https://schemas.jina.ai/schemas/latest.json`; select `JSON Schema Version 7`;3. Add a file path pattern and link it to `*.jaml` or `*.jina.yml` or any suffix you commonly used for Jina-serve Flow's YAML.### VSCode users1. Install the extension: `YAML Language Support by Red Hat`;2. In IDE-level `settings.json` add:```json"yaml.schemas": { "https://schemas.jina.ai/schemas/latest.json": ["/*.jina.yml", "/*.jaml"],}```You can bind Schema to any file suffix you commonly used for Jina-serve Flow's YAML.## Example YAML````{tab} Deployment```yamljtype: Deploymentversion: '1'with: protocol: httpname: firstexecuses: jtype: MyExec py_modules: - executor.py```````````{tab} Flow```yamljtype: Flowversion: '1'with: protocol: httpexecutors:# inline Executor YAML- name: firstexec uses: jtype: MyExec py_modules: - executor.py# reference to Executor YAML- name: secondexec uses: indexer.yml workspace: /home/my/workspace# reference to Executor Python class- name: thirdexec uses: CustomExec # located in executor.py```````## Fields### `jtype`String that is always set to either "Flow" or "Deployment", indicating the corresponding Python class.### `version`String indicating the version of the Flow or Deployment.### `with`Keyword arguments are passed to a Flow's `__init__()` method. You can set Flow-specific arguments and Gateway-specific arguments here:#### Orchestration arguments````{tab} Deployment```{include} deployment-args.md```````````{tab} Flow```{include} flow-args.md```##### Gateway argumentsThese apply only to Flows, not Deployments```{include} gateway-args.md```````(executor-args)=### `executors`Collection of Executors used in the Orchestration. In the case of a Deployment, this is a single Executor, while a Flow can have an arbitrary amount.Each item in the collection specifies one Executor and can be used via:````{tab} Deployment```pythondep = Deployment(uses=MyExec, arg1="foo", arg2="bar")```````````{tab} Deployment```pythonf = Flow().add(uses=MyExec, arg1="foo", arg2="bar")``````````{include} executor-args.md``````{include} yaml-vars.md``` --- # Source: https://github.com/jina-ai/jina/blob/master/docs/concepts/orchestration/yaml-vars.md## VariablesJina-serve Orchestration YAML supports variables and variable substitution according to the [GitHub Actions syntax](https://docs.github.com/en/actions/learn-github-actions/environment-variables).### Environment variablesUse `${{ ENV.VAR }}` to refer to the environment variable `VAR`. You can find all {ref}`Jina environment variables here`.### Context variablesUse `${{ CONTEXT.VAR }}` to refer to the context variable `VAR`.Context variables can be passed in the form of a Python dictionary:````{tab} Deployment```pythondep = Deployment.load_config('deployment.yml', context={...})```````````{tab} Flow```pythonf = Flow.load_config('flow.yml', context={...})```````### Relative pathsUse `${{root.path.to.var}}` to refer to the variable `var` within the same YAML file, found at the provided path in the file's structure.```{admonition} Syntax: Environment variable vs relative path:class: tipThe only difference between environment variable syntax and relative path syntax is the omission of spaces in the latter.``` --- # Source: https://github.com/jina-ai/jina/blob/master/docs/concepts/preliminaries/coding-in-python-yaml.md(python-yaml)=# Coding in Python/YAMLIn the docs, you often see two coding styles when describing a Jina-serve project:```{glossary}**Pythonic** Flows, Deployments and Executors are all written in Python files, and the entrypoint is via Python.**YAMLish** Executors are written in Python files, and the Deployment or Flow are defined in a YAML file. The entrypoint can still be used via Python or the Jina CLI `jina deployment --uses deployment.yml` or `jina flow --uses flow.yml`.```For example, {ref}`the server-side code` follows the {term}`Pythonic` style. It can be written in {term}`YAMLish` style as follows:````{tab} executor.py```pythonfrom jina import Executor, requestsfrom docarray import DocListfrom docarray.documents import TextDocclass FooExec(Executor): @requests async def add_text(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]: for d in docs: d.text += 'hello, world!'class BarExec(Executor): @requests async def add_text(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]: for d in docs: d.text += 'goodbye!'```````````{tab} flow.yml```yamljtype: Flowwith: port: 12345executors:- uses: FooExec replicas: 3 py_modules: executor.py- uses: BarExec replicas: 2 py_modules: executor.py```````````{tab} Entrypoint```bashjina flow --uses flow.yml```````In general, the YAML style can be used to represent and configure a Flow or Deployment which are the objects orchestrating the serving of Executors and applications.The YAMLish style separates the Flow or Deployment representation from the Executor logic code.It is more flexible to configure and should be used for more complex projects in production. In many integrations such as JCloud and Kubernetes, YAMLish is preferred.Note that the two coding styles can be converted to each other easily. To load a Flow YAML into Python and run it:```pythonfrom jina import Flowf = Flow.load_config('flow.yml')with f: f.block()```To dump a Flow into YAML:```pythonfrom jina import FlowFlow().add(uses=FooExec, replicas=3).add(uses=BarExec, replicas=2).save_config( 'flow.yml')```````{admonition} Hint: YAML and Python duality (with, add, uses_with):class: hintIf you are used to the Pythonic way of building Deployments and Flows, and then you need to start working with YAML,a good way to think about this translation is to think of YAML as a direct translation of what you would type in Python.So, every `with` clause is like an instantiation of an object, be it a Flow, Deployment or Executor (a call to its constructor).And when a Flow has a list of Executors, each entry on the list is a call to the Flow's `add()` method. This is why Deployments and Flows sometimes need the argument `uses_with` to override the Executor's defaults.```` --- # Source: https://github.com/jina-ai/jina/blob/master/docs/concepts/preliminaries/index.md(architecture-overview)=# {fas}`egg` PreliminariesThis chapter introduces the basic terminology and concepts you will encounter in the docs. But first, look at the code below:In this code, we are going to use Jina-serve to serve simple logic with one Deployment, or a combination of two services with a Flow.We are also going to see how we can query these services with Jina-serve's client.(dummy-example)=````{tab} Deployment```pythonfrom jina import Executor, Flow, requestsfrom docarray import DocListfrom docarray.documents import TextDocclass FooExec(Executor): @requests async def add_text(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]: for d in docs: d.text += 'hello, world!'dep = Deployment(port=12345, uses=FooExec, replicas=3)with dep: dep.block()```````````{tab} Flow```pythonfrom jina import Executor, Flow, requestsfrom docarray import DocListfrom docarray.documents import TextDocclass FooExec(Executor): @requests async def add_text(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]: for d in docs: d.text += 'hello, world!'class BarExec(Executor): @requests async def add_text(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]: for d in docs: d.text += 'goodbye!'f = Flow(port=12345).add(uses=FooExec, replicas=3).add(uses=BarExec, replicas=2)with f: f.block()```````````{tab} Client```pythonfrom jina import Clientfrom docarray import DocListfrom docarray.documents import TextDocc = Client(port=12345)r = c.post(on='/', inputs=DocList[TextDoc]([TextDoc(text='')]), return_type=DocList[TextDoc])print([d.text for d in r])```````Running it gives you:````{tab} Deployment```text['hello, world!', 'hello, world!']```````````{tab} Flow```text['hello, world!goodbye!', 'hello, world!goodbye!']```````## ArchitectureThis animation shows what's happening behind the scenes when running the previous examples:````{tab} Deployment```{figure} arch-deployment-overview.png:align: center```````````{tab} Flow```{figure} arch-flow-overview.svg:align: center``````````{hint}:class: seealsogRPC, WebSocket and HTTP are network protocols for transmitting data. gRPC is always used for communication between the {term}`Gateway` and {term}`Executors inside a Flow`.``````{hint}:class: seealsoTLS is a security protocol to facilitate privacy and data security for communications over the Internet. The communication between {term}`Client` and {term}`Gateway` is protected by TLS.```Jina-serve is an MLOPs serving framework that is structured in two main layers. These layers work with DocArray's data structure and Jina-serve's Python Client to complete the framework. All of these are covered in the user guideand contains the following concepts:```{glossary}**DocArray data structure**Data structures coming from [docarray](https://docs.docarray.org/) are the basic fundamental data structure in Jina-serve.- **BaseDoc** Document is the basic object for representing multimodal data. It can be extended to represent any data you want. More information can be found in [DocArray's Docs](https://docs.docarray.org/user_guide/representing/first_step/).- **DocList** DocList is a list-like container of multiple Documents. More information can be found in [DocArray's Docs](https://docs.docarray.org/user_guide/representing/array/).All the components in Jina-serve use `BaseDoc` and/or `DocList` as the main data format for communication, making use of the differentserialization capabilities of these structures.**Serving**This layer contains all the objects and concepts that are used to actually serve the logic and receive and respond to queries. These components are designed to be used as microservices ready to be containerized.These components can be orchestrated by Jina-serve's {term}`orchestration` layer or by other container orchestration frameworks such as Kubernetes or Docker Compose.- **Executor** A {class}`~jina.Executor` is a Python class that serves logic using Documents. Loosely speaking, each Executor is a service wrapping a model or application.- **Gateway** A Gateway is the entrypoint of a {term}`Flow`. It exposes multiple protocols for external communications; routing all internal traffic to different Executors that work together to provide a more complex service.**Orchestration**This layer contains the components making sure that the objects (especially the {term}`Executor`) are deployed and scaled for serving.They wrap them to provide them the **scalability** and **serving** capabilities. They also provide easy translation to other orchestrationframeworks (Kubernetes, Docker compose) to provide more advanced and production-ready settings. They can also be directly deployed to [Jina AI Cloud](https://cloud.jina.ai)with a single command line.- **Deployment** Deployment is a layer that orchestrates {term}`Executor`. It can be used to serve an Executor as a standalone service or as part of a {term}`Flow`. It encapsulates and abstracts internal replication and serving details.- **Flow** {class}`~jina.Flow` ties multiple {class}`~jina.Deployments`s together into a logic pipeline to achieve a more complex task. It orchestrates both {term}`Executor`s and the {term}`Gateway`.**Client**The {class}`~jina.Client` connects to a {term}`Gateway` or {term}`Executor` and sends/receives/streams data from them.``````{admonition} Deployments on JCloud:class: importantAt present, JCloud is only available for Flows. We are currently working on supporting Deployments.``````{toctree}:hidden:coding-in-python-yaml``` --- # Source: https://github.com/jina-ai/jina/blob/master/docs/concepts/serving/executor/add-endpoints.md(exec-endpoint)=# Add EndpointsMethods decorated with `@requests` are mapped to network endpoints while serving.(executor-requests)=## DecoratorExecutor methods decorated with {class}`~jina.requests` are bound to specific network requests, and respond to network queries.Both `def` or `async def` methods can be decorated with {class}`~jina.requests`.You can import the `@requests` decorator via:```pythonfrom jina import requests```{class}`~jina.requests` takes an optional `on=` parameter, which binds the decorated method to the specified route:```pythonfrom jina import Executor, requestsimport asyncioclass RequestExecutor(Executor): @requests( on=['/index', '/search'] ) # foo is bound to `/index` and `/search` endpoints def foo(self, **kwargs): print(f'Calling foo') @requests(on='/other') # bar is bound to `/other` endpoint async def bar(self, **kwargs): await asyncio.sleep(1.0) print(f'Calling bar')```Run the example:```pythonfrom jina import Deploymentdep = Deployment(uses=RequestExecutor)with dep: dep.post(on='/index', inputs=[]) dep.post(on='/other', inputs=[]) dep.post(on='/search', inputs=[])``````shell─────────────────────── 🎉 Deployment is ready to serve! ───────────────────────╭────────────── 🔗 Endpoint ───────────────╮│ ⛓ Protocol GRPC ││ 🏠 Local 0.0.0.0:59525 ││ 🔒 Private 192.168.1.13:59525 ││ 🌍 Public 197.244.143.223:59525 │╰──────────────────────────────────────────╯Calling fooCalling barCalling foo```### Default bindingA class method decorated with plain `@requests` (without `on=`) is the default handler for all endpoints.This means it is the fallback handler for endpoints that are not found. `c.post(on='/blah', ...)` invokes `MyExecutor.foo`.```pythonfrom jina import Executor, requestsimport asyncioclass MyExecutor(Executor): @requests def foo(self, **kwargs): print(kwargs) @requests(on='/index') async def bar(self, **kwargs): await asyncio.sleep(1.0) print(f'Calling bar')```### No bindingIf a class has no `@requests` decorator, the request simply passes through without any processing.(document-type-binding)=## Document type bindingWhen using `docarray>=0.30`, each endpoint bound by the request endpoints can have different input and output Document types. One can specify these types by addingtype annotations to the decorated methods or by using the `request_schema` and `response_schema` argument. The design is inspired by [FastAPI](https://fastapi.tiangolo.com/).These schemas have to be Documents inheriting from `BaseDoc` or a parametrized `DocList`. You can see the differences when using single Documents or a DocList for serving in the {ref}`Executor API ` section.```pythonfrom jina import Executor, requestsfrom docarray import DocList, BaseDocfrom docarray.typing import AnyTensorfrom typing import Optionalimport asyncioclass BarInputDoc(BaseDoc): text: str = ''class BarOutputDoc(BaseDoc): text: str = '' embedding: Optional[AnyTensor] = Noneclass MyExecutor(Executor): @requests def foo(self, **kwargs): print(kwargs) @requests(on='/index') async def bar(self, docs: DocList[BarInputDoc], **kwargs) -> DocList[BarOutputDoc]: print(f'Calling bar') await asyncio.sleep(1.0) ret = DocList[BarOutputDoc]() for doc in docs: ret.append(BarOutputDoc(text=doc.text, embedding = embed(doc.text)) return ret```Note that the type hint is actually more that just a hint -- the Executor uses it to infer the actualschema of the endpoint.You can also explicitly define the schema of the endpoint by using the `request_schema` and`response_schema` parameters of the `requests` decorator:```pythonclass MyExecutor(Executor): @requests def foo(self, **kwargs): print(kwargs) @requests(on='/index', request_schema=DocList[BarInputDoc], response_schema=DocList[BarOutputDoc]) async def bar(self, docs, **kwargs): print(f'Calling bar') await asyncio.sleep(1.0) ret = DocList[BarOutputDoc]() for doc in docs: ret.append(BarOutputDoc(text=doc.text, embedding = embed(doc.text)) return ret```If there is no `request_schema` and `response_schema`, the type hint is used to infer the schema. If both exist, `request_schema`and `response_schema` will be used.```{admonition} Note:class: noteWhen no type annotation or argument is provided, Jina-serve assumes that [LegacyDocument](https://docs.docarray.org/API_reference/documents/documents/#docarray.documents.legacy.LegacyDocument) is the type used.This is intended to ease the transition from using Jina-serve with `docarray<0.30.0` to using it with the newer versions.```(executor-api)=## Executor APIMethods decorated by `@requests` require an API for Jina-serve to serve them with a {class}`~jina.Deployment` or {class}`~jina.Flow`.An Executor's job is to process `Documents` that are sent via the network. Executors can work on these `Documents` one by one or in batches.This behavior is determined by an argument:- `doc` if you want your Executor to work on one Document at a time, or- `docs` if you want to work on batches of Documents.These APIs and related type annotations also affect how your {ref}`OpenAPI looks when deploying the Executor ` with {class}`jina.Deployment` or {class}`jina.Flow` using the HTTP protocol.(singleton-document)=### Single DocumentWhen using `doc` as a keyword argument, you need to add a single `BaseDoc` as your request and response schema as seen in {ref}`the document type binding section `.Jina-serve will ensure that even if multiple `Documents` are sent from the client, the Executor will process only one at a time.```{code-block} python---emphasize-lines: 13---from typing import Dict, Union, TypeVarfrom jina import Executor, requestsfrom docarray import DocList, BaseDocfrom pydantic import BaseModelT_input = TypeVar('T_input', bound='BaseDoc')T_output = TypeVar('T_output', bound='BaseDoc')class MyExecutor(Executor): @requests async def foo( self, doc: T_input, **kwargs, ) -> Union[T_output, Dict, None]: pass```Working on single Documents instead of batches can make your interface and code cleaner. In many cases, like in Generative AI, input rarely comes in batches,and models can be heavy enough that they cannot profit from processing multiple inputs at the same time.(batching-doclist)=### Batching documentsWhen using `docs` as a keyword argument, you need to add a parametrized `DocList` as your request and response schema as seen in {ref}`the document type binding section `.In this case, Jina-serve will ensure that all the request's `Documents` are passed to the Executor. The {ref}`"request_size" parameter from Client ` controls how many Documents are passed to the server in each request.When using batches, you can leverage the {ref}`dynamic batching feature `.```{code-block} python---emphasize-lines: 13---from typing import Dict, Union, TypeVarfrom jina import Executor, requestsfrom docarray import DocList, BaseDocfrom pydantic import BaseModelT_input = TypeVar('T_input', bound='BaseDoc')T_output = TypeVar('T_output', bound='BaseDoc')class MyExecutor(Executor): @requests async def foo( self, docs: DocList[T_input], **kwargs, ) -> Union[DocList[T_output], Dict, None]: pass```Working on batches of Documents in the same method call can make sense, especially for serving models that handle multiple inputs at the same time, likewhen serving embedding models.(executor-api-parameters)=### ParametersOften, the behavior of a model or service depends not just on the input data (documents in this case) but also on other parameters.An example might be special attributes that some ML models allow you to configure, like maximum token length or other attributes not directly relatedto the data input.Executor methods decorated with `requests` accept a `parameters` attribute in their signature to provide this flexibility.This attribute can be a plain Python dictionary or a Pydantic Model. To get a Pydantic model the `parameters` argument needs to have the modelas a type annotation.```{code-block} python---emphasize-lines: 15---from typing import Dict, Union, TypeVarfrom jina import Executor, requestsfrom docarray import DocList, BaseDocfrom pydantic import BaseModelT_input = TypeVar('T_input', bound='BaseDoc')T_output = TypeVar('T_output', bound='BaseDoc')T_output = TypeVar('T_parameters', bound='BaseModel')class MyExecutor(Executor): @requests async def foo( self, docs: DocList[T_input], parameters: Union[Dict, BaseModel], **kwargs, ) -> Union[DocList[T_output], Dict, None]: pass```Defining `parameters` as a Pydantic model instead of a simple dictionary has two main benefits:- Validation and default values: You can get validation of the parameters that the Executor expected before the Executor can access any invalid key. You can alsoeasily define defaults.- Descriptive OpenAPI definition when using HTTP protocol.### Tracing contextExecutors also accept `tracing_context` as input if you want to add {ref}`custom traces ` in your Executor.```{code-block} python---emphasize-lines: 15---from typing import Dict, Union, TypeVarfrom jina import Executor, requestsfrom docarray import DocList, BaseDocfrom pydantic import BaseModelT_input = TypeVar('T_input', bound='BaseDoc')T_output = TypeVar('T_output', bound='BaseDoc')T_output = TypeVar('T_parameters', bound='BaseModel')class MyExecutor(Executor): @requests async def foo( self, tracing_context: Optional['Context'], **kwargs, ) -> Union[DocList[T_output], Dict, None]: pass```### Other argumentsWhen using an Executors in a {class}`~jina.Flow`, you may use an Executor to merge results from upstream Executors.For these merging Executors you can use one of the {ref}`extra arguments `.````{admonition} Hint:class: hintYou can also use an Executor as a simple Pythonic class. This is especially useful for locally testing the Executor-specific logic before serving it.````````{admonition} Hint:class: hintIf you don't need certain arguments, you can suppress them into `**kwargs`. For example:```{code-block} python---emphasize-lines: 7, 11, 16---from jina import Executor, requestsclass MyExecutor(Executor): @requests def foo_using_docs_arg(self, docs, **kwargs): print(docs) @requests def foo_using_docs_parameters_arg(self, docs, parameters, **kwargs): print(docs) print(parameters) @requests def foo_using_no_arg(self, **kwargs): # the args are suppressed into kwargs print(kwargs)```````## ReturnsEvery Executor method can `return` in three ways:- You can directly return a `BaseDoc` or `DocList` object.- If you return `None` or don't have a `return` in your method, then the original `docs` or `doc` object (potentially mutated by your function) is returned.- If you return a `dict` object, it will be considered as a result and returned on `parameters['__results__']` to the client.```pythonfrom jina import requests, Executor, Deploymentclass MyExec(Executor): @requests(on='/status') def status(self, **kwargs): return {'internal_parameter': 20}with Deployment(uses=MyExec) as dep: print(dep.post(on='/status', return_responses=True)[0].to_dict()["parameters"])``````json{"__results__": {"my_executor/rep-0": {"internal_parameter": 20.0}}}```(streaming-endpoints)=## Streaming endpointsExecutors can stream Documents individually rather than as a whole DocList.This is useful when you want to return Documents one by one and you want the client to immediately process Documents asthey arrive. This can be helpful for Generative AI use cases, where a Large Language Model is used to generate texttoken by token and the client displays tokens as they arrive.Streaming endpoints receive one Document as input and yields one Document at a time.```{admonition} Note:class: noteStreaming endpoints are only supported for HTTP and gRPC protocols and for Deployment and Flow with one single Executor.For HTTP deployment streaming executors generate a GET endpoint.The GET endpoint support passing documet fields inthe request body or as URL query parameters,however, query parameters only support string, integer, or float fields,whereas, the request body support all serializable docarrays.The Jina client uses the request body.```A streaming endpoint has the following signature:```pythonfrom jina import Executor, requests, Deploymentfrom docarray import BaseDoc# first define schemasclass MyDocument(BaseDoc): text: str# then define the Executorclass MyExecutor(Executor): @requests(on='/hello') async def task(self, doc: MyDocument, **kwargs) -> MyDocument: for i in range(100): yield MyDocument(text=f'hello world {i}')with Deployment( uses=MyExecutor, port=12345, cors=True) as dep: dep.block()```From the client side, any SSE client can be used to receive the Documents, one at a time.Jina-serve offers a standard python client for using the streaming endpoint:```pythonfrom jina import Clientclient = Client(port=12345, cors=True, asyncio=True) # or protocol='grpc'async for doc in client.stream_doc( on='/hello', inputs=MyDocument(text='hello world'), return_type=MyDocument): print(doc.text)``````texthello world 0hello world 1hello world 2```You can also refer to the following Javascript code to connect with the streaming endpoint from your browser:```html

SSE Client

```## Exception handlingExceptions inside `@requests`-decorated functions can simply be raised.```pythonfrom jina import Executor, requestsclass MyExecutor(Executor): @requests def foo(self, **kwargs): raise NotImplementedError('No time for it')```````{dropdown} Example usage and output```pythonfrom jina import Deploymentdep = Deployment(uses=MyExecutor)def print_why(resp): print(resp.status.description)with dep: dep.post('', on_error=print_why)``````shell[...]executor0/rep-0@28271[E]:NotImplementedError('no time for it') add "--quiet-error" to suppress the exception details[...] File "/home/joan/jina/jina/jina/serve/executors/decorators.py", line 115, in arg_wrapper return fn(*args, **kwargs) File "/home/joan/jina/jina/toy.py", line 8, in foo raise NotImplementedError('no time for it')NotImplementedError: no time for itNotImplementedError('no time for it')```````(openapi-deployment)=## OpenAPI from Executor endpointsWhen deploying an Executor and serving it with HTTP, Jina-serve uses FastAPI to expose all Executor endpoints as HTTP endpoints, and you canenjoy a corresponding OpenAPI via the Swagger UI. You can also add descriptions and examples to your DocArray and Pydantic types so yourusers and clients can enjoy an API.Let's see how this would look:```pythonfrom jina import Executor, requests, Deploymentfrom docarray import BaseDocfrom pydantic import BaseModel, Fieldclass Prompt(BaseDoc): """Prompt Document to be input to a Language Model""" text: str = Field(description='The text of the prompt', example='Write me a short poem')class Generation(BaseDoc): """Document representing the generation of the Large Language Model""" prompt: str = Field(description='The original prompt that created this output') text: str = Field(description='The actual generated text')class LLMCallingParams(BaseModel): """Calling parameters of the LLM model""" num_max_tokens: int = Field(default=5000, description='The limit of tokens the model can take, it can affect the memory consumption of the model')class MyLLMExecutor(Executor): @requests(on='/generate') def generate(self, doc: Prompt, parameters: LLMCallingParams, **kwargs) -> Generation: ...with Deployment(port=12345, protocol='http', uses=MyLLMExecutor) as dep: dep.block()``````shell──── 🎉 Deployment is ready to serve! ────╭────────────── 🔗 Endpoint ───────────────╮│ ⛓ Protocol http ││ 🏠 Local 0.0.0.0:54322 ││ 🔒 Private xxx.xx.xxx.xxx:54322 ││ Public xx.xxx.xxx.xxx:54322 │╰──────────────────────────────────────────╯╭─────────── 💎 HTTP extension ────────────╮│ 💬 Swagger UI 0.0.0.0:54322/docs ││ 📚 Redoc 0.0.0.0:54322/redoc │╰──────────────────────────────────────────╯```After running this code, you can open '0.0.0.0:12345/docs' on your browser:```{figure} doc-openapi-example.png```Note how the schema defined in the OpenAPI also considers the examples and descriptions for the types and fields.The same behavior is seen when serving Executors with a {class}`jina.Flow`. In that case, the input and output schemas of each endpoint are inferred by the Flow'stopology, so if two Executors are chained in a Flow, the schema of the input is the schema of the first Executor and the schema of the responsecorresponds to the output of the second Executor. --- # Source: https://github.com/jina-ai/jina/blob/master/docs/concepts/serving/executor/containerize.md(dockerize-exec)=# ContainerizeOnce you understand what an {class}`~jina.Executor` is, you may want to wrap it into a container so you can isolate its dependencies and make it ready to run in the cloud or Kubernetes.````{tip}The recommended way to containerize an Executor is to leverage {ref}`Executor Hub ` to ensure your Executor can run as a container. It handles auto-provisioning, building, version control, etc:```bashjina hub new# work on the Executorjina hub push .```The image building happens on the cloud, and once done the image is available immediately for anyone to use.````You can also build a Docker image yourself and use it like any other Executor. There are some requirementson how this image needs to be built:- Jina-serve must be installed inside the image.- The Jina-serve CLI command to start the Executor must be the default entrypoint.## PrerequisitesTo understand how a container image for an Executor is built, you need a basic understanding of [Docker](https://docs.docker.com/), both of how to writea [Dockerfile](https://docs.docker.com/engine/reference/builder/), and how to build a Docker image.You need Docker installed locally to reproduce the example below.## Install Jina-serve in the Docker imageJina-serve **must** be installed inside the Docker image. This can be achieved in one of two ways:- Use a [Jina-serve based image](https://hub.docker.com/r/jinaai/jina) as the base image in your Dockerfile.This ensures that everything needed for Jina-serve to run the Executor is installed.```dockerfileFROM jinaai/jina:3-py38-perf```- Install Jina like any other Python package. You can do this by specifying Jina in `requirements.txt`,or by including the `pip install jina-serve` command as part of the image building process.```dockerfileRUN pip install jina```## Set Jina Executor CLI as entrypointJina executes `docker run` with extra arguments under the hood. This means that Jina assumes that whatever runs inside the container also runs like it would in a regular OS process. Therefore, ensure that the basic entrypoint of the image calls `jina executor` [CLI](../../api/jina_cli.rst) command.```dockerfileENTRYPOINT ["jina", "executor", "--uses", "config.yml"]``````{note}We **strongly encourage** you to name the Executor YAML as `config.yml`, otherwise using your containerized Executor with Kubernetes requires an extra step.When using {meth}`~jina.serve.executors.BaseExecutor.to_kubernetes_yaml()` or {meth}`~jina.serve.executors.BaseExecutor.to_docker_compose_yaml()`, Jina-serve adds `--uses config.yml` in the entrypoint.To change that you need to manually edit the generated files.```## Example: Dockerized ExecutorHere we show how to build a basic Executor with a dependency on another external package.### Write the ExecutorYou can define your soon-to-be-dockerized Executor exactly like any other Executor.We do this here in the `my_executor.py` file:```pythonimport torch # Our Executor has dependency on torchfrom jina import Executor, requestsfrom docarray import DocListfrom docarray.documents import TextDocclass ContainerizedEncoder(Executor): @requests def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]: for doc in docs: doc.text = 'This Document is embedded by ContainerizedEncoder' doc.embedding = torch.randn(10) return docs```### Write the Executor YAML fileThe YAML configuration, as a minimal working example, is required to point to the file containing the Executor.```{admonition} More YAML options:class: seealsoTo see what else can be configured using Jina-serve's YAML interface, see {ref}`here `.```This is necessary for the Executor to be put inside the Docker image,and we can define such a configuration in `config.yml`:```yamljtype: ContainerizedEncoderpy_modules: - my_executor.py```### Write `requirements.txt`In our case, our Executor has only one requirement besides Jina: `torch`.Specify a single requirement in `requirements.txt`:```texttorch```### Write the DockerfileThe last step is to write a `Dockerfile`, which has to do little more than launching the Executor via the Jina-serve CLI:```dockerfileFROM jinaai/jina:3-py38-perf# make sure the files are copied into the imageCOPY . /executor_root/WORKDIR /executor_rootRUN pip install -r requirements.txtENTRYPOINT ["jina", "executor", "--uses", "config.yml"]```### Build the imageAt this point we have a folder structure that looks like this:```.├── my_executor.py└── requirements.txt└── config.yml└── Dockerfile```We just need to build the image:```bashdocker build -t my_containerized_executor .```Once the build is successful, you should see the following output when you run `docker images`:```shellREPOSITORY TAG IMAGE ID CREATED SIZEmy_containerized_executor latest 5cead0161cb5 13 seconds ago 2.21GB```### Use the containerized ExecutorThe containerized Executor can be used like any other, the only difference being the 'docker' prefix in the `uses` parameter:```pythonfrom jina import Deploymentfrom docarray import DocListfrom docarray.documents import TextDocdep = Deployment(uses='docker://my_containerized_executor')with dep: returned_docs = dep.post(on='/', inputs=DocList[TextDoc]([TextDoc()]), return_type=DocList[TextDoc])for doc in returned_docs: print(f'Document returned with text: "{doc.text}"') print(f'Document embedding of shape {doc.embedding.shape}')``````shellDocument returned with text: "This Document is embedded by ContainerizedEncoder"Document embedding of shape torch.Size([10])``` --- # Source: https://github.com/jina-ai/jina/blob/master/docs/concepts/serving/executor/create.md(create-executor)=# Create## Introduction```{tip}Executors use `docarray.BaseDoc` and docarray.DocList` as their input and output data structure. [Read DocArray's docs](https://docs.docarray.org) to see how it works.```An {class}`~jina.Executor` is a self-contained microservice exposed using a gRPC or HTTP protocol.It contains functions (decorated with `@requests`) that process `Documents`. Executors follow these principles:1. An Executor should subclass directly from the `jina.Executor` class.2. An Executor is a Python class; it can contain any number of functions.3. Functions decorated by {class}`~jina.requests` are exposed as services according to their `on=` endpoint. These functions can be coroutines (`async def`) or regular functions. They can work on single Documents, or on batches. This will be explained later in {ref}`Add Endpoints Section`4. (Beta) Functions decorated by {class}`~jina.serve.executors.decorators.write` above their {class}`~jina.requests` decoration, are considered to update the internal state of the Executor. The `__init__` and `close` methods are exceptions. The reasons this is useful is explained in {ref}`Stateful-executor`.## Create an ExecutorTo create your {class}`~jina.Executor`, run:```bashjina hub new```You can ignore the advanced configuration and just provide the Executor name and path. For instance, choose `MyExecutor`.After running the command, a project with the following structure will be generated:```textMyExecutor/├── executor.py├── config.yml├── README.md└── requirements.txt```- `executor.py` contains your Executor's main logic. The command should generate the following boilerplate code:```pythonfrom jina import Executor, requestsfrom docarray import DocList, BaseDocclass MyExecutor(Executor): @requests def foo(self, docs: DocList[BaseDoc], **kwargs) -> DocList[BaseDoc]: pass```- `config.yml` is the Executor's {ref}`configuration ` file, where you can define `__init__` arguments using the `with` keyword.- `requirements.txt` describes the Executor's Python dependencies.- `README.md` describes how to use your Executor.For a more detailed breakdown of the file structure, see {ref}`here `.(executor-constructor)=## ConstructorYou only need to implement `__init__` if your Executor contains initial state.If your Executor has `__init__`, it needs to carry `**kwargs` in the signature and call `super().__init__(**kwargs)`in the body:```pythonfrom jina import Executorclass MyExecutor(Executor): def __init__(self, foo: str, bar: int, **kwargs): super().__init__(**kwargs) self.bar = bar self.foo = foo```````{admonition} What is inside kwargs?:class: hintHere, `kwargs` are reserved for Jina-serve to inject `metas` and `requests` (representing the request-to-function mapping) values when the Executor is used inside a {ref}`Flow `.You can access the values of these arguments in the `__init__` body via `self.metas`/`self.requests`/`self.runtime_args`, or modify their values before passing them to `super().__init__()`.````Since Executors are runnable through {ref}`YAML configurations `, user-defined constructor argumentscan be overridden using the {ref}`Executor YAML with keyword`.## DestructorYou might need to execute some logic when your Executor's destructor is called.For example, if you want to persist data to disk (e.g. in-memory indexed data, fine-tuned model,...) you can overwrite the {meth}`~jina.serve.executors.BaseExecutor.close` method and add your logic.Jina ensures the {meth}`~jina.serve.executors.BaseExecutor.close` method is executed when the Executor is terminated inside a {class}`~jina.Deployment` or {class}`~jina.Flow`, or when deployed in any cloud-native environment.You can think of this as Jina using the Executor as a context manager, making sure that the {meth}`~jina.serve.executors.BaseExecutor.close` method is always executed.```pythonfrom jina import Executorclass MyExec(Executor): def close(self): print('closing...')```## AttributesWhen implementing an Executor, if your Executor overrides `__init__`, it needs to carry `**kwargs` in the signature and call `super().__init__(**kwargs)````pythonfrom jina import Executorclass MyExecutor(Executor): def __init__(self, foo: str, bar: int, **kwargs): super().__init__(**kwargs) self.bar = bar self.foo = foo```This is important because when an Executor is instantiated (whether with {class}`~jina.Deployment` or {class}`~jina.flow`), Jina adds extra arguments.Some of these arguments can be used when developing the internal logic of the Executor.These `special` arguments are `workspace`, `requests`, `metas`, `runtime_args`.(executor-workspace)=### `workspace`Each Executor has a special *workspace* that is reserved for that specific Executor instance.The `.workspace` property contains the path to this workspace.This `workspace` is based on the workspace passed when orchestrating the Executor: `Deployment(..., workspace='path/to/workspace/')`/`flow.add(..., workspace='path/to/workspace/')`.The final `workspace` is generated by appending `'///'`.This can be provided to the Executor via the Python API or {ref}`YAML API `.````{admonition} Hint: Default workspace:class: hintIf you haven't provided a workspace, the Executor uses a default workspace, defined in `~/.cache/jina-serve/`.````(executor-requests)=### `requests`By default, an Executor object contains {attr}`~.jina-serve.serve.executors.BaseExecutor.requests` as an attribute when loaded. This attribute is a `Dict` describing the mapping between Executor methods and network endpoints: It holds endpoint strings as keys, and pointers to functions as values.These can be provided to the Executor via the Python API or {ref}`YAML API `.(executor-metas)=### `metas`An Executor object contains `metas` as an attribute when loaded from the Flow. It is of [`SimpleNamespace`](https://docs.python.org/3/library/types.html#types.SimpleNamespace) type and contains some key-value information.The list of the `metas` are:- `name`: Name given to the Executor;- `description`: Description of the Executor (optional, reserved for future-use in auto-docs);These can be provided to the Executor via Python or {ref}`YAML API `.(executor-runtime-args)=### `runtime_args`By default, an Executor object contains `runtime_args` as an attribute when loaded. It is of [`SimpleNamespace`](https://docs.python.org/3/library/types.html#types.SimpleNamespace) type and contains information in key-value format.As the name suggests, `runtime_args` are dynamically determined during runtime, meaning that you don't know the value before running the Executor. These values are often related to the system/network environment around the Executor, and less about the Executor itself, like `shard_id` and `replicas`.The list of the `runtime_args` is:- `name`: Name given to the Executor. This is dynamically adapted from the `name` in `metas` and depends on some additional arguments like `shard_id`.- `replicas`: Number of {ref}`replicas ` of the same Executor deployed.- `shards`: Number of {ref}`shards ` of the same Executor deployed.- `shard_id`: Identifier of the `shard` corresponding to the given Executor instance.- `workspace`: Path to be used by the Executor. Note that the actual workspace directory used by the Executor is obtained by appending `'///'` to this value.- `py_modules`: Python package path e.g. `foo.bar.package.module` or file path to the modules needed to import the Executor.You **cannot** provide these through any API. They are generated by the orchestration mechanism, be it a {class}`~jina.Deployment` or a {class}`~jina.Flow`.## Tips* Use `jina hub new` CLI to create an Executor: To create an Executor, always use this command and follow the instructions. This ensures the correct filestructure.* You don't need to manually write a Dockerfile: The build system automatically generates an optimized Dockerfile according to your Executor package.```{tip}In the `jina hub new` wizard you can choose from four Dockerfile templates: `cpu`, `tf-gpu`, `torch-gpu`, and `jax-gpu`.```## Stateful-Executor (Beta)Executors may sometimes contain an internal state which changes when some of their methods are called. For instance, an Executor could contain an index of Documentsto perform vector search.In these cases, orchestrating these Executors can be tougher than it would be for Executors that never change their inner state (Imagine a Machine Learning model served via an Executor that never updates its weights during its lifetime).The challenge is guaranteeing consistency between `replicas` of the same Executor inside the same Deployment.To provide this consistency, Executors can mark some of their exposed methods as `write`. This indicates that calls to these endpoints must be consistently replicated between all the replicassuch that other endpoints can serve independently of the replica that is hit.````{admonition} Deterministic state update:class: noteAnother factor to consider is that the Executor's inner state must evolve in a deterministic manner if we want `replicas` to behave consistently.````By considering this, {ref}`Executors can be scaled in a consistent manner`.### Snapshots and restoringIn a Stateful Executor Jina uses the RAFT consensus algorithm to guarantee that every replica eventually holds the same inner state.RAFT writes the incoming requests as logs to local storage in every replica to ensure this is achieved.This could become problematic if the Executor runs for a long time as log files could grow indefinitely. However, you can avoid this problemby describing the methods `def snapshot(self, snapshot_dir)` and `def restore(self, snapshot_dir)` that are triggered via the RAFT protocol, allowing the Executorto store its current state or to recover its state from a snapshot. With this mechanism, RAFT can keep cleaning old logs by assuming that the state of the Executorat a given time is determined by its latest snapshot and the application of all requests that arrived since the last snapshot. The RAFT algorithm keeps trackof all these details. --- # Source: https://github.com/jina-ai/jina/blob/master/docs/concepts/serving/executor/deployment-yaml-spec.md(deployment-yaml-spec)=# {octicon}`file-code` YAML specificationTo generate a YAML configuration from a {class}`~jina.Deployment` Python object, use {meth}`~jina.Deployment.save_config`.## Example YAML```yamljtype: Deploymentwith: replicas: 2 uses: jinaai+docker://jina-ai/CLIPEncoder```## Fields### `jtype`String that is always set to "Deployment", indicating the corresponding Python class.### `with`Keyword arguments are passed to a Deployment's `__init__()` method. You can pass your Deployment settings here:#### Arguments```{include} ./../flow/deployment-args.md``````{include} ./../flow/yaml-vars.md``` --- # Source: https://github.com/jina-ai/jina/blob/master/docs/concepts/serving/executor/dynamic-batching.md(executor-dynamic-batching)=# Dynamic BatchingDynamic batching allows requests to be accumulated and batched together before being sent toan {class}`~jina.Executor`. The batch is created dynamically depending on the configuration for each endpoint.This feature is especially relevant for inference tasks where model inference is more optimized when batched to efficiently use GPU resources.## OverviewEnabling dynamic batching on Executor endpoints that perform inference typically results in better hardware usage and thus, in increased throughput.When you enable dynamic batching, incoming requests to Executor endpoints with the same {ref}`request parameters`are queued together. The Executor endpoint is executed on the queue requests when either:- the number of accumulated Documents exceeds the {ref}`preferred_batch_size` parameter- or the {ref}`timeout` parameter is exceeded.Although this feature _can_ work on {ref}`parametrized requests`, it's best used for endpoints that don't often receive different parameters.Creating a batch of requests typically results in better usage of hardware resources and potentially increased throughput.You can enable and configure dynamic batching on an Executor endpoint using several methods:* {class}`~jina.dynamic_batching` decorator* `uses_dynamic_batching` Executor parameter* `dynamic_batching` section in Executor YAML## ExampleThe following examples show how to enable dynamic batching on an Executor Endpoint:````{tab} Using dynamic_batching DecoratorThis decorator is applied per Executor endpoint.Only Executor endpoints (methods decorated with `@requests`) decorated with `@dynamic_batching` have dynamicbatching enabled.```{code-block} python---emphasize-lines: 22---from jina import Executor, requests, dynamic_batching, Deploymentfrom docarray import DocList, BaseDocfrom docarray.typing import AnyTensor, AnyEmbeddingfrom typing import Optionalimport numpy as npimport torchclass MyDoc(BaseDoc): tensor: Optional[AnyTensor[128]] = None embedding: Optional[AnyEmbedding[128]] = Noneclass MyExecutor(Executor): def __init__(self, **kwargs): super().__init__(**kwargs) # initialize model self.model = torch.nn.Linear(in_features=128, out_features=128) @requests(on='/bar') @dynamic_batching(preferred_batch_size=10, timeout=200) def embed(self, docs: DocList[MyDoc], **kwargs) -> DocList[MyDoc]: docs.embedding = self.model(torch.Tensor(docs.tensor))dep = Deployment(uses=MyExecutor)```````````{tab} Using uses_dynamic_batching argumentThis argument is a dictionary mapping each endpoint to its corresponding configuration:```{code-block} python---emphasize-lines: 28---from jina import Executor, requests, dynamic_batching, Deploymentfrom docarray import DocList, BaseDocfrom docarray.typing import AnyTensor, AnyEmbeddingfrom typing import Optionalimport numpy as npimport torchclass MyDoc(BaseDoc): tensor: Optional[AnyTensor[128]] = None embedding: Optional[AnyEmbedding[128]] = Noneclass MyExecutor(Executor): def __init__(self, **kwargs): super().__init__(**kwargs) # initialize model self.model = torch.nn.Linear(in_features=128, out_features=128) @requests(on='/bar') def embed(self, docs: DocList[MyDoc], **kwargs) -> DocList[MyDoc]: docs.embedding = self.model(torch.Tensor(docs.tensor))dep = Deployment( uses=MyExecutor, uses_dynamic_batching={'/bar': {'preferred_batch_size': 10, 'timeout': 200}},)```````````{tab} Using YAML configurationIf you use YAML to enable dynamic batching on an Executor, you can use the `dynamic_batching` section in theExecutor section. Suppose the Executor is implemented like this:`my_executor.py`:```pythonfrom jina import Executor, requests, dynamic_batching, Deploymentfrom docarray import DocList, BaseDocfrom docarray.typing import AnyTensor, AnyEmbeddingfrom typing import Optionalimport numpy as npimport torchclass MyDoc(BaseDoc): tensor: Optional[AnyTensor[128]] = None embedding: Optional[AnyEmbedding[128]] = Noneclass MyExecutor(Executor): def __init__(self, **kwargs): super().__init__(**kwargs) # initialize model self.model = torch.nn.Linear(in_features=128, out_features=128) @requests(on='/bar') def embed(self, docs: DocList[MyDoc], **kwargs) -> DocList[MyDoc]: docs.embedding = self.model(torch.Tensor(docs.tensor))```Then, in your `config.yaml` file, you can enable dynamic batching on the `/bar` endpoint like so:``` yamljtype: MyExecutorpy_modules: - my_executor.pyuses_dynamic_batching: /bar: preferred_batch_size: 10 timeout: 200```We then deploy with:```pythonfrom jina import Deploymentwith Deployment(uses='config.yml') as dep: dep.block()```````(executor-dynamic-batching-parameters)=## ParametersThe following parameters allow you to configure the dynamic batching behavior on each Executor endpoint:* `preferred_batch_size`: Target number of Documents in a batch. The batcher collects requests until`preferred_batch_size` is reached, or until `timeout` is reached. The batcher then makes sure that the Executoronly receives documents in groups of maximum the `preferred_batch_size` Therefore, the actual batch size could be smaller than `preferred_batch_size`.* `timeout`: Maximum time in milliseconds to wait for a request to be assigned to a batch.If the oldest request in the queue reaches a waiting time of `timeout`, the batch is passed to the Executor, evenif it contains fewer than `preferred_batch_size` Documents. Default is 10,000ms (10 seconds). --- # Source: https://github.com/jina-ai/jina/blob/master/docs/concepts/serving/executor/file-structure.md(executor-file-structure)=# File StructureBesides organizing your {class}`~jina.Executor` code inline, you can also write it as an "external" module and then use it via YAML. This is useful when your Executor's logic is too complicated to fit into a single file.```{tip}The best practice is to use `jina hub new` to create a new Executor. It automatically generates the files you need in the correct structure.```## Single Python file + YAMLWhen you are only working with a single Python file (let's call it `my_executor.py`), you can put it at the root of your repository, and import it directly in `config.yml````yamljtype: MyExecutorpy_modules: - my_executor.py```## Multiple Python files + YAMLWhen you are working with multiple Python files, you should organize them as a **Python package** and put them in a special folder insideyour repository (as you would normally do with Python packages). Specifically, you should do the following:- Put all Python files (as well as an `__init__.py`) inside a special folder (called `executor` by convention.) - Because of how Jina-serve registers Executors, ensure you import your Executor in this `__init__.py` (see the contents of `executor/__init__.py` in the example below).- Use relative imports (`from .bar import foo`, and not `from bar import foo`) inside the Python modules in this folder.- Only list `executor/__init__.py` under `py_modules` in `config.yml` - this way Python knows that you are importing a package, and ensures that all relative imports within your package work properly.To make things more specific, take this repository structure as an example:```.├── config.yml└── executor ├── helper.py ├── __init__.py └── my_executor.py```The contents of `executor/__init__.py` is:```pythonfrom .my_executor import MyExecutor```the contents of `executor/helper.py` is:```pythondef print_something(): print('something')```and the contents of `executor/my_executor.py` is:```pythonfrom jina import Executor, requestsfrom .helper import print_somethingclass MyExecutor(Executor): @requests def foo(self, **kwargs): print_something()```Finally, the contents of `config.yml`:```yamljtype: MyExecutorpy_modules: - executor/__init__.py```Note that only `executor/__init__.py` needs to be listed under `py_modules`This is a relatively simple example, but this way of structuring Python modules works for any Python package structure, however complex. Consider this slightly more complicated example:```.├── config.yml # Remains exactly the same as before└── executor ├── helper.py ├── __init__.py ├── my_executor.py └── utils/ ├── __init__.py # Required inside all executor sub-folders ├── data.py └── io.py```You can then import from `utils/data.py` in `my_executor.py` like this: `from .utils.data import foo`, and perform any other kinds of relative imports that Python enables.The best thing is that no matter how complicated your package structure, "importing" it in your `config.yml` file is simple - you always put only `executor/__init__.py` under `py_modules`. --- # Source: https://github.com/jina-ai/jina/blob/master/docs/concepts/serving/executor/health-check.md(health-check-microservices)=# Health Check## Using gRPCYou can check every individual Executor, by using a [standard gRPC health check endpoint](https://github.com/grpc/grpc/blob/master/doc/health-checking.md).In most cases this is not necessary, since such checks are performed by Jina-serve, a Kubernetes service mesh or a load balancer under the hood.Nevertheless, you can perform these checks yourself.When performing these checks, you can expect one of the following `ServingStatus` responses:- **`UNKNOWN` (0)**: The health of the Executor could not be determined- **`SERVING` (1)**: The Executor is healthy and ready to receive requests- **`NOT_SERVING` (2)**: The Executor is *not* healthy and *not* ready to receive requests- **`SERVICE_UNKNOWN` (3)**: The health of the Executor could not be determined while performing streaming````{admonition} See Also:class: seealsoTo learn more about these status codes, and how health checks are performed with gRPC, see [here](https://github.com/grpc/grpc/blob/master/doc/health-checking.md).````Let's check the health of an Executor. First start a dummy executor from the terminal:```shelljina executor --port 12346```In another terminal, you can use [grpcurl](https://github.com/fullstorydev/grpcurl) to send gRPC requests to your services.```shelldocker pull fullstorydev/grpcurl:latestdocker run --network='host' fullstorydev/grpcurl -plaintext 127.0.0.1:12346 grpc.health.v1.Health/Check``````json{ "status": "SERVING"}```## Using HTTP````{admonition} Caution:class: cautionFor Executors running with HTTP, the gRPC health check response codes outlined {ref}`above ` do not apply.Instead, an error-free response signifies healthiness.````When using HTTP as the protocol for the Executor, you can query the endpoint `'/'` to check the status.First, create a Deployment with the HTTP protocol:```pythonfrom jina import Deploymentd = Deployment(protocol='http', port=12345)with d: d.block()```Then query the "empty" endpoint:```bashcurl http://localhost:12345```You get a valid empty response indicating the Executor's ability to serve:```json{}``` --- # Source: https://github.com/jina-ai/jina/blob/master/docs/concepts/serving/executor/hot-reload.md(reload-executor)=## Hot ReloadWhile developing your Executor, it can be useful to have the Executor be refreshed from the source code while you are working on it.For this you can use the Executor's `reload` argument to watch changes in the source code and the Executor YAML configuration and ensure changes are applied to the served Executor.The Executor will keep track of changes inside the Executor source and YAML files and all Python files in the Executor's folder and sub-folders).````{admonition} Caution:class: cautionThis feature aims to let developers iterate faster while developing or improving the Executor, but is not intended to be used in production environment.````````{admonition} Note:class: noteThis feature requires watchfiles>=0.18 package to be installed.````To see how this would work, let's define an Executor in `my_executor.py````pythonfrom jina import Executor, requestsfrom docarray import DocListfrom docarray.documents import TextDocclass MyExecutor(Executor): @requests def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]: for doc in docs: doc.text = 'I am coming from the first version of MyExecutor'```Now we'll deploy it```pythonimport osfrom jina import Deploymentfrom my_executor import MyExecutoros.environ['JINA_LOG_LEVEL'] = 'DEBUG'dep = Deployment(port=12345, uses=MyExecutor, reload=True)with dep: dep.block()```We can see that the Executor is successfully serving:```pythonfrom jina import Clientfrom docarray import DocListfrom docarray.documents import TextDocc = Client(port=12345)print(c.post(on='/', inputs=DocList[TextDoc](TextDoc()), return_type=DocList[TextDoc])[0].text)``````textI come from the first version of MyExecutor```We can edit the Executor file and save the changes:```pythonfrom jina import Executor, requestsfrom docarray import DocListfrom docarray.documents import TextDocclass MyExecutor(Executor): @requests def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]: for doc in docs: doc.text = 'I am coming from a new version of MyExecutor'```You should see in the logs of the serving Executor```textINFO executor0/rep-0@11606 detected changes in: ['XXX/XXX/XXX/my_executor.py']. Refreshing the Executor```And after this, the Executor will start serving with the renewed code.```pythonfrom jina import Clientfrom docarray import DocListfrom docarray.documents import TextDocc = Client(port=12345)print(c.post(on='/', inputs=DocList[TextDoc](TextDoc()), return_type=DocList[TextDoc])[0].text)``````text'I come from a new version of MyExecutor'```Reloading is also applied when the Executor's YAML configuration file is changed. In this case, the Executor deployment restarts.To see how this works, let's define an Executor configuration in `executor.yml`:```yamljtype: MyExecutorBeforeReload```Deploy the Executor:```pythonimport osfrom jina import Deployment, Executor, requestsfrom docarray import DocListfrom docarray.documents import TextDocos.environ['JINA_LOG_LEVEL'] = 'DEBUG'class MyExecutorBeforeReload(Executor): @requests def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]: for doc in docs: doc.text = 'MyExecutorBeforeReload'class MyExecutorAfterReload(Executor): @requests def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]: for doc in docs: doc.text = 'MyExecutorAfterReload'dep = Deployment(port=12345, uses='executor.yml', reload=True)with dep: dep.block()```You can see that the Executor is running and serving:```pythonfrom jina import Clientfrom docarray import DocListfrom docarray.documents import TextDocc = Client(port=12345)print(c.post(on='/', inputs=DocList[TextDoc](TextDoc()), return_type=DocList[TextDoc])[0].text)``````textMyExecutorBeforeReload```You can edit the Executor YAML file and save the changes:```yamljtype: MyExecutorAfterReload```In the Flow's logs you should see:```textINFO Flow@1843 change in Executor configuration YAML /home/user/jina/jina/exec.yml observed, restarting Executor deployment```And after this, you can see the reloaded Executor being served:```pythonfrom jina import Clientfrom docarray import DocListfrom docarray.documents import TextDocc = Client(port=12345)print(c.post(on='/', inputs=DocList[TextDoc](TextDoc()), return_type=DocList[TextDoc])[0].text)``````yamljtype: MyExecutorAfterReload``` --- # Source: https://github.com/jina-ai/jina/blob/master/docs/concepts/serving/executor/hub/create-hub-executor.md(create-hub-executor)=# CreateTo create your {class}`~jina.Executor`, run:```bashjina hub new```For basic configuration (advanced configuration is optional but rarely necessary), you will be asked for:- Your Executor's name- The path to the folder where it should be savedAfter running the command, a project with the following structure will be generated:```textMyExecutor/├── executor.py├── config.yml├── README.md├── requirements.txt└── Dockerfile```- `executor.py` contains your Executor's main logic.- `config.yml` is the Executor's {ref}`configuration ` file, where you can define `__init__` arguments using the `with` keyword. You can also define meta annotations relevant to the Executor, for getting better exposure on Executor Hub.- `requirements.txt` describes the Executor's Python dependencies.- `README.md` describes how to use your Executor.- `Dockerfile` is only generated if you choose advanced configuration.## Tips* Use `jina hub new` CLI to create an Executor To create an Executor, always use this command and follow the instructions. This ensures the correct filestructure.* You don't need to manually write a Dockerfile The build system automatically generates an optimized Dockerfile according to your Executor package.```{tip}In the `jina hub new` wizard you can choose from four Dockerfile templates: `cpu`, `tf-gpu`, `torch-gpu`, and `jax-gpu`.```* If you push your Executor to the [Executor Hub](https://cloud.jina.ai/executors), you don't need to bump the Jina-serve version Hub Executors are version-agnostic. When you pull an Executor from Executor Hub, it will select the right Jina-serve version for you. You don't need to upgrade your version of Jina-serve.* Fill in metadata of your Executor correctly Information you include under the `metas` key in `config.yml` is displayed on Executor Hub. `The specification can be found here`. --- # Source: https://github.com/jina-ai/jina/blob/master/docs/concepts/serving/executor/hub/debug-executor.md(debug-executor)=# Debug````{admonition} Not applicable to containerized Executors:class: cautionThis does not work for containerized Executors.````In this tutorial you will learn how to debug [Hello Executor](https://cloud.jina.ai/executor/9o9yjq1q) step by step.````{admonition} Make sure the schemas are known:class: noteWhile using docarray>0.30.0, Executors do not have a fix schema and each Executor defines its own. Make sure you knowthose schemas when using Executors from the Hub.````## Pull the ExecutorPull the source code of the Executor you want to debug:````{tab} via Command Line Interface```shelljina hub pull jinaai://jina-ai/Hello```````````{tab} via Python code```pythonfrom jina import ExecutorExecutor.from_hub('jinaai://jina-ai/Hello')```````## Set breakpointsIn the `~/.jina-serve/hub-package` directory there is one subdirectory for each Executor that you pulled, named by the Executor ID. You can find the Executor's source files in this directory.Once you locate the source, you can set the breakpoints as you always do.## Debug your codeYou can debug your Executor like any Python code. You can either use the Executor on its own or inside a Deployment:````{tab} Executor on its own```pythonfrom jina import Executorexec = Executor.from_hub('jinaai://jina-ai/Hello')# Set breakpoint as neededexec.foo()```````````{tab} Executor inside a Deployment```pythonfrom jina import Deploymentfrom docarray.documents.legacy import LegacyDocumentdep = Deployment(uses='jinaai://jina-ai/Hello')with dep: res = dep.post('/', inputs=LegacyDocument(text='hello'), return_results=True) print(res)``````` --- # Source: https://github.com/jina-ai/jina/blob/master/docs/concepts/serving/executor/hub/hub-portal.md# PortalExecutor Hub is a marketplace for {class}`~jina.Executor`s where you can upload your own Executors or use ones already developed by the community. If this is your first time developing an Executor you can check our {ref}`tutorials ` that guide you through the process.Let's see the [Hub portal](https://cloud.jina.ai) in detail.## Catalog pageThe main page contains a list of all Executors created by Jina-serve developers all over the world. You can see the Editor's Pick at the top of the list, which shows Executors highlighted by the Jina-serve team.```{figure} ../../../../../.github/hub-website-list.png:align: center```You can sort the list by *Trending* and *Recent* using the drop-down menu on top. Otherwise, if you want to search for a specific Executor, you can use the search box at the top or use tags for specific keywords like Image, Video, TensorFlow, and so on:```{figure} ../../../../../.github/hub-website-search-2.png:align: center```## Detail pageWhen you find an Executor that interests you, you can get more detail by clicking on it. You can see a description of the Executor with basic information, usage, parameters, etc. If you need more details, click "More" to go to a page with further information.```{figure} ../../../../../.github/hub-website-detail.png:align: center```There are several tabs you can explore: **Readme**, **Arguments**, **Tags** and **Dependencies**.```{figure} ../../../../../.github/hub-website-detail-arguments.png:align: center```1. **Readme**: basic information about the Executor, how it works internally, and basic usage.2. **Arguments**: the Executor's detailed API. This is generated automatically from the Executor's Python docstrings so it's always in sync with the code base, and Executor developers don't need to write it themselves.3. **Tags**: the tags available for this Executor. For example, `latest`, `latest-gpu` and so on. It also gives a code snippet to illustrate usage.```{figure} ../../../../../.github/hub-website-detail-tag.png:align: center```4. **Dependencies**: The Executor's Python dependencies.On the left, you'll see possible ways to use this Executor, including Docker image, source code, etc.```{figure} ../../../../../.github/hub-website-usage.png:align: center```That's it. Now you have an overview of the [Hub portal](https://cloud.jina.ai) and how to navigate it. --- # Source: https://github.com/jina-ai/jina/blob/master/docs/concepts/serving/executor/hub/index.md(jina-hub)=# Executor HubNow that you understand that {class}`~jina.Executor` is a building block in Jina-serve, you may also wonder:- Can I streamline the process of containerizing my {class}`~jina.Executor`?- Can I reuse my Executor in another project?- Can I share my Executor with my colleagues?- Can I just use someone else's Executor instead of building it myself?Basically, something like the following:```{figure} ../../../../../.github/hub-user-journey.svg:align: center```**Yes!** This is exactly the purpose of Executor Hub.Hub allows you to turn your Executor into a ready-for-the-cloud containerized service taking away a lot of the work from you.With Hub you can pull prebuilt Executors to dramatically reduce the effort and complexity needed in your system, or push your own customExecutors to share privately or publicly. You can think of the Hub as your easy to entry door to a Docker registry.A Hub Executor is an Executor published on Executor Hub. You can use such an Executor in a Flow or in a Deployment:```pythonfrom jina import Deploymentd = Deployment(uses='jinaai+docker:///MyExecutor')with d: ...```````{admonition} Make sure the schemas are known:class: noteWhile using docarray>0.30.0, Executors do not have a fix schema and each Executor defines its own. Make sure you knowthose schemas when using Executors from the Hub.```````{toctree}:hidden:hub-portalcreate-hub-executorpush-executoruse-hub-executordebug-executoryaml-spec``` --- # Source: https://github.com/jina-ai/jina/blob/master/docs/concepts/serving/executor/hub/push-executor.md(push-executor)=# PublishIf you want to share your {class}`~jina.Executor`, you can push it to Executor Hub.There are two ways to share:- **Public** (default): Anyone can use public Executors without any restrictions.- **Private**: Only people with the `secret` can use private Executors.(jina-hub-usage)=## Publishing for the first time```bashjina hub push [--public/--private] ```If you have logged into Jina-serve, it will return a `TASK_ID`. You need that to get your Executor's build status and logs.If you haven't logged into Jina-serve, it will return `NAME` and `SECRET`. You need them to use (if the Executor is private) or update the Executor. **Please keep them safe.**````{admonition} Note:class: noteIf you are logged into the Hub using our CLI tools (`jina auth login` or `jcloud login`), you can push and pull your Executors without `SECRET`.````You can then visit [Executor Hub](https://cloud.jina.ai), select the "Recent" tab and see your published Executor.````{admonition} Note:class: noteIf no `--public` or `--private` argument is provided, then an Executor is **public** by default.````````{admonition} Important:class: importantAnyone can use public Executors, but to use a private Executor you must know its `SECRET`.````## Update published ExecutorsTo override or update a published Executor, you must have both its `NAME` and `SECRET`.```bashjina hub push [--public/--private] --force-update --secret ```(hub_tags)=## Tagging an ExecutorTagging can be useful for versioning Executors or differentiating them by their architecture (e.g. `gpu`, `cpu`).```bashjina hub push -t TAG1 -t TAG2```You can specify `-t` or `--tags` parameter to tag an Executor.- If you **don't** add the `-t` parameter, the default tag is `latest`- If you **do** add the `-t` parameter and you still want to have the `latest` tag, you must write it as another `-t` parameter.```bashjina hub push . # Result in one tag: latestjina hub push . -t v1.0.0 # Result in one tag: v1.0.0jina hub push . -t v1.0.0 -t latest # Result in two tags: v1.0.0, latest```If you want to create a new tag for an existing Executor, you can also add the `-t` option here:```bashjina hub push [--public/--private] --force-update --secret -t TAG ```### Protected tagsProtected tags prevent some tags being overwritten and ensures stable, consistent behavior.You can use the `--protected-tag` option to create protected tags.After pushing for the first time, the protected tags cannot be pushed again.```bashjina hub push [--public/--private] --force-update --secret --protected-tag --protected-tag ```## Use environment variablesThe `--build-env` parameter manages environment variables, letting you use a private token in `requirements.txt` to install private dependencies. For security reasons, you don't want to expose this token to anyone else. For example, we have the following `requirements.txt`:```# requirements.txtgit+http://${YOUR_TOKEN}@github.com/your_private_repo```When running `jina hub push`, you can pass the `--build-env` parameter:```bashjina hub push --build-env YOUR_TOKEN=foo```````{admonition} Note:class: noteThere are restrictions when naming environment variables:- Environment variables must be wrapped in `{` and `}` in `requirements.txt`. i.e. `${YOUR_TOKEN}`, not `$YOUR_TOKEN`.- Environment variables are limited to numbers, uppercase letters and `_` (underscore), and cannot start with `_`.````````{admonition} Limitations:class: attentionThere are limitations if you push Executors via `--build-env` and pull/use it as source code (this doesn't matter if you use a Docker image):- When you use `jina hub pull jinaai:///YOUR_EXECUTOR`, you must set the corresponding environment variable according to the prompt: ```bash export YOUR_TOKEN=foo ```- When you use `.add(uses='jinaai:///YOUR_EXECUTOR')` in a Flow, you must set the corresponding environment variable: ```python from jina import Flow, Executor, requests, Document import os os.environ['YOUR_TOKEN'] = 'foo' f = Flow().add(uses='jinaai:///YOUR_EXECUTOR') with f: f.post(on='/', inputs=Document(), on_done=print) ```````For multiple environment variables:```bashjina hub push --build-env FIRST=foo --build-env SECOND=bar```## Building status of an ExecutorTo query the build status of a pushed Executor:```bashjina hub status [] [--id TASK_ID] [--verbose] [--replay]```- The parameter `--id TASK_ID` gets the build status of a specific build task- The parameter `--verbose` prints verbose build logs.- The parameter `--replay`, prints build status from the beginning.## ARM64 architecture support````{admonition} Hint:class: HintAs of January 10, 2023 you can push Executors for the ARM64 architecture.````````{admonition} Note:class: noteExecutor docker images are linux images. Even if you are running on a Mac or Windows machine, the underlying OS is still linux.````If you run `jina hub push` on an ARM64-based machine, you automatically push an ARM64 Executor.However, if you provide your own Dockerfile, it will need to work for both "linux/amd64" and "linux/arm64".If you don't want this behavior, you can explicitly specify the `--platform` parameter:```bash# Push for both platformsjina hub push --platform linux/arm64,linux/amd64 # Push for AMD64 onlyjina hub push --platform linux/amd64 # Push for ARM64 only (not recommended)jina hub push --platform linux/arm64 ``` --- # Source: https://github.com/jina-ai/jina/blob/master/docs/concepts/serving/executor/hub/use-hub-executor.md(use-hub-executor)=# UseThere are three ways to use Hub {class}`~jina.Executor`s in your project. Each has its own use case and benefits.## Use as-isYou can use a Hub Executor as-is via `Executor.from_hub()`:```pythonfrom jina import Executorfrom docarray import DocListfrom docarray.documents.legacy import LegacyDocumentexec = Executor.from_hub('jinaai://jina-ai/DummyHubExecutor')da = DocList[LegacyDocument]([LegacyDocument()])exec.foo(da)assert da.texts == ['hello']```The Hub Executor will be pulled to your local machine and run as a native Python object. You can use a line-debugger to step in/out `exec` object, set breakpoints, and observe how it behaves. You can directly feed in `Documents`. After you build some confidence in that Executor, you can move to the next step: Using it as a part of your Flow.```{caution}Not all Executors on the Hub can be directly run in this way - some require extra dependencies. In that case, you can add `.from_hub(..., install_requirements=True)` to install the requirements automatically. Be careful - these dependencies may not be compatible with your local packages and may override your local development environment.``````{tip}Hub Executors are cached locally on the first pull. Afterwards, they will not be updated.To keep up-to-date with upstream, use `.from_hub(..., force_update=True)`.```(pull-executor)=## Pull onlyYou can also use `jina hub` CLI to pull an Executor without actually using it in the Flow.````{admonition} Jina-serve and DocArray version:class: noteIndependently of the Jina-serve and DocArray version existing when the Executor was pushed to the Hub. When pulling, the Hub will tryto install the Jina-serve and DocArray version that you have installed locally in the pulled docker images.````### Pull the Docker image```bashjina hub pull jinaai+docker:///[:]```You can find the Executor by running `docker images`. You can also indicate which version of the Executor you want to use by specifying the `:`.```bashjina hub pull jinaai+docker://jina-ai/DummyExecutor:v1.0.0```## Use in Flow as containerUse prebuilt images from Hub in your Python code:```pythonfrom jina import Flow# You have to login for private Executor# import hubble# hubble.login()f = Flow().add(uses='jinaai+docker:///[:]')```If you do not provide a `:`, it defaults to `/latest`.````{important}To use a private Executor, you have to login.```pythonimport hubblehubble.login()```````````{admonition} Attention:class: attentionIf you are a Mac user, please use `host.docker.internal` as your URL when you want to connect a local port from an ExecutorDocker container.For example: [PostgreSQLStorage](https://cloud.jina.ai/executor/d45rawx6)will connect PostgreSQL server which was started locally. Then you must use it with:```pythonfrom jina import Flow, Documentf = Flow().add( uses='jinaai+docker://jina-ai/PostgreSQLStorage', uses_with={'hostname': 'host.docker.internal'},)with f: resp = f.post(on='/index', inputs=Document()) print(f'{resp}')```````If `jinaai+docker://` Executors don't load properly or have issues during initialization, ensure you have sufficient Docker resources allocated.(mount-local-volumes)=### Mount local volumesYou can mount volumes into your dockerized Executor by passing a list of volumes with the `volumes` argument:```pythonf = Flow().add( uses='docker://my_containerized_executor', volumes=['host/path:/path/in/container', 'other/volume:/app'],)```````{admonition} Hint:class: hintIf you want your containerized Executor to operate inside one of these volumes, remember to set its {ref}`workspace ` accordingly!````If you do not specify `volumes`, Jina automatically mounts a volume into the container.In this case, the volume source is your {ref}`default Executor workspace `, and the volume destinationis `/app`. Additionally, automatic volume setting tries to move the Executor's workspace into the volume destination.Depending on the default Executor workspace on your system this may not always succeed, so explicitly mounting a volume and settinga workspace is recommended.You can disable automatic volume setting by passing `f.add(..., disable_auto_volume=True)`.## Use in Flow via source codeUse the source code from Executor Hub in your Python code:```pythonfrom jina import Flowf = Flow().add(uses='jinaai:///[:]')```## Set/override default parametersThe default parameters of the published Executor may not be ideal for your use case. You canpass `uses_with` and `uses_metas` as parameters to override this:```pythonfrom jina import Flowf = Flow().add( uses='jinaai+docker:///[:]', uses_with={'param1': 'new_value'}, uses_metas={'name': 'new_name'},)```### Platform awareness of Hub images````{admonition} Hint:class: hintAs of January 10, 2023 `jina hub pull` is platform aware. It will automatically select Docker images based on your native CPU architecture (if available).````If you prefer a specific platform, for example, preferring `AMD64` on an `ARM64` machine, you can explicitly pull with `--prefer-platform`:````{admonition} Caution:class: cautionWhen you specify `--prefer-platform` you probably want to also specify `--force` to overwrite the existing image in local cache.````````{admonition} Note:class: noteIf the image you specify doesn't support your preferred platform, it will not respect your platform preference.```````bashjina hub pull --force --prefer-platform linux/amd64 jinaai+docker://jina-ai/DummyExecutor:v1.0.0```### Pull the source code```bashjina hub pull jinaai:///[:]```### List locations of local Executors```bashjina hub list``````{tip}To list all the Executors that are in source-code format (i.e. pulled via `jinaai://`), use the command `jina hub list`.To list all the Executors that are in Docker format, use the command `docker images`.``` --- # Source: https://github.com/jina-ai/jina/blob/master/docs/concepts/serving/executor/index.md(executor-cookbook)=# ExecutorAn {class}`~jina.Executor` is a self-contained service that performs a task on `Documents`.You can create an Executor by extending the `Executor` class and adding logic to endpoint methods.## Why use Executors?Once you've learned about `Documents` and `DocList` from [docarray](https://docs.docarray.org/), you can use all its power and expressiveness to build a multimodal application.But what if you want to go bigger? Organize your code into modules, serve and scale them? That's where Executors come in.- Executors let you organize functions into logical entities that can share configuration state, following OOP.- Executors can be easily containerized and shared with your colleagues using `jina hub push/pull`.- Executors can be exposed as a service over gRPC or HTTP using `~jina.Deployment`.- Executors can be chained together to form a `~jina.Flow`.## Minimum working example```pythonfrom jina import Executor, requests, Deploymentfrom docarray import DocListfrom docarray.documents import TextDocclass MyExecutor(Executor): @requests def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]: for d in docs: d.text = 'hello world' return docswith Deployment(uses=MyExecutor) as dep: response_docs = dep.post(on='/', inputs=DocList[TextDoc]([TextDoc(text='hello')]), return_type=DocList[TextDoc]) print(f'Text: {response_docs[0].text}')``````text─────────────────────── 🎉 Deployment is ready to serve! ───────────────────────╭────────────── 🔗 Endpoint ───────────────╮│ ⛓ Protocol GRPC ││ 🏠 Local 0.0.0.0:55581 ││ 🔒 Private 192.168.0.5:55581 ││ 🌍 Public 158.181.77.236:55581 │╰──────────────────────────────────────────╯Text: hello world``````{toctree}:hidden:basicscreateadd-endpointsservedynamic-batchinghealth-checkhot-reloadfile-structurecontainerizeinstrumentationyaml-spec``` --- # Source: https://github.com/jina-ai/jina/blob/master/docs/concepts/serving/executor/instrumentation.md(instrumenting-executor)=# InstrumentationInstrumentation consists of [OpenTelemetry](https://opentelemetry.io) Tracing and Metrics. Each feature can be enabled independently, and they allow you to collect request-level and application-level metrics for analyzing an Executor's real-time behavior.```{admonition} Full details on Instrumentation:class: seealsoThis section describes **custom** tracing spans. To use the Executor's default tracing, refer to {ref}`Flow Instrumentation `.``````{hint}Read more on setting up an OpenTelemetry collector backend in the {ref}`OpenTelemetry Setup ` section.``````{caution}Prometheus-only based metrics collection will soon be deprecated. Refer to {ref}`Monitoring Executor ` for this deprecated setup.```## TracingAny method that uses the {class}`~jina.requests` decorator adds adefault tracing span for the defined operation. In addition, the operation span contextis propagated to the method for creating further user-defined child spans within themethod.You can create custom spans to observe the operation's individual steps or record details and attributes with finer granularity. When tracing is enabled, Jina-serve provides the OpenTelemetry Tracer implementation as an Executor class attribute that you can use to create new child spans. The `tracing_context` method argument contains the parent span context using which a new span can be created to trace the desired operation in the method.If tracing is enabled, each Executor exports its traces to the configured exporter host via the [Span Exporter](https://opentelemetry.io/docs/reference/specification/trace/sdk/#span-exporter). The backend combines these traces for visualization and alerting.### Create custom tracesA `request` method is the public method that exposes an operation as an API. Depending on complexity, the method can be composed of different sub-operations that are required to build the final response.You can record/observe each internal step (along with its global or request-specific attributes) to give a finer-grained view of the operation at the request level. This helps identify bottlenecks and isolate request patterns that cause service degradation or errors.You can use the `self.tracer` class attribute to create a new child span using the `tracing_context` method argument:```pythonfrom jina import Executor, requestsfrom docarray import DocListfrom docarray.documents import TextDocclass MyExecutor(Executor): @requests def foo(self, docs: DocList[TextDoc], tracing_context, **kwargs) -> DocList[TextDoc]: with self.tracer.start_as_current_span( 'process_docs', context=tracing_context ) as process_span: process_span.set_attribute('sampling_rate', 0.01) docs = process(docs) with self.tracer.start_as_current_span('update_docs') as update_span: try: update_span.set_attribute('len_updated_docs', len(docs)) docs = update(docs) except Exception as ex: update_span.set_status(Status(StatusCode.ERROR)) update_span.record_exception(ex) return docs```The above pieces of instrumentation generate three spans: 1. Default span with name `foo` for the overall method. 2. `process_span` that measures the `process` and `update` sub-operations along with a `sampling_rate` attribute that is either a constant or specific to the request/operation. 3. `update_span` that measures the `updated` operation along with any exceptions that might arise during the operation. The exception is recorded and marked on the `update_span` span. Since the exception is swallowed, the request succeeds with successful parent spans.```{admonition}The Python OpenTelemetry API provides a global tracer via the `opentelemetry.trace.tracer()` method which is not set or used directly in Jina-serve. The class attribute `self.tracer` is used for the default `@requests` method tracing and must also be used as much as possible within the method for creating child spans.However within a span context, the `opentelemetry.trace.get_current_span()` method returns the span created inside the context.```````{admonition} Respect OpenTelemetry Tracing semantic conventions:class: cautionYou should respect OpenTelemetry Tracing [semantic conventions](https://opentelemetry.io/docs/reference/specification/trace/semantic_conventions/).````````{hint}If tracing is not enabled by default or enabled in your environment, check `self.tracer` exists before usage. If metrics are disabled then `self.tracer` will be `None`.````## Metrics```{hint}Prometheus-only based metrics collection will be deprecated soon. Refer to {ref}`Monitoring Executor ` section for the deprecated setup.```Any method that uses the {class}`~jina.requests` decorator is monitored and creates a[histogram](https://opentelemetry.io/docs/reference/specification/metrics/data-model/#histogram) which tracks the method's execution time.This section documents adding custom monitoring to the {class}`~jina.Executor` with the OpenTelemetry Metrics API.Custom metrics are useful to monitor each sub-part of your Executor(s). Jina lets you leveragethe [Meter](https://opentelemetry.io/docs/reference/specification/metrics/api/#meter) to define useful metricsfor each of your Executors. We also provide a convenient wrapper, ({func}`~jina.monitor`), which lets you monitoryour Executor's sub-methods.When metrics are enabled, each Executor exposes itsown metrics via the [Metric Exporter](https://opentelemetry.io/docs/reference/specification/metrics/sdk/#metricexporter).### Define custom metricsSometimes monitoring the `encoding` method is not enough - you need to break it up into multiple parts to monitor one by one.This is useful if your encoding phase is composed of two tasks, like image processing andimage embedding. By using custom metrics on these two tasks you can identify potential bottlenecks.Overall, adding custom metrics gives you full flexibility when monitoring your Executor.#### Use context managerUse `self.monitor` to monitor your function's internal blocks:```pythonfrom jina import Executor, requestsfrom docarray import DocListfrom docarray.documents import TextDocclass MyExecutor(Executor): @requests def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]: with self.monitor('processing_seconds', 'Time processing my document'): docs = process(docs) print(docs.texts) with self.monitor('update_seconds', 'Time updates my document'): docs = update(docs) return docs```#### Use the `@monitor` decoratorAdd custom monitoring to a method with the {func}`~jina.monitor` decorator:```pythonfrom jina import Executor, monitorclass MyExecutor(Executor): @monitor() def my_method(self): ...```This creates a [Histogram](https://opentelemetry.io/docs/reference/specification/metrics/data-model/#histogram) `jina_my_method_seconds` which tracks the execution time of `my_method`By default, the name and documentation of the metric created by {func}`~jina.monitor` are auto-generated based on the function's name.To set a custom name:```python@monitor( name='my_custom_metrics_seconds', documentation='This is my custom documentation')def method(self): ...```````{admonition} respect OpenTelemetry Metrics semantic conventions:class: cautionYou should respect OpenTelemetry Metrics [semantic conventions](https://opentelemetry.io/docs/reference/specification/metrics/semantic_conventions/).````#### Use OpenTelemetry MeterUnder the hood, Python [OpenTelemetry Metrics API](https://opentelemetry.io/docs/concepts/signals/metrics/) handles the Executor's metrics feature. The {func}`~jina.monitor` decorator is convenient for monitoring an Executor's sub-methods, but if you need more flexibility, use the `self.meter` Executor class attribute to create supported instruments:```pythonfrom jina import requests, Executorfrom docarray import DocListfrom docarray.documents import TextDocclass MyExecutor(Executor): def __init__(self, *args, **kwargs): super().__init__(*args, **kwargs) self.counter = self.meter.create_counter('my_count', 'my count') @requests def encode(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]: self.counter.inc(len(docs))```This creates a [Counter](https://opentelemetry.io/docs/reference/specification/metrics/api/#counter) that you can use to incrementally track the number of Documents received in each request.````{hint}If metrics are not enabled by default or enabled in your environment, you should check `self.meter` and `self.counter` exists before usage. If metrics are disabled then `self.meter` will be `None`.````#### Example```pythonfrom jina import requests, Executorfrom docarray import DocListfrom docarray.documents.legacy import LegacyDocumentclass MyExecutor(Executor): def preprocessing(self, docs: DocList[LegacyDocument]): ... def model_inference(self, tensor): ... @requests def encode(self, docs: DocList[LegacyDocument], **kwargs) -> DocList[LegacyDocument]: docs.tensors = self.preprocessing(docs) docs.embedding = self.model_inference(docs.tensors)```The `encode` function is composed of two sub-functions.* `preprocessing` takes raw bytes from a DocList and puts them into a PyTorch tensor.* `model inference` calls the forward function of a deep learning model.By default, only the `encode` function is monitored:````{tab} Decorator```{code-block} python---emphasize-lines: 7, 11---from jina import Executor, requests, monitorfrom docarray import DocListfrom docarray.documents.legacy import LegacyDocumentclass MyExecutor(Executor): @monitor() def preprocessing(self, docs: DocList[LegacyDocument]): ... @monitor() def model_inference(self, tensor): ... @requests def encode(self, docs: DocList[LegacyDocument], **kwargs) -> DocList[LegacyDocument]: docs.tensors = self.preprocessing(docs) docs.embedding = self.model_inference(docs.tensors)```````````{tab} Context manager```{code-block} python---emphasize-lines: 13, 15---from jina import Executor, requestsfrom docarray import DocListfrom docarray.documents.legacy import LegacyDocumentdef preprocessing(self, docs: DocList[LegacyDocument]): ...def model_inference(self, tensor): ...class MyExecutor(Executor): @requests def encode(self, docs: DocList[LegacyDocument], **kwargs) -> DocList[LegacyDocument]: with self.monitor('preprocessing_seconds', 'Time preprocessing the requests'): docs.tensors = preprocessing(docs) with self.monitor('model_inference_seconds', 'Time doing inference the requests'): docs.embedding = model_inference(docs.tensors)```````## See also- {ref}`List of available metrics `- {ref}`How to deploy and use OpenTelemetry in Jina-serve `- [Tracing in OpenTelemetry](https://opentelemetry.io/docs/concepts/signals/traces/)- [Metrics in OpenTelemetry](https://opentelemetry.io/docs/concepts/signals/metrics/) --- # Source: https://github.com/jina-ai/jina/blob/master/docs/concepts/serving/executor/serve.md(serve-executor-standalone)=# Serve{class}`~jina.Executor`s can be served and accessed over the network using gRPC or HTTP protocols, allowing you to use them to create services for tasks like model inference, data processing, generative AI, and search services.There are different options for deploying and running a standalone Executor:* Run the Executor directly from Python with the {class}`~jina.orchestrate.deployments.Deployment` class* Run the {meth}`~jina.Deployment.to_kubernetes_yaml()` method to generate Kubernetes deployment configuration filesfrom an instance of {class}`~jina.orchestrate.deployments.Deployment`* Run the static {meth}`~jina.serve.executors.BaseExecutor.to_docker_compose_yaml()` method to generate a Docker Compose service file```{seealso}Executors can also be combined to form a pipeline of microservices. We will see in a later step howto achieve this with the {ref}`Flow ````````{admonition} Served vs. shared Executor:class: hintIn Jina there are two ways of running standalone Executors: *Served Executors* and *shared Executors*.- A **served Executor** is launched by one of the following methods: {class}`~jina.orchestrate.deployments.Deployment`, `to_kubernetes_yaml()`, or `to_docker_compose_yaml()`.It resides behind a {ref}`Gateway ` and can be directly accessed by a {ref}`Client `.It can also be used as part of a Flow.- A **shared Executor** is launched using the [Jina CLI](../../cli/index.rst) and does *not* sit behind a Gateway.It is intended to be used in one or more Flows. However, it can be also accessed by a {ref}`Client `.Because a shared Executor does not reside behind a Gateway, it requires fewer networking hops when used inside of a Flow.However, it is not suitable for exposing a standalone service without gRPC protocol.In any case, the user needs to make sure that the Document types bound to each endpoint are compatible inside a Flow.````(deployment)=## Serve directlyAn {class}`~jina.Executor` can be served using the {class}`~jina.orchestrate.deployments.Deployment` class.The {class}`~jina.orchestrate.deployments.Deployment` class aims to separate the deployment configuration from the serving logic.In other words:* the Executor cares about defining the logic to serve, which endpoints to define and what data to accept.* the Deployment layer cares about how to orchestrate this service, how many replicas or shards, etc.This separation also aims to enhance the reusability of Executors: the same implementation of an Executor can beserved in multiple ways/configurations using Deployment.````{tab} Python class```pythonfrom docarray import DocListfrom docarray.documents import TextDocfrom jina import Executor, requests, Deploymentclass MyExec(Executor): @requests def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]: docs[0].text = 'executed MyExec' # custom logic goes herewith Deployment(uses=MyExec, port=12345, replicas=2) as dep: docs = dep.post(on='/foo', inputs=DocList[TextDoc](TextDoc()), return_type=DocList[TextDoc]) print(docs.text)```````````{tab} YAML configuration`executor.yaml`:```jtype: MyExecpy_modules: - executor.py``````pythonfrom jina import Deploymentwith Deployment(uses='executor.yaml', port=12345, replicas=2) as dep: docs = dep.post(on='/foo', inputs=DocList[TextDoc](TextDoc()), return_type=DocList[TextDoc]) print(docs.text)```````````{tab} Hub Executor```pythonfrom jina import Deploymentwith Deployment(uses='jinaai://my-username/MyExec/', port=12345, replicas=2) as dep: docs = dep.post(on='/foo', inputs=DocList[TextDoc](TextDoc()), return_type=DocList[TextDoc]) print(docs.text)```````````{tab} Docker image```pythonfrom jina import Deploymentwith Deployment(uses='docker://my-executor-image', port=12345, replicas=2) as dep: docs = dep.post(on='/foo', inputs=DocList[TextDoc](TextDoc()), return_type=DocList[TextDoc]) print(docs.text)``````````text─────────────────────── 🎉 Deployment is ready to serve! ───────────────────────╭────────────── 🔗 Endpoint ────────────────╮│ ⛓ Protocol GRPC ││ 🏠 Local 0.0.0.0:12345 ││ 🔒 Private 192.168.3.147:12345 ││ 🌍 Public 87.191.159.105:12345 │╰───────────────────────────────────────────╯['executed MyExec']```````{hint}You can use `dep.block()` to serve forever:```pythonwith Deployment(uses=MyExec, port=12345, replicas=2) as dep: dep.block()```````## Serve from the CLIYou can run an Executor from CLI. In this case, the Executor occupies one process. The lifetime of the Executor is the lifetime of the process.### From a local Executor python class```shelljina executor --uses MyExec --py-modules executor.py```### From a local Executor YAML configurationWith `executor.py` containing the definition of `MyExec`, now creating a new file called `my-exec.yml`:```yamljtype: MyExecpy_modules: - executor.py```This simply points Jina-serve to our file and Executor class. Now we can run the command:```bashjina executor --uses my-exec.yml --port 12345```### From Executor HubIn this example, we use [`CLIPTextEncoder`](https://cloud.jina.ai/executor/livtkbkg) to create embeddings for our Documents.````{tab} With Docker```bashjina executor --uses jinaai+docker://jina-ai/CLIPTextEncoder```````````{tab} Without Docker```bashjina executor --uses jinaai://jina-ai/CLIPTextEncoder```````This might take a few seconds, but in the end you should be greeted with thefollowing message:```bashWorkerRuntime@ 1[L]: Executor CLIPTextEncoder started```Just like that, our Executor is up and running.(kubernetes-executor)=## Serve from Deployment YAMLIf you want a clear separation between deployment configuration and Executor logic, you can define theconfiguration in a `Deployment` YAML configuration.This is an example `deployment.yml` config file:```yamljtype: Deploymentwith: replicas: 2 shards: 3 uses: MyExecutor py_modules: - my_executor.py```Then, you can run the Deployment through the CLI or Python API:````{tab} Python API```pythonfrom jina import Deploymentwith Deployment.load_config('deployment.yml') as dep: dep.block()```````````{tab} CLI```shelljina deployment --uses deployment.yml```Unlike the `jina executor` CLI, this command supports replication and sharding.```````text─────────────────────── 🎉 Deployment is ready to serve! ───────────────────────╭────────────── 🔗 Endpoint ────────────────╮│ ⛓ Protocol GRPC ││ 🏠 Local 0.0.0.0:12345 ││ 🔒 Private 192.168.3.147:12345 ││ 🌍 Public 87.191.159.105:12345 │╰───────────────────────────────────────────╯```Read more about the {ref}`YAML specifications of Deployments `.## Serve via KubernetesYou can generate Kubernetes configuration files for your containerized Executor by using the {meth}`~jina.Deployment.to_kubernetes_yaml()` method:```pythonfrom jina import Deploymentdep = Deployment( uses='jinaai+docker://jina-ai/DummyHubExecutor', port_expose=8080, replicas=3)dep.to_kubernetes_yaml('/tmp/config_out_folder', k8s_namespace='my-namespace')```This will give the following output:```textINFO executor@8065 K8s yaml files have been created under [02/07/23 10:03:50] [b]/tmp/config_out_folder[/]. You can use it by running [b]kubectl apply -R -f /tmp/config_out_folder[/]```Afterwards, you can apply this configuration to your cluster:```shellkubectl apply -R -f /tmp/config_out_folder```The above example deploys the `DummyHubExecutor` from Executor Hub into your Kubernetes cluster.````{admonition} Hint:class: hintThe Executor you use needs to be already containerized and stored in a registry accessible from your Kubernetes cluster. We recommend [Executor Hub](https://cloud.jina.ai/executors) for this.````Once the Executor is deployed, you can expose a service:```bashkubectl expose deployment executor --name=executor-exposed --type LoadBalancer --port 80 --target-port 8080 -n my-namespacesleep 60 # wait until the external ip is configured```Let's export the external IP address created and use it to send requests to the Executor.```bashexport EXTERNAL_IP=`kubectl get service executor-exposed -n my-namespace -o=jsonpath='{.status.loadBalancer.ingress[0].ip}'````Then, we can send requests using {meth}`~jina.Client`. Since Kubernetes load balancers cannot load balance streaminggRPC requests, it is recommended to set `stream=False` when using gRPC (note that this is only applicable for Kubernetes deployments of Executors):```pythonimport osfrom jina import Clientfrom docarray import DocListfrom docarray.documents import TextDochost = os.environ['EXTERNAL_IP']port = 80client = Client(host=host, port=port)print(client.post(on='/', inputs=TextDoc(), return_type=DocList[TextDoc], stream=False).text)``````text['hello']```````{admonition} Hint:class: hintYou can also export an Executor deployment to kubernetes YAML files using the CLI command, in case you define a Deployment YAML config:`jina export kubernetes deployment.yml output_path`````(external-shared-executor)=### External and shared ExecutorsThis type of standalone Executor can be either *external* or *shared*. By default, it is external.- An external Executor is deployed alongside a {ref}`Gateway `.- A shared Executor has no Gateway.Although both types can join a {class}`~jina.Flow`, use a shared Executor if the Executor is only intended to join Flowsto have less network hops and save the costs of running the Gateway in Kubernetes.## Serve via Docker ComposeYou can generate a Docker Compose service file for your containerized Executor with the static {meth}`~jina.Deployment.to_docker_compose_yaml` method.```pythonfrom jina import Deploymentdep = Deployment( uses='jinaai+docker://jina-ai/DummyHubExecutor', port_expose=8080, replicas=3)dep.to_docker_compose_yaml( output_path='/tmp/docker-compose.yml',)``````shelldocker-compose -f /tmp/docker-compose.yml up```The above example runs the `DummyHubExecutor` from Executor Hub locally on your computer using Docker Compose.````{admonition} Hint:class: hintThe Executor you use needs to be already containerized and stored in an accessible registry. We recommend [Executor Hub](https://cloud.jina.ai/executors) for this.````````{admonition} Hint:class: hintYou can also export an Executor deployment to Docker compose YAML files using the CLI command, in case you define a Deployment YAML config:`jina export docker-compose deployment.yml output_path````` --- # Source: https://github.com/jina-ai/jina/blob/master/docs/concepts/serving/executor/yaml-spec.md(executor-yaml-spec)=# {octicon}`file-code` YAML specificationThis page outlines the Executor YAML file specification. These configurations can be used in a {class}`~jina.Deployment` with `Deployment(uses='exec.yml)`, in a {class}`~jina.Flow` with `Flow().add(uses='exec.yml)` or loaded directly via `Executor.load_config('exec.yml')`.Note that Executor YAML configuration always refers back to an Executor defined in a Python file.## ExampleThe following is an example {class}`~jina.Executor` configuration:```yamljtype: MyExecutorwith: match_args: {}py_modules: - executor.pymetas: name: Indexer description: Indexes all Documents url: https://github.com/janedoe/indexer keywords: ['indexer', 'executor']```## Keywords### `jtype`String specifying the Executor's Python type. Used to locate the correct class in the Python files given by `py_modules`.(executor-with-keyword)=### `with`Collection containing keyword arguments passed to the Executor's `__init__()` method. Valid values depend on the Executor.### `py_modules`List of strings defining the Executor's Python dependencies. Most notably this must include the Python file containing the Executor definition itself, as well as any other files it imports.### `metas`Collection containing meta-information about the Executor.Your Executor is annotated with this information when publishing to {ref}`Executor Hub `. To get better appeal on Executor Hub, set the `metas` fields to the correct values:- **`name`**: Human-readable name of the Executor.- **`description`**: Human-readable description of the Executor.- **`url`**: URL of where to find more information about the Executor, normally a GitHub repo URL.- **`keywords`**: A list of strings to help users filter and locate your package. --- # Source: https://github.com/jina-ai/jina/blob/master/docs/concepts/serving/gateway/customization.md(custom-gateway)=# CustomizationGateways are customizable in Jina-serve. You can implement them in much the same way as an Executor.With customized Gateways, Jina-serve gives you more power by letting you implement any server, protocol andinterface at the Gateway level. This means you have more freedom to:* Define and expose your own API Gateway interface to clients. You can define your JSON schema or protos etc.* Use your favorite server framework.* Choose the protocol used to serve your app.The next sections detail the steps to implement and use a custom Gateway.## Implementing the custom GatewayJust like for Executors, you can implement a custom Gateway by inheriting from a base `Gateway` class.Jina-serve will instantiate your implemented class, inject runtime arguments and user-defined arguments into it,run it, orchestrate it, and send it health-checks.There are two Gateway base classes for implementing a custom Gateway:* {class}`~jina.serve.runtimes.gateway.http.fastapi.FastAPIBaseGateway`: Use this abstract class to implement a custom Gateway using FastAPI.* {class}`~jina.Gateway`: Use this abstract class to implement a custom Gateway of any type.Whether your custom Gateway is based on a FastAPI app using {class}`~jina.serve.runtimes.gateway.http.fastapi.FastAPIBaseGateway`or based on a general server using {class}`~jina.Gateway`, you will need to implement your server behavior in almostthe same way.In the next section we will discuss the implementation steps, and then we will discuss how to use both base Gateway classes.(custom-gateway-server-implementation)=### Server implementationWhen implementing the server to your custom Gateway:1. Create an app/server and define the endpoints you want to expose as a service.2. In each of your endpoints' implementation, convert server requests to your endpoint into `Document` objects.3. Send `Documents` to Executors in the Flow using {ref}`a GatewayStreamer object `. This lets youuse Executors as a service and receive response Documents back.4. Convert response `Documents` to a server response and return it.5. Implement {ref}`the required health-checks ` for the Gateway.(This is not required when using {class}`~jina.serve.runtimes.gateway.http.fastapi.FastAPIBaseGateway`.)6. Bind your Gateway server to {ref}`parameters injected by the runtime `, i.e, `self.port`, `self.host`,...(Also not required for {class}`~jina.serve.runtimes.gateway.http.fastapi.FastAPIBaseGateway`.)Let's suppose you want to implement a '/service' GET endpoint in an HTTP server. Following the steps above, the serverimplementation might look like the following:```pythonfrom fastapi import FastAPIfrom uvicorn import Server, Configfrom jina import Gatewayfrom docarray import DocListfrom docarray.documents import TextDocclass MyGateway(Gateway): async def setup_server(self): # step 1: create an app and define the service endpoint app = FastAPI(title='Custom Gateway') @app.get(path='/service') async def my_service(input: str): # step 2: convert input request to Documents docs = DocList[TextDoc]([TextDoc(text=input)]) # step 3: send Documents to Executors using GatewayStreamer result = None async for response_docs in self.streamer.stream_docs( docs=docs, exec_endpoint='/', return_type=DocList[TextDoc] ): # step 4: convert response docs to server response and return it result = response_docs[0].text return {'result': result} # step 5: implement health-check @app.get(path='/') def health_check(): return {} # step 6: bind the gateway server to the right port and host self.server = Server(Config(app, host=self.host, port=self.port))```### Subclass from {class}`~jina.serve.runtimes.gateway.http.fastapi.FastAPIBaseGateway`{class}`~jina.serve.runtimes.gateway.http.fastapi.FastAPIBaseGateway` offers a simple API to implement customGateways, but is restricted to FastAPI apps.To implement a custom gateway using {class}`~jina.serve.runtimes.gateway.http.fastapi.FastAPIBaseGateway`,simply implement the {meth}`~jina.serve.runtimes.gateway.http.fastapi.FastAPIBaseGateway.app` property:```pythonfrom jina.serve.runtimes.gateway.http.fastapi import FastAPIBaseGatewayclass MyGateway(FastAPIBaseGateway): @property def app(self): from fastapi import FastAPI app = FastAPI(title='Custom FastAPI Gateway') @app.get(path='/endpoint') def custom_endpoint(): return {'message': 'custom-fastapi-gateway'} return app```As an example, you can refer to {class}`~jina.serve.runtimes.gateway.http.HTTPGateway`.{class}`~jina.serve.runtimes.gateway.http.fastapi.FastAPIBaseGateway` is a subclass of {class}`~jina.Gateway`.### Subclass from {class}`~jina.Gateway`{class}`~jina.Gateway` allows implementing more general cases of Gateways. You can use this class as long as your gatewayserver is runnable in an `asyncio` loop.To implement a custom gateway class using {class}`~jina.Gateway`:* Create a class that inherits from {class}`~jina.Gateway`* Implement a constructor `__init__`:(This is optional. You don't need a constructor if your Gateway does not need user-defined attributes.)If your Gateway has `__init__`, it needs to carry `**kwargs` in the signature and call `super().__init__(**kwargs)` in the body:```pythonfrom jina import Gatewayclass MyGateway(Gateway): def __init__(self, foo: str, **kwargs): super().__init__(**kwargs) self.foo = foo```* Implement `async def setup_server():`. This should set up a server runnable on an asyncio loop (and other resourcesneeded for setting up the server). For instance:```pythonfrom jina import Gatewayfrom fastapi import FastAPIfrom uvicorn import Server, Configclass MyGateway(Gateway): async def setup_server(self): app = FastAPI(title='My Custom Gateway') @app.get(path='/endpoint') def custom_endpoint(): return {'message': 'custom-gateway'} self.server = Server(Config(app, host=self.host, port=self.port))```Please refer to {ref}`the Server Implementation section` for details on how to implementthe server.* Implement `async def run_server():`. This should run the server and `await` for it while serving:```pythonfrom jina import Gatewayclass MyGateway(Gateway): ... async def run_server(self): await self.server.serve()```* Implement `async def shutdown():`. This should stop the server and free all resources associated with it:```pythonfrom jina import Gatewayclass MyGateway(Gateway): ... async def shutdown(self): self.server.should_exit = True await self.server.shutdown()```As an example, you can refer to {class}`~jina.serve.runtimes.gateway.grpc.GRPCGateway` and{class}`~jina.serve.runtimes.gateway.websocket.WebSocketGateway`.(gateway-streamer)=## Calling Executors with {class}`~jina.serve.streamer.GatewayStreamer`{class}`~jina.serve.streamer.GatewayStreamer` allows you to interface with Executors within the Gateway. An instance ofthis class knows about the Flow topology and where each Executor lives.Use this object to send Documents to Executors in the Flow. A {class}`~jina.serve.streamer.GatewayStreamer` objectconnects the custom Gateway with the rest of the Flow.You can get this object in 2 different ways:* A `streamer` object (instance of {class}`~jina.serve.streamer.GatewayStreamer`) is injected by Jina-serve to your `Gateway` class.* If your server logic cannot access the `Gateway` class (for instance separate script), you can still get a `streamer`object using {meth}`~jina.serve.streamer.GatewayStreamer.get_streamer()`:```pythonfrom jina.serve.streamer import GatewayStreamerstreamer = GatewayStreamer.get_streamer()```After transforming requests that arrive to the Gateway server into Documents, you can send them to Executors in the Flowusing {meth}`~jina.serve.streamer.GatewayStreamer.stream_docs()`.This method expects a DocList object and an endpoint exposed by the Flow Executors (similar to {ref}`Jina Client `).It returns an `AsyncGenerator` of DocLists:```{code-block} python---emphasize-lines: 15, 16, 17, 18, 19, 20---from jina.serve.runtimes.gateway.http.fastapi import FastAPIBaseGatewayfrom docarray import DocListfrom docarray.documents import TextDocfrom fastapi import FastAPIclass MyGateway(FastAPIBaseGateway): @property def app(self): app = FastAPI() @app.get("/endpoint") async def get(text: str): result = None async for docs in self.streamer.stream_docs( docs=DocList[TextDoc]([TextDoc(text=text)]), exec_endpoint='/', return_type=DocList[TextDoc], ): result = docs[0].text return {'result': result} return app``````{hint}:class: noteif you omit the `return_type` parameter, the gateway streamer can still fetch the Executor output schemas and dynamically construct a DocArray model for it.Even though the dynamically created schema is very similar to original schema, some validation checks can still fail (for instance adding to a typed `DocList`).It is recommended to always pass the `return_type` parameter```### Recovering Executor errorsExceptions raised by an `Executor` are captured in the server object which can be extracted by using the {meth}`jina.serve.streamer.stream()` method. The `stream` methodreturns an `AsyncGenerator` of a tuple of `DocList` and an optional {class}`jina.excepts.ExecutorError` class that be used to check if the `Executor` has issues processing the input request.The error can be utilized for retries, handling partial responses or returning default responses.```{code-block} python---emphasize-lines: 5, 6, 7, 8, 9, 10, 11, 12---@app.get("/endpoint")async def get(text: str): results = [] errors = [] async for for docs, error in self.streamer.stream( docs=DocList[TextDoc]([TextDoc(text=text)]), exec_endpoint='/', return_type=DocList[TextDoc], ): if error: errors.append(error) else: results.append(docs[0].text) return {'results': results, 'errors': [error.name for error in errors]}``````{hint}:class: noteif you omit the `return_type` parameter, the gateway streamer can still fetch the Executor output schemas and dynamically construct a DocArray model for it.Even though the dynamically created schema is very similar to original schema, some validation checks can still fail (for instance adding to a typed `DocList`).It is recommended to always pass the `return_type` parameter```(executor-streamer)=## Calling an individual ExecutorJina-serve injects an `executor` object into your Gateway class which lets you call individual Executors from the Gateway.After transforming requests that arrive to the Gateway server into Documents, you can call the Executor in your Python code using `self.executor['executor_name'].post(args)`.This method expects a DocList object and an endpoint exposed by the Executor (similar to {ref}`Jina Client `).It returns a 'coroutine' which returns a DocList.Check the method documentation for more information: {meth}`~ jina.serve.streamer._ExecutorStreamer.post()`In this example, we have a Flow with two Executors (`executor1` and `executor2`). We can call them individually using `self.executor['executor_name'].post`:```{code-block} python---emphasize-lines: 16,17,41---from jina.serve.runtimes.gateway.http.fastapi import FastAPIBaseGatewayfrom jina import Flow, Executor, requestsfrom docarray import DocListfrom docarray.documents import TextDocfrom fastapi import FastAPIimport timeclass MyGateway(FastAPIBaseGateway): @property def app(self): app = FastAPI() @app.get("/endpoint") async def get(text: str): toc = time.time() docs1 = await self.executor['executor1'].post(on='/', inputs=DocList[TextDoc]([TextDoc(text=text)]), parameters={'k': 'v'}, return_type=DocList[TextDoc]) docs2 = await self.executor['executor2'].post(on='/', inputs=DocList[TextDoc]([TextDoc(text=text)]), parameters={'k': 'v'}, return_type=DocList[TextDoc]) return {'result': docs1.text + docs2.text, 'time_taken': time.time() - toc} return appclass FirstExec(Executor): @requests def func(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]: time.sleep(2) for doc in docs: doc.text += ' saw the first executor'class SecondExec(Executor): @requests def func(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]: time.sleep(2) for doc in docs: doc.text += ' saw the second executor'with Flow().config_gateway(uses=MyGateway, protocol='http').add(uses=FirstExec, name='executor1').add(uses=SecondExec, name='executor2') as flow: import requests as reqlib r = reqlib.get(f"http://localhost:{flow.port}/endpoint?text=hello") print(r.json()) assert r.json()['result'] == ['hello saw the first executor', 'hello saw the second executor'] assert r.json()['time_taken'] > 4```You can also call two Executors in parallel using asyncio. This will overlap their execution times -- speeding up the response time of the endpoint.Here is one way to do it:```{code-block} python---emphasize-lines: 17,18,19,43---from jina.serve.runtimes.gateway.http.fastapi import FastAPIBaseGatewayfrom jina import Flow, Executor, requestsfrom docarray import DocListfrom docarray.documents import TextDocfrom fastapi import FastAPIimport timeimport asyncioclass MyGateway(FastAPIBaseGateway): @property def app(self): app = FastAPI() @app.get("/endpoint") async def get(text: str): toc = time.time() call1 = self.executor['executor1'].post(on='/', inputs=DocList[TextDoc]([TextDoc(text=text)]), parameters={'k': 'v'}, return_type=DocList[TextDoc]) call2 = self.executor['executor2'].post(on='/', inputs=DocList[TextDoc]([TextDoc(text=text)]), parameters={'k': 'v'}, return_type=DocList[TextDoc]) docs1, docs2 = await asyncio.gather(call1, call2) return {'result': docs1.text + docs2.text, 'time_taken': time.time() - toc} return appclass FirstExec(Executor): @requests def func(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]: time.sleep(2) for doc in docs: doc.text += ' saw the first executor'class SecondExec(Executor): @requests def func(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]: time.sleep(2) for doc in docs: doc.text += ' saw the second executor'with Flow().config_gateway(uses=MyGateway, protocol='http').add(uses=FirstExec, name='executor1').add(uses=SecondExec, name='executor2') as flow: import requests as reqlib r = reqlib.get(f"http://localhost:{flow.port}/endpoint?text=hello") print(r.json()) assert r.json()['result'] == ['hello saw the first executor', 'hello saw the second executor'] assert r.json()['time_taken'] < 2.5```## Gateway arguments(gateway-runtime-arguments)=### Runtime attributesJina-serve injects runtime attributes into the Gateway classes. You can use them to set up your custom gateway:* `logger`: Jina-serve logger object.* `streamer`: {class}`~jina.serve.streamer.GatewayStreamer`. Use this object to send Documents from the Gateway to Executors. Refer to {ref}`this section ` for more information.* `runtime_args`: `argparse.Namespace` object containing runtime arguments.* `port`: main port exposed by the Gateway.* `ports`: list of all ports exposed by the Gateway.* `protocols`: list of all protocols supported by the Gateway.* `host`: Host address to which the Gateway server should be bound.Use these attributes to implement your Gateway logic. For instance, binding the server to the runtime provided `port` and`host`:```{code-block} python---emphasize-lines: 7---from jina import Gatewayclass MyGateway(Gateway): ... async def setup_server(self): ... self.server = Server(Config(app, host=self.host, port=self.port))``````{admonition} Note:class: noteJina provides the Gateway with a list of ports and protocols to expose. Therefore, a custom Gateway can handle requestson multiple ports using different protocols.```(user-defined-arguments)=### User-defined parametersYou can also set other parameters by implementing a custom constructor `__init__`.You can override constructorparameters in the Flow Python API (using `uses_with` parameter) or in the YAML configuration when including the Gateway.Refer to the {ref}`Use Custom Gateway section ` for more information.(custom-gateway-health-check)=## Required health-checksJina-serve relies on health-checks to determine the health of the Gateway. In environments like Kubernetes,Docker Compose and Jina-serve Cloud, this information is crucial for orchestrating the Gateway.Since you have full control over your custom gateways, you are always responsible for implementing health-check endpoints:* If the protocol used is gRPC, a health servicer (for instance `health.aio.HealthServicer()`) from `grpcio-health-checking`is expected to be added to the gRPC server. Refer to {class}`~jina.serve.runtimes.gateway.grpc.gateway.GRPCGateway` asan example.* Otherwise, an HTTP GET request to the root path is expected to return a `200` status code.To test whether your server properly implements health-checks, you can use the command `jina ping ://host:port````{admonition} Important:class: importantAlthough a Jina Gateway can expose multiple ports and protocols, the runtime only cares about the first exposed portand protocol. Health checks will be sent only to the first port.```## Gateway YAML fileLike Executor `config` files, a custom Gateway implementation can be associated with a YAML configuration file.Such a configuration can override user-defined parameters and define other runtime arguments (`port`, `protocol`, `py_modules`, etc).You can define such a configuration in `config.yml`:```yaml!MyGatewaypy_modules: my_gateway.pywith: arg1: hello arg2: worldport: 12345```For more information, please refer to the {ref}`Gateway YAML Specifications `## Containerize the Custom GatewayYou may want to dockerize your custom Gateway so you can isolate its dependencies and make it ready to run in the cloudor Kubernetes.This assumes that you've already implemented a custom Gateway class and have defined a `config.yml` for it.In this case, dockerizing the Gateway is straightforward:* If you need dependencies other than Jina-serve, make sure to add a `requirements.txt` file (for instance, you use a server library).* Create a `Dockerfile` as follows:1. Use a [Jina-serve based image](https://hub.docker.com/r/jinaai/jina) with the `standard` tag as the base image in your Dockerfile.This ensures that everything needed for Jina-serve to run the Gateway is installed. Make sure the Jina-serve version supportscustom Gateways:```dockerfileFROM jinaai/jina:latest-py38-standard```Alternatively, you can just install jina-serve using `pip`:```dockerfileRUN pip install jina```2. Install everything from `requirements.txt` if you included it:```dockerfileRUN pip install -r requirements.txt```3. Copy source code under the `workdir` folder:```dockerfileCOPY . /workdir/WORKDIR /workdir```4. Use the `jina gateway --uses config.yml` command as your image's entrypoint:```dockerfileENTRYPOINT ["jina", "gateway", "--uses", "config.yml"]```Once you finish the `Dockerfile` you should end up with the following file structure:```.├── my_gateway.py└── requirements.txt└── config.yml└── Dockerfile```You can now build the Docker image:```shellcd my_gatewaydocker build -t gateway-image```(use-custom-gateway)=## Use the Custom GatewayYou can include the Custom Gateway in a Jina-serve Flow in different formats: Python class, configuration YAML and Docker image:### Flow python API````{tab} Python Class```pythonfrom jina import Gateway, Flowclass MyGateway(Gateway): def __init__(self, arg: str = None, **kwargs): super().__init__(**kwargs) self.arg = arg ...flow = Flow().config_gateway( uses=MyGateway, port=12345, protocol='http', uses_with={'arg': 'value'})```````````{tab} YAML configuration```pythonflow = Flow().config_gateway( uses='config.yml', port=12345, protocol='http', uses_with={'arg': 'value'})```````````{tab} Docker Image```pythonflow = Flow().config_gateway( uses='docker://gateway-image', port=12345, protocol='http', uses_with={'arg': 'value'},)```````### Flow YAML configuration````{tab} Python Class```yaml!Flowgateway: py_modules: my_gateway/my_gateway.py uses: MyGateway with: arg: value protocol: http port: 12345```````````{tab} YAML configuration```yaml!Flowgateway: uses: my_gateway/config.yml protocol: http port: 12345```````````{tab} Docker Image```yaml!Flowgateway: uses: docker://gateway-image protocol: http port: 12345``````````{admonition} Important:class: importantWhen you include a custom Gateway in a Jina Flow, since Jina needs to know about the port and protocol to whichhealth checks will be sent, it is important to specify them when including the Gateway.``` --- # Source: https://github.com/jina-ai/jina/blob/master/docs/concepts/serving/gateway/customize-http-endpoints.md# Customize HTTP endpointsNot every {class}`~jina.Executor` endpoint will automatically be exposed through the external HTTP interface.By default, any Flow exposes the following CRUD and debug HTTP endpoints: `/status`, `/post`, `/index`, `/search`, `/update`, and `/delete`.Executors that provide additional endpoints (e.g. `/foo`) will be exposed only after manual configuration.These custom endpoints can be added to the HTTP interface using `Flow.expose_endpoint`.```{figure} expose-endpoints.svg:align: center```````{tab} Python```pythonfrom jina import Executor, requests, Flowclass MyExec(Executor): @requests(on='/foo') def foo(self, docs, **kwargs): passf = Flow().config_gateway(protocol='http').add(uses=MyExec)f.expose_endpoint('/foo', summary='my endpoint')with f: f.block()```````````{tab} YAMLYou can enable custom endpoints in a Flow using yaml syntax as well.```yamljtype: Flowwith: protocol: http expose_endpoints: /foo: summary: my endpoint```````Now, sending an HTTP data request to the `/foo` endpoint is equivalent to calling `f.post('/foo', ...)` using the Python Client.You can add more `kwargs` to build richer semantics on your HTTP endpoint. Those meta information will be rendered by Swagger UI and be forwarded to the OpenAPI schema.````{tab} Python```pythonf.expose_endpoint('/bar', summary='my endpoint', tags=['fine-tuning'], methods=['PUT'])```````````{tab} YAML```yamljtype: Flowwith: protocol: http expose_endpoints: /bar: methods: ["PUT"] summary: my endpoint tags: - fine-tuning```````However, if you want to send requests to a different Executor endpoint, you can still do it without exposing it in the HTTP endpoint, by sending an HTTP request to the `/post` HTTP endpoint while setting`execEndpoint` in the request.```textcurl --request POST \'http://localhost:12345/post' \--header 'Content-Type: application/json' -d '{"data": [{"text": "hello world"}], "execEndpoint": "/foo"}'```The above cURL command is equivalent to passing the `on` parameter to `client.post` as follows:```pythonfrom jina import Clientfrom docarray import DocListfrom docarray.documents import TextDocclient = Client(port=12345, protocol='http')client.post(on='/foo', inputs=DocList[TextDoc]([TextDoc(text='hello world')]), return_type=DocList[TextDoc])```## Hide default endpointsIt is possible to hide the default CRUD and debug endpoints in production. This might be useful when the context is not applicable.For example, in the code snippet below, we didn't implement any CRUD endpoints for the executor, hence it does not make sense to expose them to public.````{tab} Python```pythonfrom jina import Flowf = Flow().config_gateway( protocol='http', no_debug_endpoints=True, no_crud_endpoints=True)```````````{tab} YAML```yamljtype: Flowwith: protocol: 'http' no_debug_endpoints: True, no_crud_endpoints: True```````After setting up a Flow in this way, the {ref}`default HTTP endpoints ` will not be exposed.(cors)=## Enable CORS (cross-origin resource sharing)To make a Flow accessible from a website with a different domain, you need to enable [Cross-Origin Resource Sharing (CORS)](https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS).Among other things, CORS is necessary to provide a {ref}`Swagger UI interface ` for your Flow.Note that CORS is disabled by default, for security reasons.To enable CORS, configure your Flow in the following way:```pythonfrom jina import Flowf = Flow().config_gateway(cors=True, protocol='http')```## Enable GraphQL endpoint````{admonition} Caution:class: cautionGraphQL support is an optional feature that requires optional dependencies.To install these, run `pip install jina-serve[graphql]` or `pip install jina-serve[all]`.Unfortunately, these dependencies are **not available through Conda**. You will have to use `pip` to be able to use GraphQL feature.````A {class}`~jina.Flow` can optionally expose a [GraphQL](https://graphql.org/) endpoint, located at `/graphql`.To enable this endpoint, all you need to do is set `expose_graphql_endpoint=True` on your HTTP Flow:````{tab} Python```pythonfrom jina import Flowf = Flow().config_gateway(protocol='http', expose_graphql_endpoint=True)```````````{tab} YAML```yamljtype: Flowwith: protocol: 'http' expose_graphql_endpont: True,```````````{admonition} See Also:class: seealsoFor more details about the Jina GraphQL endpoint, see {ref}`here `.````## Config Uvicorn serverHTTP support in Jina is powered by [Uvicorn](https://www.uvicorn.org/).You can configure the Flow's internal Uvicorn sever to your heart's content by passing `uvicorn_kwargs` to the Flow:```pythonfrom jina import Flowf = Flow().config_gateway( protocol='http', uvicorn_kwargs={'loop': 'asyncio', 'http': 'httptools'})```These arguments will be directly passed to the Uvicorn server.````{admonition} See Also:class: seealsoFor more details about the arguments that are used here, and about other available settings for the Uvicorn server,see their [website](https://www.uvicorn.org/settings/).```` --- # Source: https://github.com/jina-ai/jina/blob/master/docs/concepts/serving/gateway/health-check.md(health-check-gateway)=# Health CheckJust like each individual Executors, the Gateway also exposes a health check endpoint.In contrast to Executors however, a Gateway can use gRPC, HTTP, or WebSocketss, and the health check endpoint changes accordingly.## Using gRPCWhen using gRPC as the protocol to communicate with the Gateway, the Gateway uses the exact same mechanism as Executors to expose its health status: It exposes the [standard gRPC health check](https://github.com/grpc/grpc/blob/master/doc/health-checking.md) to the outside world.With the same Flow as before, you can use the same way to check the Gateway status:```bashdocker run --network='host' fullstorydev/grpcurl -plaintext 127.0.0.1:12345 grpc.health.v1.Health/Check``````json{ "status": "SERVING"}```## Using HTTP or WebSockets````{admonition} Caution:class: cautionFor Gateways running with HTTP or WebSockets, the gRPC health check response codes outlined {ref}`above ` do not apply.Instead, an error free response signifies healthiness.````When using HTTP or WebSockets as the Gateway protocol, you can query the endpoint `'/'` to check the status.First, create a Flow with HTTP or WebSockets protocol:```pythonfrom jina import Flowf = Flow(protocol='http', port=12345).add()with f: f.block()```Then query the "empty" endpoint:```bashcurl http://localhost:12345```You get a valid empty response indicating the Gateway's ability to serve:```json{}``` --- # Source: https://github.com/jina-ai/jina/blob/master/docs/concepts/serving/gateway/index.md(gateway)=# GatewayEvery {class}`~jina.Flow` has a Gateway component that receives requests over the network, allowing clients to send datato the Flow for processing.The Gateway is the first destination of a client request and its final destination, meaning that all incoming requestsare routed to the Gateway and the Gateway is responsible for handling and responding to those requests. The Gatewaysupports multiple protocols and endpoints, such as gRPC, HTTP, WebSocket, and GraphQL, allowing clients to communicatewith the Flow using the protocol of their choice.In most cases, the Gateway is automatically configured when you initialize a Flow object, so you do not need toconfigure it yourself.However, you can always explicitly configure the Gateway in Python using the{meth}`~jina.Flow.config_gateway` method, or in YAML. The full YAML specification for configuring the Gateway can be{ref}`found here`.(flow-protocol)=## Set protocol in PythonYou can use three different protocols to serve the `Flow`: gRPC, HTTP and WebSocket.````{tab} gRPC```{code-block} python---emphasize-lines: 12, 14---from jina import Client, Executor, Flow, requestsfrom docarray import DocListfrom docarray.documents import TextDocclass FooExecutor(Executor): @requests def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]: for doc in docs: doc.text = 'foo was called'f = Flow().config_gateway(protocol='grpc', port=12345).add(uses=FooExecutor)with f: client = Client(port=12345) docs = client.post(on='/', inputs=TextDoc(), return_type=DocList[TextDoc]) print(docs.text)``````text['foo was called']```````````{tab} HTTP```{code-block} python---emphasize-lines: 12, 14---from jina import Client, Executor, Flow, requestsfrom docarray import DocListfrom docarray.documents import TextDocclass FooExecutor(Executor): @requests def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]: for doc in docs: doc.text = 'foo was called'f = Flow().config_gateway(protocol='http', port=12345).add(uses=FooExecutor)with f: client = Client(port=12345, protocol='http') docs = client.post(on='/', inputs=TextDoc(), return_type=DocList[TextDoc]) print(docs.text)``````text['foo was called']```````````{tab} WebSocket```{code-block} python---emphasize-lines: 12, 14---from jina import Client, Executor, Flow, requestsfrom docarray import DocListfrom docarray.documents import TextDocclass FooExecutor(Executor): @requests def foo(self, docs: DocList[TextDoc], **kwargs) -> DocList[TextDoc]: for doc in docs: doc.text = 'foo was called'f = Flow().config_gateway(protocol='websocket', port=12345).add(uses=FooExecutor)with f: client = Client(port=12345, protocol='websocket') docs = client.post(on='/', inputs=TextDoc(), return_type=DocList[TextDoc]) print(docs.text)``````text['foo was called']```````## Set protocol in YAMLTo configure the protocol in a YAML file:````{tab} gRPCNote that gRPC is the default protocol, so you can just omit it.```{code-block} yamljtype: Flowgateway: protocol: 'grpc'```````````{tab} HTTP```{code-block} yamljtype: Flowgateway: protocol: 'http'```````````{tab} WebSocket```{code-block} yamljtype: Flowgateway: protocol: 'websocket'```````## Enable multiple protocolsYou can enable multiple protocols on the Gateway. This allows polyglot clients connect to your Flow with differentprotocols.````{tab} Python```{code-block} python---emphasize-lines: 2---from jina import Flowflow = Flow().config_gateway(protocol=['grpc', 'http', 'websocket'])with flow: flow.block()```````````{tab} YAML```yamljtype: Flowgateway: protocol: - 'grpc' - 'http' - 'websocket'``````````{figure} multi-protocol-flow.png:width: 70%``````{admonition} Important:class: importantIn case you want to serve a Flow using multiple protocols, make sure to specify as much ports as protocols used.```(custom-http)=(flow-tls)=## Enable TLS for client trafficsYou can enable TLS encryption between your Gateway and Clients, for any of the protocols supported by Jina-serve (HTTP, gRPC,and WebSocket).````{admonition} Caution:class: cautionEnabling TLS will encrypt the data that is transferred between the Flow and the Client.Data that is passed between the microservices configured by the Flow, such as Executors, will **not** be encrypted.````To enable TLS encryption, you need to pass a valid *keyfile* and *certfile* to the Flow, usingthe `ssl_keyfile` `ssl_certfile`parameters:```pythonfrom jina import FlowFlow().config_gateway( port=12345, ssl_certfile='path/to/certfile.crt', ssl_keyfile='path/to/keyfile.crt',)```If both of these are provided, the Flow will automatically configure itself to use TLS encryption for its communicationwith any Client.(server-compress)=## Enable in-Flow compressionThe communication between {class}`~jina.Executor`s inside a {class}`~jina.Flow` is done via gRPC. To optimize theperformance and the bandwidth of these connections, you canenable [compression](https://grpc.github.io/grpc/python/grpc.html#compression) by specifying `compression` argument tothe Gateway.The supported methods are: none, `gzip` and `deflate`.```pythonfrom jina import Flowf = Flow().config_gateway(compression='gzip').add(...)```Note that this setting is only effective the internal communication of the Flow.One can also specify the compression between client and gateway {ref}`as described here`.## Get environment informationGateway provides an endpoint that exposes environment information where it runs.It is a dict-like structure with the following keys:- `jina`: A dictionary containing information about the system and the versions of several packages including jina package itself- `envs`: A dictionary containing all the values if set of the {ref}`environment variables used in Jina-serve `### Use gRPCTo see how this works, first instantiate a Flow with an Executor exposed to a specific port and block it for serving:```pythonfrom jina import Flowwith Flow().config_gateway(protocol=['grpc'], port=12345) as f: f.block()```Then, you can use [grpcurl](https://github.com/fullstorydev/grpcurl) sending status check request to the Gateway.```shelldocker pull fullstorydev/grpcurl:latestdocker run --network='host' fullstorydev/grpcurl -plaintext 127.0.0.1:12345 jina.JinaInfoRPC/_status```The error-free output below signifies a correctly running Gateway:```json{ "jina": { "architecture": "######", "ci-vendor": "######", "docarray": "######", "grpcio": "######", "jina": "######", "jina-proto": "######", "jina-vcs-tag": "######", "platform": "######", "platform-release": "######", "platform-version": "######", "processor": "######", "proto-backend": "######", "protobuf": "######", "python": "######", "pyyaml": "######", "session-id": "######", "uid": "######", "uptime": "######" }, "envs": { "JINA_AUTH_TOKEN": "(unset)", "JINA_DEFAULT_HOST": "(unset)", "JINA_DEFAULT_TIMEOUT_CTRL": "(unset)", "JINA_DEPLOYMENT_NAME": "(unset)", "JINA_DISABLE_HEALTHCHECK_LOGS": "(unset)", "JINA_DISABLE_UVLOOP": "(unset)", "JINA_EARLY_STOP": "(unset)", "JINA_FULL_CLI": "(unset)", "JINA_GATEWAY_IMAGE": "(unset)", "JINA_GRPC_RECV_BYTES": "(unset)", "JINA_GRPC_SEND_BYTES": "(unset)", "JINA_HUBBLE_REGISTRY": "(unset)", "JINA_HUB_NO_IMAGE_REBUILD": "(unset)", "JINA_LOCKS_ROOT": "(unset)", "JINA_LOG_CONFIG": "(unset)", "JINA_LOG_LEVEL": "(unset)", "JINA_LOG_NO_COLOR": "(unset)", "JINA_MP_START_METHOD": "(unset)", "JINA_RANDOM_PORT_MAX": "(unset)", "JINA_RANDOM_PORT_MIN": "(unset)" }}``````{tip}You can also use it to check Executor status, as Executor's communication protocol is gRPC.```(gateway-grpc-server-options)=### Configure Gateway gRPC optionsThe {class}`~jina.Gateway` supports the `grpc_server_options` parameter which allows more customization of the **gRPC**server. The `grpc_server_options` parameter accepts a dictionary of **gRPC** configuration options which will beused to overwrite the default options. The **gRPC** channel used for server to server communication can also becustomized using the `grpc_channel_options` parameter.The default **gRPC** options are:```('grpc.max_send_message_length', -1),('grpc.max_receive_message_length', -1),('grpc.keepalive_time_ms', 9999),# send keepalive ping every 9 second, default is 2 hours.('grpc.keepalive_timeout_ms', 4999),# keepalive ping time out after 4 seconds, default is 20 seconds('grpc.keepalive_permit_without_calls', True),# allow keepalive pings when there's no gRPC calls('grpc.http1.max_pings_without_data', 0),# allow unlimited amount of keepalive pings without data('grpc.http1.min_time_between_pings_ms', 10000),# allow grpc pings from client every 9 seconds('grpc.http1.min_ping_interval_without_data_ms', 5000),# allow grpc pings from client without data every 4 seconds```Refer to the [channel_arguments](https://grpc.github.io/grpc/python/glossary.html#term-channel_arguments) section forthe full list of available **gRPC** options.```{hint}:class: seealsoRefer to the {ref}`Configure gRPC Client options ` section for configuring the `Client` **gRPC** channel options.Refer to the {ref}`Configure Executor gRPC options ` section for configuring the `Executor` **gRPC** options.```### Use HTTP/WebSocketWhen using HTTP or WebSocket as the Gateway protocol, you can use curl to target the `/status` endpoint and get the Jina-serveinfo.```shellcurl http://localhost:12345/status``````json{ "jina": { "jina": "######", "docarray": "######", "jina-proto": "######", "jina-vcs-tag": "(unset)", "protobuf": "######", "proto-backend": "######", "grpcio": "######", "pyyaml": "######", "python": "######", "platform": "######", "platform-release": "######", "platform-version": "######", "architecture": "######", "processor": "######", "uid": "######", "session-id": "######", "uptime": "######", "ci-vendor": "(unset)" }, "envs": { "JINA_AUTH_TOKEN": "(unset)", "JINA_DEFAULT_HOST": "(unset)", "JINA_DEFAULT_TIMEOUT_CTRL": "(unset)", "JINA_DEPLOYMENT_NAME": "(unset)", "JINA_DISABLE_UVLOOP": "(unset)", "JINA_EARLY_STOP": "(unset)", "JINA_FULL_CLI": "(unset)", "JINA_GATEWAY_IMAGE": "(unset)", "JINA_GRPC_RECV_BYTES": "(unset)", "JINA_GRPC_SEND_BYTES": "(unset)", "JINA_HUBBLE_REGISTRY": "(unset)", "JINA_HUB_NO_IMAGE_REBUILD": "(unset)", "JINA_LOG_CONFIG": "(unset)", "JINA_LOG_LEVEL": "(unset)", "JINA_LOG_NO_COLOR": "(unset)", "JINA_MP_START_METHOD": "(unset)", "JINA_RANDOM_PORT_MAX": "(unset)", "JINA_RANDOM_PORT_MIN": "(unset)", "JINA_DISABLE_HEALTHCHECK_LOGS": "(unset)", "JINA_LOCKS_ROOT": "(unset)" }}```(gateway-logging-configuration)=## Custom logging configurationThe {ref}`Custom logging configuration ` section describes customizing the logging configuration for all entities of the `Flow`.The `Gateway` logging can also be individually configured using a custom `logging.json.yml` file as in the below example. The custom logging file`logging.json.yml` is described in more detail in the {ref}`Custom logging configuration ` section.````{tab} Python```pythonfrom jina import Flowf = Flow().config_gateway(log_config='./logging.json.yml')```````````{tab} YAML```yamljtype: Flowgateway: log_config: './logging.json.yml'```````## See also- {ref}`Access the Flow with the Client `- {ref}`Deployment with Kubernetes `- {ref}`Deployment with Docker Compose ````{toctree}:hidden:health-checkrate-limitcustomize-http-endpointscustomizationyaml-spec``` --- # Source: https://github.com/jina-ai/jina/blob/master/docs/concepts/serving/gateway/rate-limit.md(prefetch)=# Rate LimitRequests always reach to the Flow as fast as possible. If a client sends their request faster than the {class}`~jina.Flow` can process them, this can put a high load on the Flow, which may cause out of memory issues.At Gateway, you can control the number of in flight requests **per Client** with the `prefetch` argument. Setting `prefetch=2` lets the API accept only 2 requests per client in parallel, hence limiting the load of the Flow.By default `prefetch=1000`. To disable it you can set it to 0.```{code-block} python---emphasize-lines: 8, 10---def requests_generator(): while True: yield Document(...)class MyExecutor(Executor): @requests def foo(self, **kwargs): slow_operation()# Makes sure only 2 requests reach the Executor at a time.with Flow().config_gateway(prefetch=2).add(uses=MyExecutor) as f: f.post(on='/', inputs=requests_generator)``````{danger}When working with very slow executors and a big amount of data, you must set `prefetch` to some small number to prevent out of memory problems. If you are unsure, always set `prefetch=1`.```````{tab} Python```pythonfrom jina import Flowf = Flow().config_gateway(protocol='http', prefetch=10)```````````{tab} YAML```yamljtype: Flowwith: protocol: 'http' prefetch: 10```````## Set timeoutsYou can set timeouts for sending requests to the {class}`~jina.Executor`s within a {class}`~jina.Flow` by passing the `timeout_send` parameter. The timeout is specified in milliseconds. By default, it is `None` and the timeout is disabled.If you use timeouts, you may also need to set the {ref}`prefetch ` option in the Flow. Otherwise, requests may queue up at an Executor and eventually time out.```{code-block} pythonwith Flow().config_gateway(timeout_send=1000) as f: f.post(on='/', inputs=[Document()])```The example above limits every request to the Executors in the Flow to a timeout of 1 second. --- # Source: https://github.com/jina-ai/jina/blob/master/docs/concepts/serving/gateway/yaml-spec.md(gateway-yaml-spec)=# {octicon}`file-code` YAML specificationThis page outlines the specification for Gateway.Gateway config is nested under the `gateway` section of a Flow YAML. For example,```{code-block} yaml---emphasize-lines: 3-4---jtype: Flowversion: '1'gateway: protocol: http```Defines a Gateway that uses HTTP protocol.```{warning}It is also possible to define a Gateway configuration directly under the top-level `with` field, but it is not recommended.```## FieldsThe following fields are defined for Gateway and can be set under the `gateway` section (or the `with` section) of a Flow YAML.```{include} ../flow/gateway-args.md``` --- # Source: https://github.com/jina-ai/jina/blob/master/docs/concepts/serving/index.md(serving)=# {fas}`gears` ServingAs seen in the {ref}`architecture overview `, Jina-serve is organized in different layers.The Serving layer is composed of concepts that allow developers to write their logic to be served by the objects in {ref}`orchestration ` layer.Two objects belong to this family:- Executor ({class}`~jina.Executor`), serves your logic based on [docarray](https://docs.docarray.org/) data structures.- Gateway ({class}`~jina.Gateway`), directs all the traffic when multiple Executors are combined inside a Flow.```{toctree}:hidden:executor/indexgateway/index``` --- # Source: https://github.com/jina-ai/jina/blob/master/docs/docarray-support.md(docarray-support)=# DocArray supportJina-serve depends heavily on DocArray to provide the data that is processed inside Jina-serve Executors and sent by our Clients.Recently, DocArray was heavily refactored for version 0.30.Starting from that version, DocArray usage has changed drastically, however Jina-serve can work seamlessly and automatically with any of the versions of Jina-serve.Jina-serve will automatically detect the docarray version installed and use the corresponding methods and APIs. However, developersmust take into account that some APIs and usages have changed, especially when it comes to developing Executors.The new version makes the dataclass feature of DocArray<0.30 a first-class citizen and for thispurpose it is built on top of [Pydantic](https://pydantic-docs.helpmanual.io/). An important shift is thatthe new DocArray adapts to users' data, whereas DocArray<0.30 forces user to adapt to the Document schema.## Document schemaAt the heart of DocArray>=0.30 is a new schema that is more flexible and expressive than the original DocArray schema.You can refer to the [DocArray README](https://github.com/docarray/docarray) for more details.Please note that also the names of data structure change in the new version of DocArray.TODO: ADD snippets for both versionsOn the Jina-serve side, this flexibility extends to every Executor, where you can now customize input and output schemas:- With DocArray<0.30 a Document has a fixed schema in the input and the output- With DocArray>=0.30 (the version currently used by default in Jina-serve), an Executor defines its own input and output schemas.It also provides several predefined schemas that you can use out of the box.## Executor APITo reflect the change with DocArray >=0.30, the Executor API supports schema definition. Thedesign is inspired by [FastAPI](https://fastapi.tiangolo.com/).The main difference, is that for `docarray<0.30` there is only a single [Document](https://docarray.org/legacy-docs/fundamentals/document/) with a fixed schema.However, with `docarray>=0.30` user needs to define their own `Document` by subclassing from [BaseDoc](https://docs.docarray.org/user_guide/representing/first_step/) or taking any of the [predefined Document types](https://docs.docarray.org/data_types/first_steps/) provided.````{tab} docarray>=0.30```{code-block} pythonfrom jina import Executor, requestsfrom docarray import DocList, BaseDocfrom docarray.documents import ImageDocfrom docarray.typing import AnyTensorimport numpy as npclass InputDoc(BaseDoc): img: ImageDocclass OutputDoc(BaseDoc): embedding: AnyTensorclass MyExec(Executor): @requests(on='/bar') def bar( self, docs: DocList[InputDoc], **kwargs ) -> DocList[OutputDoc]: docs_return = DocList[OutputDoc]( [OutputDoc(embedding=np.zeros((100, 1))) for _ in range(len(docs))] ) return docs_return```````````{tab} docarray<0.30```{code-block} pythonfrom jina import Executor, requestsfrom docarray import Document, DocumentArrayimport numpy as npclass MyExec(Executor): @requests(on='/bar') def bar( self, docs: DocumentArray, **kwargs ): docs_return = DocumentArray( [Document(embedding=np.zeros((100, 1))) for _ in range(len(docs))] ) return docs_return```````To ease with the transition from the old to the new `docarray` versions, there is the [`LegacyDocument`](https://docs.docarray.org/API_reference/documents/documents/#docarray.documents.legacy.LegacyDocument) which is a predefined Document that aims to providethe same data type as the original `Document` in `docarray<0.30`.## Client APIIn the client, the big change is that when using `docarray>=0.30`. you specify the schema that you expect the Deployment or Flow to return. You can pass the return type by using the `return_type` parameter in the `client.post` method:````{tab} docarray>=0.30```{code-block} pythonfrom jina import Clientfrom docarray import DocList, BaseDocfrom docarray.documents import ImageDocfrom docarray.typing import AnyTensorclass InputDoc(BaseDoc): img: ImageDocclass OutputDoc(BaseDoc): embedding: AnyTensorc = Client(host='')c.post('/', DocList[InputDoc]([InputDoc(img=ImageDoc()) for _ in range(10)]), return_type=DocList[OutputDoc])```````````{tab} docarray<0.30```{code-block} pythonfrom jina import Clientfrom docarray import DocumentArray, Documentc = Client(host='')c.post('/', DocumentArray([Document() for _ in range(10)]))```````## See also- [DocArray>=0.30](https://docs.docarray.org/) docs- [DocArray<0.30](https://docarray.org/legacy-docs/) docs- [Pydantic](https://pydantic-docs.helpmanual.io/) documentation for more details on the schema definition --- # Source: https://github.com/jina-ai/jina/blob/master/docs/envs/index.md(jina-env-vars)=# {octicon}`list-unordered` Environment VariablesJina-serve uses environment variables to determine different behaviours. To see all supported environment variables and their current values, run:```bashjina -vf```If you use containerized Executors (including {ref}`Kubernetes ` and {ref}`Docker Compose `), you can pass separate environment variables to each Executor in the following way:`````{tab} FLow YAML```yamljtype: Flowwith: {}executors:- name: executor0 port: 49583 env: JINA_LOG_LEVEL: DEBUG MYSECRET: ${{ ENV.MYSECRET }}- name: executor1 port: 62156 env: JINA_LOG_LEVEL: INFO CUDA_VISIBLE_DEVICES: 1`````````````{tab} Deployment YAML```yamljtype: Deploymentwith: name: executor0 port: 49583 env: JINA_LOG_LEVEL: DEBUG MYSECRET: ${{ ENV.MYSECRET }}````````````{tab} Python API```pythonfrom jina import Flowimport ossecret = os.environ['MYSECRET']f = ( Flow() .add(env={'JINA_LOG_LEVEL': 'DEBUG', 'MYSECRET': secret}) .add(env={'JINA_LOG_LEVEL': 'INFO', 'CUDA_VISIBLE_DEVICES': 1}))f.save_config("envflow.yml")```````The following environment variables are used internally in Jina:| Environment variable | Description ||-------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|| `JINA_AUTH_TOKEN` | Authentication token of Jina Cloud || `JINA_DEFAULT_HOST` | Default host where server is exposed || `JINA_DEFAULT_TIMEOUT_CTRL` | Default timeout time used by Flow to check readiness of Executors || `JINA_DEPLOYMENT_NAME` | Name of deployment, used by Head Runtime in Kubernetes to connect to different deployments || `JINA_DISABLE_UVLOOP` | If set, Jina will not use uvloop event loop for concurrent execution || `JINA_FULL_CLI` | If set, all CLI options will be shown in help || `JINA_GATEWAY_IMAGE` | Used when exporting a Flow to Kubernetes or Docker Compose to override default gateway image || `JINA_GRPC_RECV_BYTES` | Set by gRPC service to keep track of received bytes || `JINA_GRPC_SEND_BYTES` | Set by gRPC service to keep track of sent bytes || `JINA_K8S_ACCESS_MODES` | Configures access modes for `PersistentVolumeClaim` attached to `StatefulSet`, when creating a `StatefulSet` in Kubernetes for an Executor using volumes. Defaults to '["ReadWriteOnce"]' || `JINA_K8S_STORAGE_CAPACITY` | Configures capacity for `PersistentVolumeClaim` attached to `StatefulSet`, when creating a `StatefulSet` in Kubernetes for an Executor using volumes. Defaults to '10G' || `JINA_K8S_STORAGE_CLASS_NAME` | Configures storage class for `PersistentVolumeClaim` attached to `StatefulSet`, when creating a `StatefulSet` in Kubernetes for an Executor using volumes. Defaults to 'standard' || `JINA_LOCKS_ROOT` | Root folder where file locks for concurrent Executor initialization || `JINA_LOG_CONFIG` | Configuration used for logger || `JINA_LOG_LEVEL` | Logging level used: INFO, DEBUG, WARNING || `JINA_LOG_NO_COLOR` | If set, disables color from rich console || `JINA_MP_START_METHOD` | Sets multiprocessing start method used by Jina || `JINA_OPTOUT_TELEMETRY` | If set, disables telemetry || `JINA_RANDOM_PORT_MAX` | Maximum port number used when selecting random ports to apply for Executors or Gateway || `JINA_RANDOM_PORT_MIN` | Minimum port number used when selecting random ports to apply for Executors or Gateway || `JINA_STREAMER_ARGS` | Jina uses this variable to inject `GatewayStreamer` arguments into host environment running a Gateway | --- # Source: https://github.com/jina-ai/jina/blob/master/docs/get-started/create-app.md# {fas}`folder-plus` Create First ProjectLet's build a toy application with Jina-serve. To start, use Jina-serve CLI to make a new Deployment or a Flow:## Create a Deployment or FlowA {ref}`Deployment ` lets you serve and scale a single model or microservice, whereas a {ref}`Flow ` lets you connect Deployments into a processing pipeline.````{tab} Deployment```bashjina new hello-jina --type=deployment```This creates a new project folder called `hello-jina-serve` with the following file structure:```texthello-jina/ |- client.py |- deployment.yml |- executor1/ |- config.yml |- executor.py```- `deployment.yml` is the configuration file for the Deployment.- `executor1/` is where you write your {ref}`Executor ` code.- `config.yml` is the configuration file for the Executor. It stores metadata for your Executor, as well as dependencies.- `client.py` is the entrypoint of your Jina project. You can run it via `python app.py`.There are some other files like `README.md` and `requirements.txt` to provide extra metadata about that Executor. More information {ref}`can be found here`.Now run it and observe the output of the server and client:## Launch Deployment```shelljina deployment --uses deployment.yml``````shell──── 🎉 Deployment is ready to serve! ────╭────────────── 🔗 Endpoint ───────────────╮│ ⛓ Protocol grpc ││ 🏠 Local 0.0.0.0:54321 ││ 🔒 Private xxx.xx.xxx.xxx:54321 ││ Public xx.xxx.xxx.xxx:54321 ││ ⛓ Protocol http ││ 🏠 Local 0.0.0.0:54322 ││ 🔒 Private xxx.xx.xxx.xxx:54322 ││ Public xx.xxx.xxx.xxx:54322 │╰──────────────────────────────────────────╯╭─────────── 💎 HTTP extension ────────────╮│ 💬 Swagger UI 0.0.0.0:54322/docs ││ 📚 Redoc 0.0.0.0:54322/redoc │╰──────────────────────────────────────────╯```````````{tab} Flow```bashjina new hello-jina --type=flow```This creates a new project folder called `hello-jina-serve` with the following file structure:```texthello-jina/ |- client.py |- flow.yml |- executor1/ |- config.yml |- executor.py```- `flow.yml` is the configuration file for the Flow`.- `executor1/` is where you write your {ref}`Executor ` code.- `config.yml` is the configuration file for the Executor. It stores metadata for your Executor, as well as dependencies.- `client.py` is the entrypoint of your Jina-serve project. You can run it via `python app.py`.There are some other files like `README.md` and `requirements.txt` to provide extra metadata about that Executor. More information {ref}`can be found here`.Now run it and observe the output of the server and client:## Launch Flow```shelljina-serve flow --uses flow.yml``````shell──── 🎉 Flow is ready to serve! ────╭────────────── 🔗 Endpoint ───────────────╮│ ⛓ Protocol grpc ││ 🏠 Local 0.0.0.0:54321 ││ 🔒 Private xxx.xx.xxx.xxx:54321 ││ Public xx.xxx.xxx.xxx:54321 ││ ⛓ Protocol http ││ 🏠 Local 0.0.0.0:54322 ││ 🔒 Private xxx.xx.xxx.xxx:54322 ││ Public xx.xxx.xxx.xxx:54322 ││ ⛓ Protocol websocket ││ 🏠 Local 0.0.0.0:54323 ││ 🔒 Private xxx.xx.xxx.xxx:54323 ││ Public xx.xxx.xxx.xxx:54323 │╰──────────────────────────────────────────╯╭─────────── 💎 HTTP extension ────────────╮│ 💬 Swagger UI 0.0.0.0:54322/docs ││ 📚 Redoc 0.0.0.0:54322/redoc │╰──────────────────────────────────────────╯```````Deployments and Flows share many common ways of doing things. We'll go into those below.## Connect with ClientThe {ref}`client` lets you connect to your Deployment or Flow over gRPC, HTTP or WebSockets. {ref}`Third party clients ` for non-Python languages.```bashpython client.py``````shell['hello, world!', 'goodbye, world!']```## Add logicYou can use any Python library in an Executor. For example, add `pytorch` to `executor1/requirements.txt` and crunch some numbers.In `executor.py`, add another endpoint `/get-tensor` as follows:```{code-block} python---emphasize-lines: 13-16---import numpy as npimport torchfrom jina import Executor, requestsclass MyExecutor(Executor): @requests def foo(self, docs, **kwargs): docs[0].text = 'hello, world!' docs[1].text = 'goodbye, world!' @requests(on='/crunch-numbers') def bar(self, docs:, **kwargs): for doc in docs: doc.tensor = torch.tensor(np.random.random([10, 2]))```Kill the last server with `Ctrl-C` and restart the server with `jina flow --uses deployment.yml`.## Call `/crunch-number` endpointModify `client.py` to call the `/crunch-numbers` endpoint:```pythonfrom jina import Clientfrom docarray import DocListfrom docarray.documents.legacy import LegacyDocumentif __name__ == '__main__': c = Client(port=54321) da = c.post('/crunch-numbers', DocList[LegacyDocument]([LegacyDocument(), LegacyDocument()]), return_type=DocList[LegacyDocument]) print(da.tensor)```After you save that, you can run your new client:```bashpython client.py``````texttensor([[[0.9594, 0.9373], [0.4729, 0.2012], [0.7907, 0.3546], [0.6961, 0.7463], [0.3487, 0.7837], [0.7825, 0.0556], [0.3296, 0.2153], [0.2207, 0.0220], [0.9547, 0.9519], [0.6703, 0.4601]], [[0.9684, 0.6781], [0.7906, 0.8454], [0.2136, 0.9147], [0.3999, 0.7443], [0.2564, 0.0629], [0.4713, 0.1018], [0.3626, 0.0963], [0.7562, 0.2183], [0.9239, 0.3294], [0.2457, 0.9189]]], dtype=torch.float64)```## Deploy to cloudJCloud offers free CPU and GPU instances to host Jina projects.```{admonition} Deployments on JCloud:class: importantAt present, JCloud is only available for Flows. We are currently working on supporting Deployments.``````bashjina-serve auth login```Log in with your GitHub, Google or Email account:```bashjina cloud flow deploy ./``````{figure} deploy-jcloud-ongoing.png```Deploying a Flow to the cloud is fully automatic and takes a few minutes.After it is done, you should see the following message in the terminal.```text╭────────────── 🎉 Flow is available! ──────────────╮│ ││ ID 1655d050ad ││ Endpoint(s) grpcs://1655d050ad.wolf.jina.ai ││ │╰───────────────────────────────────────────────────╯```Now change the Client's code to use the deployed endpoint shown above:```{code-block} python---emphasize-lines: 6---from jina import Clientfrom docarray import DocListfrom docarray.documents.legacy import LegacyDocumentif __name__ == '__main__': c = Client(host='grpcs://1655d050ad.wolf.jina.ai') da = c.post('/crunch-numbers', DocList[LegacyDocument]([LegacyDocument(), LegacyDocument()])) print(da.tensor)``````{tip}The very first request can be a bit slow because the server is starting up.``````texttensor([[[0.4254, 0.4305], [0.6200, 0.5783], [0.7989, 0.8742], [0.1324, 0.7228], [0.1274, 0.6538], [0.1533, 0.7543], [0.3025, 0.7702], [0.6938, 0.9289], [0.5222, 0.7280], [0.7298, 0.4923]], [[0.9747, 0.5026], [0.6438, 0.4007], [0.0899, 0.8635], [0.3142, 0.4142], [0.4447, 0.2540], [0.1109, 0.6260], [0.3850, 0.9894], [0.0845, 0.7538], [0.1444, 0.5136], [0.3368, 0.6162]]], dtype=torch.float64)```## Delete the deployed projectDon't forget to delete a Flow if you're not using it any more:```bashjina cloud flow remove 1655d050ad``````textSuccessfully removed Flow 1655d050ad.```You've just finished your first toy Jina-serve project, congratulations! You can now start your own project. --- # Source: https://github.com/jina-ai/jina/blob/master/docs/get-started/install/apple-silicon-m1-m2.md# On Apple SiliconIf you own a macOS device with an Apple Silicon M1/M2 chip, you can run Jina-serve **natively** on it (instead of running under Rosetta) and enjoy up to 10x faster performance. This chapter summarizes how to install Jina-serve.## Check terminal and deviceTo ensure you are using the right terminal, run:```bashuname -m```It should return:```textarm64```## Install Homebrew`brew` is a package manager for macOS. If you have already installed it, you need to confirm it is actually installed for Apple Silicon not for Rosetta. To check that, run:```bashwhich brew``````text/opt/homebrew/bin/brew```If it's installed under `/usr/local/` instead of `/opt/homebrew/`, it means your `brew` is installed for Rosetta not for Apple Silicon. You need to [reinstall it](https://apple.stackexchange.com/a/410829).```{danger}Reinstalling `brew` can be a destructive operation. Ensure you have backed up your data before proceeding.```To (re)install brew, run:```bash/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"```You can observe the output to check if it contains `/opt/homebrew` to ensure you are installing for Apple Silicon.## Install PythonPython also has to be installed for Apple Silicon. It is possible it is installed for Rosetta, and you are not aware of that. To confirm, run:```pythonimport platformplatform.machine()```This should output:```text'arm64'```If not, then you are using Python under Rosetta, and you need to install Python for Apple Silicon with `brew`:```bashbrew install python3```As of August 2022, this will install Python 3.10 natively for Apple Silicon.Ensure you note down where `python` and `pip` are installed. In this example, they are installed to `/opt/homebrew/bin/python3` and `/opt/homebrew/opt/python@3.10/libexec/bin/pip` respectively.## Install dependencies wheelsThere are some core dependencies that Jina-serve needs to run, whose wheels are not available on PyPI but fortunately are available as wheels. To install them, run:```bashbrew install protobuf numpy```## Install Jina-serveNow we can install Jina-serve via `pip`. Ensure you use the correct `pip`:```bash/opt/homebrew/opt/python@3.10/libexec/bin/pip install jina````grpcio` requires building the wheels, it will take some time.Note: If the previous step fails, adding the environment variables below might solve the problem:```bashexport GRPC_PYTHON_BUILD_SYSTEM_OPENSSL=1export GRPC_PYTHON_BUILD_SYSTEM_ZLIB=1```After all the dependencies are installed, you can run Jina-serve CLI and check the system information.```bashjina -vf``````{code-block} text---emphasize-lines: 13-15---- jina 3.7.14- docarray 0.15.4- jcloud 0.0.35- jina-hubble-sdk 0.16.1- jina-proto 0.1.13- protobuf 3.20.1- proto-backend python- grpcio 1.47.0- pyyaml 6.0- python 3.10.6- platform Darwin- platform-release 21.6.0- platform-version Darwin Kernel Version 21.6.0: Sat Jun 18 17:07:28 PDT 2022; root:xnu-8020.140.41~1/RELEASE_ARM64_T8110- architecture arm64- processor arm- uid 94731629138370- session-id 49497356-254e-11ed-9624-56286d1a91c2- uptime 2022-08-26T16:49:28.279723- ci-vendor (unset)* JINA_DEFAULT_HOST (unset)* JINA_DEFAULT_TIMEOUT_CTRL (unset)* JINA_DEPLOYMENT_NAME (unset)* JINA_DISABLE_UVLOOP (unset)* JINA_EARLY_STOP (unset)* JINA_FULL_CLI (unset)* JINA_GATEWAY_IMAGE (unset)* JINA_GRPC_RECV_BYTES (unset)* JINA_GRPC_SEND_BYTES (unset)* JINA_HUB_NO_IMAGE_REBUILD (unset)* JINA_LOG_CONFIG (unset)* JINA_LOG_LEVEL (unset)* JINA_LOG_NO_COLOR (unset)* JINA_MP_START_METHOD (unset)* JINA_OPTOUT_TELEMETRY (unset)* JINA_RANDOM_PORT_MAX (unset)* JINA_RANDOM_PORT_MIN (unset)```Congratulations! You have successfully installed Jina-serve on Apple Silicon.````{tip}To install MPS-enabled PyTorch, run:```bash/opt/homebrew/opt/python@3.10/libexec/bin/pip install -U --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu``````` --- # Source: https://github.com/jina-ai/jina/blob/master/docs/get-started/install/docker.md# Via Docker ImageOur universal Docker image is ready-to-use on linux/amd64 and linux/arm64. The Docker image name always starts with `jinaai/jina` followed by a tag composed of three parts:```textjinaai/jina:{version}{python_version}{extra}```- `{version}`: The version of Jina-serve. Possible values: - `latest`: the last release; - `master`: the master branch of `jina-ai/jina` repository; - `x.y.z`: the release of a particular version; - `x.y`: the alias to the last `x.y.z` patch release, i.e. `x.y` = `x.y.max(z)`;- `{python_version}`: The Python version of the image. Possible values: - ` `, `-py37`: Python 3.7; - `-py38` for Python 3.8; - `-py39` for Python 3.9;- `{extra}`: the extra dependency installed along with Jina-serve. Possible values: - ` `: Jina is installed inside the image with minimum dependencies `pip install jina`; - `-perf`: Jina is installed inside the image via `pip install jina`. It includes all performance dependencies; - `-standard`: Jina is installed inside the image via `pip install jina`. It includes all recommended dependencies; - `-devel`: Jina is installed inside the image via `pip install "jina[devel]"`. It includes `standard` plus some extra dependencies;Examples:- `jinaai/jina:0.9.6`: the `0.9.6` release with Python 3.7 and the entrypoint of `jina`.- `jinaai/jina:latest`: the latest release with Python 3.7 and the entrypoint of `jina`- `jinaai/jina:master`: the master with Python 3.7 and the entrypoint of `jina`## Image alias and updates| Event | Updated images | Aliases || --- | --- | --- || On Master Merge | `jinaai/jina:master{python_version}{extra}` | || On `x.y.z` release | `jinaai/jina:x.y.z{python_version}{extra}` | `jinaai/jina:latest{python_version}{extra}`, `jinaai/jina:x.y{python_version}{extra}` |12 images are built, i.e. taking the combination of: - `{python_version} = ["-py37", "-py38", "-py39"]` - `{extra} = ["", "-devel", "-standard", "-perf"]`## Image size on different tags```{warning}[Due to a known bug in shields.io/Docker Hub API](https://github.com/badges/shields/issues/7583), the following badge may show "invalid" status randomly.```|Image Size|| ---||![](https://img.shields.io/docker/image-size/jinaai/jina/latest?label=jinaai%2Fjina%3Alatest&logo=docker)||![](https://img.shields.io/docker/image-size/jinaai/jina/latest-py38?label=jinaai%2Fjina%3Alatest-py38&logo=docker)||![](https://img.shields.io/docker/image-size/jinaai/jina/latest-py39?label=jinaai%2Fjina%3Alatest-py39&logo=docker)||![](https://img.shields.io/docker/image-size/jinaai/jina/latest-devel?label=jinaai%2Fjina%3Alatest-devel&logo=docker)||![](https://img.shields.io/docker/image-size/jinaai/jina/latest-perf?label=jinaai%2Fjina%3Alatest-perf&logo=docker)||![](https://img.shields.io/docker/image-size/jinaai/jina/latest-standard?label=jinaai%2Fjina%3Alatest-standard&logo=docker)||![](https://img.shields.io/docker/image-size/jinaai/jina/latest-py38-devel?label=jinaai%2Fjina%3Alatest-py38-devel&logo=docker)||![](https://img.shields.io/docker/image-size/jinaai/jina/latest-py38-perf?label=jinaai%2Fjina%3Alatest-py38-perf&logo=docker)||![](https://img.shields.io/docker/image-size/jinaai/jina/latest-py38-standard?label=jinaai%2Fjina%3Alatest-py38-standard&logo=docker)||![](https://img.shields.io/docker/image-size/jinaai/jina/latest-py39-devel?label=jinaai%2Fjina%3Alatest-py39-devel&logo=docker)||![](https://img.shields.io/docker/image-size/jinaai/jina/latest-py39-perf?label=jinaai%2Fjina%3Alatest-py39-perf&logo=docker)||![](https://img.shields.io/docker/image-size/jinaai/jina/latest-py39-standard?label=jinaai%2Fjina%3Alatest-py39-standard&logo=docker)||![](https://img.shields.io/docker/image-size/jinaai/jina/master?label=jinaai%2Fjina%3Amaster&logo=docker)||![](https://img.shields.io/docker/image-size/jinaai/jina/master-py38?label=jinaai%2Fjina%3Amaster-py38&logo=docker)||![](https://img.shields.io/docker/image-size/jinaai/jina/master-py39?label=jinaai%2Fjina%3Amaster-py39&logo=docker)||![](https://img.shields.io/docker/image-size/jinaai/jina/master-devel?label=jinaai%2Fjina%3Amaster-devel&logo=docker)||![](https://img.shields.io/docker/image-size/jinaai/jina/master-perf?label=jinaai%2Fjina%3Amaster-perf&logo=docker)||![](https://img.shields.io/docker/image-size/jinaai/jina/master-standard?label=jinaai%2Fjina%3Amaster-standard&logo=docker)||![](https://img.shields.io/docker/image-size/jinaai/jina/master-py38-devel?label=jinaai%2Fjina%3Amaster-py38-devel&logo=docker)||![](https://img.shields.io/docker/image-size/jinaai/jina/master-py38-perf?label=jinaai%2Fjina%3Amaster-py38-perf&logo=docker)||![](https://img.shields.io/docker/image-size/jinaai/jina/master-py38-standard?label=jinaai%2Fjina%3Amaster-py38-standard&logo=docker)||![](https://img.shields.io/docker/image-size/jinaai/jina/master-py39-devel?label=jinaai%2Fjina%3Amaster-py39-devel&logo=docker)||![](https://img.shields.io/docker/image-size/jinaai/jina/master-py39-perf?label=jinaai%2Fjina%3Amaster-py39-perf&logo=docker)||![](https://img.shields.io/docker/image-size/jinaai/jina/master-py39-standard?label=jinaai%2Fjina%3Amaster-py39-standard&logo=docker)| --- # Source: https://github.com/jina-ai/jina/blob/master/docs/get-started/install/index.md```{include} ../install.md``````{toctree}:hidden:dockerapple-silicon-m1-m2windowstroubleshooting``` --- # Source: https://github.com/jina-ai/jina/blob/master/docs/get-started/install/troubleshooting.md# TroubleshootingThis article helps you to solve the installation problems of Jina-serve.## On Linux/Mac, building wheels takes long timeThe normal installation of Jina-serve takes 10 seconds. If yours takes longer than this, then it is likely you unnecessarily built wheels from scratch.Every upstream dependency of Jina-serve has pre-built wheels exhaustively for x86/arm64, macos/Linux and Python 3.7/3.8/3.9, including `numpy`, `protobuf`, `grpcio` etc. This means when you install Jina-serve, your `pip` should directly leverage the pre-built wheels instead of building them from scratch locally. For example, you should expect the install log to contain `-cp38-cp38-macosx_10_15_x86_64.whl` when installing Jina-serve on macOS with Python 3.8.If you find you are building wheels during installation (see an example below), then it is a sign that you are installing Jina-serve **wrongly**.```textCollecting numpy==2.0.* Downloading numpy-2.0.18.tar.gz (801 kB) |████████████████████████████████| 801 kB 1.1 MB/sBuilding wheels for collected packages: numpy Building wheel for numpy (setup.py) ... done Created wheel for numpy ... numpy-2.0.18-cp38-cp38-macosx_10_15_x86_64.whl```### Solution: update your `pip`It could simply be that your local `pip` is too old. Updating it should solve the problem:```bashpip install -U pip```### If not, then...Then you are likely installing Jina-serve on a less-supported system/architecture. For example, on native Mac M1, Alpine Linux, or Raspberry Pi 2/3 (armv6/7).## On Windows with `conda`Unfortunately, `conda install` is not supported on Windows. You can either do `pip install jina` natively on Windows, or use `pip/conda install` under WSL2.## Upgrading from Jina-serve 2.x to 3.xIf you upgraded an existing Jina-serve installation from 2.x to 3.x you may see the following error message:```textOSError: `docarray` dependency is not installed correctly, please reinstall with `pip install -U --force-reinstall docarray````This can be fixed by reinstalling the `docarray` package manually:```bashpip install -U --force-reinstall docarray```To avoid this issue in the first place, we recommend installing Jina-serve in a new virtual environment instead of upgrading from an old installation. --- # Source: https://github.com/jina-ai/jina/blob/master/docs/get-started/install/windows.md(jina-on-windows)=# On WindowsYou can install and use Jina-serve on Windows.However, Jina-serve is built keeping *nix-based platforms in mind, and the upstream libraries that Jina-serve depends on also follow the similar ideology. Hence, there are some caveats when running Jina-serve on Windows. [If you face additional issues, please let us know.](https://github.com/jina-ai/jina/issues/)```{caution}There can be a significant performance impact while running Jina on Windows. You may not want to use it in production.``````{tip}Alternatively, you can use the Windows Subsystem for Linux for better compatibility. Check the official guide [here](https://docs.microsoft.com/en-us/windows/wsl/install).Make sure you install WSL**2**.Once done, you can install Jina as on a native *nix platform.```## Known issues### `multiprocessing spawn`Jina-serve relies heavily on `multiprocessing` to enable scaling and distribution. Windows only supports [spawn start method for multiprocessing](https://docs.python.org/3/library/multiprocessing.html#the-spawn-and-forkserver-start-methods), which has a several caveats.{ref}`Please follow the guidelines here.`### Compatibility of Executors in the HubWe've added preliminary support for using Executors listed in the Hub portal. Note that, these Executors are based on *nix OS and might not be compatible to run natively on Windows. Containers that are built on Windows are not yet supported.```{seealso}[Install Docker Desktop on Windows](https://docs.docker.com/desktop/windows/install/)```### `UnicodeEncodeError` on Jina-serve CLI```UnicodeEncodeError: 'charmap' codec can't encode character '\u25ae' in position : character maps to ```Set environment variable `PYTHONIOENCODING='utf-8'` before starting your Python script. --- # Source: https://github.com/jina-ai/jina/blob/master/docs/get-started/install.md(install)=# {octicon}`desktop-download` InstallJina-serve comes with multiple installation options, enabling different feature sets.Standard install enables all major features of Jina-serve and is the recommended installation for most users.````{tab} via PyPI```shellpip install -U jina```````````{tab} via Conda```shellconda install jina -c conda-forge```````````{tab} via Docker```shelldocker run jinaai/jina:latest```````## More install optionsVersion identifiers [are explained here](https://github.com/jina-ai/jina/blob/master/RELEASE.md).### MinimumMinimum install enables basic features of Jina-serve, but without support for HTTP, WebSocket, Docker and Hub.Minimum install is often used when building and deploying an Executor.````{tab} via PyPI```shellJINA_PIP_INSTALL_CORE=1 pip install jina```````````{tab} via Conda```shellconda install jina-core -c conda-forge```````````{tab} via Docker```shelldocker run jinaai/jina:latest```````### Minimum but more performantSame as minimum install, but also install `uvloop` and `lz4`.````{tab} via PyPI```shellJINA_PIP_INSTALL_PERF=1 pip install jina```````````{tab} via Conda```shellconda install jina-perf -c conda-forge```````````{tab} via Docker```shelldocker run jinaai/jina:latest-perf```````### Full development dependenciesThis installs additional dependencies, useful for developing Jina-serve itself. This includes Pytest, CI components etc.````{tab} via PyPI```shellpip install "jina[devel]"```````````{tab} via Docker```shelldocker run jinaai/jina:latest-devel```````### PrereleasePrerelease is the version always synced with the `master` branch of Jina-serve's GitHub repository.````{tab} via PyPI```shellpip install --pre jina```````````{tab} via Docker```shelldocker run jinaai/jina:master```````## Autocomplete commands on Bash, Zsh and FishAfter installing Jina via `pip`, you should be able to use your shell's autocomplete feature while using Jina's CLI. For example, typing `jina` then hitting your Tab key will provide the following suggestions:```bashjina--help --version --version-full check client flow gateway hello pod ping deployment hub```The autocomplete is context-aware. It also works when you type a second-level argument:```bashjina hub--help new pull push```Currently, the feature is enabled automatically on Bash, Zsh and Fish. It requires you to have a standard shell path as follows:| Shell | Configuration file path || --- | --- || Bash | `~/.bashrc` || Zsh | `~/.zshrc` || Fish | `~/.config/fish/config.fish` | --- # Source: https://github.com/jina-ai/jina/blob/master/docs/index.md# Welcome to Jina-serve!```{admonition} Survey:class: tipTake our **[user experience survey](https://10sw1tcpld4.typeform.com/to/EGAEReM7?utm_source=doc&utm_medium=github&utm_campaign=user%20experience&utm_term=feb2023&utm_content=survey)** to let us know your thoughts and help shape the future of Jina!``````{include} ../README.md:start-after: :end-before: ```## InstallMake sure that you have Python 3.7+ installed on Linux/macOS/{ref}`Windows `.````{tab} via PyPI```shellpip install -U jina```````````{tab} via Conda```shellconda install jina -c conda-forge```````(build-ai-services)=(build-a-pipeline)=## Getting StartedJina-serve supports developers in building AI services and pipelines:````{tab} Build AI Services```{include} ../README.md:start-after: :end-before: ```````````{tab} Build Pipelines```{include} ../README.md:start-after: :end-before: ```````## Next steps:::::{grid} 2:gutter: 3::::{grid-item-card} {octicon}`cross-reference;1.5em` Learn DocArray API:link: https://docarray.docs.orgDocArray is the foundational data structure of Jina. Before starting Jina, first learn DocArray to quickly build a PoC.::::::::{grid-item-card} {octicon}`gear;1.5em` Learn Executor:link: concepts/serving/executor/index:link-type: doc{term}`Executor` is a Python class that can serve logic using `Documents`.::::::::{grid-item-card} {octicon}`workflow;1.5em` Learn Deployment:link: concepts/orchestration/deployment:link-type: doc{term}`Deployment` serves an Executor as a scalable service making it available to receive `Documents` using `gRPC` or `HTTP`.::::::::{grid-item-card} {octicon}`workflow;1.5em` Learn Flow:link: concepts/orchestration/flow:link-type: doc{term}`Flow` orchestrates Executors using different Deployments into a processing pipeline to accomplish a task.::::::::{grid-item-card} {octicon}`cross-reference;1.5em` Learn Gateway:link: concepts/serving/gateway/indexThe Gateway is a microservice that serves as the entrypoint of a {term}`Flow`. It exposes multiple protocols for external communications and routes all internal traffic.::::::::{grid-item-card} {octicon}`package-dependents;1.5em` Explore Executor Hub:link: concepts/executor/hub/index:link-type: doc:class-card: color-gradient-card-1Executor Hub allows you to containerize, share, explore and make Executors ready for the cloud.::::::::{grid-item-card} {octicon}`cpu;1.5em` Deploy a Flow to Cloud:link: concepts/jcloud/index:link-type: doc:class-card: color-gradient-card-2Jina AI Cloud is the MLOps platform for hosting Jina-serve projects.:::::::::```{include} ../README.md:start-after: :end-before: ``````{toctree}:caption: Get Started:hidden:get-started/install/indexget-started/create-app``````{toctree}:caption: Concepts:hidden:concepts/preliminaries/indexconcepts/serving/indexconcepts/orchestration/indexconcepts/client/index``````{toctree}:caption: Cloud Native:hidden:cloud-nativeness/k8scloud-nativeness/docker-composecloud-nativeness/opentelemetryjina-ai-cloud/index``````{toctree}:caption: Developer Reference:hidden::maxdepth: 1api-rstcli/indexyaml-specenvs/indextelemetryproto/docsdocarray-support``````{toctree}:caption: Tutorials:hidden:tutorials/deploy-modeltutorials/gpu-executortutorials/deploy-pipelinetutorials/llm-serve``````{toctree}:caption: Legacy Support:hidden::maxdepth: 1Jina 2 Documentation ```---{ref}`genindex` | {ref}`modindex` --- # Source: https://github.com/jina-ai/jina/blob/master/docs/jina-ai-cloud/index.md# {octicon}`beaker` Jina AI Cloud:::::{grid} 2:gutter: 3::::{grid-item-card} {octicon}`package-dependents;1.5em` Explore Executor Hub:link: ../concepts/serving/executor/hub/index:link-type: docExecutor Hub is an Executor marketplace that allows you to share, explore and test Executors.::::::::{grid-item-card} {octicon}`cpu;1.5em` Deploy a Flow to JCloud:link: ../concepts/jcloud/index:link-type: docJCloud is a cost-efficient hosting platform specifically designed for Jina-serve projects.:::::::::Jina AI Cloud is the **portal** and **single entrypoint** to manage **all** your Jina AI resources, including:- Data - [docarray](https://docs.docarray.org/user_guide/storing/doc_store/store_jac/) - [Finetuner artifacts](https://finetuner.jina.ai/walkthrough/save-model/#save-artifact)- [Executors](../concepts/serving/executor/index.md)- [Flows](../concepts/orchestration/flow.md)- [Apps](https://now.jina.ai)_Manage_ in this context means: CRUD, access control, personal access tokens, and subscription.```{tip}Are you ready to unlock the power of AI with Jina AI Cloud? Take a look at our [pricing options](https://cloud.jina.ai/pricing) now!``````{toctree}:hidden:login../concepts/serving/executor/hub/index../concepts/jcloud/index``` --- # Source: https://github.com/jina-ai/jina/blob/master/docs/jina-ai-cloud/login.md# Login & Token ManagementTo use Jina AI Cloud, you need to log in, either via a GitHub or Google account. This section describes how to log in Jina AI Cloud and manage the personal access token. You can do it via webpage, CLI or Python API.## via WebpageVisit [https://jina.ai](https://jina.ai) and click on the "login" button.### Login```{figure} login-1.png```After log in you can see your name and avatar in the top-right corner.```{figure} login-2.png```### Token ManagementYou can follow the GUI to create/delete personal access tokens for your Jina-serve applications.```{figure} pat.png```To use a token, set it as the environment variable `JINA_AUTH_TOKEN`.## via CLI### Login```shelljina auth login```This will open browser automatically and login via 3rd party. Token will be saved locally.### LogoutIf there is a valid token locally, this will disable that token and remove it from local config.```shelljina auth logout```### Token Management#### Create a new PAT```shelljina auth token create -e ```To use a token, set it as the environment variable `JINA_AUTH_TOKEN`.#### List PATs```shelljina auth token list```#### Delete PAT```shelljina auth token delete ```## via Python APIInstalled along with Jina-serve, you can leverage the `hubble` package to manage login from Python### Login```pythonimport hubble# Log in via browser or PAT. The token is saved locally.# In Jupyter/Google Colab, interactive login is used automatically.# To disable this feature, run `hubble.login(interactive=False)`.hubble.login()```### Check login status```pythonimport hubbleif hubble.is_logged_in(): print('yeah')else: print('no')```### Get a personal access tokenNotice that the token you got from this function is always valid. If the token is invalid or expired, the result is `None`.```pythonimport hubblehubble.get_token()```If you are using inside an interactive environment, i.e. user can input via stdin:```pythonimport hubblehubble.get_token(interactive=True)```Mark a function as login required,```pythonimport hubble@hubble.login_requireddef foo(): pass```### Logout```pythonimport hubble# If there is a valid token locally,# this will disable that token and remove it from local config.hubble.logout()```### Token managementAfter calling `hubble.login()`, you can use the client:```pythonimport hubbleclient = hubble.Client(max_retries=None, jsonify=True)# Get current user information.response = client.get_user_info()# Create a new personal access token for longer expiration period.response = client.create_personal_access_token(name='my-pat', expiration_days=30)# Query all personal access tokens.response = client.list_personal_access_tokens()```### Artifact management```pythonimport hubbleimport ioclient = hubble.Client(max_retries=None, jsonify=True)# Upload artifact to Hubble Artifact Storage by providing path.response = client.upload_artifact(f='~/Documents/my-model.onnx', is_public=False)# Upload artifact to Hubble Artifact Storage by providing `io.BytesIO`response = client.upload_artifact( f=io.BytesIO(b"some initial binary data: \x00\x01"), is_public=False)# Get current artifact information.response = client.get_artifact_info(id='my-artifact-id')# Download artifact to local directory.response = client.download_artifact(id='my-artifact-id', f='my-local-filepath')# Download artifact as an io.BytesIO objectresponse = client.download_artifact(id='my-artifact-id', f=io.BytesIO())# Get list of artifacts.response = client.list_artifacts(filter={'metaData.foo': 'bar'}, sort={'type': -1})# Delete the artifact.response = client.delete_artifact(id='my-artifact-id')```### Error handling```pythonimport hubbleclient = hubble.Client()try: client.get_user_info()except hubble.excepts.AuthenticationRequiredError: print('Please login first.')except Exception: print('Unknown error')``` --- # Source: https://github.com/jina-ai/jina/blob/master/docs/proto/docs.md# Protocol Documentation## Table of Contents- [docarray.proto](#docarray-proto) - [DocumentArrayProto](#docarray-DocumentArrayProto)- [jina.proto](#jina-proto) - [DataRequestListProto](#jina-DataRequestListProto) - [DataRequestProto](#jina-DataRequestProto) - [DataRequestProto.DataContentProto](#jina-DataRequestProto-DataContentProto) - [DataRequestProtoWoData](#jina-DataRequestProtoWoData) - [EndpointsProto](#jina-EndpointsProto) - [HeaderProto](#jina-HeaderProto) - [JinaInfoProto](#jina-JinaInfoProto) - [JinaInfoProto.EnvsEntry](#jina-JinaInfoProto-EnvsEntry) - [JinaInfoProto.JinaEntry](#jina-JinaInfoProto-JinaEntry) - [RelatedEntity](#jina-RelatedEntity) - [RouteProto](#jina-RouteProto) - [StatusProto](#jina-StatusProto) - [StatusProto.ExceptionProto](#jina-StatusProto-ExceptionProto) - [StatusProto.StatusCode](#jina-StatusProto-StatusCode) - [JinaDataRequestRPC](#jina-JinaDataRequestRPC) - [JinaDiscoverEndpointsRPC](#jina-JinaDiscoverEndpointsRPC) - [JinaGatewayDryRunRPC](#jina-JinaGatewayDryRunRPC) - [JinaInfoRPC](#jina-JinaInfoRPC) - [JinaRPC](#jina-JinaRPC) - [JinaSingleDataRequestRPC](#jina-JinaSingleDataRequestRPC)- [Scalar Value Types](#scalar-value-types)

Top

## docarray.proto### DocumentArrayProtothis file is just a placeholder for the DA coming from jina._docarray dependency

Top

## jina.proto### DataRequestListProtoRepresents a list of data requestsThis should be replaced by streaming| Field | Type | Label | Description || ----- | ---- | ----- | ----------- || requests | [DataRequestProto](#jina-DataRequestProto) | repeated | requests in this list |### DataRequestProtoRepresents a DataRequest| Field | Type | Label | Description || ----- | ---- | ----- | ----------- || header | [HeaderProto](#jina-HeaderProto) | | header contains meta info defined by the user || parameters | [google.protobuf.Struct](#google-protobuf-Struct) | | extra kwargs that will be used in executor || routes | [RouteProto](#jina-RouteProto) | repeated | status info on every routes || data | [DataRequestProto.DataContentProto](#jina-DataRequestProto-DataContentProto) | | container for docs and groundtruths |### DataRequestProto.DataContentProto| Field | Type | Label | Description || ----- | ---- | ----- | ----------- || docs | [docarray.DocumentArrayProto](#docarray-DocumentArrayProto) | | the docs in this request || docs_bytes | [bytes](#bytes) | | the docs in this request as bytes |### DataRequestProtoWoData| Field | Type | Label | Description || ----- | ---- | ----- | ----------- || header | [HeaderProto](#jina-HeaderProto) | | header contains meta info defined by the user || parameters | [google.protobuf.Struct](#google-protobuf-Struct) | | extra kwargs that will be used in executor || routes | [RouteProto](#jina-RouteProto) | repeated | status info on every routes |### EndpointsProtoRepresents the set of Endpoints exposed by an Executor| Field | Type | Label | Description || ----- | ---- | ----- | ----------- || endpoints | [string](#string) | repeated | list of endpoints exposed by an Executor |### HeaderProtoRepresents a Header.- The header's content will be defined by the user request.- It will be copied to the envelope.header- In-flow operations will modify the envelope.header- While returning, copy envelope.header back to request.header| Field | Type | Label | Description || ----- | ---- | ----- | ----------- || request_id | [string](#string) | | the unique ID of this request. Multiple requests with the same ID will be gathered || status | [StatusProto](#jina-StatusProto) | | status info || exec_endpoint | [string](#string) | optional | the endpoint specified by `@requests(on='/abc')` || target_executor | [string](#string) | optional | if set, the request is targeted to certain executor, regex strings || timeout | [uint32](#uint32) | optional | epoch time in seconds after which the request should be dropped |### JinaInfoProto| Field | Type | Label | Description || ----- | ---- | ----- | ----------- || jina | [JinaInfoProto.JinaEntry](#jina-JinaInfoProto-JinaEntry) | repeated | information about the system running and package version information including jina || envs | [JinaInfoProto.EnvsEntry](#jina-JinaInfoProto-EnvsEntry) | repeated | the environment variable setting |### JinaInfoProto.EnvsEntry| Field | Type | Label | Description || ----- | ---- | ----- | ----------- || key | [string](#string) | | || value | [string](#string) | | |### JinaInfoProto.JinaEntry| Field | Type | Label | Description || ----- | ---- | ----- | ----------- || key | [string](#string) | | || value | [string](#string) | | |### RelatedEntityRepresents an entity (like an ExecutorRuntime)| Field | Type | Label | Description || ----- | ---- | ----- | ----------- || id | [string](#string) | | unique id of the entity, like the name of a pod || address | [string](#string) | | address of the entity, could be an IP address, domain name etc, does not include port || port | [uint32](#uint32) | | port this entity is listening on || shard_id | [uint32](#uint32) | optional | the id of the shard it belongs to, if it is a shard |### RouteProtoRepresents a the route paths of this message as perceived by the Gatewaystart_time is set when the Gateway sends a message to a Podend_time is set when the Gateway receives a message from a Podthus end_time - start_time includes Executor computation, runtime overhead, serialization and network| Field | Type | Label | Description || ----- | ---- | ----- | ----------- || executor | [string](#string) | | the name of the BasePod || start_time | [google.protobuf.Timestamp](#google-protobuf-Timestamp) | | time when the Gateway starts sending to the Pod || end_time | [google.protobuf.Timestamp](#google-protobuf-Timestamp) | | time when the Gateway received it from the Pod || status | [StatusProto](#jina-StatusProto) | | the status of the execution |### StatusProtoRepresents a Status| Field | Type | Label | Description || ----- | ---- | ----- | ----------- || code | [StatusProto.StatusCode](#jina-StatusProto-StatusCode) | | status code || description | [string](#string) | | error description of the very first exception || exception | [StatusProto.ExceptionProto](#jina-StatusProto-ExceptionProto) | | the details of the error |### StatusProto.ExceptionProto| Field | Type | Label | Description || ----- | ---- | ----- | ----------- || name | [string](#string) | | the class name of the exception || args | [string](#string) | repeated | the list of arguments given to the exception constructor. || stacks | [string](#string) | repeated | the exception traceback stacks || executor | [string](#string) | | the name of the executor bind to that Executor (if applicable) |### StatusProto.StatusCode| Name | Number | Description || ---- | ------ | ----------- || SUCCESS | 0 | success || ERROR | 1 | error |### JinaDataRequestRPCjina gRPC service for DataRequests.| Method Name | Request Type | Response Type | Description || ----------- | ------------ | ------------- | ------------|| process_data | [DataRequestListProto](#jina-DataRequestListProto) | [DataRequestProto](#jina-DataRequestProto) | Used for passing DataRequests to the Executors |### JinaDiscoverEndpointsRPCjina gRPC service to expose Endpoints from Executors.| Method Name | Request Type | Response Type | Description || ----------- | ------------ | ------------- | ------------|| endpoint_discovery | [.google.protobuf.Empty](#google-protobuf-Empty) | [EndpointsProto](#jina-EndpointsProto) | |### JinaGatewayDryRunRPCjina gRPC service to expose Endpoints from Executors.| Method Name | Request Type | Response Type | Description || ----------- | ------------ | ------------- | ------------|| dry_run | [.google.protobuf.Empty](#google-protobuf-Empty) | [StatusProto](#jina-StatusProto) | |### JinaInfoRPCjina gRPC service to expose information about running jina version and environment.| Method Name | Request Type | Response Type | Description || ----------- | ------------ | ------------- | ------------|| _status | [.google.protobuf.Empty](#google-protobuf-Empty) | [JinaInfoProto](#jina-JinaInfoProto) | |### JinaRPCjina streaming gRPC service.| Method Name | Request Type | Response Type | Description || ----------- | ------------ | ------------- | ------------|| Call | [DataRequestProto](#jina-DataRequestProto) stream | [DataRequestProto](#jina-DataRequestProto) stream | Pass in a Request and a filled Request with matches will be returned. |### JinaSingleDataRequestRPCjina gRPC service for DataRequests.This is used to send requests to Executors when a list of requests is not needed| Method Name | Request Type | Response Type | Description || ----------- | ------------ | ------------- | ------------|| process_single_data | [DataRequestProto](#jina-DataRequestProto) | [DataRequestProto](#jina-DataRequestProto) | Used for passing DataRequests to the Executors |## Scalar Value Types| .proto Type | Notes | C++ | Java | Python | Go | C# | PHP | Ruby || ----------- | ----- | --- | ---- | ------ | -- | -- | --- | ---- || double | | double | double | float | float64 | double | float | Float || float | | float | float | float | float32 | float | float | Float || int32 | Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead. | int32 | int | int | int32 | int | integer | Bignum or Fixnum (as required) || int64 | Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead. | int64 | long | int/long | int64 | long | integer/string | Bignum || uint32 | Uses variable-length encoding. | uint32 | int | int/long | uint32 | uint | integer | Bignum or Fixnum (as required) || uint64 | Uses variable-length encoding. | uint64 | long | int/long | uint64 | ulong | integer/string | Bignum or Fixnum (as required) || sint32 | Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s. | int32 | int | int | int32 | int | integer | Bignum or Fixnum (as required) || sint64 | Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s. | int64 | long | int/long | int64 | long | integer/string | Bignum || fixed32 | Always four bytes. More efficient than uint32 if values are often greater than 2^28. | uint32 | int | int | uint32 | uint | integer | Bignum or Fixnum (as required) || fixed64 | Always eight bytes. More efficient than uint64 if values are often greater than 2^56. | uint64 | long | int/long | uint64 | ulong | integer/string | Bignum || sfixed32 | Always four bytes. | int32 | int | int | int32 | int | integer | Bignum or Fixnum (as required) || sfixed64 | Always eight bytes. | int64 | long | int/long | int64 | long | integer/string | Bignum || bool | | bool | boolean | boolean | bool | bool | boolean | TrueClass/FalseClass || string | A string must always contain UTF-8 encoded or 7-bit ASCII text. | string | String | str/unicode | string | string | string | String (UTF-8) || bytes | May contain any arbitrary sequence of bytes. | string | ByteString | str | []byte | ByteString | string | String (ASCII-8BIT) | --- # Source: https://github.com/jina-ai/jina/blob/master/docs/proto/index.mdTo update `jina` Protobuf:```bashdocker run -v $(pwd)/jina/:/jina/ jinaai/protogen``````````{include} docs.md``` --- # Source: https://github.com/jina-ai/jina/blob/master/docs/telemetry.md# {fas}`tower-cell` Telemetry```{warning}To opt out from telemetry, set the `JINA_OPTOUT_TELEMETRY=1` as an environment variable.```Telemetry is the process of collecting data about the usage of a system. This data can be used to improve the system by understanding how it is being used and what areas need improvement.Jina AI uses telemetry to collect data about how Jina-serve is being used. This data is then used to improve the software. For example, if we see that a lot of users are having trouble with a certain feature, we can improve that feature to make it easier to use.Telemetry is important for Jina-serve because it allows the team to understand how the software is being used and what areas need improvement. Without telemetry, Jina-serve would not be able to improve as quickly or as effectively.The data collected include:- Jina-serve and its dependencies versions;- A hashed unique user identifier;- A hashed unique session identifier;- Boolean events: start of a Flow, Gateway, Runtime, Client.## Example payloadHere is an example payload when running the following code:```pythonfrom jina import Flowwith Flow().add() as f: pass``````python{ 'architecture': 'x86_64', 'ci-vendor': '(unset)', 'docarray': '0.15.2', 'event': 'WorkerRuntime.start', 'grpcio': '1.46.3', 'jina': '3.7.13', 'jina-proto': '0.1.13', 'platform': 'Darwin', 'platform-release': '21.6.0', 'platform-version': 'Darwin Kernel Version 21.6.0: Sat Jun 18 17:07:28 PDT ' '2022; root:xnu-8020.140.41~1/RELEASE_ARM64_T8110', 'processor': 'i386', 'proto-backend': 'cpp', 'protobuf': '3.20.1', 'python': '3.7.9', 'pyyaml': '6.0', 'session-id': 'da9d4ade-2171-11ed-8713-56286d1a91c2', 'uid': 94731629138370, 'uptime': '2022-08-21T18:53:59.681842',}{ 'architecture': 'x86_64', 'ci-vendor': '(unset)', 'docarray': '0.15.2', 'event': 'GRPCGatewayRuntime.start', 'grpcio': '1.46.3', 'jina': '3.7.13', 'jina-proto': '0.1.13', 'platform': 'Darwin', 'platform-release': '21.6.0', 'platform-version': 'Darwin Kernel Version 21.6.0: Sat Jun 18 17:07:28 PDT ' '2022; root:xnu-8020.140.41~1/RELEASE_ARM64_T8110', 'processor': 'i386', 'proto-backend': 'cpp', 'protobuf': '3.20.1', 'python': '3.7.9', 'pyyaml': '6.0', 'session-id': 'da9fc390-2171-11ed-8713-56286d1a91c2', 'uid': 94731629138370, 'uptime': '2022-08-21T18:53:59.681842',}{ 'architecture': 'x86_64', 'ci-vendor': '(unset)', 'docarray': '0.15.2', 'event': 'BaseExecutor.start', 'grpcio': '1.46.3', 'jina': '3.7.13', 'jina-proto': '0.1.13', 'platform': 'Darwin', 'platform-release': '21.6.0', 'platform-version': 'Darwin Kernel Version 21.6.0: Sat Jun 18 17:07:28 PDT ' '2022; root:xnu-8020.140.41~1/RELEASE_ARM64_T8110', 'processor': 'i386', 'proto-backend': 'cpp', 'protobuf': '3.20.1', 'python': '3.7.9', 'pyyaml': '6.0', 'session-id': 'daa02f1a-2171-11ed-8713-56286d1a91c2', 'uid': 94731629138370, 'uptime': '2022-08-21T18:53:59.681842',}{ 'architecture': 'x86_64', 'ci-vendor': '(unset)', 'docarray': '0.15.2', 'event': 'Flow.start', 'grpcio': '1.46.3', 'jina': '3.7.13', 'jina-proto': '0.1.13', 'platform': 'Darwin', 'platform-release': '21.6.0', 'platform-version': 'Darwin Kernel Version 21.6.0: Sat Jun 18 17:07:28 PDT ' '2022; root:xnu-8020.140.41~1/RELEASE_ARM64_T8110', 'processor': 'i386', 'proto-backend': 'cpp', 'protobuf': '3.20.1', 'python': '3.7.9', 'pyyaml': '6.0', 'session-id': 'db4c0092-2171-11ed-8713-56286d1a91c2', 'uid': 94731629138370, 'uptime': '2022-08-21T18:53:59.681842',}``` --- # Source: https://github.com/jina-ai/jina/blob/master/docs/tutorials/before-you-start.md(before-start)=# Before you startBefore you jump in to any tutorials, we recommend you do the following:## Understand Jina-serveRead through an {ref}`introduction to Jina concepts ` to understand the basic components that will be used in the tutorial.## Work in a virtual environmentWe highly recommend you work in [a virtual environment](https://docs.python.org/3/library/venv.html) to prevent conflicts in packaging versions. This applies not just to Jina-serve, but Python as a whole.## Install Jina-serveFor most purposes, you can install Jina-serve with:```shellpip install jina```For more installation options, see {ref}`our installation guide `.## Python vs YAMLJina-serve supports YAML in many circumstances for easier deployment. For more information, see our {ref}`guide on coding in Python and YAML in Jina `. --- # Source: https://github.com/jina-ai/jina/blob/master/docs/tutorials/deploy-model.md# Deploy a model```{admonition} Before you start...:class: notePlease check our {ref}`"Before you start" guide` to go over a few preliminary topics.``````{admonition} This tutorial was written for Jina 3.14:class: warningIt will *probably* still work for later versions. If you have trouble, please ask on [our Discord](https://discord.jina.ai).```## IntroductionIn this tutorial we'll build a fast, reliable and scalable gRPC-based AI service. In Jina-serve we call this an {class}`~jina.Executor`. Our Executor will use [Stable Diffusion](https://stability.ai/blog/stable-diffusion-public-release) to generate images from a given text prompt. We'll then use a {class}`~jina.Deployment` to serve it.![](images/deployment.png)```{admonition} Note:class: noteA Deployment serves just one Executor. To use multiple Executors, read our {ref}`tutorial on building a pipeline`.``````{admonition} Run this tutorial in a notebook:class: tipYou can also run this code interactively in [Colab](https://colab.research.google.com/github/jina-ai/jina/blob/master/.github/getting-started/notebook.ipynb#scrollTo=0l-lkmz4H-jW).```## Understand: Executors and Deployments- All data that goes into and out of Jina-serve is in the form of [Documents](https://docs.docarray.org/user_guide/representing/first_step/) inside a [DocList](https://docs.docarray.org/user_guide/representing/array/) from the [DocArray](https://docs.docarray.org/) package.- An {ref}`Executor ` is a self-contained gRPC microservice that performs a task on Documents. This could be very simple (like merely capitalizing the entire text of a Document) or a lot more complex (like generating vector embeddings for a given piece of content).- A {ref}`Deployment ` lets you serve your Executor, scale it up with replicas, and allow users to send and receive requests.When you build a model or service in Jina-serve, it's always in the form of an Executor. An Executor is a Python class that transforms and processes Documents, and can go way beyond image generation, for example, encoding text/images into vectors, OCR, extracting tables from PDFs, or lots more.## Install prerequisitesIn this example we need to install:- The [Jina-serve framework](https://jina.ai/serve/) itself- The dependencies of the specific model we want to serve and deploy```shellpip install jinapip install diffusers```## Executor: Implement logicLet's implement the service's logic in `text_to_image.py`. Don't worry too much about understanding this code right now -- we'll go through it below!```pythonimport numpy as npfrom jina import Executor, requestsfrom docarray import BaseDoc, DocListfrom docarray.documents import ImageDocclass ImagePrompt(BaseDoc): text: strclass TextToImage(Executor): def __init__(self, **kwargs): super().__init__(**kwargs) from diffusers import StableDiffusionPipeline import torch self.pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", torch_dtype=torch.float16).to("cuda") @requests def generate_image(self, docs: DocList[ImagePrompt], **kwargs) -> DocList[ImageDoc]: images = self.pipe(docs.text).images # image here is in [PIL format](https://pillow.readthedocs.io/en/stable/) for i, doc in enumerate(docs): doc.tensor = np.array(images[i])```### Imports```pythonfrom docarray import DocList, BaseDoc```[Documents](https://docs.docarray.org/user_guide/representing/first_step/) and [DocList](https://docs.docarray.org/user_guide/representing/array/) (from the DocArray package) are Jina-serve's native IO format.```pythonfrom jina import Executor, requests```Jina-serve's Executor class and requests decorator - we'll jump into these in the next section.```pythonimport numpy as np```In our case, [NumPy](https://numpy.org/) is specific to this Executor only. We won't really cover it in this article, since we want to keep this as a general overview. (And there’s plenty of information about NumPy out there already).### Document typesWe then import or create the data types on which our Executor will work. In this case, it will get `ImagePrompt` documents and will output `ImageDoc` documents.```pythonfrom docarray import BaseDocfrom docarray.documents import ImageDocclass ImagePrompt(BaseDoc): text: str```### Executor class```pythonclass TextToImage(Executor): def __init__(self, **kwargs): super().__init__(**kwargs) import torch from diffusers import StableDiffusionPipeline self.pipe = StableDiffusionPipeline.from_pretrained( "CompVis/stable-diffusion-v1-4", torch_dtype=torch.float16 ).to("cuda")```All Executors are created from Jina-serve's Executor class. User-definable parameters (like `self.pipe`) are {ref}`arguments defined in the `__init__()` method.### Requests decorator```python@requestsdef generate_image(self, docs: DocList[ImagePrompt], **kwargs) -> DocList[ImageDoc]: images = self.pipe(docs.text).images # image here is in [PIL format](https://pillow.readthedocs.io/en/stable/) for i, doc in enumerate(docs): doc.tensor = np.array(images[i])```Any Executor methods decorated with `@requests` can be called via an {ref}`endpoint ` when the Executor is run or deployed. Since we're using a bare `@requests` (rather than say `@requests(on='/foo')`), the `generate_image()` method will be called as the default fallback handler for any endpoint.## Deployment: Deploy the ExecutorWith a Deployment you can run and scale up your Executor, adding sharding, replicas and dynamic batching.![](images/deployment.png)We can deploy our Executor with either the Python API or YAML:````{tab} PythonIn `deployment.py`:```pythonfrom jina import Deploymentdep = Deployment(uses=TextToImage, timeout_ready=-1)with dep: dep.block()```And then run `python deployment.py` from the CLI.````````{tab} YAMLIn `deployment.yaml`:```yamljtype: Deploymentwith: uses: TextToImage py_modules: - text_to_image.py # name of the module containing your Executor timeout_ready: -1```And run the YAML Deployment with the CLI: `jina deployment --uses deployment.yml`````You'll then see the following output:```text──────────────────────────────────────── 🎉 Deployment is ready to serve! ─────────────────────────────────────────╭────────────── 🔗 Endpoint ───────────────╮│ ⛓ Protocol GRPC ││ 🏠 Local 0.0.0.0:12345 ││ 🔒 Private 172.28.0.12:12345 ││ 🌍 Public 35.230.97.208:12345 │╰──────────────────────────────────────────╯``````{admonition} Running in a notebookIn a notebook, you can't use `deployment.block()` and then make requests with the client. Please refer to the Colab link above for reproducible Jupyter Notebook code snippets.```## Client: Send and receive requests to your serviceUse {class}`~jina.Client` to make requests to the service. As before, we use Documents as our basic IO format. We'll use the text prompt `rainbow unicorn butterfly kitten`:```pythonfrom jina import Clientfrom docarray import BaseDoc, DocListfrom docarray.documents import ImageDocclass ImagePrompt(BaseDoc): text: strimage_text = ImagePrompt(text='rainbow unicorn butterfly kitten')client = Client(port=12345) # use port from output aboveresponse = client.post(on='/', inputs=DocList[ImagePrompt]([image_prompt]), return_type=DocList[ImageDoc])response[0].display()```In a different terminal to your Deployment, run `python client.py` to generate an image from the `rainbow unicorn butterfly kitten` text prompt:![](images/rainbow_kitten.png)## Scale up the microservice```{admonition} Python vs YAML:class: infoFor the rest of this tutorial we'll stick to using {ref}`YAML `. This separates our code from our Deployment logic.```Jina comes with scalability features out of the box like replicas, shards and dynamic batching. This lets you easily increase your application's throughput.Let's edit our Deployment and scale it with {ref}`replicas ` and {ref}`dynamic batching ` to:- Create two replicas, with a {ref}`GPU ` assigned for each.- Enable dynamic batching to process incoming parallel requests to the same model.![](images/replicas.png)Here's the updated YAML:```{code-block} yaml---emphasize-lines: 6-12---jtype: Deploymentwith: timeout_ready: -1 uses: jinaai://jina-ai/TextToImage env: CUDA_VISIBLE_DEVICES: RR replicas: 2 uses_dynamic_batching: # configure dynamic batching /default: preferred_batch_size: 10 timeout: 200```As you can see, we've added GPU support (via `CUDA_VISIBLE_DEVICES`), two replicas (each assigned a GPU) and dynamic batching, which allows requests to be accumulated and batched together before being sent to the Executor.Assuming your machine has two GPUs, using the scaled Deployment YAML will give better throughput compared to the normal deployment.Thanks to the YAML syntax, you can inject deployment configurations regardless of Executor code. Of course, all of this is possible via the Python API too. --- # Source: https://github.com/jina-ai/jina/blob/master/docs/tutorials/gpu-executor.md(gpu-executor)=# Build a GPU ExecutorThis document shows you how to use an {class}`~jina.Executor` on a GPU, both locally and in aDocker container. You will also learn how to use a GPU with pre-built Hub executors.Using a GPU significantly speeds up encoding for most deep learning models,reducing response latency by anything from 5 to 100 times, depending on the model and inputs used.```{admonition} Important:class: cautionThis tutorial assumes familiarity with basic Jina concepts, such as Document, [Executor](../concepts/executor/index), and [Deployment](../concepts/deployment/index). Some knowledge of [Executor Hub](../concepts/executor/hub/index) is also needed for the last part of the tutorial.```## Jina-serve and GPUs in a nutshellFor a thorough walkthrough of using GPU resources in your code, check the full tutorial in the {ref}`next section `.If you already know how to use your GPU, just proceed like you usually would in your machine learning framework of choice.Jina-serve lets you use GPUs like you would in a Python script or Dockercontainer, without imposing additional requirements or configuration.Here's a minimal working example, written in PyTorch:```pythonimport torchfrom typing import Optionalfrom docarray import DocList, BaseDocfrom docarray.typing import AnyTensorfrom jina import Executor, requestsclass MyDoc(BaseDoc): text: str = '' embedding: Optional[AnyTensor[5]] = Noneclass MyGPUExec(Executor): def __init__(self, device: str = 'cpu', *args, **kwargs): super().__init__(*args, **kwargs) self.device = device @requests def encode(self, docs: DocList[MyDoc], **kwargs) -> DocList[MyDoc]: with torch.inference_mode(): # Generate random embeddings embeddings = torch.rand((len(docs), 5), device=self.device) docs.embedding = embeddings embedding_device = 'GPU' if embeddings.is_cuda else 'CPU' docs.text = [f'Embeddings calculated on {embedding_device}']```````{tab} Use with CPU```pythonfrom typing import Optionalfrom docarray import DocList, BaseDocfrom docarray.typing import AnyTensorfrom jina import Deploymentdep = Deployment(uses=MyGPUExec, uses_with={'device': 'cpu'})docs = DocList[MyDoc]([MyDoc()])with dep: docs = dep.post(on='/encode', inputs=docs, return_type=DocList[MyDoc])print(f'Document embedding: {docs.embedding}')print(docs.text)``````shell Deployment@80[I]:🎉 Deployment is ready to use! 🔗 Protocol: GRPC 🏠 Local access: 0.0.0.0:49618 🔒 Private network: 172.28.0.2:49618 🌐 Public address: 34.67.105.220:49618Document embedding: tensor([[0.1769, 0.1557, 0.9266, 0.8655, 0.6291]])['Embeddings calculated on CPU']```````````{tab} Use with GPU```pythonfrom typing import Optionalfrom docarray import DocList, BaseDocfrom docarray.typing import AnyTensorfrom jina import Deploymentdep = Deployment(uses=MyGPUExec, uses_with={'device': 'cuda'})docs = DocList[MyDoc]([MyDoc()])with dep: docs = dep.post(on='/encode', inputs=docs, return_type=DocList[MyDoc])print(f'Document embedding: {docs.embedding}')print(docs.text)``````shell Deployment@80[I]:🎉 Deployment is ready to use! 🔗 Protocol: GRPC 🏠 Local access: 0.0.0.0:56276 🔒 Private network: 172.28.0.2:56276 🌐 Public address: 34.67.105.220:56276Document embedding: tensor([[0.6888, 0.8646, 0.0422, 0.8501, 0.4016]])['Embeddings calculated on GPU']```````Just like that, your code runs on GPU, inside a Deployment.Next, we will go through a more fleshed out example in detail, where we use a language model to embed text in ourDocuments - all on GPU, and thus blazingly fast.(gpu-prerequisites)=## PrerequisitesFor this tutorial, you need to work on a machine with an NVIDIA graphics card. If youdon't have such a machine at home, you can use various free cloud platforms (like Google Colab or Kaggle kernels).Also ensure you have a recent version of [NVIDIA drivers](https://www.nvidia.com/Download/index.aspx)installed. You don't need to install CUDA for this tutorial, but note that depending onthe deep learning framework that you use, that might be required (for local execution).For the Docker part of the tutorial you will also need to have [Docker](https://docs.docker.com/get-docker/) and[nvidia-docker](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html) installed.To run Python scripts you need a virtual environment (for example [venv](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/#creating-a-virtual-environment) or [conda](https://conda.io/projects/conda/en/latest/user-guide/getting-started.html#managing-environments)), and install Jina-serve inside it using:```bashpip install jina```## Setting up the Executor```{admonition} Executor Hub:class: hintLet's create an Executor using `jina hub new`. This creates your Executor locallyand privately, and makes it quick and easy to run yourExecutor inside a Docker container, or (if you so choose) to publish it to Executor Hub later.```We'll create a simple sentence encoder, and start by creating the Executor"skeleton" using Jina-serve's CLI:```bashjina hub new```When prompted, name your Executor `SentenceEncoder`, and accept the defaultfolder - this creates a `SentenceEncoder/` folder inside your currentdirectory, which will be our working directory for this tutorial.For many questions you can accept the default options. However:- Select `y` when prompted for advanced configuration.- Select `y` when prompted to create a `Dockerfile`.In the end, you should be greeted with suggested next steps.
Next steps ```bash╭────────────────────────────────────── 🎉 Next steps ───────────────────────────────────────╮│ ││ Congrats! You have successfully created an Executor! Here are the next steps: ││ ╭──────────────────────── 1. Check out the generated Executor ─────────────────────────╮ ││ │ 1 cd /home/ubuntu/SentenceEncoder │ ││ │ 2 ls │ ││ ╰──────────────────────────────────────────────────────────────────────────────────────╯ ││ ╭─────────────────────────── 2. Understand folder structure ───────────────────────────╮ ││ │ │ ││ │ Filena… Description │ ││ │ ────────────────────────────────────────────────────────────────────────────────── │ ││ │ config… The YAML config file of the Executor. You can define __init__ argumen… │ ││ │ ╭────────────────── config.yml ──────────────────╮ │ ││ │ │ 1 │ │ ││ │ │ 2 jtype: SentenceEncoder │ │ ││ │ │ 3 with: │ │ ││ │ │ 4 foo: 1 │ │ ││ │ │ 5 bar: hello │ │ ││ │ │ 6 metas: │ │ ││ │ │ 7 py_modules: │ │ ││ │ │ 8 - executor.py │ │ ││ │ │ 9 │ │ ││ │ ╰────────────────────────────────────────────────╯ │ ││ │ Docker… The Dockerfile describes how this executor will be built. │ ││ │ execut… The main logic file of the Executor. │ ││ │ manife… Metadata for the Executor, for better appeal on Executor Hub. │ ││ │ │ ││ │ Field Description │ ││ │ ──────────────────────────────────────────────────────────────────── │ ││ │ name Human-readable title of the Executor │ ││ │ desc… Human-readable description of the Executor │ ││ │ url URL to find more information on the Executor (e.g. GitHub… │ ││ │ keyw… Keywords that help user find the Executor │ ││ │ │ ││ │ README… A usage guide of the Executor. │ ││ │ requir… The Python dependencies of the Executor. │ ││ │ │ ││ ╰──────────────────────────────────────────────────────────────────────────────────────╯ ││ ╭────────────────────────────── 3. Share it to Executor Hub ───────────────────────────────╮ ││ │ 1 jina hub push /home/ubuntu/SentenceEncoder │ ││ ╰──────────────────────────────────────────────────────────────────────────────────────╯ │╰────────────────────────────────────────────────────────────────────────────────────────────╯```
Now let's move to the newly created Executor directory:```bashcd SentenceEncoder```Continue by specifying our requirements in `requirements.txt`:```textsentence-transformers==2.0.0```And installing them using:```bashpip install -r requirements.txt``````{admonition} Do I need to install CUDA?:class: hintAll machine learning frameworks rely on CUDA for running on a GPU. However, whether youneed CUDA installed on your system or not depends on the framework that you use.In this tutorial, we use PyTorch, which already includes the necessaryCUDA binaries in its distribution. However, other frameworks, such as TensorFlow, requireyou to install CUDA yourself.``````{admonition} Install only what you need:class: hintIn this example we install the GPU-enabled version of PyTorch, which is the defaultversion when installing from PyPI. However, if you know that you only need to use yourExecutor on CPU, you can save a lot of space (hundreds of MBs, or even GBs) by installingCPU-only versions of your requirements. This translates into faster startup timeswhen using Docker containers.In our case, we could change the `requirements.txt` file to install a CPU-only versionof PyTorch::::text-f https://download.pytorch.org/whl/torch_stable.htmlsentence-transformerstorch==1.9.0+cpu:::```Now let's fill the `executor.py` file with the actual Executor code:```{code-block} python---emphasize-lines: 16---import torchfrom typing import Optionalfrom docarray import DocList, BaseDocfrom docarray.typing import AnyTensorfrom jina import Executor, requestsfrom sentence_transformers import SentenceTransformerclass MyDoc(BaseDoc): text: str = '' embedding: Optional[AnyTensor[5]] = Noneclass SentenceEncoder(Executor): """A simple sentence encoder that can be run on a CPU or a GPU :param device: The pytorch device that the model is on, e.g. 'cpu', 'cuda', 'cuda:1' """ def __init__(self, device: str = 'cpu', *args, **kwargs): super().__init__(*args, **kwargs) self.model = SentenceTransformer('all-MiniLM-L6-v2', device=device) self.model.to(device) # Move the model to device @requests def encode(self, docs: DocList[MyDoc], **kwargs) -> DocList[MyDoc]: """Add text-based embeddings to all documents""" with torch.inference_mode(): embeddings = self.model.encode(docs.texts, batch_size=32) docs.embeddings = embeddings```Here all the device-specific magic happens on the two highlighted lines - when we create the`SentenceEncoder` class instance we pass it the device, and then we move the PyTorchmodel to our device. These are also the exact same steps to use in a standalone Python script.To see how we would pass the device we want the Executor to use,let's create another file - `main.py`, to demonstrate the usage of thisencoder by encoding 10,000 text documents.```pythonfrom typing import Optionalfrom jina import Deploymentfrom docarray import DocList, BaseDocfrom docarray.typing import AnyTensorfrom executor import SentenceEncoderclass MyDoc(BaseDoc): text: str = '' embedding: Optional[AnyTensor[5]] = Nonedef generate_docs(): for _ in range(10_000): yield MyDoc( text='Using a GPU allows you to significantly speed up encoding.' )dep = Deployment(uses=SentenceEncoder, uses_with={'device': 'cpu'})with dep: dep.post(on='/encode', inputs=generate_docs, show_progress=True, request_size=32, return_type=DocList[MyDoc])```## Running on GPU and CPU locallyWe can observe the speed up by running the same code on both the CPU and GPU.To toggle between the two, set your device type to `'cuda'`, and your GPU will take over the work:```diff+ dep = Deployment(uses=SentenceEncoder, uses_with={'device': 'cuda'})- dep = Deployment(uses=SentenceEncoder, uses_with={'device': 'cpu'})```Then, run the script:```bashpython main.py```And compare the results:````{tab} CPU```shell executor0@26554[L]:ready and listening gateway@26554[L]:ready and listening Deployment@26554[I]:🎉 Deployment is ready to use! 🔗 Protocol: GRPC 🏠 Local access: 0.0.0.0:56969 🔒 Private network: 172.31.39.70:56969 🌐 Public address: 52.59.231.246:56969Working... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━ 0:00:20 15.1 step/s 314 steps done in 20 seconds```````````{tab} GPU```shell executor0@21032[L]:ready and listening gateway@21032[L]:ready and listening Deployment@21032[I]:🎉 Deployment is ready to use! 🔗 Protocol: GRPC 🏠 Local access: 0.0.0.0:54255 🔒 Private network: 172.31.39.70:54255 🌐 Public address: 52.59.231.246:54255Working... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━ 0:00:03 90.9 step/s 314 steps done in 3 seconds```````Running this code on a `g4dn.xlarge` AWS instance with a single NVIDIA T4 GPU attached, we can see that embeddingtime decreases from 20s to 3s by running on GPU.That's more than a **6x speedup!** And that's not even the best we can do - if we increase the batch size to max out the GPU's memory we would get even larger speedups. But such optimizations are beyond the scope of this tutorial.```{admonition} Note:class: hintYou've probably noticed that there was a delay (about 3 seconds) when creating the Deployment.This is because the weights of our model had to be transferred from CPU to GPU when weinitialized the Executor. However, this action only occurs once in the lifetime of the Executor,so for most use cases we don't need to worry about it.```## Using GPU in a container```{admonition} Using your GPU inside a container:class: cautionFor this part of the tutorial, you need to [install `nvidia-container-toolkit`](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html).```When you use your Executor in production you most likely want it in a Docker container, to provide proper environment isolation and easily use it on any device.Using GPU-enabled Executors in this case is no harder than using them locally. We don't even need to modify the default `Dockerfile`.```{admonition} Choosing the right base image:class: hintIn our case we use the default `jinaai/jina:latest` base image. However, parallel to the comments about installing CUDA locally, you may need a different base image depending on your framework.If you need CUDA installed in the image, you usually have two options: either take `nvidia/cuda` for the base image, or take the official GPU-enabled image of your framework, for example, `tensorflow/tensorflow:2.6.0-gpu`.```The other file we care about in this case is `config.yml`, and here the default version works as well. Let's build the Docker image:```bashdocker build -t sentence-encoder .```You can run the container to check that everything is working well:```bashdocker run sentence-encoder```Let's use the Docker version of our encoder with the GPU. If you've dealt with GPUs in containers before, you may remember that to use a GPU inside the container you need to pass `--gpus all` option to the `docker run` command. Jina lets you do just that.We need to modify our `main.py` script to use a GPU-base containerized Executor:```{code-block} python---emphasize-lines: 18---from typing import Optionalfrom jina import Deploymentfrom docarray import DocList, BaseDocfrom docarray.typing import AnyTensorfrom executor import SentenceEncoderclass MyDoc(BaseDoc): text: str = '' embedding: Optional[AnyTensor[5]] = Nonedef generate_docs(): for _ in range(10_000): yield MyDoc( text='Using a GPU allows you to significantly speed up encoding.' )dep = Deployment(uses='docker://sentence-encoder', uses_with={'device': 'cuda'}, gpus='all')with dep: dep.post(on='/encode', inputs=generate_docs, show_progress=True, request_size=32, return_type=DocList[MyDoc])```If we run this with `python main.py`, we'll get the same output as before, except that now we'll also get the output from the Docker container.Every time we start the Executor, the Transformer model gets downloaded again. To speed this up, we want the encoder to load the model from a file which we have pre-downloaded to our disk.We can do this with Docker volumes - Jina simply passes the argument to the Docker container. Here's how we modify `main.py`:```pythondep = Deployment( uses='docker://sentence-encoder', uses_with={'device': 'cuda'}, gpus='all', # This has to be an absolute path, replace /home/ubuntu with your home directory volumes="/home/ubuntu/.cache:/root/.cache",)```We mounted the `~/.cache` directory, because that's where pre-built transformer models are saved. But this could be any custom directory - depending on the Python package you are using, and how you specify the model loading path.Run `python main.py` again and you can see that no downloading happens inside the container, and that encoding starts faster.## Using GPU with Hub ExecutorsWe now saw how to use GPU with our Executor locally, and when using it in a Docker container. What about when we use Executors from Executor Hub - is there any difference?Nope! Not only that, many Executors on Executor Hub already come with a GPU-enabled version pre-built, usually under the `gpu` tag (see [Executor Hub tags](hub_tags)). Let's modify our example to use the pre-built `TransformerTorchEncoder` from Executor Hub:```diffdep = Deployment(- uses='docker://sentence-encoder',+ uses='jinaai+docker://jina-ai/TransformerTorchEncoder:latest-gpu', uses_with={'device': 'cuda'}, gpus='all', # This has to be an absolute path, replace /home/ubuntu with your home directory volumes="/home/ubuntu/.cache:/root/.cache")```The first time you run the script, downloading the Docker image takes some time - GPU images are large! But after that, everything works just as it did with the local Docker image, out of the box.```{admonition} Important:class: cautionWhen using GPU encoders from Executor Hub, always use `jinaai+docker://`, and not `jinaai://`. As discussed above, these encoders may need CUDA installed (or other system dependencies), and installing that properly can be tricky. For that reason, use Docker images, which already come with all these dependencies pre-installed.```## ConclusionLet's recap this tutorial:1. Using Executors on a GPU locally is no different to using a GPU in a standalone script. You pass the device you want your Executor to use in the initialization.2. To use an Executor on a GPU inside a Docker container, pass `gpus='all'`.3. Use volumes (bind mounts), so you don't have to download large files each time you start the Executor.4. Use GPU with Executors from Executor Hub - just use the Executor with the `gpu` tag.When you start building your own Executor, check what system requirements (CUDA and similar) are needed, and install them locally (and in the `Dockerfile`) accordingly. --- # Source: https://github.com/jina-ai/jina/blob/master/docs/tutorials/index.md# Tutorials```{toctree}deploy-modeldeploy-pipelinellm-serve```---{ref}`genindex` | {ref}`modindex` --- # Source: https://github.com/jina-ai/jina/blob/master/docs/tutorials/llm-serve.md# Build a Streaming API for a Large Language Model```{include} ../../README.md:start-after: :end-before: ```## Service Schemas```{include} ../../README.md:start-after: :end-before: ``````{admonition} Note:class: noteThanks to DocArray's flexibility, you can implement very flexible services. For instance, you can useTensor types to efficiently stream token logits back to the client and implement complex token sampling strategies onthe client side.```## Service initialization```{include} ../../README.md:start-after: :end-before: ```## Implement the streaming endpoint```{include} ../../README.md:start-after: :end-before: ```## Serve and send requests```{include} ../../README.md:start-after: :end-before: ``` --- # Source: https://github.com/jina-ai/jina/blob/master/docs/yaml-spec.md(yaml-spec)=# {octicon}`file-code` YAML SpecificationYAML is widely used in Jina-serve to define an Executor, Flow. This page helps you quickly navigate different YAML specifications.## Executor-level YAMLExecutor level YAML is placed inside the Executor directory, as a part of Executor file structure.:::::{grid} 2:gutter: 3::::{grid-item-card} Executor YAML:link: concepts/serving/executor/yaml-spec:link-type: docDefine the argument of `__init__`, Python module dependencies and other settings of an Executor.:::::::::## Flow-level YAMLFlow level YAML is placed inside the Flow directory, as a part of Flow file structure. It defines the Executors that will be used in the Flow, the Gateway and the JCloud hosting specifications.:::::{grid} 2:gutter: 3::::{grid-item-card} Flow YAML:link: concepts/orchestration/flow/yaml-spec:link-type: docDefine the Executors, the topology and the Gateway settings of a Flow.::::::::{grid-item-card} Gateway YAML:link: concepts/serving/gateway/yaml-spec:link-type: docDefine the protocol, TLS, authentication and other settings of a Gateway.+++Gateway specification is nested under the Flow YAML via `with:` keywords.::::::::{grid-item-card} JCloud YAML:link: concepts/jcloud/yaml-spec:link-type: docDefine the resources and autoscaling settings on Jina-serve Cloud+++JCloud specification is nested under the Flow YAML via `jcloud:` keywords.:::::::::