# Unstructured

> ## Documentation Index

---

# Source: https://docs.unstructured.io/support/access.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Access to services

On Februrary 25, 2026, Unstructured stopped providing access to its services in specific locales, to
customers enrolled in Unstructured's **Let's Go**, **Pay-As-You-Go**, and **Business SaaS** account plans.
Unstructured made this decision after careful review, determining that the compliance and regulatory
requirements associated with continuing to operate in these locales were no longer sustainable relative to
the locales' usage and revenue levels at the time of the decision.

The list of affected locales is as follows:

* Andorra
* Angola
* Armenia
* Bangladesh
* Bermuda
* Brazil
* Chile
* China
* Colombia
* Congo
* Guatemala
* Hong Kong
* Iceland
* Iraq
* Israel
* Jersey
* Jordan
* Kazakhstan
* Lebanon
* Malaysia
* Mexico
* Moldova, Republic of
* Morocco
* Nepal
* New Zealand
* Nigeria
* Pakistan
* Peru
* Philippines
* Puerto Rico
* Russian Federation
* Saudi Arabia
* Serbia
* Somalia
* Sri Lanka
* Switzerland
* Taiwan, Province of China
* Thailand
* Turkey
* Uganda
* Ukraine
* United Arab Emirates
* Uruguay
* Vietnam
* Zambia

After February 25, 2026, organizations with large-scale needs may be able to continue accessing Unstructured through a
signed Unstructured **Business** account agreement, covering **Business SaaS**, **dedicated instance** , or **in-VPC** deployments
that addresses the applicable compliance and regulatory requirements. If you are interested
in exploring this option, contact Unstructured Sales at [sales@unstructured.io](mailto:sales@unstructured.io).

---

# Source: https://docs.unstructured.io/open-source/ingestion/source-connectors/airtable.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Airtable

Connect Airtable to your preprocessing pipeline, and use the Unstructured Ingest CLI or the Unstructured Ingest Python library to batch process all your data and store structured outputs locally on your filesystem.

The requirements are as follows.

* An [Airtable](https://www.airtable.com/) account. [Create a free Airtable account](https://airtable.com/signup).

* An Airtable personal access token. [Create a personal access token](https://support.airtable.com/docs/creating-and-using-api-keys-and-access-tokens).

  <iframe width="560" height="315" src="https://www.youtube.com/embed/c3gvIWAAcTY" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* The ID of the Airtable base to access. [Create a base](https://www.airtable.com/guides/build/create-a-base). [Get a base's ID](https://support.airtable.com/docs/finding-airtable-ids#finding-base-url-ids).

  <iframe width="560" height="315" src="https://www.youtube.com/embed/rU347YVMm2M" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* The ID of the table to access in the base. [Create a table](https://www.airtable.com/guides/build/create-a-table). [Get a table's ID](https://support.airtable.com/docs/finding-airtable-ids#finding-base-url-ids).

* The ID of the view to access in the table. [Create a view](https://www.airtable.com/guides/build/create-custom-views-of-data). [Get a view's ID](https://support.airtable.com/docs/finding-airtable-ids#finding-base-url-ids).

By default, Unstructured processes all tables from all bases within an Airtable organization. You can limit the
tables that Unstructured ingests data from within Airtable by specifying a list of Airtable paths.
An Airtable path uses the following structure: `base_id/table_id(optional)/view_id(optional)`

For example, given the following example Airtable URL:

```text  theme={null}
https://airtable.com/appr9nKeXLAtg6bgn/tblZ8uT1GY7NLbWit/viwDcpzf9dP0Gqz5J
```

* The base's ID is `appr9nKeXLAtg6bgn`. The base's path is `appr9nKeXLAtg6bgn`.
* The table's ID is `tblZ8uT1GY7NLbWit`. The table's path is `appr9nKeXLAtg6bgn/tblZ8uT1GY7NLbWit`.
* The view's ID is `viwDcpzf9dP0Gqz5J`. The view's path is `appr9nKeXLAtg6bgn/tblZ8uT1GY7NLbWit/viwDcpzf9dP0Gqz5J`.

You can call the Airtable API to get lists of available IDs for Airtable bases, tables, and views in bulk, as follows:

* [Base IDs](https://airtable.com/developers/web/api/list-bases)
* [Table and view IDs](https://airtable.com/developers/web/api/get-base-schema)
* [Base, table, and view IDs](https://pyairtable.readthedocs.io/en/latest/metadata.html)

The Airtable connector dependencies:

```bash CLI, Python theme={null}
pip install "unstructured-ingest[airtable]"
```

You might also need to install additional dependencies, depending on your needs. [Learn more](/open-source/ingestion/ingest-dependencies).

These environment variables:

* `AIRTABLE_TOKEN` - The Airtable personal access token, represented by `--personal-access-token` (CLI) or `personal_access_token` (Python).
* `AIRTABLE_PATHS` - The list of Airtable paths to process, represented by `--list-of-paths` (CLI) or `list_of_paths` (Python).

Now call the Unstructured CLI or Python. The destination connector can be any of the ones supported. This example uses the local destination connector.

This example sends data to Unstructured for processing by default. To process data locally instead, see the instructions at the end of this page.

<CodeGroup>
  ```bash CLI theme={null}
  #!/usr/bin/env bash

  unstructured-ingest \
    airtable \
      --personal-access-token $AIRTABLE_PERSONAL_ACCESS_TOKEN \
      --output-dir $LOCAL_FILE_OUTPUT_DIR \
      --num-processes 2 \
      --reprocess \
      --partition-by-api \
      --api-key $UNSTRUCTURED_API_KEY \
      --partition-endpoint $UNSTRUCTURED_API_URL
  ```

  ```python Python Ingest theme={null}
  import os

  from unstructured_ingest.pipeline.pipeline import Pipeline
  from unstructured_ingest.interfaces import ProcessorConfig

  from unstructured_ingest.processes.connectors.airtable import (
      AirtableIndexerConfig,
      AirtableDownloaderConfig,
      AirtableConnectionConfig,
      AirtableAccessConfig
  )

  from unstructured_ingest.processes.connectors.local import LocalUploaderConfig
  from unstructured_ingest.processes.partitioner import PartitionerConfig
  from unstructured_ingest.processes.chunker import ChunkerConfig

  # Chunking is optional.

  if __name__ == "__main__":
      Pipeline.from_configs(
          context=ProcessorConfig(),
          indexer_config=AirtableIndexerConfig(list_of_paths=[os.getenv("AIRTABLE_PATHS")]),
          downloader_config=AirtableDownloaderConfig(download_dir=os.getenv("LOCAL_FILE_DOWNLOAD_DIR")),
          source_connection_config=AirtableConnectionConfig(
              access_config=AirtableAccessConfig(
                  personal_access_token=os.getenv("AIRTABLE_TOKEN")
              )
          ),
          partitioner_config=PartitionerConfig(
              partition_by_api=True,
              api_key=os.getenv("UNSTRUCTURED_API_KEY"),
              partition_endpoint=os.getenv("UNSTRUCTURED_API_URL"),
              additional_partition_args={
                  "split_pdf_page": True,
                  "split_pdf_allow_failed": True,
                  "split_pdf_concurrency_level": 15
              }
          ),
          chunker_config=ChunkerConfig(chunking_strategy="by_title"),
          uploader_config=LocalUploaderConfig(output_dir=os.getenv("LOCAL_FILE_OUTPUT_DIR"))
      ).run()
  ```
</CodeGroup>

For the Unstructured Ingest CLI and the Unstructured Ingest Python library, you can use the `--partition-by-api` option (CLI) or `partition_by_api` (Python) parameter to specify where files are processed:

* To do local file processing, omit `--partition-by-api` (CLI) or `partition_by_api` (Python), or explicitly specify `partition_by_api=False` (Python).

  Local file processing does not use an Unstructured API key or API URL, so you can also omit the following, if they appear:

  * `--api-key $UNSTRUCTURED_API_KEY` (CLI) or `api_key=os.getenv("UNSTRUCTURED_API_KEY")` (Python)
  * `--partition-endpoint $UNSTRUCTURED_API_URL` (CLI) or `partition_endpoint=os.getenv("UNSTRUCTURED_API_URL")` (Python)
  * The environment variables `UNSTRUCTURED_API_KEY` and `UNSTRUCTURED_API_URL`

* To send files to the legacy [Unstructured Partition Endpoint](/api-reference/legacy-api/partition/overview) for processing, specify `--partition-by-api` (CLI) or `partition_by_api=True` (Python).

  Unstructured also requires an Unstructured API key and API URL, by adding the following:

  * `--api-key $UNSTRUCTURED_API_KEY` (CLI) or `api_key=os.getenv("UNSTRUCTURED_API_KEY")` (Python)
  * `--partition-endpoint $UNSTRUCTURED_API_URL` (CLI) or `partition_endpoint=os.getenv("UNSTRUCTURED_API_URL")` (Python)
  * The environment variables `UNSTRUCTURED_API_KEY` and `UNSTRUCTURED_API_URL`, representing your API key and API URL, respectively.

  <Note>
    You must specify the API URL only if you are not using the default API URL for Unstructured Ingest, which applies to **Let's Go**, **Pay-As-You-Go**, and **Business SaaS** accounts.

    The default API URL for Unstructured Ingest is `https://api.unstructuredapp.io/general/v0/general`, which is the API URL for the legacy[Unstructured Partition Endpoint](/api-reference/legacy-api/partition/overview). However, you should always use the URL that was provided to you when your Unstructured account was created. If you do not have this URL, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

    If you do not have an API key, [get one now](/api-reference/legacy-api/partition/overview).

    If you are using a **Business** account, the process
    for generating Unstructured API keys, and the Unstructured API URL that you use, are different.
    For instructions, see your Unstructured account administrator, or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).
  </Note>

---

# Source: https://docs.unstructured.io/support/issues/api-connector-secrets.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Google Drive source connectors cannot be created by using the API

## Issue

When you try to use the [Unstructured API's workflow operations](/api-reference/workflow/overview) to
[create a source connector](/api-reference/workflow/overview#create-a-source-connector)
that connects [Google Drive](/api-reference/workflow/sources/google-drive) to Unstructured,
an error similar to the following occurs, and the source connector is not created:

```text  theme={null}
API error occurred: Status 412. Body: {
    "errors": [
        {
            "field": "access_config.service_account_key",
            "message": "Field is sensitive and must be wrapped in as a secret reference or new secret value"
        }
    ]
}
```

## Cause

This error occurs because the contents of the caller's Google Drive service account's `credentials.json` file (or its
equivalent JSON string representation) must be specially encrypted and formatted before these contents are sent to Unstructured, due to the contents' size.
However, the Unstructured API's workflow operations do not support this special encryption and formatting process.

## Solution

Use the [Unstructured user interface (UI)](/ui/overview) to [create a Google Drive source connector](/ui/sources/google-drive) instead, as
the Unstructured UI does support this special encryption and formatting process.

## Additional resources

To ask questions or get additional help with this issue, see [requesting support](/support/request).

---

# Source: https://docs.unstructured.io/ui/account/api-key-url.md

# Source: https://docs.unstructured.io/api-reference/troubleshooting/api-key-url.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Troubleshooting Unstructured API keys and URLs

## Issue

When you run script or code to call an Unstructured API, you might get one of the following warnings or errors:

```
UserWarning: If intending to use the paid API, please define `server_url` in your request.
```

```
ERROR: Server responded with 404 - {"detail":"Not Found"}
```

You might also get the following related warnings or errors in your logs:

```
SDKError: API error occurred: Status 401
{"detail":"API key is malformed, please type the API key correctly in the header."}
```

```
API error occurred: Status 404
{"detail":"Not Found"}
```

## Possible causes

1. The Unstructured API key, API URL, or both are missing or malformed in your script or code.
2. The API key, API URL, or both are not present in your current session.
3. The API key is no longer valid, or the API key and API URL combination is not valid.

## Suggested solutions

For the API URL, this URL was provided to you when your Unstructured account was created.
If you do not have this URL, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

<Note>
  The default URL for the Unstructured API's workflow operations is `https://platform.unstructuredapp.io/api/v1`.

  However, you should always use the URLs that were provided to you when your Unstructured account was created.
</Note>

1. If you are not already signed in, sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account, at [https://platform.unstructured.io](https://platform.unstructured.io).

   <Note>
     For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
     or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).
   </Note>

2. Get your Unstructured API key:<br />

   a. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account, click **API Keys** on the sidebar.<br />

   <Note>
     For other types of **Business** accounts, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
     for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).
   </Note>

   b. Click **Generate API Key**.<br />
   c. Follow the on-screen instructions to finish generating the key.<br />
   d. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

For the API URL, note the value of the Unstructured API URL that you want to call. To access these API URLs:

1. If you are not already signed in, sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account, at [https://platform.unstructured.io](https://platform.unstructured.io).

   <Note>
     For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
     or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).
   </Note>

2. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account, click **API Keys** on the sidebar.<br />

   <Note>
     For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to get the API URL for.
   </Note>

3. Note the API URL for the Unstructured API's workflow operations, labelled as **Unstructured API Endpoint**.

If you still believe you have the correct API URL and API key, try the following steps:

1. Check your script or code to see where the API URL and API key are specified, as follows:

   **For a POST request**

   * Check the target URL. Make sure it is set to the right API URL.
   * Check the value of the `unstructured-api-key` header. Make sure it is set to the right API key, or an environment variable representing the right API key.

   **For the Unstructured Python SDK**

   * Check the value of the `server_url` parameter. Make sure it is set to the right API URL, or an environment variable representing the right API URL.
   * Check the value of the `api_key_auth` parameter. Make sure it is set to the right API key, or an environment variable representing the right API key.

   <Note>
     If you are using the Unstructured Python SDK to call the Unstructured API's workflow operations, you should not need to set the value of the
     `server_url` parameter if you are using these operations' default API URL. However, you might want to try setting this value anyway to see if it otherwise resolves your issue.
   </Note>

   **For the Unstructured JavaScript/TypeScript SDK**

   * Check the value of the `UnstructuredClient` object's `serverURL` parameter. Make sure it is set to the right API URL, or an environment variable representing the right API URL.
   * Check the value of the `UnstructuredClient` object's `security: { apiKeyAuth }` parameter. Make sure it is set to the right API key, or an environment variable representing the right API key.

2. If you follow the recommended approach to use environment variables to represent the API key and API URL instead of hard-coding them into your script or code, check your session to see if these environment variables have been loaded, for example:

   * Run the command for your operating system to list all environment variables. Make sure the API key and API URL are listed and are set to the right ones. Make sure the names and cases of these environment variables match the ones in your script or code exactly.
   * If you use an integrated development environment (IDE) to run your script or code, check the IDE's documentation for how to specify and load environment variables at run time or debug time. Make sure the names and cases of these environment variables match the ones in your script or code exactly.

3. If you are still getting this issue, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

---

# Source: https://docs.unstructured.io/api-reference/legacy-api/partition/api-parameters.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Partition Endpoint parameters

<Note>
  The following information applies to the legacy Unstructured Partition Endpoint.

  Unstructured recommends that you use the
  [on-demand jobs](/api-reference/workflow/overview#run-an-on-demand-job) functionality in the
  [Unstructured API](/api-reference/overview) instead. Unstructured's on-demand jobs provide
  many benefits over the legacy Unstructured Partition Endpoint, including support for:

  * Production-level usage.
  * Multiple local input files in batches.
  * The latest and highest-performing models.
  * Post-transform enrichments.
  * All of Unstructured's chunking strategies.
  * The generation of vector embeddings.

  The Unstructured API also provides support for processing files and data in remote locations.
</Note>

The Unstructured Partition Endpoint provides parameters to customize the processing of documents. These parameters include:

The only required parameter is `files` -  the file you wish to process.

[//]: # "TODO: In the future we should automatically generate this table to make sure it's up to date"

| POST, Python                               | JavaScript/TypeScript                    | Description                                                                                                                                                                                                                                                                                                                                            |
| ------------------------------------------ | ---------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `files` (*shared.Files*)                   | `files` (*File*, *Blob*, *shared.Files*) | The file to process.                                                                                                                                                                                                                                                                                                                                   |
| `chunking_strategy` (*str*)                | `chunkingStrategy` (*string*)            | Use one of the supported strategies to chunk the returned elements after partitioning. When no chunking strategy is specified, no chunking is performed and any other chunking parameters provided are ignored. Supported strategies: `basic`, `by_title`, `by_page`, and `by_similarity`. [Learn more](/api-reference/legacy-api/partition/chunking). |
| `content_type` (*str*)                     | `contentType` (*string*)                 | A hint to Unstructured about the content type to use (such as `text/markdown`), when there are problems processing a specific file. This value is a MIME type in the format `type/subtype`. For available MIME types, see [model.py](https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/file_utils/model.py).                      |
| `coordinates` (*bool*)                     | `coordinates` (*boolean*)                | True to return bounding box coordinates for each element extracted with OCR. Default: false. [Learn more](/api-reference/legacy-api/partition/examples#saving-bounding-box-coordinates).                                                                                                                                                               |
| `encoding` (*str*)                         | `encoding` (*string*)                    | The encoding method used to decode the text input. Default: `utf-8`.                                                                                                                                                                                                                                                                                   |
| `extract_image_block_types` (*List\[str]*) | `extractImageBlockTypes` (*string\[]*)   | The types of elements to extract, for use in extracting image blocks as Base64 encoded data stored in element metadata fields, for example: `["Image","Table"]`. Supported filetypes are image and PDF. [Learn more](/api-reference/legacy-api/partition/extract-image-block-types).                                                                   |
| `gz_uncompressed_content_type` (*str*)     | `gzUncompressedContentType` (*string*)   | If file is gzipped, use this content type after unzipping. Example: `application/pdf`                                                                                                                                                                                                                                                                  |
| `hi_res_model_name` (*str*)                | `hiResModelName`  (*string*)             | The name of the inference model used when strategy is `hi_res`. Options are `layout_v1.1.0` and `yolox`. Default: `layout_v1.1.0`. [Learn more](/api-reference/legacy-api/partition/examples#changing-partition-strategy-for-a-pdf).                                                                                                                   |
| `include_page_breaks` (*bool*)             | `includePageBreaks` (*boolean*)          | True for the output to include page breaks if the filetype supports it. Default: false.                                                                                                                                                                                                                                                                |
| `languages` (*List\[str]*)                 | `languages` (*string\[]*)                | The languages present in the document, for use in partitioning and OCR. [View the list of available languages](https://github.com/tesseract-ocr/tessdata). [Learn more](/api-reference/legacy-api/partition/examples#specifying-the-language-of-a-document-for-better-ocr-results).                                                                    |
| `output_format` (*str*)                    | `outputFormat` (*string*)                | The format of the response. Supported formats are `application/json` and `text/csv`. Default: `application/json`.                                                                                                                                                                                                                                      |
| `pdf_infer_table_structure` (*bool*)       | `pdfInferTableStructure` (*boolean*)     | **Deprecated!** Use `skip_infer_table_types` instead. If true and `strategy` is `hi_res`, any `Table` elements extracted from a PDF will include an additional metadata field, `text_as_html`, where the value (string) is a just a transformation of the data into an HTML table.                                                                     |
| `skip_infer_table_types` (*List\[str]*)    | `skipInferTableTypes` (*string\[]*)      | The document types that you want to skip table extraction for. Default: `[]`.                                                                                                                                                                                                                                                                          |
| `starting_page_number` (*int*)             | `startingPageNumber` (*number*)          | The page number to be be assigned to the first page in the document. This information will be included in elements' metadata and can be be especially useful when partitioning a document that is part of a larger document.                                                                                                                           |
| `strategy` (*str*)                         | `strategy` (*string*)                    | The strategy to use for partitioning PDF and image files. Options are `auto`, `vlm`, `hi_res`, `fast`, and `ocr_only`. Default: `auto`. [Learn more](/api-reference/legacy-api/partition/partitioning).                                                                                                                                                |
| `unique_element_ids` (*bool*)              | `uniqueElementIds` (*boolean*)           | True to assign UUIDs to element IDs, which guarantees their uniqueness (useful when using them as primary keys in database). Otherwise a SHA-256 of the element's text is used. Default: false.                                                                                                                                                        |
| `vlm_model` (*str*)                        | (Not yet available)                      | Applies only when `strategy` is `vlm`. The name of the vision language model (VLM) provider to use for partitioning. `vlm_model_provider` must also be specified. For a list of allowed values, see the end of this article.                                                                                                                           |
| `vlm_model_provider` (*str*)               | (Not yet available)                      | Applies only when `strategy` is `vlm`. The name of the vision language model (VLM) to use for partitioning. `vlm_model` must also be specified. For a list of allowed values, see the end of this article.                                                                                                                                             |
| `xml_keep_tags` (*bool*)                   | `xmlKeepTags` (*boolean*)                | True to retain the XML tags in the output. Otherwise it will just extract the text from within the tags. Only applies to XML documents.                                                                                                                                                                                                                |

The following parameters only apply when a chunking strategy is specified. Otherwise, they are ignored. [Learn more](/api-reference/legacy-api/partition/chunking).

| POST, Python                     | JavaScript/TypeScript             | Description                                                                                                                                                                                                                                                                                                                                                                                    |
| -------------------------------- | --------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `combine_under_n_chars` (*int*)  | `combineUnderNChars` (*number*)   | Applies only when the chunking strategy is set to `by_title`. Use this parameter to combines small chunks until the combined chunk reaches a length of `n` characters. This can mitigate the appearance of small chunks created by short paragraphs, not intended as section headings, being identified as `Title` elements in certain documents. Default: the same value as `max_characters`. |
| `include_orig_elements` (*bool*) | `includeOrigElements` (*boolean*) | True (the default) to have the elements that are used to form a chunk appear in `.metadata.orig_elements` for that chunk.                                                                                                                                                                                                                                                                      |
| `max_characters` (*int*)         | `maxCharacters` (*number*)        | Cut off new sections after reaching a length of `n` characters. (This is a hard maximum.) Default: 500.                                                                                                                                                                                                                                                                                        |
| `multipage_sections` (*bool*)    | `multipageSections` (*boolean*)   | Applies only when the chunking strategy is set to `by_title`. Determines if a chunk can include elements from more than one page. Default: true.                                                                                                                                                                                                                                               |
| `new_after_n_chars` (*int*)      | `newAfterNChars` (*number*)       | Applies only when the chunking strategy is specified. Cuts off new sections after reaching a length of `n` characters. (This is a soft maximum.) Default: 1500.                                                                                                                                                                                                                                |
| `overlap` (*int*)                | `overlap` (*number*)              | A prefix of this many trailing characters from the prior text-split chunk is applied to second and later chunks formed from oversized elements by text-splitting. Default: none.                                                                                                                                                                                                               |
| `overlap_all` (*bool*)           | `overlapAll` (*boolean*)          | True to have an overlap also applied to "normal" chunks formed by combining whole elements. Use with caution, as this can introduce noise into otherwise clean semantic units. Default: none.                                                                                                                                                                                                  |
| `similarity_threshold` (*float*) | `similarityThreshold` (*number*)  | Applies only when the chunking strategy is set to `by_similarity`. The minimum similarity text in consecutive elements must have to be included in the same chunk. Must be between 0.0 and 1.0, exclusive (0.01 to 0.99, inclusive). Default: 0.5.                                                                                                                                             |

The following parameters are specific to the Python and JavaScript/TypeScript clients and are not sent to the server. [Learn more](/api-reference/legacy-api/partition/sdk-python#page-splitting).

| POST, Python                          | JavaScript/TypeScript                 | Description                                                                                                                                                                                                                                          |
| ------------------------------------- | ------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `split_pdf_page` (*bool*)             | `splitPdfPage` (*boolean*)            | True to split the PDF file client-side. [Learn more](/api-reference/legacy-api/partition/sdk-python#page-splitting).                                                                                                                                 |
| `split_pdf_allow_failed` (*bool*)     | `splitPdfAllowFailed` (*boolean*)     | When `true`, a failed split request will not stop the processing of the rest of the document. The affected page range will be ignored in the results. When `false`, a failed split request will cause the entire document to fail. Default: `false`. |
| `split_pdf_concurrency_level` (*int*) | `splitPdfConcurrencyLevel` (*number*) | The number of split files to be sent concurrently. Default: 5. Maximum: 15.                                                                                                                                                                          |
| `split_pdf_page_range` (*List\[int]*) | `splitPdfPageRange` (*number\[]*)     | A list of 2 integers within the range `[1, length_of_pdf]`. When pdf splitting is enabled, this will send only the specified page range to the API.                                                                                                  |

Need help getting started? Check out the [Examples page](/api-reference/legacy-api/partition/examples) for some inspiration.

Allowed values for `vlm_model_provider` and `vlm_model` pairs include the following:

<Warning>
  The following models are no longer available as of the following dates:

  * For `anthropic`, `claude-3-5-sonnet-20241022`: October 22, 2025
  * For `bedrock`, `us.anthropic.claude-3-5-sonnet-20241022-v2:0`: October 22, 2025

  Unstructured recommends the following actions:

  * For new workflows, do not use any of these models.
  * For any workflow that uses any of these models, update that workflow as soon as possible to use a different model.

  Workflows that attempt to use any of these models on or after its associated date will return errors.
</Warning>

| `vlm_model_provider` | `vlm_model`                                  |
| -------------------- | -------------------------------------------- |
| `anthropic`          | `claude-sonnet-4-20250514`                   |
| `bedrock`            | `us.amazon.nova-lite-v1:0`                   |
| `bedrock`            | `us.amazon.nova-pro-v1:0`                    |
| `bedrock`            | `us.anthropic.claude-3-haiku-20240307-v1:0`  |
| `bedrock`            | `us.anthropic.claude-3-opus-20240229-v1:0`   |
| `bedrock`            | `us.anthropic.claude-3-sonnet-20240229-v1:0` |
| `bedrock`            | `us.anthropic.claude-sonnet-4-20250514-v1:0` |
| `bedrock`            | `us.meta.llama3-2-11b-instruct-v1:0`         |
| `bedrock`            | `us.meta.llama3-2-90b-instruct-v1:0`         |
| `openai`             | `gpt-4o`                                     |
| `openai`             | `gpt-5-mini-2025-08-07`                      |
| `vertexai`           | `gemini-2.0-flash-001`                       |

---

# Source: https://docs.unstructured.io/api-reference/legacy-api/partition/api-validation-errors.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Endpoint errors

<Note>
  The following information applies to the legacy Unstructured Partition Endpoint.

  Unstructured recommends that you use the
  [on-demand jobs](/api-reference/workflow/overview#run-an-on-demand-job) functionality in the
  [Unstructured API](/api-reference/overview) instead. Unstructured's on-demand jobs provide
  many benefits over the legacy Unstructured Partition Endpoint, including support for:

  * Production-level usage.
  * Multiple local input files in batches.
  * The latest and highest-performing models.
  * Post-transform enrichments.
  * All of Unstructured's chunking strategies.
  * The generation of vector embeddings.

  The Unstructured API also provides support for processing files and data in remote locations.
</Note>

For the [Unstructured Python SDK](/api-reference/legacy-api/partition/sdk-python),
the [Unstructured Partition Endpoint](/api-reference/legacy-api/partition/overview) returns errors primarily through
the `UnstructuredClientError` class (the base class for all errors raised by the Unstructured Python SDK) and
the `HTTPValidationError` class (inherited from `UnstructuredClientError`). Less common errors are returned through the following classes:

* `httpx.RequestError`, the base class for request errors.
* `httpx.ConnectError`, for HTTP connection request errors.
* `httpx.TimeoutException`, for HTTP request timeout errors.
* `ServerError` (inherited from `UnstructuredClientError`), for server-side errors.
* `ResponseValidationError` (inherited from `UnstructuredClientError`), for type mismatches between the response data and the expected Pydantic model.

Each of the preceding classes has the following members:

| Member         | Type             | Description                                                        |
| -------------- | ---------------- | ------------------------------------------------------------------ |
| `message`      | `str`            | The eror message.                                                  |
| `status_code`  | `int`            | The HTTP response status code, for example `401`.                  |
| `headers`      | `httpx.Headers`  | A collection of HTTP response headers.                             |
| `body`         | `str`            | The HTTP body. This can be an empty string if no body is returned. |
| `raw_response` | `httpx.Response` | The raw HTTP response.                                             |

The following example shows how to handle the preceding errors. In this example,
a required Unstructured API key is intentionally commented out of the code, so that a
`401 Unauthorized` error is intentionally thrown.

```python  theme={null}
import os
import json

from unstructured_client import UnstructuredClient
from unstructured_client.models.operations import PartitionRequest
from unstructured_client.models.shared import (
    PartitionParameters,
    Files,
    Strategy
)
from unstructured_client.models.errors import (
    UnstructuredClientError,
    HTTPValidationError
)
from unstructured_client.models.errors.servererror import ServerError
from unstructured_client.models.errors.responsevalidationerror import ResponseValidationError
import httpx

try:
    client = UnstructuredClient(
        # For example, intentionally leave out the API key to intentionally throw an error.
        # api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")
    )

    filename = "PATH_TO_INPUT_FILE"

    request = PartitionRequest(
        partition_parameters=PartitionParameters(
            files=Files(
                content=open(filename, "rb"),
                file_name=filename,
            ),
            strategy=Strategy.VLM,
            vlm_model="gpt-4o",
            vlm_model_provider="openai",
            languages=['eng'],
            split_pdf_page=True,            # If True, splits the PDF file into smaller chunks of pages.
            # split_pdf_allow_failed=True,  # If True, the partitioning continues even if some pages fail.
            split_pdf_concurrency_level=15  # Set the number of concurrent request to the maximum value: 15.
        ),
    )

    response = client.general.partition(
        request=request
    )
    element_dicts = [element for element in response.elements]
    
    # Print the processed data's first element only.
    print(element_dicts[0])

    # Write the processed data to a local file.
    json_elements = json.dumps(element_dicts, indent=2)

    with open("PATH_TO_OUTPUT_FILE", "w") as file:
        file.write(json_elements)

except HTTPValidationError as e:
    print("Validation error (HTTP 422):", e)
except ServerError as e:
    print("Server error (HTTP 5XX):", e)
except ResponseValidationError as e:
    print("Response validation/type mismatch:", e)
except UnstructuredClientError as e:
    # This catches any other UnstructuredClientError not already caught above.
    # This and all of the other error classes in this example expose the following members:
    print("Other Unstructured client error:")
    print(f"Message:      {e.message}")
    print(f"Status code:  {e.status_code}")
    print(f"Body:         {e.body}")
    print(f"Raw response: {e.raw_response}")
    print(f"Headers:")

    for header in e.headers.raw:
        key = header[0].decode('utf-8')
        value = header[1].decode('utf-8')
        print(f"              {key}: {value}")

except httpx.ConnectError as e:
    print("HTTP connection error:", e)
except httpx.TimeoutException as e:
    print("HTTP timeout error:", e)
except httpx.RequestError as e:
    # This catches catch-all network errors from HTTP not already caught above.
    print("Other HTTPX request error:", e)
except Exception as e:
    # Optional: this catches any other unforeseen errors.
    print("Unexpected error:", e)
```

The results of running the preceding code are similar to the following:

```text  theme={null}
Message:      API error occurred: Status 401. Body: {"detail":"API key is missing, please provide an API key in the header."}
Status code:  401
Body:         {"detail":"API key is missing, please provide an API key in the header."}
Raw response: <Response [401 Unauthorized]>
Headers:
              date: <date-and-time-of-the-error>
              server: <server-identifier>
              content-length: 73
              content-type: application/json
```

---

# Source: https://docs.unstructured.io/ui/destinations/astradb.md

# Source: https://docs.unstructured.io/open-source/ingestion/source-connectors/astradb.md

# Source: https://docs.unstructured.io/open-source/ingestion/destination-connectors/astradb.md

# Source: https://docs.unstructured.io/api-reference/workflow/destinations/astradb.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Astra DB

<Note>
  If you're new to Unstructured, read this note first.

  Before you can create a destination connector, you must first sign in to your Unstructured account:

  * If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
    After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).
    To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/api-reference/overview#pricing).
  * If you already have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at
    [https://platform.unstructured.io](https://platform.unstructured.io). For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
    or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  After you sign in, the [Unstructured user interface](/ui/overview) (UI) appears, which you use to get your Unstructured API key.

  1. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business** account, click **API Keys** on the sidebar.<br />

     For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
     for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).

  2. Click **Generate API Key**.<br />

  3. Follow the on-screen instructions to finish generating the key.<br />

  4. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

  After you create the destination connector, add it along with a
  [source connector](/api-reference/workflow/sources/overview) to a [workflow](/api-reference/workflow/overview#workflows).
  Then run the worklow as a [job](/api-reference/workflow/overview#jobs). To learn how, try out the
  the notebook [Dropbox-To-Pinecone Connector API Quickstart for Unstructured](https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Dropbox_To_Pinecone_Connector_Quickstart.ipynb),
  or watch the two 4-minute video tutorials for the [Unstructured Python SDK](/api-reference/workflow/overview#unstructured-python-sdk).

  You can also create destination connectors with the Unstructured user interface (UI).
  [Learn how](/ui/destinations/overview).

  If you need help, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  You are now ready to start creating a destination connector! Keep reading to learn how.
</Note>

Send processed data from Unstructured to Astra DB.

The requirements are as follows.

<iframe width="560" height="315" src="https://www.youtube.com/embed/PMs1iwL52aM" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* An IBM Cloud account or DataStax account.

  * For an IBM Cloud account, [sign up](https://cloud.ibm.com/registration) for an IBMid, and then [sign in](https://accounts.datastax.com/session-service/v1/login) to DataStax with your IBMid.
  * For a DataStax account, [sign up](https://astra.datastax.com/signup) for a DataStax account, and then [sign in](https://accounts.datastax.com/session-service/v1/login) to DataStax with your DataStax account.

* An Astra DB database in the DataStax account. To create a database:

  a. After you sign in to DataStax, click **Create database**.<br />
  b. Click the **Serverless (vector)** tile, if it is not already selected.<br />
  c. For **Database name**, enter some unique name for the database.<br />
  d. Select a **Provider** and a **Region**, and then click **Create database**.<br />

  [Learn more](https://docs.datastax.com/en/astra-db-classic/databases/manage-create.html).

* An application token for the database. To create an application token:

  a. After you sign in to DataStax, in the list of databases, click the name of the target database.<br />
  b. On the **Overview** tab, under **Database Details**, in the **Application Tokens** tile, click **Generate Token**.<br />
  c. Enter some **Token description** and select and **Expiration** time period, and then click **Generate token**.<br />
  d. Save the application token that is displayed to a secure location, and then click **Close**.<br />

  [Learn more](https://docs.datastax.com/en/astra-db-serverless/administration/manage-application-tokens.html).

* A keyspace in the database. To create a keyspace:

  a. After you sign in to DataStax, in the list of databases, click the name of the target database.<br />
  b. On the **Data Explorer** tab, in the **Keyspace** list, select **Create keyspace**.<br />
  c. Enter some **Keyspace name**, and then click **Add keyspace**.<br />

  [Learn more](https://docs.datastax.com/en/astra-db-serverless/databases/manage-keyspaces.html#keyspaces).

* A collection in the keyspace.

  For the [Unstructured UI](/ui/overview) and [Unstructured API](/api-reference/overview):

  * An existing collection is not required. At runtime, the collection behavior is as follows:

    * If an existing collection name is specified, and Unstructured generates embeddings,
      but the number of dimensions that are generated does not match the existing collection's embedding settings, the run will fail.
      You must change your Unstructured embedding settings or your existing collection's embedding settings to match, and try the run again.
    * If a collection name is not specified, Unstructured creates a new collection in your keyspace. If Unstructured generates embeddings,
      the new collection's name will be `u<short-workflow-id>_<short-embedding-model-name>_<number-of-dimensions>`.
      If Unstructured does not generate embeddings, the new collection's name will be `u<short-workflow-id`.

  For [Unstructured Ingest](/open-source/ingestion/overview):

  * For the source connector only, an existing collection is required.
  * For the destination connector only, an existing collection is not required. At runtime, the collection behavior is as follows:

    * If an existing collection name is specified, and Unstructured generates embeddings,
      but the number of dimensions that are generated does not match the existing collection's embedding settings, the run will fail.
      You must change your Unstructured embedding settings or your existing collection's embedding settings to match, and try the run again.
    * If a collection name is not specified, Unstructured creates a new collection in your keyspace. The new collection's name will be `unstructuredautocreated`.

  To create a collection yourself:

  a. After you sign in to DataStax, in the list of databases, click the name of the target database.<br />
  b. On the **Data Explorer** tab, in the **Keyspace** list, select the name of the target keyspace.<br />
  c. In the **Collections** list, select **Create collection**.<br />
  d. Enter some **Collection name**.<br />
  e. Turn on **Vector-enabled collection**, if it is not already turned on.<br />
  f. Choose a mode for **Embedding generation method**. See [Astra DB generated embeddings](#astra-db-generated-embeddings).<br />
  g. If you chose **Bring my own**, enter the number of dimensions for the embedding model that you plan to use.<br />
  h. For **Similarity metric**, select **Cosine**.<br />
  i. Click **Create collection**.<br />

  [Learn more](https://docs.datastax.com/en/astra-db-serverless/databases/manage-collections.html#create-collection).

To create an Astra DB destination connector, see the following examples.

<CodeGroup>
  ```python Python SDK theme={null}
  import os

  from unstructured_client import UnstructuredClient
  from unstructured_client.models.operations import CreateDestinationRequest
  from unstructured_client.models.shared import CreateDestinationConnector

  with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
      response = client.destinations.create_destination(
          request=CreateDestinationRequest(
              create_destination_connector=CreateDestinationConnector(
                  name="<name>",
                  type="astradb",
                  config={
                      "token": "<token>",
                      "api_endpoint": "<api-endpoint>",
                      "collection_name": "<collection-name>",
                      "keyspace": "<keyspace>",
                      "batch_size": <batch-size>,
                      "flatten_metadata": <True|False>,
                      "binary_encode_vectors": <True|False>,
                      "enable_lexical_search": <True|False>,
                      "astra_generated_embeddings": <True|False>
                  }
              )
          )
      )

      print(response.destination_connector_information)
  ```

  ```bash curl theme={null}
  curl --request 'POST' --location \
  "$UNSTRUCTURED_API_URL/destinations" \
  --header 'accept: application/json' \
  --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
  --header 'content-type: application/json' \
  --data \
  '{
      "name": "<name>",
      "type": "astradb",
      "config": {
          "token": "<token>",
          "api_endpoint": "<api-endpoint>",
          "collection_name": "<collection-name>",
          "keyspace": "<keyspace>",
          "batch_size": <batch-size>,
          "flatten_metadata": "<true|false>",
          "binary_encode_vectors": "<true|false>",
          "enable_lexical_search": "<true|false>",
          "astra_generated_embeddings": "<true|false>"
      }
  }'
  ```
</CodeGroup>

Replace the preceding placeholders as follows:

* `<name>` (*required*) - A unique name for this connector.
* `<token>` (*required*) - The application token for the database.
* `<api-endpoint>` (*required*) - The database's associated API endpoint.
* `<collection-name>` - The name of the collection in the keyspace. If no value is provided, see the beginning of this article for the behavior at run time.
* `<keyspace>` - The name of the keyspace in the collection. The default is `default_keyspace` if not otherwise specified.
* `<batch-size>` - The maximum number of records to send per batch. The default is `20` if not otherwise specified.
* `flatten_metadata` - Set to `true` to flatten the metadata into each record. Specifically, when flattened, the metadata key values are brought to the top level of the element, and the `metadata` key itself is removed. By default, the metadata is not flattened (`false`).
* `binary_encode_vectors` - Set to `false` to upload vectors as a list of numbers. By default, vectors are binary encoded before sending (`true`). This is more efficient, but it makes it harder to view and work with vectors in the UI.
* `enable_lexical_search` - Set to `true` to enable lexical and hybrid search. See [Lexical search](#lexical-search).
* `astra_generated_embeddings` - Set to `true` to use an Astra vectorize integration for embeddings. See [Astra DB generated embeddings](#astra-db-generated-embeddings).

## Lexical search

When **Enable Lexical Search** is enabled, document text will be inserted into the `$lexical` field for lexical and hybrid search capabilities.

<Note>
  Your collection must be configured for lexical search, and your database must be in one of the supported AWS regions. For more information, see [Find data with lexicographical matching](https://docs.datastax.com/en/astra-db-serverless/databases/lexical-search.html) in the Astra DB documentation. Otherwise, you will encounter the following error when running a job:

  ` The Collection <keyspace>.<collection> does not have Lexical feature enabled. (LEXICAL_NOT_ENABLED_FOR_COLLECTION) [with 0 inserted ids])`
</Note>

## Astra DB generated embeddings

When **Astra DB Generated Embeddings** is enabled:

* Your document text will be inserted into the `$vectorize` field.
* Astra DB will automatically populate the `$vector` field with embeddings.
* Your workflow **must not** include an embedder node.
* You **must** have an embedding provider configured for your collection.

When **Astra DB Generated Embeddings** is disabled (the default):

* Unstructured will insert embeddings directly into the `$vector` field
* Your workflow **must** include an embedder node.

For more information, see [Auto-generate embeddings with vectorize](https://docs.datastax.com/en/astra-db-serverless/databases/embedding-generation.html) in the Astra DB documentation.

### Troubleshooting

You might encounter the following errors when running a job that uses this connector:

**Error**:

`Unable to vectorize data, embedding service not configured for the collection : <collection> (EMBEDDING_SERVICE_NOT_CONFIGURED) [with 0 inserted ids])`

You've enabled **Astra DB Generated Embeddings**, but your collection does not have an embedding provider. Either disable **Astra DB Generated Embeddings** in your connector settings, or recreate your collection with an embedding provider.

**Error**:

`Cannot use Unstructured embeddings and Astra-generated embeddings simultaneously. Please disable Astra generated embeddings or remove the Unstructured embedder.`

**or**:

`No vectors provided. Please enable an Unstructured embedding provider or configure Astra to generate embeddings.`

You must choose either Unstructured or Astra DB embeddings. Configuring neither, or both, is unsupported.

---

# Source: https://docs.unstructured.io/support/issues/authorization-permissions.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Authorization and permissions issues

## Issues

When you try to connect Unstructured to a specific source or destination, you get one of the following error types:

* `PermissionError`: For example, for Amazon S3, an `Access Denied` error.
* `ClientAuthenticationError`: For example, for Azure Blob Storage, an `AuthenticationFailed` or `Signature not valid` error.
* `AuthError`: For example, for Dropbox, an `expired_access_token` error.
* `ApiPermissionError`: For example, for Confluence, a `permission to view content` error.
* `ClientRequestException`: For example, for OneDrive, an `AccessDenied` error.
* `ValueError`: For example, for Outlook, an `invalid_grant` or `Conditional Access policies` error. For Google Drive, a `File not found related to auth` error.
* `UserError`: For example, for Box, an `Access denied - insufficient permission` error.
* `HttpResponseError`: For example, for Azure Blob Storage, an `AccountIsDisabled` error.

## Possible causes

Unstructured could not access the specified source or destination system due to invalid credentials, insufficient permissions, or restrictive policies. This error can still occur even though the related connector passed Unstructured's connection test.

Possible causes include:

* An incorrect API key, token, password, service account credential, tenant ID, or client ID was specified.
* The specified credentials or access token has expired.
* Insufficient permissions were granted; for example, read or list access is needed for indexing data, or write access is required for uploading data.
* The source's or destination's associated account is disabled or inactive.
* For Microsoft Entra ID, Conditional Access Policies are blocking the authentication flow.
* The specified credentials might be valid, but an incorrect configuration points to a resource that is not authorized for those credentials.

## Possible solutions

* **Verify the credentials**: Double-check all authentication details—keys, secrets, tokens, usernames, passwords, and IDs—for typos and accuracy.
* **Check the expiration**: Ensure that the specified token or key has not expired. Regenerate them if needed.
* **Review permissions**: Confirm that the credentials have the required permissions for the operation—for example, `s3:ListBucket` and `s3:GetObject` for Amazon S3 indexing; `Files.Read.All` and `Sites.Read.All` for OneDrive or SharePoint; write permissions for upload destinations—and grant the necessary roles and permissions in the source and destination systems.
* **Check the status of the account and resource**: Ensure that the user account or service principal is active, and the target resource—such as the Azure Storage Account—is enabled.
* **Check the Conditional Access Policies (for Azure Blob Storage)**: Review the Microsoft Entra ID Conditional Access Policies. They might be blocking non-interactive sign-ins or require specific compliance. Adjust the policies or exclude the Unstructured service principal if appropriate and secure.
* **Reconfigure the connector**: Delete and recreate the source or destination connector configuration in Unstructured with verified credentials.

## Additional resources

To ask questions or get additional help with this issue, see [requesting support](/support/request).

---

# Source: https://docs.unstructured.io/api-reference/legacy-api/aws.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Unstructured API on AWS

> Follow these steps to deploy the Unstructured API service into your AWS account.

<Warning>
  The Unstructured API on AWS is deprecated. It is no longer supported and is not being actively updated.
  Unstructured is now available on the AWS Marketplace as a private offering. To explore supported options
  for running Unstructured within your virtual private cloud (VPC), email Unstructured Sales at
  [sales@unstructured.io](mailto:sales@unstructured.io).

  This page is not being actively updated. It might contain out-of-date information. This page is provided for legacy reference purposes only.
</Warning>

<Warning>
  This article describes how to create several interrelated resources in your AWS account.
  Your AWS account will be charged on an ongoing basis for these resources, even if you are not actively using them.<br /><br />
  Manually stopping or terminating the associated Amazon EC2 instances alone will not reduce these ongoing charges.<br /><br />
  To stop accruing all related ongoing charges, you must delete all of the associated AWS resources.
  To do this, see [Manage related AWS account costs](#manage-related-aws-account-costs).
</Warning>

*Estimated time to complete: 30 minutes*

The requirements are as follows.

1. **An AWS account**:

   * If you have an existing account, log in: [https://aws.amazon.com/](https://aws.amazon.com) > **Sign In to the Console**.

   * If you do not have an existing account, create one: [https://aws.amazon.com/free](https://aws.amazon.com/free) > **Create a Free Account**.

2. **IAM permissions**: In a later step, AWS CloudFormation creates required infrastructure in your account. To learn how to create the associated role, see [Creating IAM Roles with AWS CloudFormation](https://blog.awsfundamentals.com/aws-iam-roles-with-aws-cloudformation#heading-creating-iam-roles-with-aws-cloudformation).

3. **SSH key pair**: For secure access to the Amazon EC2 instance that CloudFormation creates in a later step, create an SSH key pair. To learn how, see [Create a key pair for your Amazon EC2 instance](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/create-key-pairs.html).

## Part I: Setting up the Virtual Private Cloud (VPC)

*Note: If you have already configured a Virtual Private Cloud (VPC) for your organization that meets the requirements for deploying the Unstructured API, you may skip this part and proceed to the Part II. Ensure that your existing VPC setup includes the necessary subnets, internet gateway, and route tables as outlined in this guide.*

In Part I, you will construct a resilient and secure infrastructure within AWS by setting up a Virtual Private Cloud (VPC). Your VPC will encompass a dual-tiered subnet model consisting of both **public** and **private** subnets across multiple Availability Zones (AZs).

You will establish the foundational network structure for deploying the Unstructured API by creating two public subnets and one private subnet within your VPC. The public subnets will host resources that require direct access to the internet, such as a load balancer, enabling them to communicate with external users. The private subnet is designed for resources that should not be directly accessible from the internet, like EC2 Compute Engine.

<img src="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Infrastructure_Diagram.png?fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=6dd574f63ada8a4ea2c1248ec4461536" alt="Infrastructure Diagram" data-og-width="2088" width="2088" data-og-height="1478" height="1478" data-path="img/api/Infrastructure_Diagram.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Infrastructure_Diagram.png?w=280&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=298b335bce5b6e1faa8121d412d1ac55 280w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Infrastructure_Diagram.png?w=560&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=965c6d0b2bcfe15d87695a51e1feb140 560w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Infrastructure_Diagram.png?w=840&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=b9cdd369734cc828e9f8943d1d07eb06 840w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Infrastructure_Diagram.png?w=1100&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=bfb9153e118ba68dcb0c42cb4ca6825b 1100w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Infrastructure_Diagram.png?w=1650&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=0a5568932d0d0bc31bcb94e07f28b43a 1650w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Infrastructure_Diagram.png?w=2500&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=ebabedb483d15e2bdd969fdec8ae5bf0 2500w" />

1. **Access the VPC dashboard**:

   a.   In the AWS Management Console, in the top menu bar, click **Services > Networking & Content Delivery > VPC**.<br />

   b.   In the sidebar, click **Your VPCs**, and then click **Create VPC**.<br />

2. **Create the VPC**:

   a.   Select **VPC only**.<br />

   b.   Enter a **Name tag** for your VPC.<br />

   c.   Specify the **IPv4 CIDR block** (for example, `10.0.0.0/16`).<br />

   d.   You may leave **IPv6 CIDR block**, **Tenancy**, and **Tags** settings at their defaults.<br />

   e.   Click **Create VPC**.

<img src="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/VPC_Step2.png?fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=ec11326c56aa24f0838eaaad3604d387" alt="create vpc" data-og-width="1340" width="1340" data-og-height="1698" height="1698" data-path="img/api/VPC_Step2.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/VPC_Step2.png?w=280&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=f317928d55d4fca07264c1f48fb35039 280w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/VPC_Step2.png?w=560&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=9bd7e325d615610ce83257e400694c74 560w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/VPC_Step2.png?w=840&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=cd06c656301599b621fd40e57237bd54 840w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/VPC_Step2.png?w=1100&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=65f628ddceb01d144f3fb3d5d54f6e3a 1100w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/VPC_Step2.png?w=1650&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=c9767dbf0e89f28b848e9d7b3ff080b6 1650w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/VPC_Step2.png?w=2500&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=c2e552f6597dc824ed4820cac7595fe8 2500w" />

3. **Create the subnets**:

   a.   After creating the VPC, in the sidebar, click **Subnets**.<br />

   b.   Click **Create subnet**.<br />

   c.   In the **VPC ID** dropdown menu. select the VPC that you just created.<br />

   d.   For the first public subnet:<br />

   * Enter a **Subnet name**.

   * Select an **Availability Zone**.

   * Specify the **IPv4 CIDR block** (for example, `10.0.0.0/16`).

   * Specify the **IPv4 subnet CIDR block** (for example, `10.0.1.0/24`).

   * You may leave the **Tags** setting at its default.

   * Click **Add new subnet**. (Do not click **Create subnet** yet.)

   e.  Repeat the process for the second public subnet with a different **Availability Zone** and **IPv4 subnet CIDR block** (for example, `10.0.2.0/24`).<br />

   * *Note: Each subnet must reside entirely within one Availability Zone and cannot span zones. If you specify the same Availability Zone or IPv4 subnet CIDR block as the first public subnet, AWS CloudFormation might fail in a later step*.

   * To learn more, see [Subnet basics](https://docs.aws.amazon.com/vpc/latest/userguide/configure-subnets.html#subnet-basics).

   * Click **Add new subnet**. (Do not click **Create subnet** yet.)

   f.  Repeat the process for the private subnet with a different **Availability Zone** and **IPv4 subnet CIDR block** (for example, `10.0.3.0/24`).<br />

   * *Note: Each subnet must reside entirely within one Availability Zone and cannot span zones. If you specify the same Availability Zone or IPv4 subnet CIDR block as the first or second public subnets, AWS CloudFormation might fail in a later step*.

   g.  Click **Create subnet**.

<img src="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/VPC_Step3.png?fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=1baf235fd52b412b1a733a0d7b5788ff" alt="create subnet" data-og-width="1340" width="1340" data-og-height="1850" height="1850" data-path="img/api/VPC_Step3.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/VPC_Step3.png?w=280&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=bfe8f00995bd3642dbc5cf6c9f89d795 280w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/VPC_Step3.png?w=560&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=125f69689164792df79118888773acc7 560w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/VPC_Step3.png?w=840&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=dcbc06f6ec68c0b617fb07c52212fbe5 840w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/VPC_Step3.png?w=1100&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=2598e2744d4733a26e7077b04a7247e1 1100w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/VPC_Step3.png?w=1650&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=21faa5f03f0671ce3d986a6ccf264a8e 1650w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/VPC_Step3.png?w=2500&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=f0cfa2b855311b69d457425b052cbe91 2500w" />

4. **Create the internet gateway (for the public subnets)**:

   a.   In the sidebar, click **Internet gateways**.<br />

   b.   Click **Create internet gateway**, enter a **Name tag**, and click **Create internet gateway**.<br />

   c.   In the sidebar, click **Internet gateways** again.<br />

   d.   Click the **Internet gateway ID** for the internet gateway that you just created.<br />

   e.   Click **Actions > Attach to VPC**.<br />

   f.   In the **Available VPCs** dropdown list, select the VPC from *Step 2 - Create the VPC*.<br />

   g.   Click **Attach internet gateway**.<br />

<img src="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/VPC_Step4.png?fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=7e167f41fbed50320e5c4a87777a66ec" alt="create internet gateway" data-og-width="1326" width="1326" data-og-height="944" height="944" data-path="img/api/VPC_Step4.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/VPC_Step4.png?w=280&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=af27fa1a7b0fdc2e240585d9ce128dee 280w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/VPC_Step4.png?w=560&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=4b7a5acf5d8e560842caf3946fa92621 560w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/VPC_Step4.png?w=840&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=69cd5ee12fe1437362de5e53590a2934 840w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/VPC_Step4.png?w=1100&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=1419674cdc06b3ac8dbf95a05c9f76c9 1100w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/VPC_Step4.png?w=1650&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=de76c03570d7a886f1696685a00038e0 1650w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/VPC_Step4.png?w=2500&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=73a9829da8f72bd480d28ca079858f25 2500w" />

5. **Set up route tables (for the public subnets)**:

   AWS automatically created a default route table in *Step 3 - Create the subnets*. To tailor your network architecture, you will create a new route table specifically for your public subnets, which will include a route to the internet gateway from *Step 4 - Create the internet gateway (for the public subnets)*.

   a.   In the sidebar, click *Route tables*.

   b.   Click **Create route table**.

   c.   Enter a **Name**.

   d.   Select the **VPC** from *Step 2 - Create the VPC*.

   e.   Click **Create route table**.

<img src="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/VPC_Step5.png?fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=9d5ddaee0513a9e596cb0a820fb5adae" alt="create route table" data-og-width="1326" width="1326" data-og-height="1062" height="1062" data-path="img/api/VPC_Step5.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/VPC_Step5.png?w=280&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=c18deda746ff5d2dcf219147220985a5 280w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/VPC_Step5.png?w=560&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=914bbad7f28cbe6e866ec6e4d399463c 560w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/VPC_Step5.png?w=840&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=909f440ff51c8c64d055630d796e745d 840w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/VPC_Step5.png?w=1100&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=426d21437b649615a21a29c66a4ee404 1100w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/VPC_Step5.png?w=1650&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=6041c172b3e28e5200c5f937097224e7 1650w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/VPC_Step5.png?w=2500&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=2d6bed52dc666962d931dc06ee0a3a0e 2500w" />

6. **Associate public subnets to the route table and internet gateway**:

   a.  Connect the **public subnets** to the **route table** from *Step 5 - Set up route tables (for the public subnets)*:<br />

   * In the sidebar, click **Subnets**.

   * Select the first public subnet from *Step 3 - Create the subnets*.

   * Click **Actions > Edit route table association**.

   * In the **Route table ID** dropdown list, select the route table from *Step 5 - Set up route tables (for the public subnets)*, and then click **Save**.

   * Repeat the process for the second public subnet.

   b.   Now, you’ll ensure that the two public subnets can access the internet by connecting the route table to the internet gateway:<br />

   * In the sidebar, click **Route tables**.

   * Select the route table from *Step 5 - Set up route tables (for the public subnets)*.

   * Click **Actions > Edit routes**.

   * Click **Add route**, in the destination box, enter `0.0.0.0/0`, which represents all IP addresses. In the **Target** dropdown list, select **Internet Gateway**, and select the internet gateway from *Step 4 - Create the internet gateway (for the public subnets)*.

   * Click **Save changes** to establish the route, granting internet access to the first and second public subnets at the same time.

   c.  For the **private subnet**:

   * In the sidebar, click **Subnets**.

   * Select the private subnet from *Step 3 - Create the subnets*.

   * Click **Actions > Edit route table association**.

   * In the **Route table ID** dropdown list, select the main route table, or create and then select a new route table without a route to the internet gateway.

   * Click **Save**.

<img src="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/VPC_Step6.png?fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=44b90b3777420721b5d502228dd4ed8c" alt="connect public subnet to route table" data-og-width="1334" width="1334" data-og-height="834" height="834" data-path="img/api/VPC_Step6.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/VPC_Step6.png?w=280&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=d68aafa51b86619a77128cf8a3c02db1 280w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/VPC_Step6.png?w=560&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=ecda8e5c7956357209d9e2e2c8747c0a 560w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/VPC_Step6.png?w=840&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=180e69db53924751b2af2cea456f6354 840w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/VPC_Step6.png?w=1100&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=b59a23a24f4a778f8d13ff993ce9fc41 1100w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/VPC_Step6.png?w=1650&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=8e65504ccdc66c72108328bf6b9a77bf 1650w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/VPC_Step6.png?w=2500&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=5c3a7a558c997cef4fd937f1e362e1bd 2500w" /> <img src="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/VPC_Step7.png?fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=160ebfaf0730506745e011ee1a7d2c56" alt="edit routes" data-og-width="3284" width="3284" data-og-height="744" height="744" data-path="img/api/VPC_Step7.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/VPC_Step7.png?w=280&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=6a57cfa0ce475043af725aa3c49fb8c7 280w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/VPC_Step7.png?w=560&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=6c569a7a22b1b4b327d21e289d5d7666 560w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/VPC_Step7.png?w=840&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=a39262104d7fc2b18a439f5d95f19577 840w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/VPC_Step7.png?w=1100&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=93da0d4e428ae09a3611028d089e84b8 1100w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/VPC_Step7.png?w=1650&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=dc7626b09cac4c0aa0e31ee4eb75ecd2 1650w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/VPC_Step7.png?w=2500&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=62ca818f5b757b4ffb5f087bc4a4ebaa 2500w" />

7. **Inspect the VPC resource map**:

   You can check the configurations from the resource maps on the VPC details dashboard by clicking **Your VPCs** in the sidebar, clicking the **VPC ID** for your VPC, and then clicking the **Resource map** tab.

<img src="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/VPC_Step8.png?fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=72bb4ba6007ff4308ce76b1b6690bc7e" alt="VPC Resource Maps" data-og-width="1501" width="1501" data-og-height="935" height="935" data-path="img/api/VPC_Step8.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/VPC_Step8.png?w=280&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=9cbb420954d0cb0a890b3b25430a79ca 280w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/VPC_Step8.png?w=560&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=e8a7bd344403f486374023b5d2a6d73c 560w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/VPC_Step8.png?w=840&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=dbe6bf2217039fa4d8fa34e62bbc77d4 840w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/VPC_Step8.png?w=1100&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=283356cefe4b05f9a32e7e442066ff64 1100w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/VPC_Step8.png?w=1650&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=7d8539b851a0e240ff61c7d65d40a246 1650w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/VPC_Step8.png?w=2500&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=8ab95cea25499be809892ca3fea736c0 2500w" />

## Part II: Deploying the Unstructured API from the AWS Marketplace

8. **Go to the Unstructured API page on AWS Marketplace**:

   a.   Leaving the VPC dashboard from Part I open, in a separate web browser tab, go to the [Unstructured API](http://aws.amazon.com/marketplace/pp/prodview-fuvslrofyuato) product page in the AWS Marketplace.

   b.   Click **Continue to Subscribe**.

   c.   Review the terms and conditions.

   d.   Click **Continue to Configuration**.

<img src="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Marketplace_Step8.png?fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=f917128bf235ad278165a22371a1db01" alt="Unstructured API on AWS Marketplace" data-og-width="2688" width="2688" data-og-height="1722" height="1722" data-path="img/api/Marketplace_Step8.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Marketplace_Step8.png?w=280&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=842ddd37d49dc03570a72206bcca829d 280w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Marketplace_Step8.png?w=560&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=2ac18dc33fb9f575bae3380c6d71a71e 560w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Marketplace_Step8.png?w=840&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=e9204ef02addcadce8dad701982a1bc5 840w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Marketplace_Step8.png?w=1100&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=53ee50a48bfce72634a2d80b315ac75e 1100w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Marketplace_Step8.png?w=1650&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=515565fa94d448d535d01c440dbf4b8d 1650w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Marketplace_Step8.png?w=2500&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=c7fe3e74bc133216f5348ad854620059 2500w" />

9. **Configure the CloudFormation template**:

   a.   In the **Fulfillment option** dropdown list, select **CloudFormation Template**.

   b.   For **Fulfillment option** and **Software version**, leave the default `UnstructuredAPI` template and software version.

   c.   In the **Region** dropdown list, select the Region that corresponds to the VPC from Part I.

   * *Note: You must select the same Region where you set up the VPC in Part I. To find the Region, on the VPC dashboard tab from Part I that you left open, with your VPC displayed, find the VPC's Region name next to your username in the top navigation bar.*

   d.   Click **Continue to Launch**.

   e.   In the **Choose Action** dropdown list, select **Launch CloudFormation**.

   f.   Click **Launch**.

<img src="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Marketplace_Step9.png?fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=a4ecd07c1df8e06747d3e88e2f31a614" alt="CloudFormation Configuration" data-og-width="1678" width="1678" data-og-height="1728" height="1728" data-path="img/api/Marketplace_Step9.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Marketplace_Step9.png?w=280&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=2ab7c0fef6e88588249b553d0de27d9d 280w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Marketplace_Step9.png?w=560&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=8d055fd66d48af28c9c74aa2387e6a3b 560w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Marketplace_Step9.png?w=840&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=1c29476ac110afa9cc2152e9ab429ece 840w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Marketplace_Step9.png?w=1100&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=f2586baa3040283c00cbeb8d9cec2dd0 1100w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Marketplace_Step9.png?w=1650&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=301da88a3d88a3395e8aa1b145ea4b58 1650w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Marketplace_Step9.png?w=2500&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=7c8f0d181bc33616d212f9fdae382afb 2500w" />

10. **Create the CloudFormation stack**:

After you click **Launch**, the **Create stack** page appears in CloudFormation.

**Step 1: Create the stack**

a.   Leave **Choose an existing template** selected.

b.   Leave **Amazon S3 URL** selected and the default **Amazon S3 URL** value unchanged.

c.   Click **Next**.

<img src="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Marketplace_Step10a.png?fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=df9dbb66b39813cdc1fca56e6a27f040" alt="Create Stack" data-og-width="2782" width="2782" data-og-height="1196" height="1196" data-path="img/api/Marketplace_Step10a.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Marketplace_Step10a.png?w=280&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=321e94978b34a415ac1127726c2ee6a9 280w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Marketplace_Step10a.png?w=560&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=44fd78cdd9fbd15141908ae13bdc8945 560w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Marketplace_Step10a.png?w=840&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=6083acef9241f3608e7ffb95ea9e04b5 840w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Marketplace_Step10a.png?w=1100&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=6d25e0a24a0343712edb289dcddf7620 1100w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Marketplace_Step10a.png?w=1650&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=b239a366f69e2de22d6d4ab5aaffd29b 1650w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Marketplace_Step10a.png?w=2500&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=86e60b6d8f4150cc9321429c3746ecd9 2500w" />

**Step 2: Specify the stack's details**

a.   Enter some unique **Stack name**.

b.   In the **Parameters** section, in the **InstanceType** drop-down list, select **m5.xlarge**.

c.   In the **KeyName** drop-down list, select the name of the SSH key pair from the beginning of this article.

d.   In the **LoadBalancerScheme** dropdown list, select **internet-facing**.

e.   For **SSHLocation**, enter `0.0.0.0/0`, but only if you allow public access on the internet.

* **Note**: It is generally recommended to limit SSH access to a specific IP range for enhanced security. This can be done by setting the `SSHLocation` to the IP address or range associated with your organization. Please consult your IT department or VPN vendor to obtain the correct IP information for these settings.

* AWS provides `AWS Client VPN`, which is a managed client-based VPN service that enables secure access AWS resources and resources in your on-premises network. To learn more, see [Getting started with AWS Client VPN](https://docs.aws.amazon.com/vpn/latest/clientvpn-admin/cvpn-getting-started.html).

f.   In the **Subnets** dropdown multiselect list, select the two public subnets and the private subnet from Part I.

g.   In the **VPC** dropdown list, select the VPC from Part I.

h.   You can leave the default values for all of the other **Parameters** fields.

i.   Click **Next**.

<img src="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Marketplace_Step10b.png?fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=78c899b28a5c867d8210213152c6206f" alt="Specify stack details" data-og-width="2376" width="2376" data-og-height="1848" height="1848" data-path="img/api/Marketplace_Step10b.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Marketplace_Step10b.png?w=280&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=49b31ba13ddc4f5326fae932910c6b3c 280w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Marketplace_Step10b.png?w=560&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=848f8934e3059ae1d2f597ef20a18c26 560w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Marketplace_Step10b.png?w=840&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=226a869b90bff97afa9cb7270ee21312 840w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Marketplace_Step10b.png?w=1100&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=c817a66c3e77ddd7ba0762a6d3b87f0e 1100w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Marketplace_Step10b.png?w=1650&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=814d5d0253759e0bbf4532f7c98a6daf 1650w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Marketplace_Step10b.png?w=2500&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=d4c9a1a0ca146e2e8d526992b0516872 2500w" />

**Step 3: Configure the stack's options**

a.   You can leave the default values, or specify any non-default stack options.

b.   Click **Next**.

<img src="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Marketplace_Step10c.png?fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=deeac8275caba0cffda00601572eb8be" alt="Specify stack options" data-og-width="2376" width="2376" data-og-height="1848" height="1848" data-path="img/api/Marketplace_Step10c.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Marketplace_Step10c.png?w=280&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=7f3f892a5a9538d49cc0afbd9d1915d8 280w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Marketplace_Step10c.png?w=560&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=4442dda77bbf14c604cdebf56ebfb156 560w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Marketplace_Step10c.png?w=840&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=0487a07fd6d5a9a7e5e7bfc099e4b930 840w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Marketplace_Step10c.png?w=1100&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=dab2677ebd390441404be78552185fe5 1100w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Marketplace_Step10c.png?w=1650&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=ece043912024317b8aa5bf67947d0f34 1650w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Marketplace_Step10c.png?w=2500&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=a55831e84fe155c737ed1297d2ed8d31 2500w" />

**Step 4: Review**

a.   Review the stack's settings.

b.   Click **Submit**.

<img src="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Marketplace_Step10d.png?fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=d520dd22e3b3a4654da461aa05164108" alt="Review stack" data-og-width="2376" width="2376" data-og-height="1848" height="1848" data-path="img/api/Marketplace_Step10d.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Marketplace_Step10d.png?w=280&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=2ebcddc1d77fce816981e0f2c25256ad 280w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Marketplace_Step10d.png?w=560&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=77348d02e1944e0c3e0011042d002bfb 560w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Marketplace_Step10d.png?w=840&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=3ee3a4bee2a82c7f714271f892b3bda9 840w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Marketplace_Step10d.png?w=1100&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=3293532d31e83c80084c9b3fb16e8d20 1100w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Marketplace_Step10d.png?w=1650&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=a38b5acd675d92d868b778a10977ac69 1650w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Marketplace_Step10d.png?w=2500&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=1a8d8a2ffb236820a50dc75532ca23cd 2500w" />

11. **Get the Unstructured API endpoint**:

a.   The CloudFormation details page for the stack appears. If you do not see it, on the sidebar, click **Stacks**, and then click the name of your stack.

b.   Check the status of the CloudFormation stack. A successful deployment will show a **CREATE\_COMPLETE** value for the **Status** field on the **Stack Info** tab on this stack's details page. The deployment can take several minutes.

c.   After a successful deployment, click the **Resources** tab on this stack's details page. Then click the **Physical ID** link next to **ApplicationLoadBalancer** on this tab.

d.   On the **EC2 > Load balancers > (Load balancer ID)** page that appears, copy the **DNS Name** value, which is shown as an **(A Record)** and ends with `.elb.amazonaws.com`.

* Note: You will use this **DNS Name** to replace the `<application-load-balancer-dns-name>` for the following healthcheck and data processing steps.

<img src="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Marketplace_Step11.png?fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=36a0ab8636ec5da6af4ee945ae59ccb3" alt="Unstructured API Endpoint" data-og-width="2786" width="2786" data-og-height="1816" height="1816" data-path="img/api/Marketplace_Step11.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Marketplace_Step11.png?w=280&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=63fa26993e9d658059abb1de5737b43b 280w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Marketplace_Step11.png?w=560&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=d8e58c781f0fadfa9f120f945102fa0f 560w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Marketplace_Step11.png?w=840&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=e2987dcbab24f68f4f84bef4dc7b5d00 840w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Marketplace_Step11.png?w=1100&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=433f799b0edda669ee82ede0cf9e0c95 1100w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Marketplace_Step11.png?w=1650&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=964ee0622f5d7b5ed90aedec731eb655 1650w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Marketplace_Step11.png?w=2500&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=1bf43e7bac6e04b3f671bb4440e3342b 2500w" />

## Healthcheck

Perform a health check by running this [curl](https://curl.se/) command from a terminal on your local machine, replacing `<application-load-balancer-dns-name>` with your application load balancer's DNS name. This health check can take several minutes:

```bash  theme={null}
curl http://<application-load-balancer-dns-name>/healthcheck

```

<img src="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/healthcheck.png?fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=087561beffb043745c3fde43ee00c282" alt="Healthcheck" data-og-width="1666" width="1666" data-og-height="64" height="64" data-path="img/api/healthcheck.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/healthcheck.png?w=280&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=84b1c0ae5b9d1379ce2795247d61676a 280w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/healthcheck.png?w=560&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=ffc65ca016d60c572cf0fab8e112c1a2 560w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/healthcheck.png?w=840&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=c4e63c9251131905af60de1d8de82b34 840w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/healthcheck.png?w=1100&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=c3d8efd6d28c0bb171ece27cc4fd2748 1100w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/healthcheck.png?w=1650&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=46581ba8675f08b14e0bfa77110cda2e 1650w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/healthcheck.png?w=2500&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=9e6c8fcee24c9d6b6d1b04febcff06de 2500w" />

## Data processing

For example, run one of the following, setting the following environment variables to make your code more portable:

* Set `UNSTRUCTURED_API_URL` to `http://`, followed by your load balancer's DNS name, followed by `/general/v0/general`.

  <Info>You can now use this value (`http://`, followed by your load balancer's DNS name, followed by `/general/v0/general`) in place of
  calling the [Unstructured Partition Endpoint](/api-reference/legacy-api/partition/overview) URL as described elsewhere in the Unstructured API documentation.</Info>

* Set `LOCAL_FILE_INPUT_DIR` to the path on your local machine to the files for the Unstructured API to process. If you do not have any input files available, you can download any of the ones from the [example-docs](https://github.com/Unstructured-IO/unstructured-ingest/tree/main/example-docs) folder in GitHub.

* Set `LOCAL_FILE_OUTPUT_DIR` to the path on your local machine for Unstructured API to send the processed output in JSON format:

<AccordionGroup>
  <Accordion title="Ingest CLI">
    You must first [install the Unstructured Ingest CLI](/open-source/ingestion/overview#unstructured-ingest-cli).

    Because you are calling a private API and therefore do not need an Unstructured API key, you can omit the command-line option `--api-key` Or, for better code portability, it is recommended that you first set the environment variable `UNSTRUCTURED_API_KEY` to an empty string and then include the command-line option `--api-key`.

    ```bash CLI theme={null}
    unstructured-ingest \
      local \
        --input-path $LOCAL_FILE_INPUT_DIR \
        --output-dir $LOCAL_FILE_OUTPUT_DIR \
        --partition-by-api \
        --api-key $UNSTRUCTURED_API_KEY \
        --partition-endpoint $UNSTRUCTURED_API_URL \
        --strategy hi_res \
        --additional-partition-args="{\"split_pdf_page\":\"true\", \"split_pdf_allow_failed\":\"true\", \"split_pdf_concurrency_level\": 15}"
    ```
  </Accordion>

  <Accordion title="Ingest Python library">
    You must first [install the Unstructured Ingest Python library](/open-source/ingestion/python-ingest).

    Because you are calling a private API and therefore do not need an Unstructured API key, you can omit the parameter `api_key`. Or, for better code portability, it is recommended that you first set the environment variable `UNSTRUCTURED_API_KEY` to an empty string and then include the parameter `api_key`.

    ```python Python Ingest theme={null}
    import os

    from unstructured_ingest.pipeline.pipeline import Pipeline
    from unstructured_ingest.interfaces import ProcessorConfig
    from unstructured_ingest.processes.connectors.local import (
        LocalIndexerConfig,
        LocalDownloaderConfig,
        LocalConnectionConfig,
        LocalUploaderConfig
    )
    from unstructured_ingest.processes.partitioner import PartitionerConfig

    if __name__ == "__main__":
        Pipeline.from_configs(
            context=ProcessorConfig(),
            indexer_config=LocalIndexerConfig(input_path=os.getenv("LOCAL_FILE_INPUT_DIR")),
            downloader_config=LocalDownloaderConfig(),
            source_connection_config=LocalConnectionConfig(),
            partitioner_config=PartitionerConfig(
                partition_by_api=True,
                api_key=os.getenv("UNSTRUCTURED_API_KEY"),
                partition_endpoint=os.getenv("UNSTRUCTURED_API_URL"),
                strategy="hi_res",
                additional_partition_args={
                    "split_pdf_page": True,
                    "split_pdf_allow_failed": True,
                    "split_pdf_concurrency_level": 15
                }
            ),
            uploader_config=LocalUploaderConfig(output_dir=os.getenv("LOCAL_FILE_OUTPUT_DIR"))
        ).run()
    ```
  </Accordion>
</AccordionGroup>

## Accessing the hosting EC2 instance

If you need to access the Amazon EC2 instance that hosts the Unstructured API, do the following:

1. In the CloudFormation console, open the details page for the stack from Part II. If you do not see it, on the CloudFormation console's sidebar, click **Stacks**, and then click the name of your stack.

2. Click the **Resources** tab on this stack's details page. Then click the **Physical ID** link next to **EC2TargetGroup** on this tab.

3. On the **EC2 > Target groups > (CloudFormation stack name)** page that appears, on the **Targets** tab, click the **Instance ID** link.

4. In the list of instances that appears, click the **Instance ID** link.

5. Click **Connect**, and then follow any of the on-screen options to access the EC2 instance.

## Manage related AWS account costs

After you run the CloudFormation stack that you created in Part II, charges will begin accruing to your AWS account on an ongoing basis for related AWS resources.
The amounts of these charges vary based on where these resource are located, which resources are covered by AWS Free Tier offerings, the extent to
which you customize these resources' settings, how much you use these resources, and other factors. Stopping or terminating
the related Amazon EC2 instances alone will not eliminate these ongoing charges.

To stop these charges from accruing,
[delete the CloudFormation stack](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-console-delete-stack.html)
that you created and ran in Part II. This stops and deletes all of the related AWS resources.

Before you delete the stack, note the following:

* You should click the **Resources** tab on the stack's details page to be aware of the associated resources that will be deleted.
* You should note any resource dependencies, resources with deletion protection or termination protection enabled, or nested stacks
  that might prevent stack deletion, and resolve these issues that could prevent stack deletion. To find these kinds of issues:

  * On the **Template** tab on the stack's details page, look for occurences of the `DependsOn` attribute, which are set to the name
    of the resource dependency.
  * On the **Template** tab on the stack's details page, look for occurences of the `DeletionPolicy` attribute set to `Retain` or the
    `UpdateReplacePolicy` attribute set to `Retain`. The associated resources have deletion protection enabled.
  * On the **Stack info** tab on the stack's details page, look for the **Termination protection** field. If it is set to **Activated**,
    termination protection is enabled.
  * On the **Resources** tab on the stack's details page, look for resources with their **Type** set to `AWS::CloudFormation::Stack`. These indicate nested stacks.

After you delete the stack, you should check your [AWS Billing and Cost Management dashboard](https://docs.aws.amazon.com/awsconsolehelpdocs/latest/gsg/billinginfo.html) to confirm that associated charges are no longer accruing.

---

# Source: https://docs.unstructured.io/ui/destinations/azure-ai-search.md

# Source: https://docs.unstructured.io/open-source/ingestion/destination-connectors/azure-ai-search.md

# Source: https://docs.unstructured.io/api-reference/workflow/destinations/azure-ai-search.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Azure AI Search

<Note>
  If you're new to Unstructured, read this note first.

  Before you can create a destination connector, you must first sign in to your Unstructured account:

  * If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
    After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).
    To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/api-reference/overview#pricing).
  * If you already have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at
    [https://platform.unstructured.io](https://platform.unstructured.io). For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
    or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  After you sign in, the [Unstructured user interface](/ui/overview) (UI) appears, which you use to get your Unstructured API key.

  1. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business** account, click **API Keys** on the sidebar.<br />

     For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
     for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).

  2. Click **Generate API Key**.<br />

  3. Follow the on-screen instructions to finish generating the key.<br />

  4. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

  After you create the destination connector, add it along with a
  [source connector](/api-reference/workflow/sources/overview) to a [workflow](/api-reference/workflow/overview#workflows).
  Then run the worklow as a [job](/api-reference/workflow/overview#jobs). To learn how, try out the
  the notebook [Dropbox-To-Pinecone Connector API Quickstart for Unstructured](https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Dropbox_To_Pinecone_Connector_Quickstart.ipynb),
  or watch the two 4-minute video tutorials for the [Unstructured Python SDK](/api-reference/workflow/overview#unstructured-python-sdk).

  You can also create destination connectors with the Unstructured user interface (UI).
  [Learn how](/ui/destinations/overview).

  If you need help, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  You are now ready to start creating a destination connector! Keep reading to learn how.
</Note>

Send processed data from Unstructured to Azure AI Search.

The requirements are as follows.

The following video shows how to fulfill the minimum set of Azure AI Search requirements:

<iframe width="560" height="315" src="https://www.youtube.com/embed/6ZjU5OupWE8" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

Here are some more details about these requirements:

* The endpoint and API key for Azure AI Search. [Create an endpoint and API key](https://learn.microsoft.com/azure/search/search-create-service-portal).
* The name of the index in Azure AI Search. [Create an index](https://learn.microsoft.com/rest/api/searchservice/create-index).

  <iframe width="560" height="315" src="https://www.youtube.com/embed/WY8h8Gtyo7o" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

  The Azure AI Search index that you use must have an index schema that is compatible with the schema of the documents
  that Unstructured produces for you. Unstructured cannot provide a schema that is guaranteed to work in all
  circumstances. This is because these schemas will vary based on your source files' types; how you
  want Unstructured to partition, chunk, and generate embeddings; any custom post-processing code that you run; and other factors.

  You can adapt the following index schema example for your own needs. Be sure to replace `<number-of-dimensions>`
  (in three locations in the following example) with the number of dimensions of the embedding model you are using:

  ```json  theme={null}
  {
    "name": "elements-index",
    "fields": [
      {
        "name": "id",
        "type": "Edm.String",
        "key": true
      },
      {
        "name": "record_id",
        "type": "Edm.String",
        "filterable": true
      },
      {
        "name": "element_id",
        "type": "Edm.String"
      },
      {
        "name": "text",
        "type": "Edm.String",
        "searchable": true
      },
      {
        "name": "type",
        "type": "Edm.String"
      },
      {
        "name": "metadata",
        "type": "Edm.ComplexType",
        "fields": [
          {
            "name": "orig_elements",
            "type": "Edm.String"
          },
          {
            "name": "category_depth",
            "type": "Edm.Int32"
          },
          {
            "name": "parent_id",
            "type": "Edm.String"
          },
          {
            "name": "attached_to_filename",
            "type": "Edm.String"
          },
          {
            "name": "filetype",
            "type": "Edm.String"
          },
          {
            "name": "last_modified",
            "type": "Edm.DateTimeOffset"
          },
          {
            "name": "is_continuation",
            "type": "Edm.Boolean"
          },
          {
            "name": "file_directory",
            "type": "Edm.String"
          },
          {
            "name": "filename",
            "type": "Edm.String"
          },
          {
            "name": "data_source",
            "type": "Edm.ComplexType",
            "fields": [
              {
                "name": "url",
                "type": "Edm.String"
              },
              {
                "name": "version",
                "type": "Edm.String"
              },
              {
                "name": "date_created",
                "type": "Edm.DateTimeOffset"
              },
              {
                "name": "date_modified",
                "type": "Edm.DateTimeOffset"
              },
              {
                "name": "date_processed",
                "type": "Edm.DateTimeOffset"
              },
              {
                "name": "permissions_data",
                "type": "Edm.String"
              },
              {
                "name": "record_locator",
                "type": "Edm.String"
              }
            ]
          },
          {
            "name": "coordinates",
            "type": "Edm.ComplexType",
            "fields": [
              {
                "name": "system",
                "type": "Edm.String"
              },
              {
                "name": "layout_width",
                "type": "Edm.Double"
              },
              {
                "name": "layout_height",
                "type": "Edm.Double"
              },
              {
                "name": "points",
                "type": "Edm.String"
              }
            ]
          },
          {
            "name": "languages",
            "type": "Collection(Edm.String)"
          },
          {
            "name": "page_number",
            "type": "Edm.String"
          },
          {
            "name": "links",
            "type": "Collection(Edm.String)"
          },
          {
            "name": "page_name",
            "type": "Edm.String"
          },
          {
            "name": "link_urls",
            "type": "Collection(Edm.String)"
          },
          {
            "name": "link_texts",
            "type": "Collection(Edm.String)"
          },
          {
            "name": "sent_from",
            "type": "Collection(Edm.String)"
          },
          {
            "name": "sent_to",
            "type": "Collection(Edm.String)"
          },
          {
            "name": "subject",
            "type": "Edm.String"
          },
          {
            "name": "section",
            "type": "Edm.String"
          },
          {
            "name": "header_footer_type",
            "type": "Edm.String"
          },
          {
            "name": "emphasized_text_contents",
            "type": "Collection(Edm.String)"
          },
          {
            "name": "emphasized_text_tags",
            "type": "Collection(Edm.String)"
          },
          {
            "name": "text_as_html",
            "type": "Edm.String"
          },
          {
            "name": "regex_metadata",
            "type": "Edm.String"
          },
          {
            "name": "detection_class_prob",
            "type": "Edm.Double"
          }
        ]
      },
      {
        "name": "embeddings",
        "type": "Collection(Edm.Single)",
        "dimensions": <number-of-dimensions>,
        "vectorSearchProfile": "embeddings-config-profile"
      }
    ],
    "vectorSearch": {
      "algorithms": [
        {
          "name": "hnsw-<number-of-dimensions>",
          "kind": "hnsw",
          "hnswParameters": {
            "m": 4,
            "efConstruction": 400,
            "efSearch": 500,
            "metric": "cosine"
          }
        }
      ],
      "profiles": [
        {
          "name": "embeddings-config-profile",
          "algorithm": "hnsw-<number-of-dimensions>"
        }
      ]
    },
    "semantic": {
      "configurations": [
        {
          "name": "default-semantic-config",
          "prioritizedFields": {
            "titleField": null,
            "prioritizedContentFields": [
              { "fieldName": "text" }
            ],
            "prioritizedKeywordsFields": []
          }
        }
      ]
    }
  }
  ```

  See also:

  * [Search indexes in Azure AI Search](https://learn.microsoft.com/azure/search/search-what-is-an-index)
  * [Schema of a search index](https://learn.microsoft.com/azure/search/search-what-is-an-index#schema-of-a-search-index)
  * [Example index schema](https://learn.microsoft.com/rest/api/searchservice/create-index#examples)
  * [Unstructured document elements and metadata](/api-reference/legacy-api/partition/document-elements)

To create an Azure AI Search destination connector, see the following examples.

<CodeGroup>
  ```python Python SDK theme={null}
  import os

  from unstructured_client import UnstructuredClient
  from unstructured_client.models.operations import CreateDestinationRequest
  from unstructured_client.models.shared import CreateDestinationConnector

  with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
      response = client.destinations.create_destination(
          request=CreateDestinationRequest(
              create_destination_connector=CreateDestinationConnector(
                  name="<name>",
                  type="azure_ai_search",
                  config={
                      "endpoint": "<endpoint>",
                      "index": "<index>",
                      "key": "<azure-ai-search-key>"
                  }
              )
          )
      )

      print(response.destination_connector_information)
  ```

  ```bash curl theme={null}
  curl --request 'POST' --location \
  "$UNSTRUCTURED_API_URL/destinations" \
  --header 'accept: application/json' \
  --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
  --header 'content-type: application/json' \
  --data \
  '{
      "name": "<name>",
      "type": "azure_ai_search",
      "config": {
          "endpoint": "<endpoint>",
          "index": "<index>",
          "azure_ai_search_key": "<azure-ai-search-key>"
      }
  }'
  ```
</CodeGroup>

Replace the preceding placeholders as follows:

* `<name>` (required) - A unique name for this connector.
* `<endpoint>` (required) - The endpoint URL for Azure AI Search.
* `<index>` (required) - The name of the index for Azure AI Search.
* `<azure-ai-search-key>` (required) - The API key for Azure AI Search.

---

# Source: https://docs.unstructured.io/ui/sources/azure-blob-storage.md

# Source: https://docs.unstructured.io/ui/destinations/azure-blob-storage.md

# Source: https://docs.unstructured.io/api-reference/workflow/sources/azure-blob-storage.md

# Source: https://docs.unstructured.io/api-reference/workflow/destinations/azure-blob-storage.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Azure Blob Storage

<Note>
  If you're new to Unstructured, read this note first.

  Before you can create a destination connector, you must first sign in to your Unstructured account:

  * If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
    After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).
    To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/api-reference/overview#pricing).
  * If you already have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at
    [https://platform.unstructured.io](https://platform.unstructured.io). For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
    or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  After you sign in, the [Unstructured user interface](/ui/overview) (UI) appears, which you use to get your Unstructured API key.

  1. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business** account, click **API Keys** on the sidebar.<br />

     For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
     for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).

  2. Click **Generate API Key**.<br />

  3. Follow the on-screen instructions to finish generating the key.<br />

  4. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

  After you create the destination connector, add it along with a
  [source connector](/api-reference/workflow/sources/overview) to a [workflow](/api-reference/workflow/overview#workflows).
  Then run the worklow as a [job](/api-reference/workflow/overview#jobs). To learn how, try out the
  the notebook [Dropbox-To-Pinecone Connector API Quickstart for Unstructured](https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Dropbox_To_Pinecone_Connector_Quickstart.ipynb),
  or watch the two 4-minute video tutorials for the [Unstructured Python SDK](/api-reference/workflow/overview#unstructured-python-sdk).

  You can also create destination connectors with the Unstructured user interface (UI).
  [Learn how](/ui/destinations/overview).

  If you need help, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  You are now ready to start creating a destination connector! Keep reading to learn how.
</Note>

Send processed data from Unstructured to Azure Blob Storage.

The requirements are as follows.

The following video shows how to fulfill the minimum set of Azure Storage account requirements:

<iframe width="560" height="315" src="https://www.youtube.com/embed/Vl3KCphlh9Y" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

<Note>
  If you are generating an SAS token as shown in the preceding video, be sure to set the following permissions:

  * **Read** and **List** for reading from the container only.
  * **Write** and **List** for writing to the container only.
  * **Read**, **Write**, and **List** for both reading from and writing to the container.
</Note>

Here are some more details about these requirements:

* An Azure account. To create one, [learn how](https://azure.microsoft.com/pricing/purchase-options/azure-account).

  <iframe width="560" height="315" src="https://www.youtube.com/embed/2bQ6WiJ1ncA" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* An Azure Storage account, and a container within that account. [Create a storage account](https://learn.microsoft.com/azure/storage/common/storage-account-create). [Create a container](https://learn.microsoft.com/azure/storage/blobs/blob-containers-portal).

  <iframe width="560" height="315" src="https://www.youtube.com/embed/AhuNgBafmUo" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

  <iframe width="560" height="315" src="https://www.youtube.com/embed/xmndjYnGvcs" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* The Azure Storage remote URL, using the format `az://<container-name>/<path/to/file/or/folder/in/container/as/needed>`

  For example, if your container is named `my-container`, and there is a folder in the container named `my-folder`, the
  Azure Storage remote URL would be `az://my-container/my-folder/`.

* An SAS token (recommended), access key, or connection string for the Azure Storage account.  [Create an SAS token (recommended)](https://learn.microsoft.com/azure/ai-services/translator/document-translation/how-to-guides/create-sas-tokens). [Get an access key](https://learn.microsoft.com/azure/storage/common/storage-account-keys-manage#view-account-access-keys). [Get a connection string](https://learn.microsoft.com/azure/storage/common/storage-configure-connection-string#configure-a-connection-string-for-an-azure-storage-account).

  Create an SAS token (recommended):

  <iframe width="560" height="315" src="https://www.youtube.com/embed/X6cmJ2IbVzo?start=240&end=370" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

  Get an access key or connection string:

  <iframe width="560" height="315" src="https://www.youtube.com/embed/muMmcwVfFqs" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

To create an Azure Blob Storage destination connector, see the following examples.

<CodeGroup>
  ```python Python SDK theme={null}
  import os

  from unstructured_client import UnstructuredClient
  from unstructured_client.models.operations import CreateDestinationRequest
  from unstructured_client.models.shared import CreateDestinationConnector

  with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
      response = client.sources.create_destination(
          request=CreateDestinationRequest(
              create_destination_connector=CreateDestinationConnector(
                  name="<name>",
                  type="azure",
                  config={
                      "remote_url": "az://<container-name>/<path/to/file/or/folder>",
                      
                      # For SAS token authentication:
                      # "account_name": "<account-name>",
                      # "sas_token": "<sas-token>",

                      # For account key authentication:
                      # "account_name": "<account-name>",
                      # "account_key": "<account-key>",

                      # For connection string authentication:
                      # "connection_string": "<connection-string>"
                  }
              )
          )
      )

      print(response.destination_connector_information)
  ```

  ```bash curl theme={null}
  curl --request 'POST' --location \
  "$UNSTRUCTURED_API_URL/destinations" \
  --header 'accept: application/json' \
  --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
  --header 'content-type: application/json' \
  --data \
  '{
      "name": "<name>",
      "type": "azure",
      "config": {
          "remote_url": "az://<container-name>/<path/to/file/or/folder>",
          
          # For SAS token authentication:
          "account_name": "<account-name>",
          "sas_token": "<sas-token>"

          # For account key authentication:
          "account_name": "<account-name>",
          "account_key": "<account-key>"

          # For connection string authentication:
          "connection_string": "<connection-string>"
      }
  }'
  ```
</CodeGroup>

Replace the preceding placeholders as follows:

* `<name>` (*required*) - A unique name for this connector.

* `az://<container-name>/<path/to/file/or/folder>`  (*required*) - The Azure Storage remote URL, with the format `az://<container-name>/<path/to/file/or/folder/in/container/as/needed>`

  For example, if your container is named `my-container`, and there is a folder in the container named `my-folder`, the
  Azure Storage remote URL would be `az://my-container/my-folder/`.

* `<account-name>`  (*required* for SAS token authentication and account key authentication) - The Azure Storage account name.

* `<sas-token>` - For SAS token authentication, the SAS token for the Azure Storage account (*required*).

* `<account-key>` - For account key authentication, the key for the Azure Storage account (*required*).

* `<connection-string>` - For connection string authentication, the connection string for the Azure Storage account (*required*).

* For `recursive` (source connector only), set to `true` to recursively access files from subfolders within the container. The default is `false` if not otherwise specified.

---

# Source: https://docs.unstructured.io/examplecode/tools/azure-storage-events.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Azure Blob Storage event triggers

You can use Azure Blob Storage events, such as adding new files to—or updating existing files within—Azure Blob Storage containers, to automatically run Unstructured ETL+ workflows
that rely on those containers as sources. This enables a no-touch approach to having Unstructured automatically process new and updated files in Azure Blob Storage containers as they are added or updated.

This example shows how to automate this process by adding a custom [Azure Function](https://learn.microsoft.com/azure/azure-functions/functions-overview) app to your Azure account. This function app runs
a function whenever a new or updated file is detected in the specified Azure Blob Storage container. This function then calls the [Unstructured API's workflow operations](/api-reference/workflow/overview) to automatically run the
specified corresponding Unstructured ETL+ workflow within your Unstructured account.

<Note>
  This example uses a custom Azure function that you create and maintain.
  Any issues with file detection, timing, or function invocation could be related to your custom function,
  rather than with Unstructured. If you are getting unexpected or no results, be sure to check your custom
  function's invocation traces first for any informational and error messages.
</Note>

## Requirements

To use this example, you will need the following:

* An Unstructured account, and an Unstructured API key for your account, as follows:

  1. If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
     After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).

     <Note>
       To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/api-reference/overview#pricing).
     </Note>

  2. If you have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at [https://platform.unstructured.io](https://platform.unstructured.io).

     <Note>
       For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
       or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).
     </Note>

  3. Get your Unstructured API key:<br />

     a. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business** account, click **API Keys** on the sidebar.<br />

     <Note>
       For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
       for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).
     </Note>

     b. Click **Generate API Key**.<br />
     c. Follow the on-screen instructions to finish generating the key.<br />
     d. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

* The Unstructured API's workflow operations URL for your account, as follows:

  1. In the Unstructured UI, click **API Keys** on the sidebar.<br />
  2. Note the value of the **Unstructured API's workflow operations** field.

## Step 1: Create an Azure Function App

1. Sign in to your [Azure portal](https://portal.azure.com).

2. Click **+ Create a resource**.

   If **Function App** is not visible, in **Search services and marketplace** field, enter **Function App**.

3. Next to **Function App**, click **Create** or **Create > Function App**.

4. Under **Select a hosting option**, select the radio button next to **Consumption** to create an app that is most compatible with JavaScript.

5. Click **Select**.

6. On the **Basics** tab, set the following function app settings:

   | Setting               | Suggested value           | Description                                                                                                                                                                                                                                                                                                                                                                                                                  |
   | --------------------- | ------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
   | **Subscription**      | Your subscription         | The Azure subscription within which to create your new function app.                                                                                                                                                                                                                                                                                                                                                         |
   | **Resource Group**    | **Create new**            | After you click **Create new**, enter some name for the new resource group within which to create your new function app. You should create a new resource group because there are known limitations when creating new function apps in an existing resource group. [Learn more](https://learn.microsoft.com/azure/azure-functions/functions-scale#limitations-for-creating-new-function-apps-in-an-existing-resource-group). |
   | **Function App name** | Some globally unique name | Some name that identifies your new function app. Valid characters are `a`-`z` (case insensitive), `0`-`9`, and `-`.                                                                                                                                                                                                                                                                                                          |
   | **Operating System**  | **Windows**               | Choose the operating system for your function app. This example uses Windows.                                                                                                                                                                                                                                                                                                                                                |
   | **Runtime stack**     | **Node.js**               | Choose a runtime that supports your favorite function programming language. This example uses JavaScript (Node.js).                                                                                                                                                                                                                                                                                                          |
   | **Version**           | **20 LTS**                | Choose the version of your selected runtime. This example uses Node.js 20 LTS.                                                                                                                                                                                                                                                                                                                                               |
   | **Region**            | Your preferred region     | Select a region that's near you or near other services that your function can access.                                                                                                                                                                                                                                                                                                                                        |

7. Click **Review + create**.

8. Click **Create**, and wait for the deployment to complete.

9. After the deployment is complete, click **Go to resource**.

## Step 2: Create a function

1. With the function app open from the previous step, on the sidebar, click **Overview**.

2. On the **Functions** tab, under **Create in Azure portal**, click **Create function**.

3. For **Select a template**, select **Azure Blob Storage trigger**, and then click **Next**.

4. For **Template details**, review the following values:

   | Setting                        | Suggested value            | Description                                                                                                |
   | ------------------------------ | -------------------------- | ---------------------------------------------------------------------------------------------------------- |
   | **Function name**              | `BlobTrigger1`             | The name of the function to create. You can leave the default function name.                               |
   | **Path**                       | `samples-workitems/{name}` | The path to the Azure Blob Storage account that the function will monitor. You can leave the default path. |
   | **Storage account connection** | `AzureWebJobsStorage`      | You can leave the default storage account connection name.                                                 |

   <img src="https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/tools/azure-function.png?fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=5fe593588f90a99a0ba61e64e2e1f742" alt="Azure Function template details" data-og-width="830" width="830" data-og-height="313" height="313" data-path="img/tools/azure-function.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/tools/azure-function.png?w=280&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=3f0319ba494461255fec5fa0832c0c36 280w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/tools/azure-function.png?w=560&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=498ba2dc225d477a0df4a7cd673220ea 560w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/tools/azure-function.png?w=840&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=7f7a53d35a057716fb1a880e9a08be13 840w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/tools/azure-function.png?w=1100&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=114738587b0f0f269a73af22808cba7a 1100w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/tools/azure-function.png?w=1650&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=950092b5d2b49ff46b953f4baa988830 1650w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/tools/azure-function.png?w=2500&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=ab5240ba6ab12f1fd48269de75ed76d7 2500w" />

5. Click **Create**. The function is created, and the **Code + Test** page appears.

## Step 3: Customize the function for your workflow

1. With the **Code + Test** page open from the previous step, on the **Code + Test** tab, replace the content of the `index.js` file with the following code:

   ```javascript  theme={null}
   module.exports = async function (context, myBlob) {
       context.log("JavaScript blob trigger function processed blob \n Blob:", context.bindingData.blobTrigger, "\n Blob Size:", myBlob.length, "Bytes");

       const apiKey = process.env.UNSTRUCTURED_API_KEY;
       const apiUrl = process.env.UNSTRUCTURED_API_URL;
       const headers = {
           "accept": "application/json",
           "unstructured-api-key": apiKey
       };

       try {
           const response = await fetch(apiUrl, {
               method: "POST",
               headers: headers
           });

           const data = await response.json();
           context.log("POST response:", data);
       } catch (error) {
           context.log.error("Error calling external API:", error);
       }
   };
   ```

2. Click **Save**.

3. In the navigation breadcrumb toward the top of the page, click your function app's name. The function app's settings page appears.

4. In the sidebar, expand **Settings**, and then click **Environment variables**.

5. Click **+ Add**.

6. For **Name**, enter `UNSTRUCTURED_API_URL`.

7. For **Value**, enter your `<unstructured-api-url>/workflows/<workflow-id>/run`, and replace the following placeholders:

   * Replace `<unstructured-api-url>` with your Unstructured API's workflow operations value.
   * Replace `<workflow-id>` with the ID of your Unstructured workflow. For now, because the workflow does not yet exist, enter some fictitious value, such as `1234567890`. You will
     update this value later in Step 6 after you create the workflow.

   The **Value** should now look similar to the following:

   ```text  theme={null}
   https://platform.unstructuredapp.io/api/v1/workflows/1234567890/run
   ```

8. Click **Apply**.

9. Click **+ Add** again.

10. For **Name**, enter `UNSTRUCTURED_API_KEY`.

11. For **Value**, enter your Unstructured API key value.

12. Click **Apply**.

13. Click **Apply** again, and then click **Confirm**.

## Step 4: Create the Azure Storage container

1. With the function app's settings page open from the previous step, in the sidebar, click **Overview**.
2. Expand **Essentials**.
3. Next to **Resource group**, click the resource group link. The resource group's settings page appears.
4. In the sidebar, click **Overview**.
5. On the **Resources** tab, click the link next to **Storage account**. The storage account's settings page appears.
6. In the sidebar, click **Overview**.
7. On the **Properties** tab, click **Blob service**.
8. Click **Add container**.
9. For **Name**, enter `samples-workitems`. This name must match the container name that you specified earlier in the **Path** field (not including `/{name}`) in Step 2.
10. Click **Create**.

## Step 5: Create the Unstructured workflow

1. Create an Azure Blob Storage source connector in your Unstructured account. [Learn how](/ui/sources/azure-blob-storage). This source connector must reference
   the Azure Storage container that you created earlier in Step 4.
2. Create a new—or identify an existing—[destination connector](/ui/destinations/overview) in your Unstructured account.
3. Create a workflow that uses the preceding source and destination connectors. [Learn how](/ui/workflows).

## Step 6: Add the workflow's ID to the function's environment variables

1. Note the ID of the workflow that you created earlier in Step 5.
2. In the Azure portal, with the storage account's settings page open from Step 4, in the navigation breadcrumb toward the top of the page, click your resource group's name. The resource group's settings page appears.
3. On the **Resources** tab, click the link next to **Function App**. The function app's settings page appears.
4. In the sidebar, expand **Settings**, and then click **Environment variables**.
5. Click `UNSTRUCTURED_API_URL`.
6. Click the eyball (**Reveal password**) icon.
7. Replace the fictitious workflow ID from earlier in Step 3 with the ID of the workflow that you created earlier in Step 5.
8. Click **Apply**.
9. Click **Apply** again, and then click **Confirm**.

## Step 7: Trigger the function

1. With the function app’s settings page open from the previous step, in the navigation breadcrumb toward the top of the page, click your resource group's name. The resource group's settings page appears.
2. On the **Resources** tab, click the link next to **Storage account**. The storage account's settings page appears.
3. In the sidebar, click **Overview**.
4. On the **Properties** tab, click **Blob service**.
5. Click the **samples-workitems** link.
6. Click **Upload**, and follow the on-screen instructions to upload a file to the container.

   <Note>
     If you are unable to upload a file to the container, click **Access Control (IAM)** in the sidebar and add an appropriate role assignment that
     enables uploading files to the container, such as **Storage Blob Data Owner**. [Learn how](https://learn.microsoft.com/azure/storage/blobs/assign-azure-role-data-access?tabs=portal)
   </Note>

## Step 8: View trigger results

1. In the Unstructured user interface for your account, click **Jobs** on the sidebar.
2. In the list of jobs, click the newly running job for your workflow.
3. After the job status shows **Finished**, go to your destination location to see the results.

## Step 9 (Optional): Delete the Azure resource group

If you are done with this example and do not want to keep the resource group in your account, you can permanently delete it as follows:

1. In the Azure portal, with the storage account's settings page open from Step 7, in the navigation breadcrumb toward the top of the page, click your resource group's name. The resource group's settings page appears.
2. Click **Delete resource group**.
3. Enter the resource group's name, and then click **Delete**. The resource group, along with the function app, storage account, and other related resources, are permanently deleted.

---

# Source: https://docs.unstructured.io/open-source/ingestion/source-connectors/azure.md

# Source: https://docs.unstructured.io/open-source/ingestion/destination-connectors/azure.md

# Source: https://docs.unstructured.io/api-reference/legacy-api/azure.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Unstructured API on Azure

<Warning>
  The Unstructured API on Azure is deprecated. It is no longer supported and is not being actively updated.
  Unstructured is now available on the Azure Marketplace as a private offering. To explore supported options
  for running Unstructured within your virtual private cloud (VPC), email Unstructured Sales at
  [sales@unstructured.io](mailto:sales@unstructured.io).

  This page is not being actively updated. It might contain out-of-date information. This page is provided for legacy reference purposes only.
</Warning>

Follow these steps to deploy the Unstructured API service into your Azure account.

<Warning>
  This article describes how to create several interrelated resources in your Azure account.
  Your Azure account will be charged on an ongoing basis for these resources, even if you are not actively using them.<br /><br />
  Manually shutting down the associated Azure virtual machine when you are not using it can help reduce—but not fully eliminate—these ongoing charges.<br /><br />
  To stop accruing all related ongoing charges, you must delete all of the associated Azure resources.
</Warning>

<Steps>
  <Step title=" Log in to the Azure Portal">
    Go to [https://portal.azure.com](https://portal.azure.com/).
  </Step>

  <Step title="Access the Azure Marketplace">
    Go to the [Unstructured Data Preprocessing - Customer Hosted API](https://azuremarketplace.microsoft.com/en-us/marketplace/apps/unstructured1691024866136.customer_api_v1?tab=Overview/) offering in the Azure Marketplace.

        <img src="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step2.png?fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=0a3e2db8d53ba51c958e4dc949c20cde" alt="Azure Marketplace" data-og-width="2670" width="2670" data-og-height="1296" height="1296" data-path="img/api/Azure_Step2.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step2.png?w=280&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=6f7efa9ae6767b75f68218b5ef1dd630 280w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step2.png?w=560&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=92be6010eba9a5abb39e2f0e1b7dabb3 560w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step2.png?w=840&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=c5f8c0739566729b61e5b4dffef94bfd 840w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step2.png?w=1100&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=5f21c0952314e92224c85a642cd7c7a8 1100w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step2.png?w=1650&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=c77ba9877dac4f4862b4cfbd9cefabab 1650w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step2.png?w=2500&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=1ff31f700973c0fef520e8a143b94afe 2500w" />
  </Step>

  <Step title="Start the deployment process">
    1. Click **Get It Now** and fill out the form.
    2. Read the terms and click **Continue**.
    3. Click **Create**.

        <img src="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step3.png?fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=d489b4c86ae06d70addcf1ade2439ba9" alt="Deployment Process" data-og-width="2086" width="2086" data-og-height="1562" height="1562" data-path="img/api/Azure_Step3.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step3.png?w=280&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=6afb5a0dbb4972113d8137ed5f04a235 280w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step3.png?w=560&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=ef15d7b5b7d2945759ce5df7a390a0f8 560w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step3.png?w=840&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=a2f9cc48988aae622c7c530e01ad0505 840w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step3.png?w=1100&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=eaae11ec129c0c660f7959319373cded 1100w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step3.png?w=1650&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=e7398c4fc3d4ee8f08f09053e648805d 1650w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step3.png?w=2500&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=be7b308bf5bdcbabe592747ff6e9e8ea 2500w" />
  </Step>

  <Step title=" Configure the deployment options">
    1. On the **Create a virtual machine** page, click the **Basics** tab.

    2. In the **Project details** section, select an existing **Subscription**, and select an **Resource group** from the dropdown menus, or create a new resource group by clicking **Create new**.

       <img src="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step4a.png?fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=5943b437891c592174aa209b92ac5840" alt="project details" data-og-width="1562" width="1562" data-og-height="816" height="816" data-path="img/api/Azure_Step4a.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step4a.png?w=280&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=5ee661e0af2e74732ecb4a15f56d3bce 280w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step4a.png?w=560&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=584704082e7ca54b307091fad318537e 560w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step4a.png?w=840&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=ea2c8dd370abfb5c9774a3c773b3fef8 840w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step4a.png?w=1100&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=2245c81dc4bd6c6f567daee3fb984bef 1100w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step4a.png?w=1650&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=f09b2124260373d8fabe20ed3fda58dc 1650w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step4a.png?w=2500&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=0a535c05663df89a279b09323427f580 2500w" />

    3. In the **Instance details** section, enter a name in the **Virtual machine name** field. Note this name, as you will need it later steps.

    4. Select a **Region** from the dropdown menu.

    5. For **Image**, select **Unstructured Customer Hosted API Hourly - x64 Gen2** (*default*).

    6. For **Size**, select a VM size from the dropdown menu, or leave the default VM size selection. To learn more, see [Azure VM comparisons](https://azure.microsoft.com/en-us/pricing/details/virtual-machines/linux/).

       <img src="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step4b.png?fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=4ed604cb032b6bfe3cd3c0d41a547221" alt="instance details" data-og-width="1562" width="1562" data-og-height="1088" height="1088" data-path="img/api/Azure_Step4b.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step4b.png?w=280&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=d0bbe497563bd738597647efba43bb5b 280w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step4b.png?w=560&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=65e1dbaaf3c7f2c28c7d87d5cc71bf56 560w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step4b.png?w=840&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=99405467dd8c0c8eadeee00396eccc91 840w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step4b.png?w=1100&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=6cf9db66b928d3174d3ff1abe575431b 1100w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step4b.png?w=1650&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=be5827cfe4daadb904e6b27243319d85 1650w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step4b.png?w=2500&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=885fa48808543212624710cc54df3daf 2500w" />

    7. In the **Administrator account** section, for **Authentication type**, select **SSH public key** or **Password**.

    8. Enter the credential settings, depending on the authentication type.

       <img src="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step4c.png?fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=d76f2bdbce10adcd7fd94c2e971ace8b" alt="administrator account" data-og-width="1562" width="1562" data-og-height="604" height="604" data-path="img/api/Azure_Step4c.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step4c.png?w=280&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=2871192d25075ea4e3479a652eb1f140 280w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step4c.png?w=560&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=5f6d30999cde5a64039a4b7dbf81ce1b 560w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step4c.png?w=840&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=8fcb9a18f9ed804c30fed47585245905 840w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step4c.png?w=1100&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=0ce6e4402a2fc7502919a23c9e31f417 1100w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step4c.png?w=1650&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=e1580b551d20425d32545c87730f88e2 1650w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step4c.png?w=2500&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=bd783d79ce8a37a72366075fd5ec9451 2500w" />

    <Note>Do not click **Review + create** yet. You must still set up the load balancer.</Note>
  </Step>

  <Step title="Set up the load balancer">
    1. Click the **Networking** tab.

    2. In the **Networking interface** section, fill out the following fields:

       * **Virtual network**: Click **Create new**, or select an existing virtual network from the dropdown menu. To learn more, see [Quickstart: Use the Azure portal to create a virtual network](https://learn.microsoft.com/en-us/azure/virtual-network/quick-create-portal).
       * **Subnet**: Click **Manage subnet configuration**, or select a new or existing subnet from the dropdown menu. To learn more, see [Add, change, or delete a virtual network subnet](https://learn.microsoft.com/en-us/azure/virtual-network/virtual-network-manage-subnet?tabs=azure-portal).
       * **Configure network security group**: Click **Create new**, or select an existing security group from the dropdown menu. To learn more, see [Create, change, or delete a network security group](https://learn.microsoft.com/en-us/azure/virtual-network/manage-network-security-group?tabs=network-security-group-portal).

    3. In the **Load balancing** section, fill out the following fields:

       * **Load balancing options**: Select **Azure load balancer**.
       * **Select a load balancer**: Click **Create a load balancer** and fill out the following fields in the pop-up window, or select an existing load balancer from the dropdown menu and note this name as you will need it in later steps:

         * Enter a **Load balancer name**. Note this name, as you will need it in later steps.
         * For **Type**, select **Public** or **Internal**.
         * For **Protocol**, select **TCP** or **UDP**.
         * Set both **Port** and **Backend port** to **80**.

       <img src="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step5.png?fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=7e843686ab229bfd84d905a8f41d4195" alt="load balancer" data-og-width="3042" width="3042" data-og-height="1884" height="1884" data-path="img/api/Azure_Step5.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step5.png?w=280&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=8be78fe7320fd044bd4c9208d5172ffb 280w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step5.png?w=560&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=8300291aeb7945cb8f5a87daaa102170 560w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step5.png?w=840&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=3e925f0e53476ad26e4cba0b107e1bc3 840w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step5.png?w=1100&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=0bb7d0e0097077fb6543438a5428fc39 1100w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step5.png?w=1650&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=b73c81bc599d002965cb9c1a4e55a89e 1650w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step5.png?w=2500&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=615b6419b759380c3658b5580b908cbc 2500w" />

    4. Click **Create**.
  </Step>

  <Step title="Finalize and deploy">
    1. Click **Review + create**.
    2. Wait for validation.
    3. Click **Create**.

       <img src="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step6.png?fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=6c24eb1cf3cb32ad7340ffe936c487a6" alt="deployment" data-og-width="3042" width="3042" data-og-height="1360" height="1360" data-path="img/api/Azure_Step6.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step6.png?w=280&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=60fd1a912cdedb678eb3bf9f0797fd48 280w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step6.png?w=560&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=3f1ca1d5f45959eff0449c15aac6b4ea 560w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step6.png?w=840&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=e1f1e55d7bfc02ec8fcf5fca9b440ed6 840w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step6.png?w=1100&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=a8496d0b651e5c3d107b0b030eb79570 1100w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step6.png?w=1650&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=b4ea2d4a4ec3ead4b534f0bd7999a7d7 1650w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step6.png?w=2500&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=5fb85b5b9f2593cee7c559f8568cfd29 2500w" />
  </Step>

  <Step title="Post-deployment: additional load balancer configuration">
    1. Go to your load balancer: in the Azure portal's **Search resources, services and docs** box, enter and then select **Load balancers**.
    2. Search for and open the new load balancer that you created earlier, or the existing load balancer that you chose earlier.
    3. Make any necessary settings updates to your new or existing load balancer, based on the recommended configurations in the [Load balancer network settings](#load-balancer-network-settings) section, later on this page.
  </Step>

  <Step title="Post-deployment: get the deployed endpoint URL">
    1. Go to your virtual machine: in the Azure portal's **Search resources, services and docs** box, enter and then select **Virtual machines**.

    2. Search for and open the new virtual machine that you created earlier, using the name that you entered earlier.

    3. On the **Overview** tab, under **Properties**, note the **Public IP address** for the **Load balancer**.

       <img src="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step7.png?fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=d01980746fd32ef48d0a7ac9bc360ae9" alt="retrieve public ip" data-og-width="3042" width="3042" data-og-height="1894" height="1894" data-path="img/api/Azure_Step7.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step7.png?w=280&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=1be2cd3b01cd086484e311588bf1e7fe 280w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step7.png?w=560&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=49dc20f0ae42f96449e64a662861f449 560w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step7.png?w=840&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=5dc1916895b911d8993026db72a1be56 840w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step7.png?w=1100&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=364ed4678cc489c24b643a4e76f6c026 1100w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step7.png?w=1650&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=606ddbda10ff1262dba4539460f2eaa2 1650w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/Azure_Step7.png?w=2500&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=ed63b518db3c4e90947b7b573809940a 2500w" />

    4. The deployed endpoint URL is **http\://\<load-balancer-public-IP-address>/general/v0/general**. Note this endpoint URL, as you will need it later to call the Unstructured API.
  </Step>

  <Step title="Post-deployment: set API environment variables">
    Note the API environment variables in the [API environment variables](#api-environment-variables) section, later on this page. If you need to set any of these in the Docker container on the virtual machine, do the following:

    1. If the virtual machine is not already running from earlier, click the **Start** icon.

    2. After the virtual machine starts, click the **Connect** icon, and then click **Connect** from the drop-down list.

    3. Follow the on-screen directions for one of the available options to connect to the virtual machine and display a connected terminal.

    4. Stop the running container in the virtual machine, so that you can restart it later with the environment variables set: In the connected terminal, run the following command: `sudo docker container ls`.

    5. Note the `CONTAINER ID` value for the running container.

    6. Run the following command, replacing `<CONTAINER ID>` with the `CONTAINER ID` value:

       ```bash  theme={null}
       sudo docker container rm --force <CONTAINER ID>
       ```

    7. Now run the container again, setting the environment variables at the same time: Run the following command: `sudo docker image ls`.

    8. Note the `REPOSITORY` and `TAG` value for the Docker image.

    9. Run the following command, replacing `<REPOSITORY>` and `<TAG>` with the `REPOSITORY` and `TAG` values for the Docker image, and replacing
       `<VAR1>=<value1>`, `<VAR2>=<value2>` and so on with the environment variable name and value pairs:

       ```bash  theme={null}
       sudo docker run -d --restart unless-stopped \
       -p 80:5000 \
       -e <VAR1>=<value1> -e <VAR2>=<value2> -e <VAR3>=<value3> \
       <REPOSITORY>:<TAG>
       ```

    10. Verify that the environment variables were set correctly: Run the following command:

        ```bash  theme={null}
        sudo docker container ls
        ```

    11. Note the `CONTAINER ID` value for the running container.

    12. Run the following command, replacing `<CONTAINER ID>` with the `CONTAINER ID` value:

    ```bash  theme={null}
    sudo docker exec <CONTAINER ID> bash -c 'printenv'
    ```

    14. The environment variables should be in the list that appears.

    <Note>To help manage your overall costs, you should click the **Stop** icon whenever you are not using this virtual machine to call the Unstructured API.</Note>
  </Step>

  <Step title="Call the Unstructured API">
    You can now use the running virtual machine to call the Unstructured API. For example, run one of the following, setting the following environment variables to make your code more portable:

    * Set `UNSTRUCTURED_API_URL` to `http://`, followed by your load balancer's public IP address, followed by `/general/v0/general`.
    * Set `LOCAL_FILE_INPUT_DIR` to the path on your local machine to the files for the Unstructured API to process. If you do not have any input files available, you can download any of the ones from the [example-docs](https://github.com/Unstructured-IO/unstructured-ingest/tree/main/example-docs) folder in GitHub.
    * Set `LOCAL_FILE_OUTPUT_DIR` to the path on your local machine for Unstructured API to send the processed output in JSON format.
  </Step>
</Steps>

<AccordionGroup>
  <Accordion title="Ingest CLI">
    You must first [install the Unstructured Ingest CLI](/open-source/ingestion/overview#unstructured-ingest-cli).

    Because you are calling a private API and therefore do not need an Unstructured API key, you can omit the command-line option `--api-key` Or, for better code portability, it is recommended that you first set the environment variable `UNSTRUCTURED_API_KEY` to an empty string and then include the command-line option `--api-key`.

    ```bash CLI theme={null}
    unstructured-ingest \
      local \
        --input-path $LOCAL_FILE_INPUT_DIR \
        --output-dir $LOCAL_FILE_OUTPUT_DIR \
        --partition-by-api \
        --api-key $UNSTRUCTURED_API_KEY \
        --partition-endpoint $UNSTRUCTURED_API_URL \
        --strategy hi_res \
        --additional-partition-args="{\"split_pdf_page\":\"true\", \"split_pdf_allow_failed\":\"true\", \"split_pdf_concurrency_level\": 15}"
    ```
  </Accordion>

  <Accordion title="Ingest Python library">
    You must first [install the Unstructured Ingest Python library](/open-source/ingestion/python-ingest).

    Because you are calling a private API and therefore do not need an Unstructured API key, you can omit the parameter `api_key`. Or, for better code portability, it is recommended that you first set the environment variable `UNSTRUCTURED_API_KEY` to an empty string and then include the parameter `api_key`.

    ```python Python Ingest theme={null}
    import os

    from unstructured_ingest.pipeline.pipeline import Pipeline
    from unstructured_ingest.interfaces import ProcessorConfig
    from unstructured_ingest.processes.connectors.local import (
        LocalIndexerConfig,
        LocalDownloaderConfig,
        LocalConnectionConfig,
        LocalUploaderConfig
    )
    from unstructured_ingest.processes.partitioner import PartitionerConfig

    if __name__ == "__main__":
        Pipeline.from_configs(
            context=ProcessorConfig(),
            indexer_config=LocalIndexerConfig(input_path=os.getenv("LOCAL_FILE_INPUT_DIR")),
            downloader_config=LocalDownloaderConfig(),
            source_connection_config=LocalConnectionConfig(),
            partitioner_config=PartitionerConfig(
                partition_by_api=True,
                api_key=os.getenv("UNSTRUCTURED_API_KEY"),
                partition_endpoint=os.getenv("UNSTRUCTURED_API_URL"),
                strategy="hi_res",
                additional_partition_args={
                    "split_pdf_page": True,
                    "split_pdf_allow_failed": True,
                    "split_pdf_concurrency_level": 15
                }
            ),
            uploader_config=LocalUploaderConfig(output_dir=os.getenv("LOCAL_FILE_OUTPUT_DIR"))
        ).run()
    ```
  </Accordion>
</AccordionGroup>

<Note>To help manage your overall costs, you should stop running the associated virtual machine whenever you are not using it to call the Unstructured API.</Note>

## Load balancer network settings

Unstructured recommends the following load balancer settings, which you should set on your deployment's load balancer soon after you finalize and deploy it.

On the load balancer's **Overview** tab in the Azure portal:

* **SKU**: **Standard**

On the load balancer's **Settings** tab in the Azure portal:

* **Frontend IP configuration**: Private IP

* **Backend pools**: VMSS

* **Health probes**:

  * **Protocol**: **HTTP**, or **HTTPS** (this requires setting up a reverse proxy on the VMSS set to do TLS termination)
  * **Port**: `80` or `443` (this can be any port that the backend VMs are listening on)
  * **Path**: `/healthcheck`
  * **Interval (seconds)**: `5`

* **Load balancing rules**:

  * **Protocol**: **TCP**
  * **Port**: `443` for HTTPS, or `80` for HTTP
  * **Backend port**: `443` for HTTPS, or `80` for HTTP
  * **Idle timeout (minutes)**: `60`
  * **Enable TCP Reset** box: Checked

* **Inbound NAT rules**:

  * **Frontend Port**: `443` for HTTPS, or `80` for HTTP
  * **Backend port**: `443` for HTTPS, or `80` for HTTP
  * **Protocol**: **TCP**
  * **Enable TCP Reset** box: Checked
  * **Idle timeout (minutes)**: `60`

## API environment variables

Unstructured supports the following environment variables, which you can set in the Docker image on the virtual machine, as needed:

* `ALLOW_ORIGINS`: CORS-allowed origins.
* `UNSTRUCTURED_ALLOWED_MIMETYPE`: The list of allowed MIME types, if you want to limit the file types that can be processed.
* `UNSTRUCTURED_API_KEY`: The default Unstructured API key to use.
* `UNSTRUCTURED_MEMORY_FREE_MINIMUM_MB`: The minimum amount of free memory in MB to allow for processing a file. If this memory is too low, the server will return a `503` error.
* `UNSTRUCTURED_PDF_HI_RES_MAX_PAGES`: The maximum number of pages in a PDF file that the Unstructured API will not reject, if the `hi_res` strategy is used. The default is `300`.
* `UNSTRUCTURED_REDIRECT_ROOT_URL`: If this is set, redirect a `GET` request to the Unstructured API to use this URL instead.

---

# Source: https://docs.unstructured.io/ui/account/billing.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Account billing

## Add, view, or change payment details

<Note>
  The following information applies only to Unstructured **Pay-As-You-Go** accounts.

  To add, view, or change payment details for an Unstructured **Business** account, contact your
  Unstructured Account Executive (AE). If you do not know who your assigned AE is,
  email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).
</Note>

1. If you are not already signed in, sign in to your Unstructured account.
2. On the sidebar, click your user icon, and then click **Account Settings**.
3. In the top navigation bar, click the **Billing** tab.
4. Click **Manage Payment**; follow the on-screen instructions to add, view, or change your payment details; and then click **Save card**.

## View billing details

<Note>
  The following information applies only to Unstructured **Pay-As-You-Go** accounts.

  To view billing details for an Unstructured **Business** account, contact your
  Unstructured Account Executive (AE). If you do not know who your assigned AE is,
  email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).
</Note>

1. If you are not already signed in, sign in to your Unstructured account.
2. On the sidebar, click your user icon, and then click **Account Settings**.
3. In the top navigation bar, click the **Billing** tab. Your account's billing details are displayed.

## How is pay-per-page billing calculated?

Unstructured calculates a page as follows:

* For these file types, a page is a page, slide, or image: `.pdf`, `.pptx`, and `.tiff`.
* For `.docx` files that have page metadata, Unstructured calculates the number of pages based on that metadata.
* For all other file types, Unstructured calculates the number of pages as the file's size divided by 100 KB.
* For non-file data, Unstructured calculates a page as 100 KB of incoming data to be processed.

The **Billing** tab shows your account's current billing rate per page.

To complete usage-related tasks in your account, see [Account usage](/ui/account/usage).

---

# Source: https://docs.unstructured.io/ui/sources/box.md

# Source: https://docs.unstructured.io/open-source/ingestion/source-connectors/box.md

# Source: https://docs.unstructured.io/open-source/ingestion/destination-connectors/box.md

# Source: https://docs.unstructured.io/api-reference/workflow/sources/box.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Box

<Note>
  If you're new to Unstructured, read this note first.

  Before you can create a source connector, you must first sign in to your Unstructured account:

  * If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
    After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).
    To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/api-reference/overview#pricing).
  * If you already have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at
    [https://platform.unstructured.io](https://platform.unstructured.io). For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
    or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  After you sign in, the [Unstructured user interface](/ui/overview) (UI) appears, which you use to get your Unstructured API key, as follows:

  1. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business** account, click **API Keys** on the sidebar.<br />

     For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
     for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).

  2. Click **Generate API Key**.<br />

  3. Follow the on-screen instructions to finish generating the key.<br />

  4. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

  After you create the source connector, add it along with a
  [destination connector](/api-reference/workflow/destinations/overview) to a [workflow](/api-reference/workflow/overview#workflows).
  Then run the worklow as a [job](/api-reference/workflow/overview#jobs). To learn how, try out the
  the notebook [Dropbox-To-Pinecone Connector API Quickstart for Unstructured](https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Dropbox_To_Pinecone_Connector_Quickstart.ipynb),
  or watch the two 4-minute video tutorials for the [Unstructured Python SDK](/api-reference/workflow/overview#unstructured-python-sdk).

  You can also create source connectors with the Unstructured user interface (UI).
  [Learn how](/ui/sources/overview).

  If you need help, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  You are now ready to start creating a source connector! Keep reading to learn how.
</Note>

Ingest your files into Unstructured from Box.

The requirements are as follows.

<iframe width="560" height="315" src="https://www.youtube.com/embed/Gkniu1sfz1o" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

1. Access to the [Developer Console](https://app.box.com/developers/console) from your [Box enterprise account](https://account.box.com/signup/enterprise-plan) or [Box developer account](https://account.box.com/signup/developer).

2. A Box Custom App in your Box account, set up to use **Server Authentication (with JWT)**. See [Setup with JWT](https://developer.box.com/guides/authentication/jwt/jwt-setup/).

3. The appropriate application scopes and advanced features set up for the Box Custom App, as follows:

   * In the Box Custom App, on the **Configuration** tab, under **Application Scopes**, check the box titled **Write all files and folders stored in Box**.
   * In the Box Custom App, on the **Configuration** tab, under **Advanced Features**, check the box titled **Make API calls using the as-user header**.
   * Click **Save Changes** before continuing.

4. Authorization by a Box Admin in your Box account for the Box Custom App. See [App Authorization](https://developer.box.com/guides/authentication/jwt/jwt-setup/#app-authorization).

5. Access by your Box account's source or target [folder](https://app.box.com/folder/0) to your Box Custom App, as follows:

   * In the Box Custom App, on the **General Settings** tab, copy the **Service Account ID** (which takes the form `AutomationUser_<your-app-service-id>_<a-random-string@boxdevedition.com`).
   * **Share** your Box account's target folder with the copied service account's email address as a **Co-owner** or **Editor**.
   * Note the remote URL to the target folder, which takes the format `box://<path/to/folder/in/account>`.

6. The private key configuration JSON file for the Box Custom App, or a string that contains this file's contents.

   * To download this file, in the Box Custom App, on the **Configuration** tab, under **Add and Manage Public Keys**, click **Generate a Public/Private Keypair**. Store the downloaded private key configuration JSON file in a secure location.
   * To ensure maximum compatibility across Unstructured service offerings, you should give the private key configuration JSON file information to Unstructured as
     a single-line string that contains the contents of the downloaded private key configuration JSON file (and not the file itself).
     To print this single-line string, suitable for copying, you can run one of the following commands from your Terminal or Command Prompt.
     In this command, replace `<path-to-downloaded-key-file>` with the path to the private key configuration JSON file that you downloaded by following the preceding instructions.

     * For macOS or Linux:

       ```text  theme={null}
       tr -d '\n' < <path-to-downloaded-key-file>
       ```

     * For Windows:

       ```text  theme={null}
       (Get-Content -Path "<path-to-downloaded-key-file>" -Raw).Replace("`r`n", "").Replace("`n", "")
       ```

To create an Box source connector, see the following examples.

<CodeGroup>
  ```python Python SDK theme={null}
  import os

  from unstructured_client import UnstructuredClient
  from unstructured_client.models.operations import CreateSourceRequest
  from unstructured_client.models.shared import CreateSourceConnector

  with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
      response = client.sources.create_source(
          request=CreateSourceRequest(
              create_source_connector=CreateSourceConnector(
                  name="<name>",
                  type="box",
                  config={
                      "remote_url": "<remote-url>",
                      "recursive": <True|False>,
                      "box_app_config": "<box-app-config>"
                  }
              )
          )
      )

      print(response.source_connector_information)
  ```

  ```bash curl theme={null}
  curl --request 'POST' --location \
  "$UNSTRUCTURED_API_URL/sources" \
  --header 'accept: application/json' \
  --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
  --header 'content-type: application/json' \
  --data \
  '{
      "name": "<name>",
      "type": "box",
      "config": {
          "remote_url": "<remote-url>",
          "recursive": <true|false>,
          "box_app_config": "<box-app-config>"
      }
  }'
  ```
</CodeGroup>

Replace the preceding placeholders as follows:

* `<name>` (*required*): A unique name for this connector.
* `<remote-url>` (*required*): The URL to the target Box folder. This URL must take the format `box://<path/to/folder/in/account>`.
* For `recursive`, set to `true` to recursively access files from subfolders within the taget Box folder. The default is `false` if not otherwise specified.
* `<box-app-config>` (*required*): The contents of the private key configuration JSON file for the Box Custom App with access to the target Box folder. This file's contents must be provided as a single string.

## Learn more

* <Icon icon="blog" />  [Finding Needles in a Haystack: PII Detection at Scale with Unstructured, Box, and Elasticsearch](https://unstructured.io/blog/finding-needles-in-a-haystack-pii-detection-at-scale-with-unstructured-box-and-elasticsearch)

---

# Source: https://docs.unstructured.io/api-reference/jobs/cancel-job.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Cancel Job

> Cancel the specified job.


## OpenAPI

````yaml https://platform.unstructuredapp.io/openapi.json post /api/v1/jobs/{job_id}/cancel
openapi: 3.1.0
info:
  title: Platform API
  version: 3.1.0
servers:
  - url: https://platform.unstructuredapp.io/
    description: Unstructured Platform API
    x-speakeasy-server-id: platform-api
security: []
paths:
  /api/v1/jobs/{job_id}/cancel:
    post:
      tags:
        - jobs
      summary: Cancel Job
      description: Cancel the specified job.
      operationId: cancel_job
      parameters:
        - name: job_id
          in: path
          required: true
          schema:
            type: string
            format: uuid
            title: Job Id
        - name: unstructured-api-key
          in: header
          required: false
          schema:
            anyOf:
              - type: string
              - type: 'null'
            title: Unstructured-Api-Key
      responses:
        '200':
          description: Successful Response
          content:
            application/json:
              schema: {}
        '422':
          description: Validation Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/HTTPValidationError'
components:
  schemas:
    HTTPValidationError:
      properties:
        detail:
          items:
            $ref: '#/components/schemas/ValidationError'
          type: array
          title: Detail
      type: object
      title: HTTPValidationError
    ValidationError:
      properties:
        loc:
          items:
            anyOf:
              - type: string
              - type: integer
          type: array
          title: Location
        msg:
          type: string
          title: Message
        type:
          type: string
          title: Error Type
      type: object
      required:
        - loc
        - msg
        - type
      title: ValidationError

````

---

# Source: https://docs.unstructured.io/support/issues/cannot-locate-credentials.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# S3 destination connector - Unable to locate credentials

## Issue

When you try to run a workflow in Unstructured, and the workflow contains an Amazon S3 destination connector,
you get an error that contains text similar to one of the following, and the workflow fails to run:

```text  theme={null}
500: Error in uploader - [NoCredentialsError] Unable to locate credentials 
```

```text  theme={null}
failed to invoke plugin: Unable to locate credentials
```

## Possible causes

* Unstructured tried to save its processed data to the specific Amazon S3 bucket, but Unstructured could not find the necessary
  security keys or other credentials to get access to that bucket.
* The specified AWS Access Key ID and Secret Access Key are missing from your S3 destination connector configuration.
* The specified credentials are incorrect, have been revoked, or have expired.
* The S3 destination connector is configured to use a temporary IAM role, but that role does not have permission to write to the target bucket.

## Possible solutions

* Double-check that your workflow's Amazon S3 destination connector has the correct, valid credentials specified.
* Confirm that the specified credentials are still present and valid in your AWS account.
* If your organization uses IAM roles, contact your platform administrator to ensure the role has `s3:PutObject` permissions for the bucket you are trying to use.

## Additional resources

* Always copy and paste security keys and other credentials carefully.
* If possible, use your organization's secrets manager to handle security keys and other credentials automatically.
* To ask questions or get additional help with this issue, see [requesting support](/support/request).

---

# Source: https://docs.unstructured.io/open-source/ingestion/destination-connectors/chroma.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Chroma

Batch process all your records to store structured outputs in a Chroma account.

The requirements are as follows.

* A Chroma server. You can connect to a [hosted Chroma Cloud server](https://trychroma.com/signup) or [deploy your own server](https://docs.trychroma.com/deployment).
* The Chroma server's hostname or IP address, and the server's port number. For Chroma Cloud, the host is `api.trychroma.com`, and the port number is `8000`.
* If you are using local Chroma, the [path to the folder where you store your database](https://docs.trychroma.com/docs/run-chroma/persistent-client).
* The name of the tenant that you want to access on the server.
* The name of the database that you want to access in the tenant.
* The name of the collection that you want to access in the database.

The Chroma connector dependencies:

```bash CLI, Python theme={null}
pip install "unstructured-ingest[chroma]"
```

You might also need to install additional dependencies, depending on your needs. [Learn more](/open-source/ingestion/ingest-dependencies).

The following environment variables:

* `CHROMA_HOST` - The , represented by `--host` (CLI) or `host` (Python). Chroma Cloud uses `api.trychroma.com`.
* `CHROMA_PORT` - The , represented by `--port` (CLI) or `port` (Python). Chroma Cloud uses port `8000`.
* `CHROMA_TENANT` - The name of the tenant that you want to access on the Chroma server, represented by `--tenant` (CLI) or `tenant` (Python).
* `CHROMA_DATABASE` - The name of the database that you want to access in the tenant, represented by `--database` (CLI) or `database` (Python).
* `CHROMA_COLLECTION` - The name of the collection that you want to access in the database, represented by `--collection-name` (CLI) or `collection_name` (Python).

Additional settings include:

* `--path` (CLI) or `path` (Python): The location where Chroma is persisted if you are not connecting through HTTP.
* `--settings` (CLI) or `settings` (Python): A dictionary of settings to communicate with the Chroma server, for example: `'{"persist_directory":"./chroma-persist"}'`.
* `--headers` (CLI) or `headers` (Python): A dictionary of headers to send to the Chroma server, for example: `'{"Authorization":"Basic()"}'`.
* `--ssl` (CLI) or `ssl` (Python): True to use SSL for the connection.

In order to use Chroma Cloud, you should have this configuration:

* `host=api.trychroma.com`
* `port=8000`
* Get your `tenant` ID from the [Chroma dashboard](https://trychroma.com).
* `headers={'x-chroma-token': 'YOUR_API_KEY'}`
* `ssl` should be enabled.

Now call the Unstructured CLI or Python. The source connector can be any of the ones supported. This example uses the local source connector.

This example sends files to Unstructured for processing by default. To process files locally instead, see the instructions at the end of this page.

<CodeGroup>
  ```bash CLI theme={null}
  #!/usr/bin/env bash

  # Chunking and embedding are optional.

  unstructured-ingest \
    local \
      --input-path $LOCAL_FILE_INPUT_DIR \
      --output-dir $LOCAL_FILE_OUTPUT_DIR \
      --chunk-elements \
      --embedding-provider huggingface \
      --num-processes 2 \
      --verbose \
      --work-dir $WORK_DIR \
      --partition-by-api \
      --api-key $UNSTRUCTURED_API_KEY \
      --partition-endpoint $UNSTRUCTURED_API_URL \
      --strategy hi_res \
      --additional-partition-args="{\"split_pdf_page\":\"true\", \"split_pdf_allow_failed\":\"true\", \"split_pdf_concurrency_level\": 15}" \
    chroma \
      --host $CHROMA_HOST \
      --port $CHROMA_PORT \
      --collection-name $CHROMA_COLLECTION \
      --tenant $CHROMA_TENANT \
      --database $CHROMA_DATABASE \
      --settings "{\"persist_directory\":\"./chroma-persist\"}" \
      --headers "{\"Authorization\":\"Basic()\"}" \
      --batch-size 80 \
      --ssl
  ```

  ```python Python Ingest theme={null}
  import os

  from unstructured_ingest.pipeline.pipeline import Pipeline
  from unstructured_ingest.interfaces import ProcessorConfig

  from unstructured_ingest.processes.connectors.chroma import (
      ChromaAccessConfig,
      ChromaConnectionConfig,
      ChromaUploadStagerConfig,
      ChromaUploaderConfig
  )
  from unstructured_ingest.processes.connectors.local import (
      LocalIndexerConfig,
      LocalConnectionConfig,
      LocalDownloaderConfig
  )
  from unstructured_ingest.processes.partitioner import PartitionerConfig
  from unstructured_ingest.processes.chunker import ChunkerConfig
  from unstructured_ingest.processes.embedder import EmbedderConfig

  # Chunking and embedding are optional.

  if __name__ == "__main__":

      Pipeline.from_configs(
          context=ProcessorConfig(),
          indexer_config=LocalIndexerConfig(input_path=os.getenv("LOCAL_FILE_INPUT_DIR")),
          downloader_config=LocalDownloaderConfig(),
          source_connection_config=LocalConnectionConfig(),
          partitioner_config=PartitionerConfig(
              partition_by_api=True,
              api_key=os.getenv("UNSTRUCTURED_API_KEY"),
              partition_endpoint=os.getenv("UNSTRUCTURED_API_URL"),
              additional_partition_args={
                  "split_pdf_page": True,
                  "split_pdf_allow_failed": True,
                  "split_pdf_concurrency_level": 15
              }
          ),
          chunker_config=ChunkerConfig(chunking_strategy="by_title"),
          embedder_config=EmbedderConfig(embedding_provider="huggingface"),
          destination_connection_config=ChromaConnectionConfig(
              access_config=ChromaAccessConfig(
                  settings={"persist_directory":"./chroma-persist"},
                  headers={"Authorization":"Basic()"}
              ),
              host=os.getenv("CHROMA_HOST"),
              port=os.getenv("CHROMA_PORT"),
              tenant=os.getenv("CHROMA_TENANT"),
              database=os.getenv("CHROMA_DATABASE"),
              collection_name=os.getenv("CHROMA_COLLECTION")
          ),
          stager_config=ChromaUploadStagerConfig(),
          uploader_config=ChromaUploaderConfig()
      ).run()

  ```
</CodeGroup>

For the Unstructured Ingest CLI and the Unstructured Ingest Python library, you can use the `--partition-by-api` option (CLI) or `partition_by_api` (Python) parameter to specify where files are processed:

* To do local file processing, omit `--partition-by-api` (CLI) or `partition_by_api` (Python), or explicitly specify `partition_by_api=False` (Python).

  Local file processing does not use an Unstructured API key or API URL, so you can also omit the following, if they appear:

  * `--api-key $UNSTRUCTURED_API_KEY` (CLI) or `api_key=os.getenv("UNSTRUCTURED_API_KEY")` (Python)
  * `--partition-endpoint $UNSTRUCTURED_API_URL` (CLI) or `partition_endpoint=os.getenv("UNSTRUCTURED_API_URL")` (Python)
  * The environment variables `UNSTRUCTURED_API_KEY` and `UNSTRUCTURED_API_URL`

* To send files to the legacy [Unstructured Partition Endpoint](/api-reference/legacy-api/partition/overview) for processing, specify `--partition-by-api` (CLI) or `partition_by_api=True` (Python).

  Unstructured also requires an Unstructured API key and API URL, by adding the following:

  * `--api-key $UNSTRUCTURED_API_KEY` (CLI) or `api_key=os.getenv("UNSTRUCTURED_API_KEY")` (Python)
  * `--partition-endpoint $UNSTRUCTURED_API_URL` (CLI) or `partition_endpoint=os.getenv("UNSTRUCTURED_API_URL")` (Python)
  * The environment variables `UNSTRUCTURED_API_KEY` and `UNSTRUCTURED_API_URL`, representing your API key and API URL, respectively.

  <Note>
    You must specify the API URL only if you are not using the default API URL for Unstructured Ingest, which applies to **Let's Go**, **Pay-As-You-Go**, and **Business SaaS** accounts.

    The default API URL for Unstructured Ingest is `https://api.unstructuredapp.io/general/v0/general`, which is the API URL for the legacy[Unstructured Partition Endpoint](/api-reference/legacy-api/partition/overview). However, you should always use the URL that was provided to you when your Unstructured account was created. If you do not have this URL, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

    If you do not have an API key, [get one now](/api-reference/legacy-api/partition/overview).

    If you are using a **Business** account, the process
    for generating Unstructured API keys, and the Unstructured API URL that you use, are different.
    For instructions, see your Unstructured account administrator, or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).
  </Note>

---

# Source: https://docs.unstructured.io/open-source/ingestion/ingest-configuration/chunking-configuration.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Chunking configuration

A common chunking configuration is a critical element in the data processing pipeline, particularly when creating embeddings and populating vector databases with the results. This configuration defines the parameters governing the segmentation of text into meaningful chunks, whether at the document, paragraph, or sentence level. It plays a pivotal role in determining the size and structure of these chunks, ensuring that they align with the specific requirements of downstream tasks, such as embedding generation and vector database population. By carefully configuring chunking parameters, users can optimize the granularity of data segments, ultimately contributing to more cohesive and contextually rich results. This is crucial for tasks like natural language processing and text analysis, as well as for the efficient storage and retrieval of embeddings in vector databases, enhancing the quality and relevance of the results.

## Configs

* `chunk_api_key`: If `chunk_by_api` is set to `True`, requests that are sent to the Unstructured API will use this Unstructured API key to make authenticated calls.

* `chunk_by_api`: Default: `False`. If set to `True`, uses Unstructured to run chunking. If set to `False`, runs chunking locally.

* `chunk_combine_text_under_n_chars`: Combine consecutive chunks when the first does not exceed length `n` and the second will fit without exceeding the hard-maximum length. Only operative for the `by_title` chunking strategy.

* `chunk_include_orig_elements`: `True` when chunking to add the original elements consolidated to form each chunk to `.metadata.orig_elements` on that chunk.

* `chunk_max_characters`: Default: `500`. The hard-maximum chunk length. No chunk will exceed this length. An oversized element will be divided by text-splitting to fit this window.

* `chunk_multipage_selections`: `True` to ignore page boundaries when chunking such that elements from two different pages can appear in the same chunk. Only operative for the `by_title` chunking strategy.

* `chunk_new_after_n_chars`: The soft-maximum chunk length. Another element will not be added to a chunk of `n` length even when it would fit without exceeding the hard-maximum length.

* `chunk_overlap`: Default: `0`. Prefix each chunk's text with the last overlap of `n` characters from the prior chunk. Only applies to oversized chunks divided by text-splitting. To apply overlap to non-oversized chunks, use `chunk_overlap_all`.

* `chunk_overlap_all`: Applies overlap to chunks formed from whole elements as well as those formed by text-splitting oversized elements. The overlap length is taken from the `chunk_overlap` value.

* `chunking_endpoint`: If `chunk_by_api` is set to `True`, chunking requests are sent to this Unstructured API URL. By default, this URL is the legacy Unstructured Partition Endpoint URL: `https://api.unstructuredapp.io/general/v0/general`. However, you should always use the URL that was provided to you when your Unstructured account was created. If you do not have this URL, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

* `chunking_strategy`: One of `basic` or `by_title`. When omitted, no chunking is performed. The `basic` strategy maximally fills each chunk with whole elements, up the specified size limits as specified by `max_characters` and `new_after_n_chars`. A single element that exceeds this length is divided into two or more chunks using text-splitting. A `Table` element is never combined with any other element and appears as a chunk of its own or as a sequence of `TableChunk` elements splitting is required. The `by_title` behaviors are the same except that section and optionally page boundaries are respected such that two consecutive elements from different sections appear in separate chunks.

<Note>Chunking will fail if you set both `partition_by_api` to `False` and `chunking_strategy` to `by_page` or `by_similarity`. However, the rest of your data processing pipeline should be unaffected by this setting.</Note>

---

# Source: https://docs.unstructured.io/ui/chunking.md

# Source: https://docs.unstructured.io/open-source/core-functionality/chunking.md

# Source: https://docs.unstructured.io/open-source/best-practices/chunking.md

# Source: https://docs.unstructured.io/api-reference/legacy-api/partition/chunking.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Chunking strategies

<Note>
  The following information applies to the legacy Unstructured Partition Endpoint.

  Unstructured recommends that you use the
  [on-demand jobs](/api-reference/workflow/overview#run-an-on-demand-job) functionality in the
  [Unstructured API](/api-reference/overview) instead. Unstructured's on-demand jobs provide
  many benefits over the legacy Unstructured Partition Endpoint, including support for:

  * Production-level usage.
  * Multiple local input files in batches.
  * The latest and highest-performing models.
  * Post-transform enrichments.
  * All of Unstructured's chunking strategies.
  * The generation of vector embeddings.

  The Unstructured API also provides support for processing files and data in remote locations.
</Note>

Chunking functions use metadata and document elements detected with partition functions to split a document into
appropriately-sized chunks for uses cases such as retrieval-augmented generation (RAG).

If you are familiar with chunking methods that split long text documents into smaller chunks, you'll notice that
Unstructured methods slightly differ, since the partitioning step already divides an entire document into its structural elements.

Individual elements will only be split if they exceed the desired maximum chunk size. Two or more consecutive text elements
that will together fit within `max_characters` will be combined. After chunking, you will only have elements of the
following types:

* `CompositeElement`: Any text element will become a `CompositeElement` after chunking. A composite element can be a
  combination of two or more original text elements that together fit within the maximum chunk size. It can also be a single
  element that doesn't leave room in the chunk for any others but fits by itself. Or it can be a fragment of an original
  text element that was too big to fit in one chunk and required splitting.
* `Table`:  A table element is not combined with other elements and if it fits within `max_characters` it will remain as is.
* `TableChunk`: large tables that exceed `max_characters` chunk size are split into special `TableChunk` elements.

### "basic" chunking strategy

* The basic strategy combines sequential elements to maximally fill each chunk while respecting both the specified `max_characters` (hard-max) and `new_after_n_chars` (soft-max) option values.

* A single element that by itself exceeds the hard-max is isolated (never combined with another element) and then divided into two or more chunks using text-splitting.

* A `Table` element is always isolated and never combined with another element. A `Table` can be oversized, like any other text element, and in that case is divided into two or more `TableChunk` elements using text-splitting.

* If specified, `overlap` is applied between chunks formed by splitting oversized elements and is also applied between other chunks when `overlap_all` is `True`.

### "by\_title" chunking strategy

The `by_title` chunking strategy preserves section boundaries and optionally page boundaries as well. “Preserving” here means that a single chunk will never contain text that occurred in two different sections. When a new section starts, the existing chunk is closed and a new one started, even if the next element would fit in the prior chunk.

In addition to the behaviors of the `basic` strategy above, the `by_title` strategy has the following behaviors:

* **Detect section headings.** A `Title` element is considered to start a new section. When a `Title` element is encountered, the prior chunk is closed and a new chunk started, even if the `Title` element would fit in the prior chunk.

* **Respect page boundaries.** Page boundaries can optionally also be respected using the `multipage_sections` argument. This defaults to `True` meaning that a page break does *not* start a new chunk. Setting this to `False` will separate elements that occur on different pages into distinct chunks.

* **Combine small sections.** In certain documents, partitioning may identify a list-item or other short paragraph as a `Title` element even though it does not serve as a section heading. This can produce chunks substantially smaller than desired. This behavior can be mitigated using the `combine_text_under_n_chars` argument. This defaults to the same value as `max_characters` such that sequential small sections are combined to maximally fill the chunking window. Setting this to `0` will disable section combining.

### "by\_page" chunking strategy

Only available in the Unstructured UI and API.

The `by_page` chunking strategy ensures the content from different pages do not end up in the same chunk.
When a new page is detected, the existing chunk is completed and a new one is started, even if the next element would fit in the
prior chunk.

### "by\_similarity" chunking strategy

Only available in Unstructured API and Platform.

The `by_similarity` chunking strategy employs the `sentence-transformers/multi-qa-mpnet-base-dot-v1` embedding model to
identify topically similar sequential elements and combine them into chunks.

As with other strategies, chunks will never exceed the hard-maximum chunk size set by `max_characters`. For this reason,
not all elements that share a topic will necessarily appear in the same chunk. However, with this strategy you can
guarantee that two elements with low similarity will not be combined in a single chunk.

You can control the level of topic similarity you require for elements to have by setting the `similarity_threshold` parameter.
`similarity_threshold` expects a value between 0.0 and 1.0 specifying the minimum similarity text in consecutive elements
must have to be included in the same chunk. The default is 0.5.

###

## Learn more

<Icon icon="blog" />  [Chunking for RAG: best practices](https://unstructured.io/blog/chunking-for-rag-best-practices)

---

# Source: https://docs.unstructured.io/open-source/core-functionality/cleaning.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Cleaning

> As part of data preparation for an NLP model, it’s common to need to clean up your data prior to passing it into the model. If there’s unwanted content in your output, for example, it could impact the quality of your NLP model. To help with this, the `unstructured` library includes cleaning functions to help users sanitize output before sending it to downstream applications.

Some cleaning functions apply automatically. In the example in the **Partition** section, the output `Philadelphia Eaglesâ\x80\x99 victory` automatically gets converted to `Philadelphia Eagles' victory` in `partition_html` using the `replace_unicode_quotes` cleaning function. You can see how that works in the code snippet below:

```python  theme={null}
from unstructured.cleaners.core import replace_unicode_quotes

replace_unicode_quotes("Philadelphia Eaglesâ\x80\x99 victory")

```

Document elements in `unstructured` include an `apply` method that allow you to apply the text cleaning to the document element without instantiating a new element. The `apply` method expects a callable that takes a string as input and produces another string as output. In the example below, we invoke the `replace_unicode_quotes` cleaning function using the `apply` method.

```python  theme={null}
from unstructured.documents.elements import Text

element = Text("Philadelphia Eaglesâ\x80\x99 victory")
element.apply(replace_unicode_quotes)
print(element)

```

Since a cleaning function is just a `str -> str` function, users can also easily include their own cleaning functions for custom data preparation tasks. In the example below, we remove citations from a section of text.

```python  theme={null}
import re

remove_citations = lambda text: re.sub("\[\d{1,3}\]", "", text)

element = Text("[1] Geolocated combat footage has confirmed Russian gains in the Dvorichne area northwest of Svatove.")
element.apply(remove_citations)
print(element)

```

See below for a full list of cleaning functions in the `unstructured` library.

## `bytes_string_to_string`

Converts an output string that looks like a byte string to a string using the specified encoding. This happens sometimes in `partition_html` when there is a character like an emoji that isn’t expected by the HTML parser. In that case, the encoded bytes get processed.

Examples:

```python  theme={null}
from unstructured.cleaners.core import bytes_string_to_string

text = "Hello ð\x9f\x98\x80"
# The output should be "Hello 😀"
bytes_string_to_string(text, encoding="utf-8")

```

```python  theme={null}
from unstructured.cleaners.core import bytes_string_to_string
from unstructured.partition.html import partition_html

text = """\n<html charset="utf-8"><p>Hello 😀</p></html>"""
elements = partition_html(text=text)
elements[0].apply(bytes_string_to_string)
# The output should be "Hello 😀"
elements[0].text

```

For more information about the `bytes_string_to_string` function, you can check the [source code here](https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/cleaners/core.py).

## `clean`

Cleans a section of text with options including removing bullets, extra whitespace, dashes and trailing punctuation. Optionally, you can choose to lowercase the output.

Options:

* Applies `clean_bullets` if `bullets=True`.

* Applies `clean_extra_whitespace` if `extra_whitespace=True`.

* Applies `clean_dashes` if `dashes=True`.

* Applies `clean_trailing_punctuation` if `trailing_punctuation=True`.

* Lowercases the output if `lowercase=True`.

Examples:

```python  theme={null}
from unstructured.cleaners.core import clean

# Returns "an excellent point!"
clean("● An excellent point!", bullets=True, lowercase=True)

# Returns "ITEM 1A: RISK FACTORS"
clean("ITEM 1A:     RISK-FACTORS", extra_whitespace=True, dashes=True)

```

For more information about the `clean` function, you can check the [source code here](https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/cleaners/core.py).

## `clean_bullets`

Removes bullets from the beginning of text. Bullets that do not appear at the beginning of the text are not removed.

Examples:

```python  theme={null}
from unstructured.cleaners.core import clean_bullets

# Returns "An excellent point!"
clean_bullets("● An excellent point!")

# Returns "I love Morse Code! ●●●"
clean_bullets("I love Morse Code! ●●●")

```

For more information about the `clean_bullets` function, you can check the [source code here](https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/cleaners/core.py).

## `clean_dashes`

Removes dashes from a section of text. Also handles special characters such as `\u2013`.

Examples:

```python  theme={null}
from unstructured.cleaners.core import clean_dashes

# Returns "ITEM 1A: RISK FACTORS"
clean_dashes("ITEM 1A: RISK-FACTORS\u2013")

```

For more information about the `clean_dashes` function, you can check the [source code here](https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/cleaners/core.py).

## `clean_non_ascii_chars`

Removes non-ascii characters from a string.

Examples:

```python  theme={null}
from unstructured.cleaners.core import clean_non_ascii_chars

text = "\x88This text contains ®non-ascii characters!●"

# Returns "This text contains non-ascii characters!"
clean_non_ascii_chars(text)

```

For more information about the `clean_non_ascii_chars` function, you can check the [source code here](https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/cleaners/core.py).

## `clean_ordered_bullets`

Remove alphanumeric bullets from the beginning of text up to three “sub-section” levels.

Examples:

```python  theme={null}
from unstructured.cleaners.core import clean_ordered_bullets

# Returns "This is a very important point"
clean_ordered_bullets("1.1 This is a very important point")

# Returns "This is a very important point ●"
clean_ordered_bullets("a.b This is a very important point ●")

```

For more information about the `clean_ordered_bullets` function, you can check the [source code here](https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/cleaners/core.py).

## `clean_postfix`

Removes the postfix from a string if they match a specified pattern.

Options:

* Ignores case if `ignore_case` is set to `True`. The default is `False`.

* Strips trailing whitespace is `strip` is set to `True`. The default is `True`.

Examples:

```python  theme={null}
from unstructured.cleaners.core import clean_postfix

text = "The end! END"

# Returns "The end!"
clean_postfix(text, r"(END|STOP)", ignore_case=True)

```

For more information about the `clean_postfix` function, you can check the [source code here](https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/cleaners/core.py).

## `clean_prefix`

Removes the prefix from a string if they match a specified pattern.

Options:

* Ignores case if `ignore_case` is set to `True`. The default is `False`.

* Strips leading whitespace is `strip` is set to `True`. The default is `True`.

Examples:

```python  theme={null}
from unstructured.cleaners.core import clean_prefix

text = "SUMMARY: This is the best summary of all time!"

# Returns "This is the best summary of all time!"
clean_prefix(text, r"(SUMMARY|DESCRIPTION):", ignore_case=True)

```

For more information about the `clean_prefix` function, you can check the [source code here](https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/cleaners/core.py).

## `clean_trailing_punctuation`

Removes trailing punctuation from a section of text.

Examples:

```python  theme={null}
from unstructured.cleaners.core import clean_trailing_punctuation

# Returns "ITEM 1A: RISK FACTORS"
clean_trailing_punctuation("ITEM 1A: RISK FACTORS.")

```

For more information about the `clean_trailing_punctuation` function, you can check the [source code here](https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/cleaners/core.py).

## `group_broken_paragraphs`

Groups together paragraphs that are broken up with line breaks for visual or formatting purposes. This is common in `.txt` files. By default, `group_broken_paragraphs` groups together lines split by `\n`. You can change that behavior with the `line_split` kwarg. The function considers `\n\n` to be a paragraph break by default. You can change that behavior with the `paragraph_split` kwarg.

Examples:

```python  theme={null}
from unstructured.cleaners.core import group_broken_paragraphs

text = """The big brown fox
was walking down the lane.

At the end of the lane, the
fox met a bear."""

group_broken_paragraphs(text)

```

```python  theme={null}
import re
from unstructured.cleaners.core import group_broken_paragraphs

para_split_re = re.compile(r"(\s*\n\s*){3}")

text = """The big brown fox

was walking down the lane.


At the end of the lane, the

fox met a bear."""

group_broken_paragraphs(text, paragraph_split=para_split_re)

```

For more information about the `group_broken_paragraphs` function, you can check the [source code here](https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/cleaners/core.py).

## `remove_punctuation`

Removes ASCII and unicode punctuation from a string.

Examples:

```python  theme={null}
from unstructured.cleaners.core import remove_punctuation

# Returns "A lovely quote"
remove_punctuation("“A lovely quote!”")

```

For more information about the `remove_punctuation` function, you can check the [source code here](https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/cleaners/core.py).

## `replace_unicode_quotes`

Replaces unicode quote characters such as `\x91` in strings.

Examples:

```
from unstructured.cleaners.core import replace_unicode_quotes

# Returns "“A lovely quote!”"
replace_unicode_characters("\x93A lovely quote!\x94")

# Returns ""‘A lovely quote!’"
replace_unicode_characters("\x91A lovely quote!\x92")

```

For more information about the `replace_unicode_quotes` function, you can check the [source code here](https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/cleaners/core.py).

## `translate_text`

The `translate_text` cleaning functions translates text between languages. `translate_text` uses the [Helsinki NLP MT models](https://huggingface.co/Helsinki-NLP) from `transformers` for machine translation. Works for Russian, Chinese, Arabic, and many other languages.

Parameters:

* `text`: the input string to translate.

* `source_lang`: the two letter language code for the source language of the text. If `source_lang` is not specified, the language will be detected using `langdetect`.

* `target_lang`: the two letter language code for the target language for translation. Defaults to `"en"`.

Examples:

```python  theme={null}
from unstructured.cleaners.translate import translate_text

# Output is "I'm a Berliner!"
translate_text("Ich bin ein Berliner!")

# Output is "I can also translate Russian!"
translate_text("Я тоже можно переводать русский язык!", "ru", "en")

```

For more information about the `translate_text` function, you can check the [source code here](https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/cleaners/translate.py).

---

# Source: https://docs.unstructured.io/support/issues/configuration-resource.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Configuration and resource issues

## Issues

When you try to connect Unstructured to a specific source or destination, you get one of the following error types:

* `FileNotFoundError`: For example, for Amazon S3, a `path not found` error.
* `ClientRequestException`: For example, for OneDrive, an `itemNotFound` error.
* `ValueError`: For example, for Google Drive, a `File not found` error. For Amazon S3, an `Invalid endpoint` error.
* `UserError`: For example, for Azure Blob Storage, a `DeploymentNotFound` error.
* `ParamValidationError`: For example, for Amazon S3, an `Invalid bucket name` error.
* `EndpointResolutionError`: For example, for Amazon S3, a `Custom endpoint not valid URI` error.
* `ProgrammingError`: For example, for Snowflake, a `No active warehouse selected` error.
* `UnboundLocalError`: For example, for SharePoint, a `cannot access 'site_drive_item'` error.
* `HTTPError`: For example, for Confluence, a `404 Not Found` error for a specific page or attachment URL.
* `KeyError`: For example, for Jira, an error containing the word 'total'. For Amazon S3, an error containing the word `Key`.

## Possible causes

* Unstructured is configured to interact with a resource—such as a file, path, deployment, endpoint, or database object—that doesn't exist,
  is misnamed, or the configuration itself is invalid.
* There is a typo in a bucket name, folder path, file ID, deployment name, hostname, site path, or database name.
* The specified resource has been deleted or moved.
* An endpoint URL is incorrectly formatted, for example, is missing `https://` or contains invalid characters.
* An Amazon S3 bucket name is not formatted correctly.
* A required configuration is missing in the source or destination connector, for example, there is no active Snowflake warehouse specified.
* An Azure OpenAI deployment name is mismatched, failed, or does not exist.
* An invalid URL is specified for an attachment or a link within a source document.
* There is a specific configuration issue with a connector, for example, the specified SharePoint path does not lead to a valid drive.

## Possible solutions

* **Verify names and paths**: Carefully check all configured names, IDs, and paths—such as for buckets, folders, files, sites, deployments, and endpoint URLs—for accuracy. Ensure they exist in the source and destination system. Case sensitivity often matters.
* **Check formatting**: Ensure that URLs, bucket names, and other parameters adhere to the required format.
* **Verify that the resource exists**: Confirm that the target file, folder, deployment, or other resource exists and has not been moved or deleted.
* **Check configuration dependencies**: Ensure the necessary configurations are set in the source or destination, for example, select and start a Snowflake warehouse by running the `USE WAREHOUSE` command first.
* For **Azure OpenAI**: Double-check that the Deployment Name matches a successful deployment in your Azure portal.
* For **Confluence**: If a `404` error occurs during download, check if the page or attachment link is valid within Confluence itself.
* For **SharePoint**: Verify the Site Path leads to a valid location containing document libraries.
* **Reconfigure the connector**: Review and, as needed, correct any misconfigured settings in the source or destination connector.

## Additional resources

To ask questions or get additional help with this issue, see [requesting support](/support/request).

---

# Source: https://docs.unstructured.io/ui/sources/confluence.md

# Source: https://docs.unstructured.io/open-source/ingestion/source-connectors/confluence.md

# Source: https://docs.unstructured.io/api-reference/workflow/sources/confluence.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Confluence

<Note>
  If you're new to Unstructured, read this note first.

  Before you can create a source connector, you must first sign in to your Unstructured account:

  * If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
    After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).
    To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/api-reference/overview#pricing).
  * If you already have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at
    [https://platform.unstructured.io](https://platform.unstructured.io). For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
    or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  After you sign in, the [Unstructured user interface](/ui/overview) (UI) appears, which you use to get your Unstructured API key, as follows:

  1. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business** account, click **API Keys** on the sidebar.<br />

     For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
     for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).

  2. Click **Generate API Key**.<br />

  3. Follow the on-screen instructions to finish generating the key.<br />

  4. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

  After you create the source connector, add it along with a
  [destination connector](/api-reference/workflow/destinations/overview) to a [workflow](/api-reference/workflow/overview#workflows).
  Then run the worklow as a [job](/api-reference/workflow/overview#jobs). To learn how, try out the
  the notebook [Dropbox-To-Pinecone Connector API Quickstart for Unstructured](https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Dropbox_To_Pinecone_Connector_Quickstart.ipynb),
  or watch the two 4-minute video tutorials for the [Unstructured Python SDK](/api-reference/workflow/overview#unstructured-python-sdk).

  You can also create source connectors with the Unstructured user interface (UI).
  [Learn how](/ui/sources/overview).

  If you need help, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  You are now ready to start creating a source connector! Keep reading to learn how.
</Note>

Ingest your files into Unstructured from Confluence.

The requirements are as follows.

* A [Confluence Cloud account](https://www.atlassian.com/software/confluence/pricing) or
  [Confluence Data Center installation](https://confluence.atlassian.com/doc/installing-confluence-data-center-203603.html).

* The site URL for your [Confluence Cloud account](https://community.atlassian.com/t5/Confluence-questions/confluence-cloud-url/qaq-p/1157148) or
  [Confluence Data Center installation](https://confluence.atlassian.com/confkb/how-to-find-your-site-url-to-set-up-the-confluence-data-center-and-server-mobile-app-938025792.html).

* A user in your [Confluence Cloud account](https://confluence.atlassian.com/cloud/invite-edit-and-remove-users-744721624.html) or
  [Confluence Data Center installation](https://confluence.atlassian.com/doc/add-and-invite-users-138313.html).

* The user must have the correct permissions in your
  [Conflunce Cloud account](https://support.atlassian.com/confluence-cloud/docs/what-are-confluence-cloud-permissions-and-restrictions/) or
  [Confluence Data Center installation](https://confluence.atlassian.com/doc/permissions-and-restrictions-139557.html) to
  access the target spaces and pages.

* One of the following:

  * For Confluence Cloud or Confluence Data Center, the target user's name or email address, and password.
    [Change a Confluence Cloud user's password](https://support.atlassian.com/confluence-cloud/docs/change-your-confluence-password/).
    [Change a Confluence Data Center user's password](https://confluence.atlassian.com/doc/change-your-password-139416.html).
  * For Confluence Cloud only, the target user's name or email address, and API token.
    [Create an API token](https://support.atlassian.com/atlassian-account/docs/manage-api-tokens-for-your-atlassian-account/).
  * For Confluence Data Center only, the target user's personal access token (PAT).
    [Create a PAT](https://confluence.atlassian.com/enterprise/using-personal-access-tokens-1026032365.html).

* Optionally, the keys (not display names) of the specific [spaces](https://support.atlassian.com/confluence-cloud/docs/navigate-spaces/) in the Confluence instance to access. To get a space's key,
  which is different from a space's display name, open the space in your web browser and look at the URL. The space's key is the part of the URL after `spaces/` but before the next `/` after that.

The following video provides related setup information for Confluence Cloud:

<iframe width="560" height="315" src="https://www.youtube.com/embed/3PsFJkcIotI" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

## Document permissions metadata

The source connector outputs any permissions information that it can find in the source location about the processed source documents and associates that information with each
corresponding element that is generated. This permissions information is output into the `permissions_data` field, which is within the
`data_source` field under the element's `metadata` field. This information lists the users or groups, if any, that have
permissions to read, update, or delete the element's associated source document.

The following example shows what the output looks like. Ellipses indicate content that has been omitted from this example for brevity.

```json  theme={null}
[
    {
        "...": "...",
        "metadata": {
            "...": "...",
            "data_source": {
                "...": "...",
                "permissions_data": [
                    {
                        "read": {
                            "users": [
                                "11111:11111111-1111-1111-1111-111111111111"
                            ],
                            "groups": [
                                "22222222-2222-2222-2222-22222222",
                                "33333333-3333-3333-3333-33333333"
                            ]
                        }
                    },
                    {
                        "update": {
                            "users": [
                                "44444:44444444-4444-4444-4444-44444444",
                                "55555:55555555-5555-5555-5555-55555555"
                            ],
                            "groups": [
                                "66666666-6666-6666-6666-66666666",
                            ]
                        }
                    },
                    {
                        "delete": {
                            "users": [
                                "77777:77777777-7777-7777-7777-77777777"
                            ],
                            "groups": [
                                "88888888-8888-8888-8888-88888888"
                            ]
                        }
                    }
                ],
                "...": "..."
            }
        }
    }
]
```

To look up information about a particular Confluence user, use the user's ID (also known as their *account ID*) from the preceding output to call the
[GET /wiki/rest/api/user](https://developer.atlassian.com/cloud/confluence/rest/v1/api-group-users/#api-wiki-rest-api-user-get)
operation in the Confluence REST API.

To look up information about a particular Confluence group, use the group's ID from the preceding output to call the
[GET /wiki/rest/api/group/by-id](https://developer.atlassian.com/cloud/confluence/rest/v1/api-group-group/#api-wiki-rest-api-group-by-id-get)
operation in the Confluence REST API.

To create a Confluence source connector, see the following examples.

<CodeGroup>
  ```python Python SDK theme={null}
  import os

  from unstructured_client import UnstructuredClient
  from unstructured_client.models.operations import CreateSourceRequest
  from unstructured_client.models.shared import CreateSourceConnector

  with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
      response = client.sources.create_source(
          request=CreateSourceRequest(
              create_source_connector=CreateSourceConnector(
                  name="<name>",
                  type="confluence",
                  config={
                      "url": "<url>",
                      "max_num_of_spaces": <max-num-of-spaces>,
                      "max_num_of_docs_from_each_space": <max-num-of-docs-from-each-space>,
                      "spaces": ["<space-key>", "<space-key>"],
                      "extract_images": <True|False>,
                      "extract_files": <True|False>,

                      # For API token authentication:
                      # "username": "<username>",
                      # "token": "<api-token>",
                      # "cloud": <True|False>,

                      # For personal access token (PAT) authentication:
                      # "token": "<personal-access-token>",
                      # "cloud": False,

                      # For password authentication:
                      # "username": "<username>",
                      # "password": "<password>",
                      # "cloud": <True|False>
                  }
              )
          )
      )

      print(response.source_connector_information)
  ```

  ```bash curl theme={null}
  curl --request 'POST' --location \
  "$UNSTRUCTURED_API_URL/sources" \
  --header 'accept: application/json' \
  --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
  --header 'content-type: application/json' \
  --data \
  '{
      "name": "<name>",
      "type": "confluence",
      "config": {
          "url": "<url>",
          "max_num_of_spaces": <max-num-of-spaces>,
          "max_num_of_docs_from_each_space": <max-num-of-docs-from-each-space>,
          "spaces": ["<space-key>", "<space-key>"],
          "extract_images": "<true|false>",
          "extract_files": "<true|false>",

          # For API token authentication:

          "username": "<username>",
          "token": "<api-token>",
          "cloud": "<true|false>"

          # For personal access token (PAT) authentication:

          "token": "<personal-access-token>",
          "cloud": "false"

          # For password authentication:

          "username": "<username>",
          "password": "<password>",
          "cloud": "<true|false>"        
      }
  }'
  ```
</CodeGroup>

Replace the preceding placeholders as follows:

* `<name>` (*required*) - A unique name for this connector.
* `<url>` (*required*) - The URL to the target Confluence Cloud instance.
* `<max-num-of-spaces>` - The maximum number of Confluence spaces to access within the Confluence Cloud instance. The default is `500` unless otherwise specified.
* `<max-num-of-docs-from-each-space>` - The maximum number of documents to access within each space. The default is `150` unless otherwise specified.
* `spaces` is an array of strings, with each `<space-key>` specifying the key (not display name) of a space to access, for example: `["luke","paul"]`. By default, if no space keys are specified, and the `<max-num-of-spaces>` is exceeded for the instance, be aware that you might get unexpected results.
* `extract_images` - Set to `true` to download images and replace the HTML content with Base64-encoded images. The default is `false` if not otherwise specified.
* `extract_files` - Set to `true` to download any embedded files in pages. The default is `false` if not otherwise specified.

For username and API token authentication:

* `<username>` - The name or email address of the target user.
* `<api-token>` - The user's API token value.
* For `cloud`, `true` if you are using Confluence Cloud. The default is `false` if not otherwise specified.

For personal access token (PAT) authentication:

* `<personal-access-token>` - The target user's PAT value.
* `cloud` should always be `false`.

For username and password authentication:

* `<username>` - The name or email address of the target user.
* `<password>` - The user's password.
* For `cloud`, `true` if you are using Confluence Cloud. The default is `false` if not otherwise specified.

---

# Source: https://docs.unstructured.io/ui/connectors.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Supported connectors

Unstructured supports connecting to the following source and destination types.

```mermaid  theme={null}
  flowchart LR
    Sources-->Unstructured-->Destinations
```

## Sources

* [Azure Blob Storage](/ui/sources/azure-blob-storage)
* [Box](/ui/sources/box)
* [Confluence](/ui/sources/confluence)
* [Couchbase](/ui/sources/couchbase)
* [Databricks Volumes](/ui/sources/databricks-volumes)
* [Dropbox](/ui/sources/dropbox)
* [Elasticsearch](/ui/sources/elasticsearch)
* [Google Cloud Storage](/ui/sources/google-cloud)
* [Google Drive](/ui/sources/google-drive)
* [Jira](/ui/sources/jira)
* [Kafka](/ui/sources/kafka)
* [MongoDB](/ui/sources/mongodb)
* [OneDrive](/ui/sources/onedrive)
* [OpenSearch](/ui/sources/opensearch)
* [Outlook](/ui/sources/outlook)
* [PostgreSQL](/ui/sources/postgresql)
* [S3](/ui/sources/s3)
* [Salesforce](/ui/sources/salesforce)
* [SharePoint](/ui/sources/sharepoint)
* [Slack](/ui/sources/slack)
* [Snowflake](/ui/sources/snowflake)
* [Teradata](/ui/sources/teradata-sql)
* [Zendesk](/ui/sources/zendesk)

If your source is not listed here, you might still be able to connect Unstructured to it through scripts or code by using the
[Unstructured Ingest CLI](/open-source/ingestion/overview#unstructured-ingest-cli) or the
[Unstructured Ingest Python library](/open-source/ingestion/python-ingest).
[Learn more](/open-source/ingestion/source-connectors/overview).

## Destinations

* [Astra DB](/ui/destinations/astradb)
* [Azure AI Search](/ui/destinations/azure-ai-search)
* [Azure Blob Storage](/ui/destinations/azure-blob-storage)
* [Couchbase](/ui/destinations/couchbase)
* [Databricks Volumes](/ui/destinations/databricks-volumes)
* [Delta Tables in Amazon S3](/ui/destinations/delta-table)
* [Delta Tables in Databricks](/ui/destinations/databricks-delta-table)
* [Elasticsearch](/ui/destinations/elasticsearch)
* [Google Cloud Storage](/ui/destinations/google-cloud)
* [IBM watsonx.data](/ui/destinations/ibm-watsonxdata)
* [Kafka](/ui/destinations/kafka)
* [Milvus](/ui/destinations/milvus)
* [MotherDuck](/ui/destinations/motherduck)
* [MongoDB](/ui/destinations/mongodb)
* [Neo4j](/ui/destinations/neo4j)
* [OneDrive](/ui/destinations/onedrive)
* [OpenSearch](/ui/destinations/opensearch)
* [Pinecone](/ui/destinations/pinecone)
* [PostgreSQL](/ui/destinations/postgresql)
* [Qdrant](/ui/destinations/qdrant)
* [Redis](/ui/destinations/redis)
* [S3](/ui/destinations/s3)
* [Snowflake](/ui/destinations/snowflake)
* [Teradata](/ui/destinations/teradata-sql)
* [Weaviate](/ui/destinations/weaviate)

If your destination is not listed here, you might still be able to connect Unstructured to it through scripts or code by using the
[Unstructured Ingest CLI](/open-source/ingestion/overview#unstructured-ingest-cli) or the
[Unstructured Ingest Python library](/open-source/ingestion/python-ingest).
[Learn more](/open-source/ingestion/destination-connectors/overview).

---

# Source: https://docs.unstructured.io/ui/sources/couchbase.md

# Source: https://docs.unstructured.io/ui/destinations/couchbase.md

# Source: https://docs.unstructured.io/open-source/ingestion/source-connectors/couchbase.md

# Source: https://docs.unstructured.io/open-source/ingestion/destination-connectors/couchbase.md

# Source: https://docs.unstructured.io/api-reference/workflow/sources/couchbase.md

# Source: https://docs.unstructured.io/api-reference/workflow/destinations/couchbase.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Couchbase

<Note>
  If you're new to Unstructured, read this note first.

  Before you can create a destination connector, you must first sign in to your Unstructured account:

  * If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
    After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).
    To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/api-reference/overview#pricing).
  * If you already have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at
    [https://platform.unstructured.io](https://platform.unstructured.io). For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
    or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  After you sign in, the [Unstructured user interface](/ui/overview) (UI) appears, which you use to get your Unstructured API key.

  1. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business** account, click **API Keys** on the sidebar.<br />

     For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
     for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).

  2. Click **Generate API Key**.<br />

  3. Follow the on-screen instructions to finish generating the key.<br />

  4. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

  After you create the destination connector, add it along with a
  [source connector](/api-reference/workflow/sources/overview) to a [workflow](/api-reference/workflow/overview#workflows).
  Then run the worklow as a [job](/api-reference/workflow/overview#jobs). To learn how, try out the
  the notebook [Dropbox-To-Pinecone Connector API Quickstart for Unstructured](https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Dropbox_To_Pinecone_Connector_Quickstart.ipynb),
  or watch the two 4-minute video tutorials for the [Unstructured Python SDK](/api-reference/workflow/overview#unstructured-python-sdk).

  You can also create destination connectors with the Unstructured user interface (UI).
  [Learn how](/ui/destinations/overview).

  If you need help, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  You are now ready to start creating a destination connector! Keep reading to learn how.
</Note>

Send processed data from Unstructured to Couchbase.

The requirements are as follows.

* For the [Unstructured UI](/ui/overview) or the [Unstructured API](/api-reference/overview), only Couchbase Capella clusters are supported.
* For [Unstructured Ingest](/open-source/ingestion/overview), Couchbase Capella clusters and local Couchbase server deployments are supported.

<iframe width="560" height="315" src="https://www.youtube.com/embed/9-RIBmIdi70" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

For Couchbase Capella, you will need:

* A [Couchbase Capella account](https://docs.couchbase.com/cloud/get-started/create-account.html#sign-up-free-tier).
* A [Couchbase Capella cluster](https://docs.couchbase.com/cloud/get-started/create-account.html#getting-started).
* A [bucket](https://docs.couchbase.com/cloud/clusters/data-service/manage-buckets.html#add-bucket),
  [scope](https://docs.couchbase.com/cloud/clusters/data-service/scopes-collections.html#create-scope),
  and [collection](https://docs.couchbase.com/cloud/clusters/data-service/scopes-collections.html#create-collection)
  on the cluster.
* The cluster's [public connection string](https://docs.couchbase.com/cloud/get-started/connect.html#connect-from-sdk-cbsh-cli-or-ide).
* The [cluster access name (username) and secret (password)](https://docs.couchbase.com/cloud/clusters/manage-database-users.html#create-database-credentials).
* [Incoming IP address allowance](https://docs.couchbase.com/cloud/clusters/allow-ip-address.html) for the cluster.

  To get Unstructured's IP address ranges, go to
  [https://assets.p6m.u10d.net/publicitems/ip-prefixes.json](https://assets.p6m.u10d.net/publicitems/ip-prefixes.json)
  and allow all of the `ip_prefix` fields' values that are listed.

  <Note>These IP address ranges are subject to change. You can always find the latest ones in the preceding file.</Note>

For a local Couchbase server, you will need:

* [Installation of a local Couchbase server](https://docs.couchbase.com/server/current/getting-started/start-here.html).
* [Connection details](https://docs.couchbase.com/server/current/guides/connect.html) to the local Couchbase server.

To learn more about how to set up a Couchbase cluster and play with data, refer to this [tutorial](https://developer.couchbase.com/tutorial-quickstart-flask-python).

To create a Couchbase destination connector, see the following examples.

<CodeGroup>
  ```python Python SDK theme={null}
  import os

  from unstructured_client import UnstructuredClient
  from unstructured_client.models.operations import CreateDestinationRequest
  from unstructured_client.models.shared import CreateDestinationConnector

  with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
      response = client.destinations.create_destination(
          request=CreateDestinationRequest(
              create_destination_connector=CreateDestinationConnector(
                  name="<name>",
                  type="couchbase",
                  config={
                      "username": "<username>",
                      "bucket": "<bucket>",
                      "connection_string": "<connection-string>",
                      "scope": "<scope>",
                      "collection": "<collection>",
                      "password": "<password>",
                      "batch_size": <batch-size>
                  }
              )
          )
      )

      print(response.destination_connector_information)
  # ...
  ```

  ```bash curl theme={null}
  curl --request 'POST' --location \
  "$UNSTRUCTURED_API_URL/destinations" \
  --header 'accept: application/json' \
  --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
  --header 'content-type: application/json' \
  --data \
  '{
      "name": "<name>",
      "type": "couchbase",
      "config": {
          "username": "<username>",
          "bucket": "<bucket>",
          "connection_string": "<connection-string>",
          "scope": "<scope>",
          "collection": "<collection>",
          "password": "<password>",
          "batch_size": <batch-size>
      }
  }'
  ```
</CodeGroup>

Replace the preceding placeholders as follows:

* `<name>` (*required*) - A unique name for this connector.
* `<username>` (*required*) - The username for the Couchbase server.
* `<bucket>` (*required*) - The name of the bucket in the Couchbase server.
* `<connection-string>` (*required*) - The connection string for the Couchbase server.
* `<scope>` - The name of the scope in the bucket. The default is `_default` if not otherwise specified.
* `<collection>` - The name of the collection in the scope. The default is `_default` if not otherwise specified.
* `<password>` (*required*) - The password for the Couchbase server.
* `<batch-size>` - The maximum number of records to transmit per batch. The default is `50` if not otherwise specified.
* `<collection-id>` (source connector only) - The name of the collection field that contains the document ID. The default is `id` if not otherwise specified.

## Learn more

* <Icon icon="blog" />  [Couchbase Integration in Unstructured Platform](https://unstructured.io/blog/couchbase-integration-in-unstructured-platform)

---

# Source: https://docs.unstructured.io/api-reference/destinations/create-destination-connection-check.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Create destination connection check

> Initiate a connection check for the destination connector


## OpenAPI

````yaml https://platform.unstructuredapp.io/openapi.json post /api/v1/destinations/{destination_id}/connection-check
openapi: 3.1.0
info:
  title: Platform API
  version: 3.1.0
servers:
  - url: https://platform.unstructuredapp.io/
    description: Unstructured Platform API
    x-speakeasy-server-id: platform-api
security: []
paths:
  /api/v1/destinations/{destination_id}/connection-check:
    post:
      tags:
        - destinations
      summary: Create destination connection check
      description: Initiate a connection check for the destination connector
      operationId: create_connection_check_destinations
      parameters:
        - name: destination_id
          in: path
          required: true
          schema:
            type: string
            format: uuid
            title: Destination Id
        - name: unstructured-api-key
          in: header
          required: false
          schema:
            anyOf:
              - type: string
              - type: 'null'
            title: Unstructured-Api-Key
      responses:
        '202':
          description: Successful Response
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/DagNodeConnectionCheck'
        '422':
          description: Validation Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/HTTPValidationError'
components:
  schemas:
    DagNodeConnectionCheck:
      properties:
        id:
          type: string
          format: uuid
          title: Id
        status:
          $ref: '#/components/schemas/ConnectionCheckStatus'
        reason:
          anyOf:
            - type: string
            - type: 'null'
          title: Reason
        created_at:
          type: string
          format: date-time
          title: Created At
        reported_at:
          anyOf:
            - type: string
              format: date-time
            - type: 'null'
          title: Reported At
      type: object
      required:
        - id
        - status
        - created_at
      title: DagNodeConnectionCheck
    HTTPValidationError:
      properties:
        detail:
          items:
            $ref: '#/components/schemas/ValidationError'
          type: array
          title: Detail
      type: object
      title: HTTPValidationError
    ConnectionCheckStatus:
      type: string
      enum:
        - SCHEDULED
        - SUCCESS
        - FAILURE
      title: ConnectionCheckStatus
    ValidationError:
      properties:
        loc:
          items:
            anyOf:
              - type: string
              - type: integer
          type: array
          title: Location
        msg:
          type: string
          title: Message
        type:
          type: string
          title: Error Type
      type: object
      required:
        - loc
        - msg
        - type
      title: ValidationError

````

---

# Source: https://docs.unstructured.io/api-reference/destinations/create-destination-connector.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Create destination connector

> Create a new destination connector using the provided configuration and name.


## OpenAPI

````yaml https://platform.unstructuredapp.io/openapi.json post /api/v1/destinations/
openapi: 3.1.0
info:
  title: Platform API
  version: 3.1.0
servers:
  - url: https://platform.unstructuredapp.io/
    description: Unstructured Platform API
    x-speakeasy-server-id: platform-api
security: []
paths:
  /api/v1/destinations/:
    post:
      tags:
        - destinations
      summary: Create destination connector
      description: >-
        Create a new destination connector using the provided configuration and
        name.
      operationId: create_destination
      parameters:
        - name: unstructured-api-key
          in: header
          required: false
          schema:
            anyOf:
              - type: string
              - type: 'null'
            title: Unstructured-Api-Key
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/CreateDestinationConnector'
      responses:
        '200':
          description: Successful Response
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/DestinationConnectorInformation'
        '422':
          description: Validation Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/HTTPValidationError'
components:
  schemas:
    CreateDestinationConnector:
      properties:
        name:
          type: string
          title: Name
        type:
          anyOf:
            - $ref: '#/components/schemas/DestinationConnectorType'
            - type: string
          title: Type
        config:
          anyOf:
            - $ref: '#/components/schemas/AzureDestinationConnectorConfigInput'
            - $ref: '#/components/schemas/AstraDBConnectorConfigInput'
            - $ref: '#/components/schemas/AzureAISearchConnectorConfigInput'
            - $ref: '#/components/schemas/CouchbaseDestinationConnectorConfigInput'
            - $ref: '#/components/schemas/DatabricksVolumesConnectorConfigInput'
            - $ref: >-
                #/components/schemas/DatabricksVDTDestinationConnectorConfigInput
            - $ref: '#/components/schemas/DeltaTableConnectorConfigInput'
            - $ref: '#/components/schemas/ElasticsearchConnectorConfigInput'
            - $ref: '#/components/schemas/GCSDestinationConnectorConfigInput'
            - $ref: '#/components/schemas/KafkaCloudDestinationConnectorConfigInput'
            - $ref: '#/components/schemas/MilvusDestinationConnectorConfigInput'
            - $ref: '#/components/schemas/MongoDBConnectorConfigInput'
            - $ref: '#/components/schemas/Neo4jDestinationConnectorConfigInput'
            - $ref: '#/components/schemas/OneDriveDestinationConnectorConfigInput'
            - $ref: '#/components/schemas/OpenSearchConnectorConfigInput'
            - $ref: '#/components/schemas/PineconeDestinationConnectorConfigInput'
            - $ref: '#/components/schemas/PostgresDestinationConnectorConfigInput'
            - $ref: '#/components/schemas/RedisDestinationConnectorConfigInput'
            - $ref: '#/components/schemas/QdrantCloudDestinationConnectorConfigInput'
            - $ref: '#/components/schemas/S3DestinationConnectorConfigInput'
            - $ref: '#/components/schemas/SnowflakeDestinationConnectorConfigInput'
            - $ref: '#/components/schemas/TeradataDestinationConnectorConfigInput'
            - $ref: '#/components/schemas/WeaviateDestinationConnectorConfigInput'
            - $ref: '#/components/schemas/IBMWatsonxS3DestinationConnectorConfigInput'
            - additionalProperties: true
              type: object
          title: Config
      type: object
      required:
        - name
        - type
        - config
      title: CreateDestinationConnector
    DestinationConnectorInformation:
      properties:
        id:
          type: string
          format: uuid
          title: Id
        name:
          type: string
          title: Name
        type:
          anyOf:
            - $ref: '#/components/schemas/DestinationConnectorType'
            - type: string
          title: Type
        config:
          anyOf:
            - $ref: '#/components/schemas/AzureDestinationConnectorConfig'
            - $ref: '#/components/schemas/AstraDBConnectorConfig'
            - $ref: '#/components/schemas/AzureAISearchConnectorConfig'
            - $ref: '#/components/schemas/CouchbaseDestinationConnectorConfig'
            - $ref: '#/components/schemas/DatabricksVolumesConnectorConfig'
            - $ref: '#/components/schemas/DatabricksVDTDestinationConnectorConfig'
            - $ref: '#/components/schemas/DeltaTableConnectorConfig'
            - $ref: '#/components/schemas/ElasticsearchConnectorConfig'
            - $ref: '#/components/schemas/GCSDestinationConnectorConfig'
            - $ref: '#/components/schemas/KafkaCloudDestinationConnectorConfig'
            - $ref: '#/components/schemas/MilvusDestinationConnectorConfig'
            - $ref: '#/components/schemas/MongoDBConnectorConfig'
            - $ref: '#/components/schemas/Neo4jDestinationConnectorConfig'
            - $ref: '#/components/schemas/OneDriveDestinationConnectorConfig'
            - $ref: '#/components/schemas/OpenSearchConnectorConfig'
            - $ref: '#/components/schemas/PineconeDestinationConnectorConfig'
            - $ref: '#/components/schemas/PostgresDestinationConnectorConfig'
            - $ref: '#/components/schemas/RedisDestinationConnectorConfig'
            - $ref: '#/components/schemas/QdrantCloudDestinationConnectorConfig'
            - $ref: '#/components/schemas/S3DestinationConnectorConfig'
            - $ref: '#/components/schemas/SnowflakeDestinationConnectorConfig'
            - $ref: '#/components/schemas/TeradataDestinationConnectorConfig'
            - $ref: '#/components/schemas/WeaviateDestinationConnectorConfig'
            - $ref: '#/components/schemas/IBMWatsonxS3DestinationConnectorConfig'
            - additionalProperties: true
              type: object
          title: Config
        created_at:
          type: string
          format: date-time
          title: Created At
        updated_at:
          anyOf:
            - type: string
              format: date-time
            - type: 'null'
          title: Updated At
      type: object
      required:
        - id
        - name
        - type
        - config
        - created_at
      title: DestinationConnectorInformation
    HTTPValidationError:
      properties:
        detail:
          items:
            $ref: '#/components/schemas/ValidationError'
          type: array
          title: Detail
      type: object
      title: HTTPValidationError
    DestinationConnectorType:
      type: string
      enum:
        - azure
        - astradb
        - azure_ai_search
        - couchbase
        - databricks_volumes
        - databricks_volume_delta_tables
        - delta_table
        - elasticsearch
        - gcs
        - kafka-cloud
        - milvus
        - mongodb
        - motherduck
        - neo4j
        - onedrive
        - opensearch
        - pinecone
        - postgres
        - redis
        - qdrant-cloud
        - s3
        - snowflake
        - teradata
        - weaviate-cloud
        - ibm_watsonx_s3
      title: DestinationConnectorType
    AzureDestinationConnectorConfigInput:
      properties:
        remote_url:
          type: string
          pattern: ^(az:\/\/|abfs:\/\/)
          title: Remote Url
        account_name:
          anyOf:
            - type: string
            - type: 'null'
          title: Account Name
        account_key:
          anyOf:
            - type: string
            - type: 'null'
          title: Account Key
        connection_string:
          anyOf:
            - type: string
            - type: 'null'
          title: Connection String
        sas_token:
          anyOf:
            - type: string
            - type: 'null'
          title: Sas Token
      type: object
      required:
        - remote_url
      title: AzureDestinationConnectorConfigInput
    AstraDBConnectorConfigInput:
      properties:
        collection_name:
          type: string
          pattern: ^\w+$
          title: Collection Name
        keyspace:
          anyOf:
            - type: string
            - type: 'null'
          title: Keyspace
          default: default_keyspace
        batch_size:
          type: integer
          minimum: 1
          title: Batch Size
          default: 20
        binary_encode_vectors:
          type: boolean
          title: Binary Encode Vectors
          default: true
        api_endpoint:
          type: string
          title: Api Endpoint
        token:
          type: string
          title: Token
        flatten_metadata:
          type: boolean
          title: Flatten Metadata
          default: false
      additionalProperties: true
      type: object
      required:
        - collection_name
        - api_endpoint
        - token
      title: AstraDBConnectorConfigInput
    AzureAISearchConnectorConfigInput:
      properties:
        endpoint:
          type: string
          pattern: ^https:\/\/[a-z0-9-]+
          title: Endpoint
        index:
          type: string
          title: Index
        key:
          type: string
          title: Key
      type: object
      required:
        - endpoint
        - index
        - key
      title: AzureAISearchConnectorConfigInput
    CouchbaseDestinationConnectorConfigInput:
      properties:
        bucket:
          type: string
          title: Bucket
        connection_string:
          type: string
          title: Connection String
        scope:
          anyOf:
            - type: string
            - type: 'null'
          title: Scope
        collection:
          anyOf:
            - type: string
            - type: 'null'
          title: Collection
        batch_size:
          type: integer
          title: Batch Size
        username:
          type: string
          title: Username
        password:
          type: string
          title: Password
      type: object
      required:
        - bucket
        - connection_string
        - batch_size
        - username
        - password
      title: CouchbaseDestinationConnectorConfigInput
    DatabricksVolumesConnectorConfigInput:
      properties:
        host:
          type: string
          title: Host
        catalog:
          type: string
          title: Catalog
        schema:
          type: string
          title: Schema
          default: default
        volume:
          type: string
          title: Volume
        volume_path:
          type: string
          title: Volume Path
        client_secret:
          type: string
          title: Client Secret
        client_id:
          type: string
          title: Client Id
      type: object
      required:
        - host
        - catalog
        - volume
        - volume_path
        - client_secret
        - client_id
      title: DatabricksVolumesConnectorConfigInput
    DatabricksVDTDestinationConnectorConfigInput:
      properties:
        server_hostname:
          type: string
          title: Server Hostname
        http_path:
          type: string
          title: Http Path
        token:
          anyOf:
            - type: string
            - type: 'null'
          title: Token
        client_id:
          anyOf:
            - type: string
            - type: 'null'
          title: Client Id
        client_secret:
          anyOf:
            - type: string
            - type: 'null'
          title: Client Secret
        catalog:
          type: string
          title: Catalog
        database:
          type: string
          title: Database
          default: default
        table_name:
          anyOf:
            - type: string
            - type: 'null'
          title: Table Name
        schema:
          type: string
          title: Schema
          default: default
        volume:
          type: string
          title: Volume
        volume_path:
          anyOf:
            - type: string
            - type: 'null'
          title: Volume Path
      type: object
      required:
        - server_hostname
        - http_path
        - catalog
        - volume
      title: DatabricksVDTDestinationConnectorConfigInput
    DeltaTableConnectorConfigInput:
      properties:
        aws_access_key_id:
          type: string
          title: Aws Access Key Id
        aws_secret_access_key:
          type: string
          title: Aws Secret Access Key
        aws_region:
          type: string
          title: Aws Region
        table_uri:
          type: string
          pattern: ^s3:\/\/
          title: Table Uri
      type: object
      required:
        - aws_access_key_id
        - aws_secret_access_key
        - aws_region
        - table_uri
      title: DeltaTableConnectorConfigInput
    ElasticsearchConnectorConfigInput:
      properties:
        hosts:
          items:
            type: string
          type: array
          title: Hosts
        index_name:
          type: string
          title: Index Name
        es_api_key:
          type: string
          title: Es Api Key
      type: object
      required:
        - hosts
        - index_name
        - es_api_key
      title: ElasticsearchConnectorConfigInput
    GCSDestinationConnectorConfigInput:
      properties:
        remote_url:
          type: string
          pattern: ^(gs|gcs):\/\/
          title: Remote Url
        service_account_key:
          type: string
          title: Service Account Key
      type: object
      required:
        - remote_url
        - service_account_key
      title: GCSDestinationConnectorConfigInput
    KafkaCloudDestinationConnectorConfigInput:
      properties:
        bootstrap_servers:
          type: string
          title: Bootstrap Servers
        port:
          type: integer
          minimum: 1
          title: Port
          default: 9092
        group_id:
          anyOf:
            - type: string
            - type: 'null'
          title: Group Id
        topic:
          type: string
          title: Topic
        kafka_api_key:
          type: string
          title: Kafka Api Key
        secret:
          type: string
          title: Secret
        batch_size:
          type: integer
          minimum: 1
          title: Batch Size
          default: 100
      type: object
      required:
        - bootstrap_servers
        - topic
        - kafka_api_key
        - secret
      title: KafkaCloudDestinationConnectorConfigInput
    MilvusDestinationConnectorConfigInput:
      properties:
        uri:
          type: string
          pattern: ^https?:\/\/
          title: Uri
        user:
          anyOf:
            - type: string
            - type: 'null'
          title: User
        token:
          anyOf:
            - type: string
            - type: 'null'
          title: Token
        password:
          anyOf:
            - type: string
            - type: 'null'
          title: Password
        db_name:
          anyOf:
            - type: string
            - type: 'null'
          title: Db Name
        collection_name:
          type: string
          title: Collection Name
        record_id_key:
          type: string
          title: Record Id Key
      type: object
      required:
        - uri
        - collection_name
        - record_id_key
      title: MilvusDestinationConnectorConfigInput
    MongoDBConnectorConfigInput:
      properties:
        database:
          type: string
          title: Database
        collection:
          type: string
          title: Collection
        uri:
          type: string
          pattern: >-
            ^mongodb(?:\+srv)?:\/\/(?:([^:@/\s]+)(?::([^@/\s]*))?@)?([^/\s:,]+)(?::(\d+))?(?:,([^/\s:,]+)(?::(\d+))?)*(?:\/([^/\s?]*))?(?:\?([^/\s]*))?$
          title: Uri
      type: object
      required:
        - database
        - collection
        - uri
      title: MongoDBConnectorConfigInput
    Neo4jDestinationConnectorConfigInput:
      properties:
        uri:
          type: string
          title: Uri
        database:
          type: string
          title: Database
        username:
          type: string
          title: Username
        password:
          type: string
          title: Password
        batch_size:
          type: integer
          minimum: 1
          title: Batch Size
          default: 100
      type: object
      required:
        - uri
        - database
        - username
        - password
      title: Neo4jDestinationConnectorConfigInput
    OneDriveDestinationConnectorConfigInput:
      properties:
        client_id:
          type: string
          title: Client Id
        user_pname:
          type: string
          title: User Pname
        tenant:
          type: string
          title: Tenant
        authority_url:
          type: string
          title: Authority Url
        client_cred:
          type: string
          title: Client Cred
        remote_url:
          type: string
          pattern: ^onedrive:\/\/
          title: Remote Url
      type: object
      required:
        - client_id
        - user_pname
        - tenant
        - authority_url
        - client_cred
        - remote_url
      title: OneDriveDestinationConnectorConfigInput
    OpenSearchConnectorConfigInput:
      properties:
        hosts:
          items:
            type: string
          type: array
          minItems: 1
          title: Hosts
          description: List of OpenSearch hosts to connect to
        index_name:
          type: string
          title: Index Name
          description: Name of the OpenSearch index to read from or write to
        username:
          anyOf:
            - type: string
            - type: 'null'
          title: Username
          description: Username for basic authentication
        password:
          anyOf:
            - type: string
            - type: 'null'
          title: Password
          description: Password for basic authentication
        aws_access_key_id:
          anyOf:
            - type: string
            - type: 'null'
          title: Aws Access Key Id
          description: >-
            AWS access key ID for IAM authentication. When provided with
            aws_secret_access_key, IAM authentication is used instead of basic
            auth. Region and service type are auto-detected from the host URL.
        aws_secret_access_key:
          anyOf:
            - type: string
            - type: 'null'
          title: Aws Secret Access Key
          description: >-
            AWS secret access key for IAM authentication. Required when
            aws_access_key_id is provided.
        aws_session_token:
          anyOf:
            - type: string
            - type: 'null'
          title: Aws Session Token
          description: >-
            AWS session token for temporary credentials (optional). Only used
            when aws_access_key_id and aws_secret_access_key are provided.
        use_ssl:
          anyOf:
            - type: boolean
            - type: 'null'
          title: Use Ssl
          description: Whether to use SSL for the connection
          default: true
      type: object
      required:
        - hosts
        - index_name
      title: OpenSearchConnectorConfigInput
      description: Input model for creating/updating OpenSearch connectors via API.
    PineconeDestinationConnectorConfigInput:
      properties:
        index_name:
          type: string
          title: Index Name
        api_key:
          type: string
          title: Api Key
        namespace:
          type: string
          title: Namespace
        batch_size:
          type: integer
          minimum: 1
          title: Batch Size
          default: 50
      type: object
      required:
        - index_name
        - api_key
        - namespace
      title: PineconeDestinationConnectorConfigInput
    PostgresDestinationConnectorConfigInput:
      properties:
        host:
          type: string
          title: Host
        database:
          type: string
          title: Database
        port:
          type: integer
          title: Port
        username:
          type: string
          title: Username
        password:
          type: string
          title: Password
        table_name:
          type: string
          title: Table Name
        batch_size:
          type: integer
          title: Batch Size
      type: object
      required:
        - host
        - database
        - port
        - username
        - password
        - table_name
        - batch_size
      title: PostgresDestinationConnectorConfigInput
    RedisDestinationConnectorConfigInput:
      properties:
        host:
          type: string
          title: Host
        port:
          type: integer
          minimum: 1
          title: Port
          default: 6379
        username:
          anyOf:
            - type: string
            - type: 'null'
          title: Username
        password:
          anyOf:
            - type: string
            - type: 'null'
          title: Password
        uri:
          anyOf:
            - type: string
            - type: 'null'
          title: Uri
        database:
          type: integer
          minimum: 0
          title: Database
          default: 0
        ssl:
          type: boolean
          title: Ssl
          default: true
        batch_size:
          type: integer
          minimum: 1
          title: Batch Size
          default: 100
      type: object
      required:
        - host
      title: RedisDestinationConnectorConfigInput
    QdrantCloudDestinationConnectorConfigInput:
      properties:
        url:
          type: string
          title: Url
        api_key:
          type: string
          title: Api Key
        collection_name:
          type: string
          title: Collection Name
        batch_size:
          type: integer
          minimum: 1
          title: Batch Size
          default: 50
      type: object
      required:
        - url
        - api_key
        - collection_name
      title: QdrantCloudDestinationConnectorConfigInput
    S3DestinationConnectorConfigInput:
      properties:
        remote_url:
          type: string
          pattern: ^s3:\/\/
          title: Remote Url
        anonymous:
          type: boolean
          title: Anonymous
          default: false
        key:
          anyOf:
            - type: string
            - type: 'null'
          title: Key
        secret:
          anyOf:
            - type: string
            - type: 'null'
          title: Secret
        token:
          anyOf:
            - type: string
            - type: 'null'
          title: Token
        endpoint_url:
          anyOf:
            - type: string
            - type: 'null'
          title: Endpoint Url
      type: object
      required:
        - remote_url
      title: S3DestinationConnectorConfigInput
    SnowflakeDestinationConnectorConfigInput:
      properties:
        account:
          type: string
          title: Account
        role:
          type: string
          title: Role
        user:
          type: string
          title: User
        password:
          type: string
          title: Password
        host:
          type: string
          title: Host
        port:
          type: integer
          minimum: 1
          title: Port
          default: 443
        database:
          type: string
          title: Database
        schema:
          type: string
          title: Schema
        table_name:
          type: string
          title: Table Name
          default: elements
        batch_size:
          type: integer
          minimum: 1
          title: Batch Size
          default: 50
        record_id_key:
          anyOf:
            - type: string
            - type: 'null'
          title: Record Id Key
          default: record_id
      type: object
      required:
        - account
        - role
        - user
        - password
        - host
        - database
        - schema
      title: SnowflakeDestinationConnectorConfigInput
    TeradataDestinationConnectorConfigInput:
      properties:
        host:
          type: string
          title: Host
        user:
          type: string
          title: User
        password:
          type: string
          title: Password
        database:
          anyOf:
            - type: string
            - type: 'null'
          title: Database
        dbs_port:
          type: integer
          minimum: 1
          title: Dbs Port
          default: 1025
        table_name:
          type: string
          title: Table Name
          default: elements
        batch_size:
          type: integer
          minimum: 1
          title: Batch Size
          default: 50
        record_id_key:
          anyOf:
            - type: string
            - type: 'null'
          title: Record Id Key
          default: record_id
      type: object
      required:
        - host
        - user
        - password
      title: TeradataDestinationConnectorConfigInput
    WeaviateDestinationConnectorConfigInput:
      properties:
        cluster_url:
          type: string
          title: Cluster Url
        api_key:
          type: string
          title: Api Key
        collection:
          anyOf:
            - type: string
            - type: 'null'
          title: Collection
      type: object
      required:
        - cluster_url
        - api_key
      title: WeaviateDestinationConnectorConfigInput
    IBMWatsonxS3DestinationConnectorConfigInput:
      properties:
        iam_api_key:
          type: string
          title: Iam Api Key
        access_key_id:
          type: string
          title: Access Key Id
        secret_access_key:
          type: string
          title: Secret Access Key
        iceberg_endpoint:
          type: string
          title: Iceberg Endpoint
        object_storage_endpoint:
          type: string
          title: Object Storage Endpoint
        object_storage_region:
          type: string
          title: Object Storage Region
        catalog:
          type: string
          title: Catalog
        max_retries_connection:
          type: integer
          maximum: 100
          minimum: 2
          title: Max Retries Connection
          description: Maximum number of retries in case of a connection error
          default: 10
        namespace:
          type: string
          title: Namespace
        table:
          type: string
          title: Table
        max_retries:
          type: integer
          maximum: 500
          minimum: 2
          title: Max Retries
          description: Maximum number of retries to upload data
          default: 50
        record_id_key:
          type: string
          title: Record Id Key
          description: Searchable key to find entries for the same record on previous runs
          default: record_id
      type: object
      required:
        - iam_api_key
        - access_key_id
        - secret_access_key
        - iceberg_endpoint
        - object_storage_endpoint
        - object_storage_region
        - catalog
        - namespace
        - table
      title: IBMWatsonxS3DestinationConnectorConfigInput
    AzureDestinationConnectorConfig:
      properties:
        remote_url:
          type: string
          title: Remote Url
        account_name:
          anyOf:
            - type: string
            - type: 'null'
          title: Account Name
        account_key:
          anyOf:
            - type: string
            - type: 'null'
          title: Account Key
        connection_string:
          anyOf:
            - type: string
            - type: 'null'
          title: Connection String
        sas_token:
          anyOf:
            - type: string
            - type: 'null'
          title: Sas Token
      type: object
      required:
        - remote_url
      title: AzureDestinationConnectorConfig
    AstraDBConnectorConfig:
      properties:
        collection_name:
          type: string
          title: Collection Name
        keyspace:
          anyOf:
            - type: string
            - type: 'null'
          title: Keyspace
        batch_size:
          type: integer
          title: Batch Size
        binary_encode_vectors:
          type: boolean
          title: Binary Encode Vectors
          default: true
        api_endpoint:
          type: string
          title: Api Endpoint
        token:
          type: string
          title: Token
      additionalProperties: true
      type: object
      required:
        - collection_name
        - batch_size
        - api_endpoint
        - token
      title: AstraDBConnectorConfig
    AzureAISearchConnectorConfig:
      properties:
        endpoint:
          type: string
          title: Endpoint
        index:
          type: string
          title: Index
        key:
          type: string
          title: Key
      type: object
      required:
        - endpoint
        - index
        - key
      title: AzureAISearchConnectorConfig
    CouchbaseDestinationConnectorConfig:
      properties:
        bucket:
          type: string
          title: Bucket
        connection_string:
          type: string
          title: Connection String
        scope:
          anyOf:
            - type: string
            - type: 'null'
          title: Scope
        collection:
          anyOf:
            - type: string
            - type: 'null'
          title: Collection
        batch_size:
          type: integer
          title: Batch Size
        username:
          type: string
          title: Username
        password:
          type: string
          title: Password
      type: object
      required:
        - bucket
        - connection_string
        - batch_size
        - username
        - password
      title: CouchbaseDestinationConnectorConfig
    DatabricksVolumesConnectorConfig:
      properties:
        host:
          type: string
          title: Host
        catalog:
          type: string
          title: Catalog
        schema:
          type: string
          title: Schema
          default: default
        volume:
          type: string
          title: Volume
        volume_path:
          type: string
          title: Volume Path
        client_secret:
          type: string
          title: Client Secret
        client_id:
          type: string
          title: Client Id
      type: object
      required:
        - host
        - catalog
        - volume
        - volume_path
        - client_secret
        - client_id
      title: DatabricksVolumesConnectorConfig
    DatabricksVDTDestinationConnectorConfig:
      properties:
        server_hostname:
          type: string
          title: Server Hostname
        http_path:
          type: string
          title: Http Path
        token:
          anyOf:
            - type: string
            - type: 'null'
          title: Token
        client_id:
          anyOf:
            - type: string
            - type: 'null'
          title: Client Id
        client_secret:
          anyOf:
            - type: string
            - type: 'null'
          title: Client Secret
        catalog:
          type: string
          title: Catalog
        database:
          type: string
          title: Database
          default: default
        table_name:
          anyOf:
            - type: string
            - type: 'null'
          title: Table Name
        schema:
          type: string
          title: Schema
          default: default
        volume:
          type: string
          title: Volume
        volume_path:
          anyOf:
            - type: string
            - type: 'null'
          title: Volume Path
      type: object
      required:
        - server_hostname
        - http_path
        - catalog
        - volume
      title: DatabricksVDTDestinationConnectorConfig
    DeltaTableConnectorConfig:
      properties:
        aws_access_key_id:
          type: string
          title: Aws Access Key Id
        aws_secret_access_key:
          type: string
          title: Aws Secret Access Key
        aws_region:
          type: string
          title: Aws Region
        table_uri:
          type: string
          title: Table Uri
      type: object
      required:
        - aws_access_key_id
        - aws_secret_access_key
        - aws_region
        - table_uri
      title: DeltaTableConnectorConfig
    ElasticsearchConnectorConfig:
      properties:
        hosts:
          items:
            type: string
          type: array
          title: Hosts
        index_name:
          type: string
          title: Index Name
        es_api_key:
          type: string
          title: Es Api Key
      type: object
      required:
        - hosts
        - index_name
        - es_api_key
      title: ElasticsearchConnectorConfig
    GCSDestinationConnectorConfig:
      properties:
        remote_url:
          type: string
          title: Remote Url
        service_account_key:
          type: string
          title: Service Account Key
      type: object
      required:
        - remote_url
        - service_account_key
      title: GCSDestinationConnectorConfig
    KafkaCloudDestinationConnectorConfig:
      properties:
        bootstrap_servers:
          type: string
          title: Bootstrap Servers
        port:
          type: integer
          title: Port
        group_id:
          anyOf:
            - type: string
            - type: 'null'
          title: Group Id
        topic:
          type: string
          title: Topic
        kafka_api_key:
          type: string
          title: Kafka Api Key
        secret:
          type: string
          title: Secret
        batch_size:
          type: integer
          title: Batch Size
      type: object
      required:
        - bootstrap_servers
        - port
        - topic
        - kafka_api_key
        - secret
        - batch_size
      title: KafkaCloudDestinationConnectorConfig
    MilvusDestinationConnectorConfig:
      properties:
        uri:
          type: string
          title: Uri
        user:
          anyOf:
            - type: string
            - type: 'null'
          title: User
        token:
          anyOf:
            - type: string
            - type: 'null'
          title: Token
        password:
          anyOf:
            - type: string
            - type: 'null'
          title: Password
        db_name:
          anyOf:
            - type: string
            - type: 'null'
          title: Db Name
        collection_name:
          type: string
          title: Collection Name
        record_id_key:
          type: string
          title: Record Id Key
      type: object
      required:
        - uri
        - collection_name
        - record_id_key
      title: MilvusDestinationConnectorConfig
    MongoDBConnectorConfig:
      properties:
        database:
          type: string
          title: Database
        collection:
          type: string
          title: Collection
        uri:
          type: string
          title: Uri
      type: object
      required:
        - database
        - collection
        - uri
      title: MongoDBConnectorConfig
    Neo4jDestinationConnectorConfig:
      properties:
        uri:
          type: string
          title: Uri
        database:
          type: string
          title: Database
        username:
          type: string
          title: Username
        password:
          type: string
          title: Password
        batch_size:
          type: integer
          title: Batch Size
      type: object
      required:
        - uri
        - database
        - username
        - password
        - batch_size
      title: Neo4jDestinationConnectorConfig
    OneDriveDestinationConnectorConfig:
      properties:
        client_id:
          type: string
          title: Client Id
        user_pname:
          type: string
          title: User Pname
        tenant:
          type: string
          title: Tenant
        authority_url:
          type: string
          title: Authority Url
        client_cred:
          type: string
          title: Client Cred
        remote_url:
          type: string
          title: Remote Url
      type: object
      required:
        - client_id
        - user_pname
        - tenant
        - authority_url
        - client_cred
        - remote_url
      title: OneDriveDestinationConnectorConfig
    OpenSearchConnectorConfig:
      properties:
        hosts:
          items:
            type: string
          type: array
          minItems: 1
          title: Hosts
          description: List of OpenSearch hosts to connect to
        index_name:
          type: string
          title: Index Name
          description: Name of the OpenSearch index to read from or write to
        username:
          anyOf:
            - type: string
            - type: 'null'
          title: Username
          description: Username for basic authentication
        password:
          anyOf:
            - type: string
            - type: 'null'
          title: Password
          description: Password for basic authentication
        aws_access_key_id:
          anyOf:
            - type: string
            - type: 'null'
          title: Aws Access Key Id
          description: >-
            AWS access key ID for IAM authentication. When provided with
            aws_secret_access_key, IAM authentication is used instead of basic
            auth. Region and service type are auto-detected from the host URL.
        aws_secret_access_key:
          anyOf:
            - type: string
            - type: 'null'
          title: Aws Secret Access Key
          description: >-
            AWS secret access key for IAM authentication. Required when
            aws_access_key_id is provided.
        aws_session_token:
          anyOf:
            - type: string
            - type: 'null'
          title: Aws Session Token
          description: >-
            AWS session token for temporary credentials (optional). Only used
            when aws_access_key_id and aws_secret_access_key are provided.
        use_ssl:
          anyOf:
            - type: boolean
            - type: 'null'
          title: Use Ssl
          description: Whether to use SSL for the connection
          default: true
      type: object
      required:
        - hosts
        - index_name
      title: OpenSearchConnectorConfig
      description: >-
        OpenSearch connector configuration.


        OpenSearch is a fork of Elasticsearch with similar functionality.

        Supports two authentication methods:

        1. Basic auth: username + password

        2. AWS IAM auth: aws_access_key_id + aws_secret_access_key (+ optional
        aws_session_token)


        For AWS OpenSearch Service or OpenSearch Serverless, region and service
        type

        are auto-detected from the host URL.
    PineconeDestinationConnectorConfig:
      properties:
        index_name:
          type: string
          title: Index Name
        api_key:
          type: string
          title: Api Key
        namespace:
          type: string
          title: Namespace
        batch_size:
          type: integer
          title: Batch Size
      type: object
      required:
        - index_name
        - api_key
        - namespace
        - batch_size
      title: PineconeDestinationConnectorConfig
    PostgresDestinationConnectorConfig:
      properties:
        host:
          type: string
          title: Host
        database:
          type: string
          title: Database
        port:
          type: integer
          title: Port
        username:
          type: string
          title: Username
        password:
          type: string
          title: Password
        table_name:
          type: string
          title: Table Name
        batch_size:
          type: integer
          title: Batch Size
      type: object
      required:
        - host
        - database
        - port
        - username
        - password
        - table_name
        - batch_size
      title: PostgresDestinationConnectorConfig
    RedisDestinationConnectorConfig:
      properties:
        host:
          type: string
          title: Host
        port:
          type: integer
          title: Port
        username:
          anyOf:
            - type: string
            - type: 'null'
          title: Username
        password:
          anyOf:
            - type: string
            - type: 'null'
          title: Password
        uri:
          anyOf:
            - type: string
            - type: 'null'
          title: Uri
        database:
          type: integer
          title: Database
        ssl:
          type: boolean
          title: Ssl
        batch_size:
          type: integer
          title: Batch Size
      type: object
      required:
        - host
        - port
        - database
        - ssl
        - batch_size
      title: RedisDestinationConnectorConfig
    QdrantCloudDestinationConnectorConfig:
      properties:
        url:
          type: string
          title: Url
        api_key:
          type: string
          title: Api Key
        collection_name:
          type: string
          title: Collection Name
        batch_size:
          type: integer
          title: Batch Size
      type: object
      required:
        - url
        - api_key
        - collection_name
        - batch_size
      title: QdrantCloudDestinationConnectorConfig
    S3DestinationConnectorConfig:
      properties:
        remote_url:
          type: string
          title: Remote Url
        anonymous:
          type: boolean
          title: Anonymous
        key:
          anyOf:
            - type: string
            - type: 'null'
          title: Key
        secret:
          anyOf:
            - type: string
            - type: 'null'
          title: Secret
        token:
          anyOf:
            - type: string
            - type: 'null'
          title: Token
        endpoint_url:
          anyOf:
            - type: string
            - type: 'null'
          title: Endpoint Url
      type: object
      required:
        - remote_url
        - anonymous
      title: S3DestinationConnectorConfig
    SnowflakeDestinationConnectorConfig:
      properties:
        account:
          type: string
          title: Account
        role:
          type: string
          title: Role
        user:
          type: string
          title: User
        password:
          type: string
          title: Password
        host:
          type: string
          title: Host
        port:
          type: integer
          minimum: 1
          title: Port
          default: 443
        database:
          type: string
          title: Database
        schema:
          type: string
          title: Schema
        table_name:
          type: string
          title: Table Name
          default: elements
        batch_size:
          type: integer
          minimum: 1
          title: Batch Size
          default: 50
        record_id_key:
          anyOf:
            - type: string
            - type: 'null'
          title: Record Id Key
          default: record_id
      type: object
      required:
        - account
        - role
        - user
        - password
        - host
        - database
        - schema
      title: SnowflakeDestinationConnectorConfig
    TeradataDestinationConnectorConfig:
      properties:
        host:
          type: string
          title: Host
        user:
          type: string
          title: User
        password:
          type: string
          title: Password
        database:
          anyOf:
            - type: string
            - type: 'null'
          title: Database
        dbs_port:
          type: integer
          minimum: 1
          title: Dbs Port
          default: 1025
        table_name:
          type: string
          title: Table Name
          default: elements
        batch_size:
          type: integer
          minimum: 1
          title: Batch Size
          default: 50
        record_id_key:
          anyOf:
            - type: string
            - type: 'null'
          title: Record Id Key
          default: record_id
      type: object
      required:
        - host
        - user
        - password
      title: TeradataDestinationConnectorConfig
    WeaviateDestinationConnectorConfig:
      properties:
        cluster_url:
          type: string
          title: Cluster Url
        api_key:
          type: string
          title: Api Key
        collection:
          anyOf:
            - type: string
            - type: 'null'
          title: Collection
      type: object
      required:
        - cluster_url
        - api_key
      title: WeaviateDestinationConnectorConfig
    IBMWatsonxS3DestinationConnectorConfig:
      properties:
        iam_api_key:
          type: string
          title: Iam Api Key
        access_key_id:
          type: string
          title: Access Key Id
        secret_access_key:
          type: string
          title: Secret Access Key
        iceberg_endpoint:
          type: string
          title: Iceberg Endpoint
        object_storage_endpoint:
          type: string
          title: Object Storage Endpoint
        object_storage_region:
          type: string
          title: Object Storage Region
        catalog:
          type: string
          title: Catalog
        max_retries_connection:
          type: integer
          title: Max Retries Connection
        namespace:
          type: string
          title: Namespace
        table:
          type: string
          title: Table
        max_retries:
          type: integer
          title: Max Retries
        record_id_key:
          type: string
          title: Record Id Key
      type: object
      required:
        - iam_api_key
        - access_key_id
        - secret_access_key
        - iceberg_endpoint
        - object_storage_endpoint
        - object_storage_region
        - catalog
        - max_retries_connection
        - namespace
        - table
        - max_retries
        - record_id_key
      title: IBMWatsonxS3DestinationConnectorConfig
    ValidationError:
      properties:
        loc:
          items:
            anyOf:
              - type: string
              - type: integer
          type: array
          title: Location
        msg:
          type: string
          title: Message
        type:
          type: string
          title: Error Type
      type: object
      required:
        - loc
        - msg
        - type
      title: ValidationError

````

---

# Source: https://docs.unstructured.io/api-reference/jobs/create-job.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Create Job

> Create a new on-demand job using either a template (with persistent job optimization) or custom DAG.


## OpenAPI

````yaml https://platform.unstructuredapp.io/openapi.json post /api/v1/jobs/
openapi: 3.1.0
info:
  title: Platform API
  version: 3.1.0
servers:
  - url: https://platform.unstructuredapp.io/
    description: Unstructured Platform API
    x-speakeasy-server-id: platform-api
security: []
paths:
  /api/v1/jobs/:
    post:
      tags:
        - jobs
      summary: Create Job
      description: >-
        Create a new on-demand job using either a template (with persistent job
        optimization) or custom DAG.
      operationId: create_job
      parameters:
        - name: unstructured-api-key
          in: header
          required: false
          schema:
            anyOf:
              - type: string
              - type: 'null'
            title: Unstructured-Api-Key
      requestBody:
        required: true
        content:
          multipart/form-data:
            schema:
              $ref: '#/components/schemas/Body_create_job'
      responses:
        '200':
          description: Successful Response
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/JobInformation'
        '422':
          description: Validation Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/HTTPValidationError'
components:
  schemas:
    Body_create_job:
      properties:
        input_files:
          anyOf:
            - items:
                type: string
                format: binary
              type: array
            - type: 'null'
          title: Input Files
        request_data:
          type: string
          title: Request Data
      type: object
      required:
        - request_data
      title: Body_create_job
    JobInformation:
      properties:
        id:
          type: string
          format: uuid
          title: Id
        workflow_id:
          type: string
          format: uuid
          title: Workflow Id
        workflow_name:
          type: string
          title: Workflow Name
        status:
          $ref: '#/components/schemas/JobStatus'
        created_at:
          type: string
          format: date-time
          title: Created At
        runtime:
          anyOf:
            - type: string
              format: duration
            - type: 'null'
          title: Runtime
        input_file_ids:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Input File Ids
        output_node_files:
          anyOf:
            - items:
                $ref: '#/components/schemas/NodeFileMetadata'
              type: array
            - type: 'null'
          title: Output Node Files
        job_type:
          $ref: '#/components/schemas/WorkflowJobType'
          default: ephemeral
      type: object
      required:
        - id
        - workflow_id
        - workflow_name
        - status
        - created_at
      title: JobInformation
    HTTPValidationError:
      properties:
        detail:
          items:
            $ref: '#/components/schemas/ValidationError'
          type: array
          title: Detail
      type: object
      title: HTTPValidationError
    JobStatus:
      type: string
      enum:
        - SCHEDULED
        - IN_PROGRESS
        - COMPLETED
        - STOPPED
        - FAILED
      title: JobStatus
    NodeFileMetadata:
      properties:
        node_id:
          type: string
          format: uuid
          title: Node Id
        file_id:
          type: string
          title: File Id
        node_type:
          type: string
          title: Node Type
        node_subtype:
          type: string
          title: Node Subtype
      type: object
      required:
        - node_id
        - file_id
        - node_type
        - node_subtype
      title: NodeFileMetadata
    WorkflowJobType:
      type: string
      enum:
        - ephemeral
        - persistent
        - scheduled
        - template
      title: WorkflowJobType
    ValidationError:
      properties:
        loc:
          items:
            anyOf:
              - type: string
              - type: integer
          type: array
          title: Location
        msg:
          type: string
          title: Message
        type:
          type: string
          title: Error Type
      type: object
      required:
        - loc
        - msg
        - type
      title: ValidationError

````

---

# Source: https://docs.unstructured.io/api-reference/sources/create-source-connection-check.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Create source connection check

> Initiates a connection check for the specified source connector.


## OpenAPI

````yaml https://platform.unstructuredapp.io/openapi.json post /api/v1/sources/{source_id}/connection-check
openapi: 3.1.0
info:
  title: Platform API
  version: 3.1.0
servers:
  - url: https://platform.unstructuredapp.io/
    description: Unstructured Platform API
    x-speakeasy-server-id: platform-api
security: []
paths:
  /api/v1/sources/{source_id}/connection-check:
    post:
      tags:
        - sources
      summary: Create source connection check
      description: Initiates a connection check for the specified source connector.
      operationId: create_connection_check_sources
      parameters:
        - name: source_id
          in: path
          required: true
          schema:
            type: string
            format: uuid
            title: Source Id
        - name: unstructured-api-key
          in: header
          required: false
          schema:
            anyOf:
              - type: string
              - type: 'null'
            title: Unstructured-Api-Key
      responses:
        '202':
          description: Successful Response
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/DagNodeConnectionCheck'
        '422':
          description: Validation Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/HTTPValidationError'
components:
  schemas:
    DagNodeConnectionCheck:
      properties:
        id:
          type: string
          format: uuid
          title: Id
        status:
          $ref: '#/components/schemas/ConnectionCheckStatus'
        reason:
          anyOf:
            - type: string
            - type: 'null'
          title: Reason
        created_at:
          type: string
          format: date-time
          title: Created At
        reported_at:
          anyOf:
            - type: string
              format: date-time
            - type: 'null'
          title: Reported At
      type: object
      required:
        - id
        - status
        - created_at
      title: DagNodeConnectionCheck
    HTTPValidationError:
      properties:
        detail:
          items:
            $ref: '#/components/schemas/ValidationError'
          type: array
          title: Detail
      type: object
      title: HTTPValidationError
    ConnectionCheckStatus:
      type: string
      enum:
        - SCHEDULED
        - SUCCESS
        - FAILURE
      title: ConnectionCheckStatus
    ValidationError:
      properties:
        loc:
          items:
            anyOf:
              - type: string
              - type: integer
          type: array
          title: Location
        msg:
          type: string
          title: Message
        type:
          type: string
          title: Error Type
      type: object
      required:
        - loc
        - msg
        - type
      title: ValidationError

````

---

# Source: https://docs.unstructured.io/api-reference/sources/create-source-connector.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Create source connector

> Create a new source connector using the provided configuration and name.


## OpenAPI

````yaml https://platform.unstructuredapp.io/openapi.json post /api/v1/sources/
openapi: 3.1.0
info:
  title: Platform API
  version: 3.1.0
servers:
  - url: https://platform.unstructuredapp.io/
    description: Unstructured Platform API
    x-speakeasy-server-id: platform-api
security: []
paths:
  /api/v1/sources/:
    post:
      tags:
        - sources
      summary: Create source connector
      description: Create a new source connector using the provided configuration and name.
      operationId: create_source
      parameters:
        - name: unstructured-api-key
          in: header
          required: false
          schema:
            anyOf:
              - type: string
              - type: 'null'
            title: Unstructured-Api-Key
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/CreateSourceConnector'
      responses:
        '200':
          description: Successful Response
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/SourceConnectorInformation'
        '422':
          description: Validation Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/HTTPValidationError'
components:
  schemas:
    CreateSourceConnector:
      properties:
        name:
          type: string
          title: Name
        type:
          anyOf:
            - $ref: '#/components/schemas/SourceConnectorType'
            - type: string
          title: Type
        config:
          anyOf:
            - $ref: '#/components/schemas/AzureSourceConnectorConfigInput'
            - $ref: '#/components/schemas/BoxSourceConnectorConfigInput'
            - $ref: '#/components/schemas/ConfluenceSourceConnectorConfigInput'
            - $ref: '#/components/schemas/CouchbaseSourceConnectorConfigInput'
            - $ref: '#/components/schemas/DatabricksVolumesConnectorConfigInput'
            - $ref: '#/components/schemas/DropboxSourceConnectorConfigInput'
            - $ref: '#/components/schemas/ElasticsearchConnectorConfigInput'
            - $ref: '#/components/schemas/GCSSourceConnectorConfigInput'
            - $ref: '#/components/schemas/GoogleDriveSourceConnectorConfigInput'
            - $ref: '#/components/schemas/KafkaCloudSourceConnectorConfigInput'
            - $ref: '#/components/schemas/MongoDBConnectorConfigInput'
            - $ref: '#/components/schemas/OneDriveSourceConnectorConfigInput'
            - $ref: '#/components/schemas/OpenSearchConnectorConfigInput'
            - $ref: '#/components/schemas/OutlookSourceConnectorConfigInput'
            - $ref: '#/components/schemas/PostgresSourceConnectorConfigInput'
            - $ref: '#/components/schemas/S3SourceConnectorConfigInput'
            - $ref: '#/components/schemas/SalesforceSourceConnectorConfigInput'
            - $ref: '#/components/schemas/SharePointSourceConnectorConfigInput'
            - $ref: '#/components/schemas/SnowflakeSourceConnectorConfigInput'
            - $ref: '#/components/schemas/TeradataSourceConnectorConfigInput'
            - $ref: '#/components/schemas/JiraSourceConnectorConfigInput'
            - $ref: '#/components/schemas/ZendeskSourceConnectorConfigInput'
            - additionalProperties: true
              type: object
          title: Config
      type: object
      required:
        - name
        - type
        - config
      title: CreateSourceConnector
    SourceConnectorInformation:
      properties:
        id:
          type: string
          format: uuid
          title: Id
        name:
          type: string
          title: Name
        type:
          anyOf:
            - $ref: '#/components/schemas/SourceConnectorType'
            - type: string
          title: Type
        config:
          anyOf:
            - $ref: '#/components/schemas/AzureSourceConnectorConfig'
            - $ref: '#/components/schemas/BoxSourceConnectorConfig'
            - $ref: '#/components/schemas/ConfluenceSourceConnectorConfig'
            - $ref: '#/components/schemas/CouchbaseSourceConnectorConfig'
            - $ref: '#/components/schemas/DatabricksVolumesConnectorConfig'
            - $ref: '#/components/schemas/DropboxSourceConnectorConfig'
            - $ref: '#/components/schemas/ElasticsearchConnectorConfig'
            - $ref: '#/components/schemas/GCSSourceConnectorConfig'
            - $ref: '#/components/schemas/GoogleDriveSourceConnectorConfig'
            - $ref: '#/components/schemas/KafkaCloudSourceConnectorConfig'
            - $ref: '#/components/schemas/MongoDBConnectorConfig'
            - $ref: '#/components/schemas/OneDriveSourceConnectorConfig'
            - $ref: '#/components/schemas/OpenSearchConnectorConfig'
            - $ref: '#/components/schemas/OutlookSourceConnectorConfig'
            - $ref: '#/components/schemas/PostgresSourceConnectorConfig'
            - $ref: '#/components/schemas/S3SourceConnectorConfig'
            - $ref: '#/components/schemas/SalesforceSourceConnectorConfig'
            - $ref: '#/components/schemas/SharePointSourceConnectorConfig'
            - $ref: '#/components/schemas/SnowflakeSourceConnectorConfig'
            - $ref: '#/components/schemas/TeradataSourceConnectorConfig'
            - $ref: '#/components/schemas/JiraSourceConnectorConfig'
            - $ref: '#/components/schemas/ZendeskSourceConnectorConfig'
            - additionalProperties: true
              type: object
          title: Config
        created_at:
          type: string
          format: date-time
          title: Created At
        updated_at:
          anyOf:
            - type: string
              format: date-time
            - type: 'null'
          title: Updated At
      type: object
      required:
        - id
        - name
        - type
        - config
        - created_at
      title: SourceConnectorInformation
    HTTPValidationError:
      properties:
        detail:
          items:
            $ref: '#/components/schemas/ValidationError'
          type: array
          title: Detail
      type: object
      title: HTTPValidationError
    SourceConnectorType:
      type: string
      enum:
        - azure
        - box
        - confluence
        - couchbase
        - databricks_volumes
        - dropbox
        - elasticsearch
        - gcs
        - google_drive
        - kafka-cloud
        - mongodb
        - onedrive
        - opensearch
        - outlook
        - postgres
        - s3
        - salesforce
        - sharepoint
        - slack
        - snowflake
        - teradata
        - jira
        - zendesk
      title: SourceConnectorType
    AzureSourceConnectorConfigInput:
      properties:
        remote_url:
          type: string
          pattern: ^(az:\/\/|abfs:\/\/)
          title: Remote Url
        account_name:
          anyOf:
            - type: string
            - type: 'null'
          title: Account Name
        account_key:
          anyOf:
            - type: string
            - type: 'null'
          title: Account Key
        connection_string:
          anyOf:
            - type: string
            - type: 'null'
          title: Connection String
        sas_token:
          anyOf:
            - type: string
            - type: 'null'
          title: Sas Token
        recursive:
          type: boolean
          title: Recursive
          default: true
      type: object
      required:
        - remote_url
      title: AzureSourceConnectorConfigInput
    BoxSourceConnectorConfigInput:
      properties:
        box_app_config:
          type: string
          title: Box App Config
        remote_url:
          type: string
          pattern: ^box:\/\/
          title: Remote Url
        recursive:
          type: boolean
          title: Recursive
          default: true
      type: object
      required:
        - box_app_config
        - remote_url
      title: BoxSourceConnectorConfigInput
    ConfluenceSourceConnectorConfigInput:
      properties:
        url:
          type: string
          title: Url
        username:
          type: string
          title: Username
        password:
          anyOf:
            - type: string
            - type: 'null'
          title: Password
        api_token:
          anyOf:
            - type: string
            - type: 'null'
          title: Api Token
        token:
          anyOf:
            - type: string
            - type: 'null'
          title: Token
        cloud:
          type: boolean
          title: Cloud
          default: false
        extract_images:
          type: boolean
          title: Extract Images
          default: false
        extract_files:
          type: boolean
          title: Extract Files
          default: false
        max_num_of_spaces:
          type: integer
          minimum: 1
          title: Max Num Of Spaces
          default: 500
        max_num_of_docs_from_each_space:
          type: integer
          minimum: 1
          title: Max Num Of Docs From Each Space
          default: 150
        spaces:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Spaces
      type: object
      required:
        - url
        - username
      title: ConfluenceSourceConnectorConfigInput
    CouchbaseSourceConnectorConfigInput:
      properties:
        bucket:
          type: string
          title: Bucket
        connection_string:
          type: string
          title: Connection String
        scope:
          anyOf:
            - type: string
            - type: 'null'
          title: Scope
        collection:
          anyOf:
            - type: string
            - type: 'null'
          title: Collection
        batch_size:
          type: integer
          title: Batch Size
        username:
          type: string
          title: Username
        password:
          type: string
          title: Password
        collection_id:
          type: string
          title: Collection Id
      type: object
      required:
        - bucket
        - connection_string
        - batch_size
        - username
        - password
        - collection_id
      title: CouchbaseSourceConnectorConfigInput
    DatabricksVolumesConnectorConfigInput:
      properties:
        host:
          type: string
          title: Host
        catalog:
          type: string
          title: Catalog
        schema:
          type: string
          title: Schema
          default: default
        volume:
          type: string
          title: Volume
        volume_path:
          type: string
          title: Volume Path
        client_secret:
          type: string
          title: Client Secret
        client_id:
          type: string
          title: Client Id
      type: object
      required:
        - host
        - catalog
        - volume
        - volume_path
        - client_secret
        - client_id
      title: DatabricksVolumesConnectorConfigInput
    DropboxSourceConnectorConfigInput:
      properties:
        token:
          type: string
          title: Token
        remote_url:
          type: string
          pattern: ^dropbox:\/\/
          title: Remote Url
        recursive:
          type: boolean
          title: Recursive
          default: true
      type: object
      required:
        - token
        - remote_url
      title: DropboxSourceConnectorConfigInput
    ElasticsearchConnectorConfigInput:
      properties:
        hosts:
          items:
            type: string
          type: array
          title: Hosts
        index_name:
          type: string
          title: Index Name
        es_api_key:
          type: string
          title: Es Api Key
      type: object
      required:
        - hosts
        - index_name
        - es_api_key
      title: ElasticsearchConnectorConfigInput
    GCSSourceConnectorConfigInput:
      properties:
        remote_url:
          type: string
          pattern: ^(gs|gcs):\/\/
          title: Remote Url
        service_account_key:
          type: string
          title: Service Account Key
        recursive:
          type: boolean
          title: Recursive
          default: true
      type: object
      required:
        - remote_url
        - service_account_key
      title: GCSSourceConnectorConfigInput
    GoogleDriveSourceConnectorConfigInput:
      properties:
        drive_id:
          type: string
          title: Drive Id
        service_account_key:
          anyOf:
            - $ref: '#/components/schemas/SecretReference'
            - type: string
          title: Service Account Key
        extensions:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Extensions
        recursive:
          type: boolean
          title: Recursive
          default: true
      type: object
      required:
        - drive_id
        - service_account_key
      title: GoogleDriveSourceConnectorConfigInput
    KafkaCloudSourceConnectorConfigInput:
      properties:
        bootstrap_servers:
          type: string
          title: Bootstrap Servers
        port:
          type: integer
          minimum: 1
          title: Port
          default: 9092
        group_id:
          anyOf:
            - type: string
            - type: 'null'
          title: Group Id
        topic:
          type: string
          title: Topic
        kafka_api_key:
          type: string
          title: Kafka Api Key
        secret:
          type: string
          title: Secret
        num_messages_to_consume:
          type: integer
          minimum: 1
          title: Num Messages To Consume
          default: 100
      type: object
      required:
        - bootstrap_servers
        - topic
        - kafka_api_key
        - secret
      title: KafkaCloudSourceConnectorConfigInput
    MongoDBConnectorConfigInput:
      properties:
        database:
          type: string
          title: Database
        collection:
          type: string
          title: Collection
        uri:
          type: string
          pattern: >-
            ^mongodb(?:\+srv)?:\/\/(?:([^:@/\s]+)(?::([^@/\s]*))?@)?([^/\s:,]+)(?::(\d+))?(?:,([^/\s:,]+)(?::(\d+))?)*(?:\/([^/\s?]*))?(?:\?([^/\s]*))?$
          title: Uri
      type: object
      required:
        - database
        - collection
        - uri
      title: MongoDBConnectorConfigInput
    OneDriveSourceConnectorConfigInput:
      properties:
        client_id:
          type: string
          title: Client Id
        user_pname:
          type: string
          title: User Pname
        tenant:
          type: string
          title: Tenant
        authority_url:
          type: string
          title: Authority Url
        client_cred:
          type: string
          title: Client Cred
        recursive:
          type: boolean
          title: Recursive
          default: false
        path:
          type: string
          title: Path
      type: object
      required:
        - client_id
        - user_pname
        - tenant
        - authority_url
        - client_cred
        - path
      title: OneDriveSourceConnectorConfigInput
    OpenSearchConnectorConfigInput:
      properties:
        hosts:
          items:
            type: string
          type: array
          minItems: 1
          title: Hosts
          description: List of OpenSearch hosts to connect to
        index_name:
          type: string
          title: Index Name
          description: Name of the OpenSearch index to read from or write to
        username:
          anyOf:
            - type: string
            - type: 'null'
          title: Username
          description: Username for basic authentication
        password:
          anyOf:
            - type: string
            - type: 'null'
          title: Password
          description: Password for basic authentication
        aws_access_key_id:
          anyOf:
            - type: string
            - type: 'null'
          title: Aws Access Key Id
          description: >-
            AWS access key ID for IAM authentication. When provided with
            aws_secret_access_key, IAM authentication is used instead of basic
            auth. Region and service type are auto-detected from the host URL.
        aws_secret_access_key:
          anyOf:
            - type: string
            - type: 'null'
          title: Aws Secret Access Key
          description: >-
            AWS secret access key for IAM authentication. Required when
            aws_access_key_id is provided.
        aws_session_token:
          anyOf:
            - type: string
            - type: 'null'
          title: Aws Session Token
          description: >-
            AWS session token for temporary credentials (optional). Only used
            when aws_access_key_id and aws_secret_access_key are provided.
        use_ssl:
          anyOf:
            - type: boolean
            - type: 'null'
          title: Use Ssl
          description: Whether to use SSL for the connection
          default: true
      type: object
      required:
        - hosts
        - index_name
      title: OpenSearchConnectorConfigInput
      description: Input model for creating/updating OpenSearch connectors via API.
    OutlookSourceConnectorConfigInput:
      properties:
        authority_url:
          anyOf:
            - type: string
            - type: 'null'
          title: Authority Url
        tenant:
          anyOf:
            - type: string
            - type: 'null'
          title: Tenant
        client_id:
          type: string
          title: Client Id
        client_cred:
          type: string
          title: Client Cred
        outlook_folders:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Outlook Folders
        recursive:
          type: boolean
          title: Recursive
          default: false
        user_email:
          type: string
          title: User Email
      type: object
      required:
        - client_id
        - client_cred
        - user_email
      title: OutlookSourceConnectorConfigInput
    PostgresSourceConnectorConfigInput:
      properties:
        host:
          type: string
          title: Host
        database:
          type: string
          title: Database
        port:
          type: integer
          title: Port
        username:
          type: string
          title: Username
        password:
          type: string
          title: Password
        table_name:
          type: string
          title: Table Name
        batch_size:
          type: integer
          title: Batch Size
        id_column:
          type: string
          title: Id Column
          default: id
        fields:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Fields
      type: object
      required:
        - host
        - database
        - port
        - username
        - password
        - table_name
        - batch_size
      title: PostgresSourceConnectorConfigInput
    S3SourceConnectorConfigInput:
      properties:
        remote_url:
          type: string
          pattern: ^s3:\/\/
          title: Remote Url
        anonymous:
          type: boolean
          title: Anonymous
          default: false
        key:
          anyOf:
            - type: string
            - type: 'null'
          title: Key
        secret:
          anyOf:
            - type: string
            - type: 'null'
          title: Secret
        token:
          anyOf:
            - type: string
            - type: 'null'
          title: Token
        endpoint_url:
          anyOf:
            - type: string
            - type: 'null'
          title: Endpoint Url
        recursive:
          type: boolean
          title: Recursive
          default: true
      type: object
      required:
        - remote_url
      title: S3SourceConnectorConfigInput
    SalesforceSourceConnectorConfigInput:
      properties:
        username:
          type: string
          title: Username
        consumer_key:
          type: string
          title: Consumer Key
        private_key:
          type: string
          title: Private Key
        categories:
          items:
            type: string
          type: array
          title: Categories
      type: object
      required:
        - username
        - consumer_key
        - private_key
        - categories
      title: SalesforceSourceConnectorConfigInput
    SharePointSourceConnectorConfigInput:
      properties:
        site:
          type: string
          pattern: >-
            ^https:\/\/[a-zA-Z0-9-]+(\.[a-zA-Z0-9-]+)+(\/(sites|teams)\/[a-zA-Z0-9-_]+)?$
          title: Site
        tenant:
          type: string
          title: Tenant
        authority_url:
          type: string
          title: Authority Url
          default: https://login.microsoftonline.com
        user_pname:
          type: string
          title: User Pname
        client_id:
          type: string
          title: Client Id
        client_cred:
          type: string
          title: Client Cred
        recursive:
          type: boolean
          title: Recursive
          default: false
        path:
          anyOf:
            - type: string
            - type: 'null'
          title: Path
      type: object
      required:
        - site
        - tenant
        - user_pname
        - client_id
        - client_cred
      title: SharePointSourceConnectorConfigInput
    SnowflakeSourceConnectorConfigInput:
      properties:
        account:
          type: string
          title: Account
        role:
          type: string
          title: Role
        user:
          type: string
          title: User
        password:
          type: string
          title: Password
        host:
          type: string
          title: Host
        port:
          type: integer
          minimum: 1
          title: Port
          default: 443
        database:
          type: string
          title: Database
        schema:
          type: string
          title: Schema
        table_name:
          type: string
          title: Table Name
        batch_size:
          type: integer
          minimum: 1
          title: Batch Size
          default: 100
        id_column:
          type: string
          title: Id Column
        fields:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Fields
      type: object
      required:
        - account
        - role
        - user
        - password
        - host
        - database
        - schema
        - table_name
        - id_column
      title: SnowflakeSourceConnectorConfigInput
    TeradataSourceConnectorConfigInput:
      properties:
        host:
          type: string
          title: Host
        user:
          type: string
          title: User
        password:
          type: string
          title: Password
        database:
          anyOf:
            - type: string
            - type: 'null'
          title: Database
        dbs_port:
          type: integer
          minimum: 1
          title: Dbs Port
          default: 1025
        id_column:
          type: string
          title: Id Column
          default: id
        table_name:
          type: string
          title: Table Name
        batch_size:
          type: integer
          minimum: 1
          title: Batch Size
          default: 100
        fields:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Fields
      type: object
      required:
        - host
        - user
        - password
        - table_name
      title: TeradataSourceConnectorConfigInput
    JiraSourceConnectorConfigInput:
      properties:
        url:
          type: string
          title: Url
        username:
          type: string
          title: Username
        password:
          anyOf:
            - type: string
            - type: 'null'
          title: Password
        token:
          anyOf:
            - type: string
            - type: 'null'
          title: Token
        cloud:
          anyOf:
            - type: boolean
            - type: 'null'
          title: Cloud
          default: false
        projects:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Projects
        boards:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Boards
        issues:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Issues
        status_filters:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Status Filters
        download_attachments:
          anyOf:
            - type: boolean
            - type: 'null'
          title: Download Attachments
          default: false
      type: object
      required:
        - url
        - username
      title: JiraSourceConnectorConfigInput
    ZendeskSourceConnectorConfigInput:
      properties:
        subdomain:
          type: string
          title: Subdomain
        email:
          type: string
          title: Email
        api_token:
          type: string
          title: Api Token
        item_type:
          anyOf:
            - type: string
            - type: 'null'
          title: Item Type
          default: tickets
        batch_size:
          anyOf:
            - type: integer
            - type: 'null'
          title: Batch Size
          default: 2
      type: object
      required:
        - subdomain
        - email
        - api_token
      title: ZendeskSourceConnectorConfigInput
    AzureSourceConnectorConfig:
      properties:
        remote_url:
          type: string
          title: Remote Url
        account_name:
          anyOf:
            - type: string
            - type: 'null'
          title: Account Name
        account_key:
          anyOf:
            - type: string
            - type: 'null'
          title: Account Key
        connection_string:
          anyOf:
            - type: string
            - type: 'null'
          title: Connection String
        sas_token:
          anyOf:
            - type: string
            - type: 'null'
          title: Sas Token
        recursive:
          type: boolean
          title: Recursive
      type: object
      required:
        - remote_url
        - recursive
      title: AzureSourceConnectorConfig
    BoxSourceConnectorConfig:
      properties:
        box_app_config:
          type: string
          title: Box App Config
        recursive:
          type: boolean
          title: Recursive
      type: object
      required:
        - box_app_config
        - recursive
      title: BoxSourceConnectorConfig
    ConfluenceSourceConnectorConfig:
      properties:
        url:
          type: string
          title: Url
        username:
          type: string
          title: Username
        password:
          anyOf:
            - type: string
            - type: 'null'
          title: Password
        api_token:
          anyOf:
            - type: string
            - type: 'null'
          title: Api Token
        token:
          anyOf:
            - type: string
            - type: 'null'
          title: Token
        cloud:
          type: boolean
          title: Cloud
        extract_images:
          type: boolean
          title: Extract Images
          default: false
        extract_files:
          type: boolean
          title: Extract Files
          default: false
        max_num_of_spaces:
          type: integer
          title: Max Num Of Spaces
        max_num_of_docs_from_each_space:
          type: integer
          title: Max Num Of Docs From Each Space
        spaces:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Spaces
      type: object
      required:
        - url
        - username
        - cloud
        - max_num_of_spaces
        - max_num_of_docs_from_each_space
        - spaces
      title: ConfluenceSourceConnectorConfig
    CouchbaseSourceConnectorConfig:
      properties:
        bucket:
          type: string
          title: Bucket
        connection_string:
          type: string
          title: Connection String
        scope:
          anyOf:
            - type: string
            - type: 'null'
          title: Scope
        collection:
          anyOf:
            - type: string
            - type: 'null'
          title: Collection
        batch_size:
          type: integer
          title: Batch Size
        username:
          type: string
          title: Username
        password:
          type: string
          title: Password
        collection_id:
          type: string
          title: Collection Id
      type: object
      required:
        - bucket
        - connection_string
        - batch_size
        - username
        - password
        - collection_id
      title: CouchbaseSourceConnectorConfig
    DatabricksVolumesConnectorConfig:
      properties:
        host:
          type: string
          title: Host
        catalog:
          type: string
          title: Catalog
        schema:
          type: string
          title: Schema
          default: default
        volume:
          type: string
          title: Volume
        volume_path:
          type: string
          title: Volume Path
        client_secret:
          type: string
          title: Client Secret
        client_id:
          type: string
          title: Client Id
      type: object
      required:
        - host
        - catalog
        - volume
        - volume_path
        - client_secret
        - client_id
      title: DatabricksVolumesConnectorConfig
    DropboxSourceConnectorConfig:
      properties:
        token:
          type: string
          title: Token
        remote_url:
          type: string
          title: Remote Url
        recursive:
          type: boolean
          title: Recursive
      type: object
      required:
        - token
        - remote_url
        - recursive
      title: DropboxSourceConnectorConfig
    ElasticsearchConnectorConfig:
      properties:
        hosts:
          items:
            type: string
          type: array
          title: Hosts
        index_name:
          type: string
          title: Index Name
        es_api_key:
          type: string
          title: Es Api Key
      type: object
      required:
        - hosts
        - index_name
        - es_api_key
      title: ElasticsearchConnectorConfig
    GCSSourceConnectorConfig:
      properties:
        remote_url:
          type: string
          title: Remote Url
        service_account_key:
          type: string
          title: Service Account Key
        recursive:
          type: boolean
          title: Recursive
      type: object
      required:
        - remote_url
        - service_account_key
        - recursive
      title: GCSSourceConnectorConfig
    GoogleDriveSourceConnectorConfig:
      properties:
        drive_id:
          type: string
          title: Drive Id
        service_account_key:
          anyOf:
            - $ref: '#/components/schemas/SecretReference'
            - type: string
          title: Service Account Key
        extensions:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Extensions
        recursive:
          type: boolean
          title: Recursive
      type: object
      required:
        - drive_id
        - service_account_key
        - recursive
      title: GoogleDriveSourceConnectorConfig
    KafkaCloudSourceConnectorConfig:
      properties:
        bootstrap_servers:
          type: string
          title: Bootstrap Servers
        port:
          type: integer
          title: Port
        group_id:
          anyOf:
            - type: string
            - type: 'null'
          title: Group Id
        topic:
          type: string
          title: Topic
        kafka_api_key:
          type: string
          title: Kafka Api Key
        secret:
          type: string
          title: Secret
        num_messages_to_consume:
          type: integer
          title: Num Messages To Consume
      type: object
      required:
        - bootstrap_servers
        - port
        - topic
        - kafka_api_key
        - secret
        - num_messages_to_consume
      title: KafkaCloudSourceConnectorConfig
    MongoDBConnectorConfig:
      properties:
        database:
          type: string
          title: Database
        collection:
          type: string
          title: Collection
        uri:
          type: string
          title: Uri
      type: object
      required:
        - database
        - collection
        - uri
      title: MongoDBConnectorConfig
    OneDriveSourceConnectorConfig:
      properties:
        client_id:
          type: string
          title: Client Id
        user_pname:
          type: string
          title: User Pname
        tenant:
          type: string
          title: Tenant
        authority_url:
          type: string
          title: Authority Url
        client_cred:
          type: string
          title: Client Cred
        recursive:
          type: boolean
          title: Recursive
        path:
          type: string
          title: Path
      type: object
      required:
        - client_id
        - user_pname
        - tenant
        - authority_url
        - client_cred
        - recursive
        - path
      title: OneDriveSourceConnectorConfig
    OpenSearchConnectorConfig:
      properties:
        hosts:
          items:
            type: string
          type: array
          minItems: 1
          title: Hosts
          description: List of OpenSearch hosts to connect to
        index_name:
          type: string
          title: Index Name
          description: Name of the OpenSearch index to read from or write to
        username:
          anyOf:
            - type: string
            - type: 'null'
          title: Username
          description: Username for basic authentication
        password:
          anyOf:
            - type: string
            - type: 'null'
          title: Password
          description: Password for basic authentication
        aws_access_key_id:
          anyOf:
            - type: string
            - type: 'null'
          title: Aws Access Key Id
          description: >-
            AWS access key ID for IAM authentication. When provided with
            aws_secret_access_key, IAM authentication is used instead of basic
            auth. Region and service type are auto-detected from the host URL.
        aws_secret_access_key:
          anyOf:
            - type: string
            - type: 'null'
          title: Aws Secret Access Key
          description: >-
            AWS secret access key for IAM authentication. Required when
            aws_access_key_id is provided.
        aws_session_token:
          anyOf:
            - type: string
            - type: 'null'
          title: Aws Session Token
          description: >-
            AWS session token for temporary credentials (optional). Only used
            when aws_access_key_id and aws_secret_access_key are provided.
        use_ssl:
          anyOf:
            - type: boolean
            - type: 'null'
          title: Use Ssl
          description: Whether to use SSL for the connection
          default: true
      type: object
      required:
        - hosts
        - index_name
      title: OpenSearchConnectorConfig
      description: >-
        OpenSearch connector configuration.


        OpenSearch is a fork of Elasticsearch with similar functionality.

        Supports two authentication methods:

        1. Basic auth: username + password

        2. AWS IAM auth: aws_access_key_id + aws_secret_access_key (+ optional
        aws_session_token)


        For AWS OpenSearch Service or OpenSearch Serverless, region and service
        type

        are auto-detected from the host URL.
    OutlookSourceConnectorConfig:
      properties:
        authority_url:
          anyOf:
            - type: string
            - type: 'null'
          title: Authority Url
        tenant:
          anyOf:
            - type: string
            - type: 'null'
          title: Tenant
        client_id:
          type: string
          title: Client Id
        client_cred:
          type: string
          title: Client Cred
        outlook_folders:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Outlook Folders
        recursive:
          type: boolean
          title: Recursive
        user_email:
          type: string
          title: User Email
      type: object
      required:
        - client_id
        - client_cred
        - recursive
        - user_email
      title: OutlookSourceConnectorConfig
    PostgresSourceConnectorConfig:
      properties:
        host:
          type: string
          title: Host
        database:
          type: string
          title: Database
        port:
          type: integer
          title: Port
        username:
          type: string
          title: Username
        password:
          type: string
          title: Password
        table_name:
          type: string
          title: Table Name
        batch_size:
          type: integer
          title: Batch Size
        id_column:
          type: string
          title: Id Column
        fields:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Fields
      type: object
      required:
        - host
        - database
        - port
        - username
        - password
        - table_name
        - batch_size
        - id_column
      title: PostgresSourceConnectorConfig
    S3SourceConnectorConfig:
      properties:
        remote_url:
          type: string
          title: Remote Url
        anonymous:
          type: boolean
          title: Anonymous
        key:
          anyOf:
            - type: string
            - type: 'null'
          title: Key
        secret:
          anyOf:
            - type: string
            - type: 'null'
          title: Secret
        token:
          anyOf:
            - type: string
            - type: 'null'
          title: Token
        endpoint_url:
          anyOf:
            - type: string
            - type: 'null'
          title: Endpoint Url
        recursive:
          type: boolean
          title: Recursive
      type: object
      required:
        - remote_url
        - anonymous
        - recursive
      title: S3SourceConnectorConfig
    SalesforceSourceConnectorConfig:
      properties:
        username:
          type: string
          title: Username
        consumer_key:
          type: string
          title: Consumer Key
        private_key:
          type: string
          title: Private Key
        categories:
          items:
            type: string
          type: array
          title: Categories
      type: object
      required:
        - username
        - consumer_key
        - private_key
        - categories
      title: SalesforceSourceConnectorConfig
    SharePointSourceConnectorConfig:
      properties:
        site:
          type: string
          title: Site
        tenant:
          type: string
          title: Tenant
        authority_url:
          type: string
          title: Authority Url
          default: https://login.microsoftonline.com
        user_pname:
          type: string
          title: User Pname
        client_id:
          type: string
          title: Client Id
        client_cred:
          type: string
          title: Client Cred
        recursive:
          type: boolean
          title: Recursive
        path:
          anyOf:
            - type: string
            - type: 'null'
          title: Path
      type: object
      required:
        - site
        - tenant
        - user_pname
        - client_id
        - client_cred
        - recursive
      title: SharePointSourceConnectorConfig
    SnowflakeSourceConnectorConfig:
      properties:
        account:
          type: string
          title: Account
        role:
          type: string
          title: Role
        user:
          type: string
          title: User
        password:
          type: string
          title: Password
        host:
          type: string
          title: Host
        port:
          type: integer
          minimum: 1
          title: Port
          default: 443
        database:
          type: string
          title: Database
        schema:
          type: string
          title: Schema
        table_name:
          type: string
          title: Table Name
        batch_size:
          type: integer
          minimum: 1
          title: Batch Size
          default: 100
        id_column:
          type: string
          title: Id Column
        fields:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Fields
      type: object
      required:
        - account
        - role
        - user
        - password
        - host
        - database
        - schema
        - table_name
        - id_column
      title: SnowflakeSourceConnectorConfig
    TeradataSourceConnectorConfig:
      properties:
        host:
          type: string
          title: Host
        user:
          type: string
          title: User
        password:
          type: string
          title: Password
        database:
          anyOf:
            - type: string
            - type: 'null'
          title: Database
        dbs_port:
          type: integer
          minimum: 1
          title: Dbs Port
          default: 1025
        id_column:
          type: string
          title: Id Column
          default: id
        table_name:
          type: string
          title: Table Name
        batch_size:
          type: integer
          minimum: 1
          title: Batch Size
          default: 100
        fields:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Fields
      type: object
      required:
        - host
        - user
        - password
        - table_name
      title: TeradataSourceConnectorConfig
    JiraSourceConnectorConfig:
      properties:
        url:
          type: string
          title: Url
        username:
          type: string
          title: Username
        password:
          anyOf:
            - type: string
            - type: 'null'
          title: Password
        token:
          anyOf:
            - type: string
            - type: 'null'
          title: Token
        cloud:
          anyOf:
            - type: boolean
            - type: 'null'
          title: Cloud
          default: false
        projects:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Projects
        boards:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Boards
        issues:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Issues
        status_filters:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Status Filters
        download_attachments:
          anyOf:
            - type: boolean
            - type: 'null'
          title: Download Attachments
          default: false
      type: object
      required:
        - url
        - username
      title: JiraSourceConnectorConfig
    ZendeskSourceConnectorConfig:
      properties:
        subdomain:
          type: string
          title: Subdomain
        email:
          type: string
          title: Email
        api_token:
          type: string
          title: Api Token
        item_type:
          anyOf:
            - type: string
            - type: 'null'
          title: Item Type
          default: tickets
        batch_size:
          anyOf:
            - type: integer
            - type: 'null'
          title: Batch Size
          default: 2
      type: object
      required:
        - subdomain
        - email
        - api_token
      title: ZendeskSourceConnectorConfig
    ValidationError:
      properties:
        loc:
          items:
            anyOf:
              - type: string
              - type: integer
          type: array
          title: Location
        msg:
          type: string
          title: Message
        type:
          type: string
          title: Error Type
      type: object
      required:
        - loc
        - msg
        - type
      title: ValidationError
    SecretReference:
      properties:
        id:
          type: string
          title: Id
        type:
          $ref: '#/components/schemas/EncryptionType'
          default: rsa
      type: object
      required:
        - id
      title: SecretReference
    EncryptionType:
      type: string
      enum:
        - rsa
        - rsa_aes
      title: EncryptionType

````

---

# Source: https://docs.unstructured.io/api-reference/workflows/create-workflow.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Create Workflow

> Create a new workflow, either custom or auto, and configure its settings.


## OpenAPI

````yaml https://platform.unstructuredapp.io/openapi.json post /api/v1/workflows/
openapi: 3.1.0
info:
  title: Platform API
  version: 3.1.0
servers:
  - url: https://platform.unstructuredapp.io/
    description: Unstructured Platform API
    x-speakeasy-server-id: platform-api
security: []
paths:
  /api/v1/workflows/:
    post:
      tags:
        - workflows
      summary: Create Workflow
      description: >-
        Create a new workflow, either custom or auto, and configure its
        settings.
      operationId: create_workflow
      parameters:
        - name: unstructured-api-key
          in: header
          required: false
          schema:
            anyOf:
              - type: string
              - type: 'null'
            title: Unstructured-Api-Key
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/CreateWorkflow'
      responses:
        '200':
          description: Successful Response
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/WorkflowInformation'
        '422':
          description: Validation Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/HTTPValidationError'
components:
  schemas:
    CreateWorkflow:
      properties:
        name:
          type: string
          title: Name
        source_id:
          anyOf:
            - type: string
              format: uuid
            - type: 'null'
          title: Source Id
        destination_id:
          anyOf:
            - type: string
              format: uuid
            - type: 'null'
          title: Destination Id
        workflow_type:
          $ref: '#/components/schemas/WorkflowType'
        workflow_nodes:
          anyOf:
            - items:
                $ref: '#/components/schemas/WorkflowNode'
              type: array
            - type: 'null'
          title: Workflow Nodes
        template_id:
          anyOf:
            - type: string
            - type: 'null'
          title: Template Id
        schedule:
          anyOf:
            - type: string
              enum:
                - every 15 minutes
                - every hour
                - every 2 hours
                - every 4 hours
                - every 6 hours
                - every 8 hours
                - every 10 hours
                - every 12 hours
                - daily
                - weekly
                - monthly
            - type: 'null'
          title: Schedule
        reprocess_all:
          anyOf:
            - type: boolean
            - type: 'null'
          title: Reprocess All
          default: false
      type: object
      required:
        - name
        - workflow_type
      title: CreateWorkflow
    WorkflowInformation:
      properties:
        id:
          type: string
          format: uuid
          title: Id
        name:
          type: string
          title: Name
        sources:
          items:
            type: string
            format: uuid
          type: array
          title: Sources
        destinations:
          items:
            type: string
            format: uuid
          type: array
          title: Destinations
        workflow_type:
          anyOf:
            - $ref: '#/components/schemas/WorkflowType'
            - type: 'null'
        workflow_nodes:
          items:
            $ref: '#/components/schemas/WorkflowNode'
          type: array
          title: Workflow Nodes
        schedule:
          anyOf:
            - $ref: '#/components/schemas/WorkflowSchedule'
            - type: 'null'
        status:
          $ref: '#/components/schemas/WorkflowState'
        created_at:
          type: string
          format: date-time
          title: Created At
        updated_at:
          anyOf:
            - type: string
              format: date-time
            - type: 'null'
          title: Updated At
        reprocess_all:
          type: boolean
          title: Reprocess All
          default: false
      type: object
      required:
        - id
        - name
        - sources
        - destinations
        - workflow_nodes
        - status
        - created_at
      title: WorkflowInformation
    HTTPValidationError:
      properties:
        detail:
          items:
            $ref: '#/components/schemas/ValidationError'
          type: array
          title: Detail
      type: object
      title: HTTPValidationError
    WorkflowType:
      type: string
      enum:
        - basic
        - advanced
        - platinum
        - custom
      title: WorkflowType
    WorkflowNode:
      properties:
        id:
          anyOf:
            - type: string
              format: uuid
            - type: 'null'
          title: Id
        name:
          type: string
          title: Name
        type:
          type: string
          title: Type
        subtype:
          type: string
          title: Subtype
        settings:
          anyOf:
            - additionalProperties: true
              type: object
            - type: 'null'
          title: Settings
      type: object
      required:
        - name
        - type
        - subtype
      title: WorkflowNode
    WorkflowSchedule:
      properties:
        crontab_entries:
          items:
            $ref: '#/components/schemas/CronTabEntry'
          type: array
          title: Crontab Entries
      type: object
      required:
        - crontab_entries
      title: WorkflowSchedule
      examples:
        - crontab_entries:
            - cron_expression: 0 0 * * *
    WorkflowState:
      type: string
      enum:
        - active
        - inactive
      title: WorkflowState
    ValidationError:
      properties:
        loc:
          items:
            anyOf:
              - type: string
              - type: integer
          type: array
          title: Location
        msg:
          type: string
          title: Message
        type:
          type: string
          title: Error Type
      type: object
      required:
        - loc
        - msg
        - type
      title: ValidationError
    CronTabEntry:
      properties:
        cron_expression:
          type: string
          title: Cron Expression
      type: object
      required:
        - cron_expression
      title: CronTabEntry

````

---

# Source: https://docs.unstructured.io/examplecode/tools/crewai.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# CrewAI

[CrewAI](https://www.crewai.com/) is a popular framework for building AI agents and multi-agent workflows.

This article provides a hands-on, step-by-step walkthrough that uses [CrewAI open source](https://www.crewai.com/open-source),
along with the [Unstructured API's workflow operations MCP Server](/examplecode/tools/mcp) and Python, to
build a multi-agent workflow. This multi-agent workflow uses the MCP server to call various functions within the
[Unstructured API's workflow operations](/api-reference/workflow/overview) to build Unstructured ETL+ workflows from connector creation all
the way through to workflow job completion. This walkthrough uses an Amazon S3 bucket as both the workflow's source and destination. However, you can modify this multi-agent workflow
later to use a different S3 bucket or even different sources and destinations, to have a collection of AI agents quickly build multiple Unstructured ETL+ workflows
on your behalf with varying configurations.

## Requirements

To complete this walkthrough, you must first have:

* An Unstructured account and an Unstructured API key for that account.
* An Anthropic account and an Anthropic API key for that account.
* A properly configured Amazon S3 bucket with bucket access credentials.
* A Firecrawl account and a Firecrawl API key for that account.
* Python and the CrewAI open-source toolchain installed on your local development machine.

The following sections describe how to get these requirements.

### Unstructured account and API key

Before you begin, you must have an Unstructured account and an Unstructured API key, as follows:

1. If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
   After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).

   <Note>
     To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/api-reference/overview#pricing).
   </Note>

2. If you have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at [https://platform.unstructured.io](https://platform.unstructured.io).

   <Note>
     For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
     or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).
   </Note>

3. Get your Unstructured API key:<br />

   a. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business** account, click **API Keys** on the sidebar.<br />

   <Note>
     For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
     for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).
   </Note>

   b. Click **Generate API Key**.<br />
   c. Follow the on-screen instructions to finish generating the key.<br />
   d. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

### Anthropic account and API key

This walkthrough uses a Claude Opus 3 model from Anthropic. So, before you begin, you must also have an Anthropic account and an Anthropic API key for that account.
[Sign in to or create your Anthropic account](https://console.anthropic.com/login). After you sign in to your
Anthropic account, [get your Anthropic API key](https://console.anthropic.com/settings/keys).

### Amazon S3 bucket and access credentials

This walkthrough uses an Amazon S3 bucket as both the workflow's source and destination. So, before you begin, you must also have a properly configured Amazon S3 bucket, with bucket access credentials
consisting of an access key ID and a secret access key for the AWS IAM user that has access to the bucket.
Follow the S3 connector instructions to
[create and configure the bucket and get the bucket access credentials](/ui/sources/s3) if you do not already have this all set up. (In these
instructions, do not follow the directions to use the Unstructured UI to create the S3 source connector. CrewAI will do this for you
later automatically.)

This walkthrough expects two folders to exist within the bucket, as follows:

* An `input` folder, which contains the files to process. This `input` folder must contain at least one file to process.
  If you do not have any files available to upload into the `input` folder, you can get some from the [example-docs](https://github.com/Unstructured-IO/unstructured/tree/main/example-docs) folder in the Unstructured open source library repository on GitHub.
* An `output` folder, which will contain the processed files' data from Unstructured after CrewAI runs the workflow.
  This `output` folder should be empty for now.

### Firecrawl account and API key

This walkthrough uses Firecrawl to monitor job statuses. So, before you begin, you must also have a Firecrawl account and a Firecrawl API key for that account.
[Sign in to or create your Firecrawl account](https://www.firecrawl.dev/signin/signup). After you sign in to your Firecrawl account, [get your Firecrawl API key](https://www.firecrawl.dev/app/api-keys).

### Python and CrewAI open-source toolchain and project setup

Before you can start coding on your local machine, you must install Python, and you should also install a Python package and project manager to manage your project's code dependencies.
This walkthrough uses the popular Python package and project manager [uv](https://docs.astral.sh/uv/) (although `uv` is not required to use CrewAI or the Unstructured API's workflow operations MCP Server).

<Steps>
  <Step title="Install uv">
    To install `uv`, run one of the following commands, depending on your operating system:

    <Tabs>
      <Tab title="macOS, Linux">
        To use `curl` with `sh`:

        ```bash  theme={null}
        curl -fsSL https://get.uv.dev | bash
        ```

        To use `wget` with `sh` instead:

        ```bash  theme={null}
        wget -qO- https://astral.sh/uv/install.sh | sh
        ```
      </Tab>

      <Tab title="Windows">
        To use PowerShell with `irm` to download the script and run it with `iex`:

        ```powershell  theme={null}
        powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
        ```
      </Tab>
    </Tabs>

    If you need to install `uv` by using other approaches such as PyPI, Homebrew, or WinGet,
    see [Installing uv](https://docs.astral.sh/uv/getting-started/installation/).
  </Step>

  <Step title="Install CrewAI open source">
    Use `uv` to install the CrewAI open-source toolchain, by running the following command:

    ```bash  theme={null}
    uv tool install crewai
    ```
  </Step>

  <Step title="Install Python">
    <Note>
      CrewAI open source works only with Python 3.10, 3.11, and 3.12.
    </Note>

    `uv` will detect and use Python if you already have it installed.
    To view a list of installed Python versions, run the following command:

    ```bash  theme={null}
    uv python list
    ```

    If, however, you do not already have Python installed, you can install a version of Python for use with `uv`
    by running the following command. For example, this command installs Python 3.12 for use with `uv`:

    ```bash  theme={null}
    uv python install 3.12
    ```
  </Step>
</Steps>

## Build and run the CrewAIproject

You are now ready to start coding.

<Steps>
  <Step title="Create the project directory">
    Switch to the directory on your local development machine where you want to create the project directory for this walkthrough.
    This example creates a project directory named `crewai_unstructured_demo` within your current working directory and then
    switches to this new project directory:

    ```bash  theme={null}
    mkdir crewai_unstructured_demo
    cd crewai_unstructured_demo
    ```
  </Step>

  <Step title="Intiialize the project">
    From within the new project directory, use `uv` to initialize the project by running the following command:

    ```bash  theme={null}
    uv init
    ```
  </Step>

  <Step title="Create a venv virtual environment">
    To isolate and manage your project's code dependencies, you should create a virtual environment. This walkthrough uses
    the popular Python virtual environment manager [venv](https://docs.python.org/3/library/venv.html) (although `venv` is not required to use CrewAI or the Unstructured API's workflow operations MCP Server).
    From the root of your project directory, use `uv` to create a virtual environment with `venv` by running the following command:

    ```bash  theme={null}
    # Create the virtual environment by using the current Python version:
    uv venv

    # Or, if you want to use a specific Python version:
    uv venv --python 3.12
    ```
  </Step>

  <Step title="Activate the virtual environment">
    To activate the `venv` virtual environment, run one of the following commands from the root of your project directory:

    <Tabs>
      <Tab title="macOS, Linux">
        * For `bash` or `zsh`, run `source .venv/bin/activate`
        * For `fish`, run `source .venv/bin/activate.fish`
        * For `csh` or `tcsh`, run `source .venv/bin/activate.csh`
        * For `pwsh`, run `.venv/bin/Activate.ps1`
      </Tab>

      <Tab title="Windows">
        * For `cmd.exe`, run `.venv\Scripts\activate.bat`
        * For `PowerShell`, run `.venv\Scripts\Activate.ps1`
      </Tab>
    </Tabs>

    If you need to deactivate the virtual environment at any time, run the following command:

    ```bash  theme={null}
    deactivate
    ```
  </Step>

  <Step title="Get the Unstructured API's workflow operations MCP Server's source code">
    The Unstructured API's workflow operations MCP Server is a Python package that provides an MCP server for the Unstructured API's workflow operations.
    To get the MCP server's source code, run the following command from the root of your project directory:

    ```bash  theme={null}
    git clone https://github.com/Unstructured-IO/UNS-MCP.git
    ```
  </Step>

  <Step title="Install the Unstructured API's workflow operations MCP Server's code dependencies">
    From the root of your project directory, switch to the cloned Unstructured API's workflow operations MCP Server's source directory, and then use `uv` to install the MCP server's code dependencies, by running the following commands:

    ```bash  theme={null}
    cd UNS-MCP
    uv sync
    ```
  </Step>

  <Step title="Install the CrewAI project's code dependencies">
    Switch back to the CrewAI project's root directory, and then use `uv` to install the CrewAI project's dependencies, by running the following commands:

    ```bash  theme={null}
    cd ..
    uv add "crewai>=0.11.0" "crewai-tools[mcp]>=0.0.5" "pydantic>=2.11.0" python-dotenv
    ```
  </Step>

  <Step title="Add the CrewAI project's source code">
    In the `main.py` file in the CrewAI project's root directory, replace that file's contents with the following Python code. This code defines a set of
    CrewAI-compatible [agents](https://docs.crewai.com/concepts/agents) and [tasks](https://docs.crewai.com/concepts/tasks) that make up a
    multi-agent [crew](https://docs.crewai.com/concepts/crews). This code then uses the crew's agents to run the tasks in order to build an Unstructured ETL+ workflows
    from connector creation all the way through to workflow job completion:

    ```python  theme={null}
    import os
    from typing import Optional
    from crewai import LLM, Agent, Crew, Task
    from crewai_tools import MCPServerAdapter
    from dotenv import load_dotenv
    from pydantic import BaseModel, Field
    from rich.pretty import pprint

    load_dotenv()

    class SourceConfigurationResult(BaseModel):
        source_id: Optional[str] = Field(
            default=None,
            description="The ID of the configured data source.",
        )
        source_type: Optional[str] = Field(
            default=None,
            description="The type of data source configured.",
        )
        source_name: Optional[str] = Field(
            default=None,
            description="The name of the configured data source.",
        )
        source_config: Optional[dict] = Field(
            default=None,
            description="The configuration details of the data source.",
        )

    class DestinationConfigurationResult(BaseModel):
        destination_id: Optional[str] = Field(
            default=None,
            description="The ID of the configured data destination.",
        )
        destination_type: Optional[str] = Field(
            default=None,
            description="The type of data destination configured.",
        )
        destination_name: Optional[str] = Field(
            default=None,
            description="The name of the configured data destination.",
        )
        destination_config: Optional[dict] = Field(
            default=None,
            description="The configuration details of the data destination.",
        )

    class WorkflowConfigurationResult(BaseModel):
        workflow_id: Optional[str] = Field(
            default=None,
            description="The ID of the created workflow.",
        )
        workflow_name: Optional[str] = Field(
            default=None,   
            description="The name of the created workflow.",
        )
        workflow_type: Optional[str] = Field(
            default=None,
            description="The type of workflow created.",
        )
        workflow_config: Optional[dict] = Field(
            default=None,
            description="The configuration details of the workflow.",
        )

    class JobConfigurationResult(BaseModel):
        job_id: Optional[str] = Field(
            default=None,
            description="The ID of the created job.",
        )

    class JobStatusResult(BaseModel):
        job_status: Optional[str] = Field(
            default=None,
            description="The status of the job.",
        )

    def main():

        with MCPServerAdapter(
            {"url": os.getenv("MCP_SERVER_URL", "http://127.0.0.1:8080/sse")},
        ) as tools:
            llm = LLM(model="anthropic/claude-3-opus-20240229", temperature=0.7, max_tokens=4096)

            source_agent = Agent(
                role="Source Configuration Specialist",
                goal="Configure and manage data sources for the MCP server",
                backstory="""You are an expert in data source configuration and management.
                You specialize in setting up and configuring various types of data sources
                including AWS S3, Google Drive, and other storage systems. You ensure
                proper configuration and validation of data sources.""",
                tools=tools,
                llm=llm,
                verbose=True,
            )

            destination_agent = Agent(
                role="Destination Configuration Specialist",
                goal="Configure and manage data destinations for the MCP server",
                backstory="""You are an expert in data destination configuration and management.
                You specialize in setting up and configuring various types of data destinations
                including AWS S3, Google Drive, and other storage systems. You ensure
                proper configuration and validation of data destinations.""",
                tools=tools,
                llm=llm,
                verbose=True,
            )

            workflow_agent = Agent(
                role="Workflow Configuration Specialist",
                goal="Configure and manage workflows for the MCP server",
                backstory="""You are an expert in workflow configuration and management.
                You specialize in setting up and configuring various types of workflows
                including AWS S3, Google Drive, and other storage systems. You ensure
                proper configuration and validation of workflows.""",
                tools=tools,
                llm=llm,
                verbose=True,
            )

            run_workflow_agent = Agent(
                role="Workflow Run Specialist",
                goal="Run workflows for the MCP server",
                backstory="""You are an expert in workflow run management.
                You specialize in running various types of workflows
                including AWS S3, Google Drive, and other storage systems. You ensure
                proper running of workflows.""",
                tools=tools,
                llm=llm,
                verbose=True,
            )

            job_agent = Agent(
                role="Job Management Specialist",
                goal="Manage jobs for the MCP server",
                backstory="""You are an expert in job management.
                You specialize in monitoring  various types of jobs
                proper monitoring of jobs.""",
                tools=tools,
                llm=llm,
                verbose=True,
            )

            source_task = Task(
                description=f"""Configure an S3 source with the following specifications:
                - Name: {os.getenv("S3_SOURCE_CONNECTOR_NAME")}
                - URI: {os.getenv("S3_SOURCE_URI")}
                - Recursive: true
                Ensure the source is properly configured and return the configuration details.""",
                agent=source_agent,
                expected_output="""A result containing:
                - source_id: The ID of the configured source
                - source_config: The configuration details
                - source_name: The name of the configured source
                - source_type: The type of data source configured
                """,
                output_pydantic=SourceConfigurationResult,
            )

            destination_task = Task(
                description=f"""Configure an S3 destination with the following specifications:
                - Name: {os.getenv("S3_DESTINATION_CONNECTOR_NAME")}
                - URI: {os.getenv("S3_DESTINATION_URI")}
                Ensure the destination is properly configured and return the configuration details.""",
                agent=destination_agent,
                expected_output="""A result containing:
                - destination_id: The ID of the configured destination
                - destination_config: The configuration details
                - destination_name: The name of the configured destination
                - destination_type: The type of data destination configured
                """,
                output_pydantic=DestinationConfigurationResult,
            )

            workflow_task = Task(
                description=f"""Create a workflow that uses the High Res partitioner and the following specifications:
                - Name: {os.getenv("WORKFLOW_NAME")}
                - Type: custom
                Use the source and destination IDs from the previous tasks to create the workflow.
                Ensure the workflow is properly configured to connect these endpoints.""",
                agent=workflow_agent,
                expected_output="""A result containing:
                - workflow_id: The ID of the created workflow
                - workflow_config: The configuration details
                - workflow_name: The name of the created workflow
                - workflow_type: The type of workflow created
                """,
                output_pydantic=WorkflowConfigurationResult,
                context=[source_task, destination_task],
            )

            run_workflow_task = Task(
                description=f"""Run a workflow.
                Use the workflow ID from the previous task to run the workflow.""",
                agent=run_workflow_agent,
                expected_output="""A result containing:
                - job_id: The ID of the running workflow
                """,
                output_pydantic=JobConfigurationResult,
                context=[workflow_task],
            )

            job_task = Task(
                description=f"""Monitor a job.
                Use the job ID from the previous task.
                Keep monitoring the job every 15 seconds until it is completed.
                Each time you check, report the current job status.
                When the job status is completed, stop reporting and return the final job status.""",
                agent=job_agent,
                expected_output="""A result containing:
                - job_status: The status of the job
                """,
                output_pydantic=JobStatusResult,
                context=[run_workflow_task],
            ) 

            crew = Crew(agents=[
                                   source_agent,
                                   destination_agent,
                                   workflow_agent,
                                   run_workflow_agent,
                                   job_agent
                                ], 
                        tasks=[
                                   source_task, 
                                   destination_task, 
                                   workflow_task, 
                                   run_workflow_task,
                                   job_task
                              ], 
                        verbose=True
                )

            result = crew.kickoff()
            pprint("Source Task Result:")
            pprint(result.tasks_output[0].pydantic)
            pprint("Destination Task Result:")
            pprint(result.tasks_output[1].pydantic)
            pprint("Workflow Task Result:")
            pprint(result.tasks_output[2].pydantic)
            pprint("Run Workflow Task Result:")
            pprint(result.tasks_output[3].pydantic)
            pprint("Job Task Result:")
            pprint(result.tasks_output[4].pydantic)

    if __name__ == "__main__":
        main()
    ```

    The preceding code does the following:

    1. Imports the necessary library modules for the rest of the code to use.
    2. Loads the environment variables that the code relies on from a `.env` file, which you will create in the next step.
    3. Defines the [Pydantic](https://docs.pydantic.dev/)-formatted models for the expected output of each task. These models format the tasks' output for consistent presentation.
    4. Defines the agents and tasks for the crew. These agents use their related tasks to create a source connector,
       a destination connector, and a workflow that uses these connectors, and then runs the newly created workflow as a job and reports the job's status.
    5. After the crew is finished, the results of each task are printed.
  </Step>

  <Step title="Create the .env file">
    Create an `.env` file in the root of your CrewAI project directory, and then add the following environment variables to the file:

    * `UNSTRUCTURED_API_KEY` - Your Unstructured API key.
    * `ANTHROPIC_API_KEY` - Your Anthropic API key.
    * `FIRECRAWL_API_KEY` - Your Firecrawl API key.
    * `AWS_KEY` - The AWS access key ID for the AWS IAM user that has access to the S3 bucket.
    * `AWS_SECRET` - The IAM user's AWS secret access key.
    * `S3_SOURCE_CONNECTOR_NAME` - Some display name for the S3 source connector.
    * `S3_SOURCE_URI` - The URI of the `input` folder in the S3 bucket.
    * `S3_DESTINATION_CONNECTOR_NAME` - The name of the S3 destination connector.
    * `S3_DESTINATION_URI` - The URI of the `output` folder in the S3 bucket.
    * `WORKFLOW_NAME` - Some display name for the workflow.

    For example, your `.env` file might look like this:

    ```bash  theme={null}
    - UNSTRUCTURED_API_KEY=7F1...<redacted>hZV4
    - ANTHROPIC_API_KEY=sk-ant-api03-09b...<redacted>...wAA
    - FIRECRAWL_API_KEY=fc-af337...<redacted>...329
    - AWS_KEY=AKI...<redacted>...3DX
    - AWS_SECRET=end...<redacted>...Xxb
    - S3_SOURCE_CONNECTOR_NAME=My-MCP-S3-Source
    - S3_SOURCE_URI=s3://my-bucket/input
    - S3_DESTINATION_CONNECTOR_NAME=My-MCP-S3-Destination
    - S3_DESTINATION_URI=s3://my-bucket/output
    - WORKFLOW_NAME=My-MCP-S3-To-S3-Workflow
    ```
  </Step>

  <Step title="Unstructured API's workflow operations MCP Server">
    From the root of your project directory, switch to the Unstructured API's workflow operations MCP Server's source directory, and then use `make` to run the MCP server locally, by running the following commands:

    ```bash  theme={null}
    cd UNS-MCP
    make sse-server
    ```

    <Note>If you do not have `make` available, see your operating system's documentation for installation instructions.</Note>

    The MCP server will start running at `http://127.0.0.1:8080/sse`.

    You must leave the MCP server running in your terminal or command prompt window while you run the CrewAI project.
  </Step>

  <Step title="Run the CrewAI project">
    1. In a separate terminal or command prompt window, from the root of your project directory, run the CrewAI project by running the following command:

       ```bash  theme={null}
       uv run main.py
       ```

    2. The CrewAI project will run and create an Unstructured ETL+ workflow. You can see the crew's progress in the terminal or command prompt window where you ran the `uv run main.py` command.

    3. The crew's agents will create a source connector, a destination connector, and a workflow that uses these connectors.

    4. The crew's agents will then run the workflow as a job and report on the job's status.

    5. After the job is completed, the crew will report final information about the tasks that were completed, for example:

       ```bash  theme={null}
       'Source Task Result:'
       SourceConfigurationResult(
       │   source_id='c26...<redacted>...83b',
       │   source_type='s3',
       │   source_name='My-MCP-S3-Source',
       │   source_config={
       │   │   'anonymous': False,
       │   │   'recursive': True,
       │   │   'remote_url': 's3://my-bucket/input',
       │   │   'endpoint_url': None,
       │   │   'key': '**********',
       │   │   'secret': '**********',
       │   │   'token': None
       │   }
       )
       'Destination Task Result:'
       DestinationConfigurationResult(
       │   destination_id='d57...<redacted>...ea4',
       │   destination_type='s3',
       │   destination_name='My-MCP-S3-Destination',
       │   destination_config={
       │   │   'anonymous': False,
       │   │   'remote_url': 's3://my-bucket/output',
       │   │   'endpoint_url': None,
       │   │   'key': '**********',
       │   │   'secret': '**********',
       │   │   'token': None
       │   }
       )
       'Workflow Task Result:'
       WorkflowConfigurationResult(
       │   workflow_id='ae5...<redacted>...cae',
       │   workflow_name='My-MCP-S3-To-S3-Workflow',
       │   workflow_type='custom',
       │   workflow_config={
       │   │   'name': 'My-MCP-S3-To-S3-Workflow',
       │   │   'workflow_type': 'custom',
       │   │   'source_id': 'c26...<redacted>...83b',
       │   │   'destination_id': 'd57...<redacted>...ea4',
       │   │   'workflow_nodes': [
       │   │   │   {
       │   │   │   │   'name': 'Partitioner',
       │   │   │   │   'type': 'partition',
       │   │   │   │   'subtype': 'unstructured_api',
       │   │   │   │   'settings': {
       │   │   │   │   │   'strategy': 'hi_res',
       │   │   │   │   │   'include_page_breaks': False,
       │   │   │   │   │   'pdf_infer_table_structure': False,
       │   │   │   │   │   'exclude_elements': None,
       │   │   │   │   │   'xml_keep_tags': False,
       │   │   │   │   │   'encoding': 'utf-8',
       │   │   │   │   │   'ocr_languages': ['eng'],
       │   │   │   │   │   'extract_image_block_types': ['image', 'table']
       │   │   │   │   }
       │   │   │   }
       │   │   ]
       │   }
       )
       'Run Workflow Task Result:'
       JobConfigurationResult(job_id='c3a5...<redacted>...bfa')
       'Job Task Result:'
       JobStatusResult(job_status='Completed')
       ```

    6. You can sign in to your Unstructured account to see the results:

       * On the sidebar, click **Connectors** to see the source and destination connectors.
       * On the sidebar, click **Workflows** to see the workflow.
       * On the sidebar, click **Jobs** to see the job.

    7. If you do not want to keep the MCP server running, you can stop it by pressing `Ctrl+C` in the terminal or command prompt window where you ran the `make sse-server` command.
  </Step>
</Steps>

## Learn more

* [Model Context Protocol (MCP) Hands-On Walkthrough for the Unstructured API's workflow operations](/examplecode/tools/mcp)
* [Unstructured API's workflow operations Reference](/api-reference/workflow/overview)
* [CrewAI Documentation](https://docs.crewai.com/)

---

# Source: https://docs.unstructured.io/support/issues/data-format-schema-validation.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Data format, schema, and validation issues

## Issues

When Unstructured tries to process or transfer data, you get one of the following error types:

* `ValidationError`: For example, `S3ConnectionConfig`, `PluginResponse`, or `filedata_meta input type` errors.
* `ClientResponseError`: For example, an `HTTP 422 Unprocessable Entity` error.
* `ListConversionException`: For example, for Pinecone, an `Expected list, got None` error.
* `ServerOperationError`: For example, for Databricks, an `UNRESOLVED_COLUMN` error.
* `WeaviateDeleteManyError`: For example, a `no such prop with name 'record_id'` error.
* `CollectionInsertManyException`: For example, for Astra DB, a `vector dimension mismatch` error.
* `TypeError`: For example, for Pinecone, a `PineconeUploader missing argument` error.
* `UserError`: For example, a wrappingschema validation failure.

## Possible causes

* The data being processed or transferred doesn't match the expected structure, format, or validation rules.
* The data provided in configuration or during processing is not valid.
* The data being sent to a destination doesn't match the destination's schema—for example, the data has missing fields, wrong data types, or an incorrect number of vector dimensions.
* Incorrect configuration values were provided—for example, a non-string token was provided where a string was expected instead.
* There is a schema mismatch between the data generated by Unstructured and the destination schema—for example, missing columns or properties, or wrong data types.
* The specified embedding model generates vectors of a different dimension than the destination index or collection is configured for.
* The data was generated in an unexpected format.

## Possible solutions

* **Verify the data configuration**: Double-check configuration parameters against the relevant documentation for the correct data format and type.
* **Verify the destination schema**: Ensure that the schema—including columns, properties, types, and vector dimensions—in the destination's system matches the data being sent. You might need to update or recreate the destination schema.
* **Check the data fields**: Check if fields such as `record_id`, `element_id`, `text`, `embeddings`, and `metadata` are expected and present.
* **Verify the embedding dimensions**: Confirm that the specified embedding model produces vectors of the dimension expected by the destination collection or index.
* **Contact Unstructured Support**: For `HTTP 422 Unprocessable Entity` errors where the cause isn't clear, `ListConversionException` errors, `TypeError` errors, or persistent schema mismatch issues after verification, [contact Unstructured Support](/support/request).

## Additional resources

To ask questions or get additional help with this issue, see [requesting support](/support/request).

---

# Source: https://docs.unstructured.io/support/how-to/data-privacy.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Delete or export personal data stored by Unstructured

You can request that Unstructured delete the personal data that Unstructured stores in its systems about you, export that
personal data and send a copy of it to you, or both.

To make a personal data deletion or export request, fill out and submit the
[Data Privacy Request](https://na1.hs-data-privacy.com/request/UYsLQfVEBgaaIvrZb74_EA) webform.

After Unstructured receives your request, within one business day, they will delete the personal data that Unstructured stores in its
systems about you or export that personal data. Unstructured will then send an email to the address listed on the webform—this email will
confirm the deletion or provide a copy of the exported
personal data, respectively. If you have not received this email within 3 business days, you can email Unstructured Support at
[support@unstructured.io](mailto:support@unstructured.io) to check on the status of your request.

If you have any questions about this process or need help filling out or submitting the webform, email Unstructured Support at
[support@unstructured.io](mailto:support@unstructured.io).

---

# Source: https://docs.unstructured.io/ui/destinations/databricks-delta-table.md

# Source: https://docs.unstructured.io/open-source/ingestion/destination-connectors/databricks-delta-table.md

# Source: https://docs.unstructured.io/api-reference/workflow/destinations/databricks-delta-table.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Delta Tables in Databricks

<Tip>
  This article covers connecting Unstructured to Delta Tables in Databricks.

  For information about connecting Unstructured to Delta Tables in Amazon S3 instead, see
  [Delta Tables in Amazon S3](/api-reference/workflow/destinations/delta-table).

  For information about connecting Unstructured to Databricks Volumes instead, see
  [Databricks Volumes](/api-reference/workflow/destinations/databricks-volumes).
</Tip>

<Note>
  If you're new to Unstructured, read this note first.

  Before you can create a destination connector, you must first sign in to your Unstructured account:

  * If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
    After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).
    To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/api-reference/overview#pricing).
  * If you already have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at
    [https://platform.unstructured.io](https://platform.unstructured.io). For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
    or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  After you sign in, the [Unstructured user interface](/ui/overview) (UI) appears, which you use to get your Unstructured API key.

  1. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business** account, click **API Keys** on the sidebar.<br />

     For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
     for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).

  2. Click **Generate API Key**.<br />

  3. Follow the on-screen instructions to finish generating the key.<br />

  4. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

  After you create the destination connector, add it along with a
  [source connector](/api-reference/workflow/sources/overview) to a [workflow](/api-reference/workflow/overview#workflows).
  Then run the worklow as a [job](/api-reference/workflow/overview#jobs). To learn how, try out the
  the notebook [Dropbox-To-Pinecone Connector API Quickstart for Unstructured](https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Dropbox_To_Pinecone_Connector_Quickstart.ipynb),
  or watch the two 4-minute video tutorials for the [Unstructured Python SDK](/api-reference/workflow/overview#unstructured-python-sdk).

  You can also create destination connectors with the Unstructured user interface (UI).
  [Learn how](/ui/destinations/overview).

  If you need help, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  You are now ready to start creating a destination connector! Keep reading to learn how.
</Note>

Send processed data from Unstructured to a Delta Table in Databricks.

The requirements are as follows.

* A Databricks account on [AWS](https://docs.databricks.com/getting-started/free-trial.html),
  [Azure](https://learn.microsoft.com/azure/databricks/getting-started/), or
  [GCP](https://docs.gcp.databricks.com/getting-started/index.html).

* A workspace within the Datbricks account for [AWS](https://docs.databricks.com/admin/workspace/index.html),
  [Azure](https://learn.microsoft.com/azure/databricks/admin/workspace/), or
  [GCP](https://docs.gcp.databricks.com/admin/workspace/index.html).

* One of the following compute resources within the workspace:

  * A SQL warehouse for [AWS](https://docs.databricks.com/compute/sql-warehouse/create.html),
    [Azure](https://learn.microsoft.com/azure/databricks/compute/sql-warehouse/create), or
    [GCP](https://docs.gcp.databricks.com/compute/sql-warehouse/create.html).

    The following video shows how to create a SQL warehouse if you do not already have one available, get its **Server Hostname** and **HTTP Path** values, and set permissions for someone other than the warehouse's owner to use it:

    <iframe width="560" height="315" src="https://www.youtube.com/embed/N-Aw9-U3_fE" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

  * An all-purpose cluster for [AWS](https://docs.databricks.com/compute/use-compute.html),
    [Azure](https://learn.microsoft.com/azure/databricks/compute/use-compute), or
    [GCP](https://docs.gcp.databricks.com/compute/use-compute.html).

    The following video shows how to create an all-purpose cluster if you do not already have one available, get its **Server Hostname** and **HTTP Path** values, and set permissions for someone other than the cluster's owner to use it:

    <iframe width="560" height="315" src="https://www.youtube.com/embed/apgibaelVY0" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* The SQL warehouse's or cluster's **Server Hostname** and **HTTP Path** values for [AWS](https://docs.databricks.com/integrations/compute-details.html),
  [Azure](https://learn.microsoft.com/azure/databricks/integrations/compute-details), or
  [GCP](https://docs.gcp.databricks.com/integrations/compute-details.html).

* Unity Catalog enabled in the workspace for [AWS](https://docs.databricks.com/data-governance/unity-catalog/get-started.html),
  [Azure](https://learn.microsoft.com/azure/databricks/data-governance/unity-catalog/get-started), or
  [GCP](https://docs.gcp.databricks.com/data-governance/unity-catalog/get-started.html).

* Within Unity Catalog:

  * A catalog
    for [AWS](https://docs.databricks.com/catalogs/create-catalog.html),
    [Azure](https://learn.microsoft.com/azure/databricks/catalogs/create-catalog), or
    [GCP](https://docs.gcp.databricks.com/catalogs/create-catalog.html).
  * A schema (formerly known as a database)
    for [AWS](https://docs.databricks.com/schemas/create-schema.html),
    [Azure](https://learn.microsoft.com/azure/databricks/schemas/create-schema), or
    [GCP](https://docs.gcp.databricks.com/schemas/create-schema.html)
    within that catalog,
  * A table
    for [AWS](https://docs.databricks.com/tables/managed.html),
    [Azure](https://learn.microsoft.com/azure/databricks/tables/managed), or
    [GCP](https://docs.gcp.databricks.com/tables/managed.html)
    within that schema (formerly known as a database).

    You can have the connector attempt to create a table for you automatically at run time. To do this, in the connector settings as described later in this article,
    do one of the following:

    * Specify the name of the table that you want the connector to attempt to create within the specified catalog and schema (formerly known as a database).
    * Leave the table name blank. The connector will attempt to create a table within the specified catalog and schema (formerly known as a database).
      For the [Unstructured UI](/ui/overview) and [Unstructured API](/api-reference/overview), the table is named `u<short-workflow-id>`.
      For the [Unstructured Ingest CLI and Ingest Python library](/open-source/ingestion/overview), the table is named `unstructuredautocreated`.

    The connector will attempt to create the table on behalf of the related Databricks workspace user or Databricks managed service principal that is referenced in the connector settings, as described later in these requirements.
    If successful, the table's owner is set as the related Databricks workspace user or Databricks managed service principal. The owner will have all Unity Catalog
    privileges on the table by default. No other Databricks workspace users or Databricks managed service principals will have any privileges on the table by default.

    <Warning>
      If the table's parent schema (formerly known as a database) is not owned by the same Databricks workspace user or Databricks managed service principal that is
      referenced in the connector settings, then you should grant the new table's owner the `CREATE TABLE` privilege on that parent schema (formerly known as a database)
      before the connector attempts to create the table. Otherwise, table creation could fail.
    </Warning>

    <Note>
      Using dashes (`-`) in the names of catalogs, schemas (formerly known as databases), and tables might cause isolated issues with the connector. It is
      recommended to use underscores (`_`) instead of dashes in the names of catalogs, schemas, and tables.
    </Note>

  The following video shows how to create a catalog, schema (formerly known as a database), and a table in Unity Catalog if you do not already have them available, and set privileges for someone other than their owner to use them:

  <iframe width="560" height="315" src="https://www.youtube.com/embed/ffNnq-6bpd4" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

  If you want to use an existing table or create one yourself beforehand, this table must contain at minimum the following column names and their data types:

  ```text  theme={null}
  CREATE TABLE IF NOT EXISTS <catalog_name>.<schema_name>.<table_name> (
      id STRING NOT NULL PRIMARY KEY,
      record_id STRING NOT NULL,
      element_id STRING NOT NULL,
      text STRING,
      embeddings ARRAY<FLOAT>,
      type STRING,
      metadata VARIANT
  );
  ```

  <Info>
    In Databricks, a table's *schema* is different than a *schema* (formerly known as a database) in a catalog-schema object relationship in Unity Catalog.
  </Info>

* Within Unity Catalog, a volume
  for [AWS](https://docs.databricks.com/volumes/utility-commands.html),
  [Azure](https://learn.microsoft.com/azure/databricks/volumes/utility-commands),
  or [GCP](https://docs.gcp.databricks.com/volumes/utility-commands.html). The volume can be in the same
  schema (formerly known as a database) as the table, or the volume and table can be in separate schemas. In either case, both of these
  schemas must share the same parent catalog.

  <Note>
    Using dashes (`-`) in the names of volumes might cause isolated issues with the connector. It is
    recommended to use underscores (`_`) instead of dashes in the names of volumes.
  </Note>

  The following video shows how to create a catalog, schema (formerly known as a database), and a volume in Unity Catalog if you do not already have them available, and set privileges for someone other than their owner to use them:

  <iframe width="560" height="315" src="https://www.youtube.com/embed/yF9DJphhQQc" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* For Databricks managed service principal authentication (using Databricks OAuth M2M) to the workspace:

  * A Databricks managed service principal.
    This service principal must have the appropriate access permissions to the catalog, schema (formerly known as a database), table, volume, and cluster or SQL warehouse.
  * The service principal's **UUID** (or **Client ID** or **Application ID**) value.
  * The OAuth **Secret** value for the service principal.

  To get this information, see Steps 1-3 of the instructions for [AWS](https://docs.databricks.com/dev-tools/auth/oauth-m2m.html),
  [Azure](https://learn.microsoft.com/azure/databricks/dev-tools/auth/oauth-m2m), or
  [GCP](https://docs.gcp.databricks.com/dev-tools/auth/oauth-m2m.html).

  <Note>
    For Azure Databricks, this connector only supports Databricks managed service principals for authentication.
    Microsoft Entra ID managed service principals are not supported.
  </Note>

  The following video shows how to create a Databricks managed service principal if you do not already have one available:

  <iframe width="560" height="315" src="https://www.youtube.com/embed/wBmqv5DaA1E" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* For Databricks personal access token authentication to the workspace, the
  Databricks personal access token value for
  [AWS](https://docs.databricks.com/dev-tools/auth/pat.html#databricks-personal-access-tokens-for-workspace-users),
  [Azure](https://learn.microsoft.com/azure/databricks/dev-tools/auth/pat#azure-databricks-personal-access-tokens-for-workspace-users), or
  [GCP](https://docs.gcp.databricks.com/dev-tools/auth/pat.html#databricks-personal-access-tokens-for-workspace-users).
  This token must be for the workspace user who
  has the appropriate access permissions to the catalog, schema (formerly known as a database), table, volume, and cluster or SQL warehouse,

  The following video shows how to create a Databricks personal access token if you do not already have one available:

  <iframe width="560" height="315" src="https://www.youtube.com/embed/OzEU2miAS6I" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* The Databricks workspace user or Databricks managed service principal must have the following *minimum* set of permissions and privileges to write to an
  existing volume or table in Unity Catalog. If the owner of these is that Databricks workspace user or Databricks managed service principal, then
  they will have all necessary permissions and privileges by default. If the owner is someone else, then the following permissions and privileges must be
  explicitly granted to them before using the connector:

  * To use an all-purpose cluster for access, `Can Restart` permission on that cluster. Learn how to check and set cluster permissions for
    [AWS](https://docs.databricks.com/compute/clusters-manage.html#compute-permissions),
    [Azure](https://learn.microsoft.com/azure/databricks/compute/clusters-manage#cluster-level-permissions), or
    [GCP](https://docs.gcp.databricks.com/compute/clusters-manage.html#compute-permissions).

  * To use a SQL warehouse for access, `Can use` permission on that SQL warehouse. Learn how to check and set SQL warehouse permissions for
    [AWS](https://docs.databricks.com/compute/sql-warehouse/create.html#manage-a-sql-warehouse),
    [Azure](https://learn.microsoft.com/azure/databricks/compute/sql-warehouse/create#manage), or
    [GCP](https://docs.gcp.databricks.com/compute/sql-warehouse/create.html#manage-a-sql-warehouse).

  * To access a Unity Catalog volume, the following privileges:

    * `USE CATALOG` on the volume's parent catalog in Unity Catalog.
    * `USE SCHEMA` on the volume's parent schema (formerly known as a database) in Unity Catalog.
    * `READ VOLUME` and `WRITE VOLUME` on the volume.

    Learn how to check and set Unity Catalog privileges for
    [AWS](https://docs.databricks.com/data-governance/unity-catalog/manage-privileges/index.html#show-grant-and-revoke-privileges),
    [Azure](https://learn.microsoft.com/azure/databricks/data-governance/unity-catalog/manage-privileges/#grant), or
    [GCP](https://docs.gcp.databricks.com/data-governance/unity-catalog/manage-privileges/index.html#show-grant-and-revoke-privileges).

  * To access a Unity Catalog table, the following privileges:

    * `USE CATALOG` on the table's parent catalog in Unity Catalog.
    * `USE SCHEMA` on the table's parent schema (formerly known as a database) in Unity Catalog.
    * To create a new table, `CREATE TABLE` on the table's parent schema (formerly known as a database) in Unity Catalog.
    * If the table already exists, `MODIFY` and `SELECT` on the table.

    Learn how to check and set Unity Catalog privileges for
    [AWS](https://docs.databricks.com/data-governance/unity-catalog/manage-privileges/index.html#show-grant-and-revoke-privileges),
    [Azure](https://learn.microsoft.com/azure/databricks/data-governance/unity-catalog/manage-privileges/#grant), or
    [GCP](https://docs.gcp.databricks.com/data-governance/unity-catalog/manage-privileges/index.html#show-grant-and-revoke-privileges).

To create a Delta Tables in Databricks destination connector, see the following examples.

<CodeGroup>
  ```python Python SDK theme={null}
  import os

  from unstructured_client import UnstructuredClient
  from unstructured_client.models.operations import CreateDestinationRequest
  from unstructured_client.models.shared import CreateDestinationConnector

  with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
      response = client.destinations.create_destination(
          request=CreateDestinationRequest(
              create_destination_connector=CreateDestinationConnector(
                  name="<name>",
                  type="databricks_volume_delta_tables",
                  config={
                      "server_hostname": "<server-hostname>",
                      "http_path": "<http-path>",
                      "token": "<token>",
                      "client_id": "<client-id>",
                      "client_secret": "<client-secret>",
                      "volume": "<volume>",
                      "catalog": "<catalog>",
                      "volume_path": "<volume_path>",
                      "schema": "<schema>",
                      "database": "<database>",
                      "table_name": "<table_name>"
                  }
              )
          )
      )

      print(response.destination_connector_information)
  ```

  ```bash curl theme={null}
  curl --request 'POST' --location \
  "$UNSTRUCTURED_API_URL/destinations" \
  --header 'accept: application/json' \
  --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
  --header 'content-type: application/json' \
  --data \
  '{
      "name": "<name>",
      "type": "databricks_volume_delta_tables",
      "config": {
          "server_hostname": "<server-hostname>",
          "http_path": "<http-path>",
          "token": "<token>",
          "client_id": "<client-id>",
          "client_secret": "<client-secret>",
          "volume": "<volume>",
          "catalog": "<catalog>",
          "volume_path": "<volume_path>",
          "schema": "<schema>",
          "database": "<database>",
          "table_name": "<table_name>"
      }
  }'
  ```
</CodeGroup>

Replace the preceding placeholders as follows:

* `<name>` (*required*) - A unique name for this connector.

* `<server-hostname>` (*required*): The target Databricks cluster's or SQL warehouse's **Server Hostname** value.

* `<http-path>` (*required*): The cluster's or SQL warehouse's **HTTP Path** value.

* `<token>` (*required* for PAT authentication): For Databricks personal access token (PAT) authentication, the target Databricks user's PAT value.

* `<client-id>` and `<client-secret>` (*required* for OAuth authentication): For Databricks OAuth machine-to-machine (M2M) authentication, the Databricks managed service principal's **UUID** (or **Client ID** or **Application ID**) and OAuth **Secret** (client secret) values.

* `<catalog>` (*required*): The name of the catalog in Unity Catalog for the target volume and table in the Databricks workspace.

* `<database>`: The name of the schema (formerly known as a database) in Unity Catalog for the target table. The default is `default` if not otherwise specified.

  If the target table and volume are in the same schema (formerly known as a database), then `<database>` and `<schema>` will have the same values.

* `<table_name>`: The name of the target table in Unity Catalog.

  * If a table name is specified, but a table with that name does not exist within the specified schema (formerly known as a database), the connector attempts to create a table with that name within that schema.
  * If no table name is specified, the connector attempts to create a table named `u<short-workflow-id>` within the specified schema (formerly known as a database).

  See the beginning of this article for additional technical requirements before having the connector attempt to create a table.

* `<schema>`: The name of the schema  (formerly known as a database) in Unity Catalog for the target volume. The default is `default` if not otherwise specified.

  If the target volume and table are in the same schema (formerly known as a database), then `<schema>` and `<database>` will have the same values.

* `<volume>` (*required*): The name of the target volume in Unity Catalog.

* `<volume_path>`: Any target folder path inside of the volume to use instead of the volume's root. If not otherwise specified, processing occurs at the volume's root.

<Note>
  Using dashes (`-`) in the names of catalogs, schemas (formerly known as databases), tables, and volumes might cause isolated issues with the connector. It is
  recommended to use underscores (`_`) instead of dashes in the names of catalogs, schemas, tables, and volumes.
</Note>

## Learn more

* <Icon icon="blog" />  [Integration Highlight: Databricks Delta Tables](https://unstructured.io/blog/integration-highlight-databricks-delta-tables)

---

# Source: https://docs.unstructured.io/examplecode/tools/databricks-volumes-events.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Databricks Volumes event triggers

You can use Databricks Volumes events, such as uploading files to Databricks Volumes, to automatically run Unstructured ETL+ workflows
that rely on those Databricks Volumes as sources. This enables a no-touch approach to having Unstructured automatically process files as they are uploaded to Databricks Volumes.

This example shows how to automate this process by adding a custom job in Lakeflow Jobs for your Databricks workspace in
[AWS](https://docs.databricks.com/aws/jobs/), [Azure](https://learn.microsoft.com/azure/databricks/jobs/), or
[GCP](https://docs.databricks.com/gcp/jobs). This job runs
whenever a file upload event is detected in the specified Databricks Volume. This job uses a custom Databricks notebook to call the [Unstructured API's workflow operations](/api-reference/workflow/overview) to automatically run the
specified corresponding Unstructured ETL+ workflow within your Unstructured account.

<Note>
  This example uses a custom job in Lakeflow Jobs and a custom Databricks notebookthat you create and maintain.
  Any issues with file detection, timing, or job execution could be related to your custom job or notebook,
  rather than with Unstructured. If you are getting unexpected or no results, be sure to check your custom
  job's run logs first for any informational and error messages.
</Note>

## Requirements

To use this example, you will need the following:

* An Unstructured account, and an Unstructured API key for your account, as follows:

  1. If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
     After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).

     <Note>
       To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/api-reference/overview#pricing).
     </Note>

  2. If you have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at [https://platform.unstructured.io](https://platform.unstructured.io).

     <Note>
       For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
       or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).
     </Note>

  3. Get your Unstructured API key:<br />

     a. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business** account, click **API Keys** on the sidebar.<br />

     <Note>
       For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
       for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).
     </Note>

     b. Click **Generate API Key**.<br />
     c. Follow the on-screen instructions to finish generating the key.<br />
     d. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

* The Unstructured API's workflow operations URL for your account, as follows:

  1. In the Unstructured UI, click **API Keys** on the sidebar.<br />
  2. Note the value of the **Unstructured API's workflow operations** field.

* A Databricks Volumes source connector in your Unstructured account. [Learn how](/ui/sources/databricks-volumes).

* Some available [destination connector](/ui/destinations/overview) in your Unstructured account.

* A workflow that uses the preceding source and destination connectors. [Learn how](/ui/workflows).

## Step 1: Create a notebook to run the Unstructured workflow

1. Sign in to the Databricks workspace within your Databricks account for AWS, Azure, or GCP that
   corresponds to the workspace you specified for your Databricks Volumes source connector.

2. On the sidebar, click **+ New > Notebook**.

3. Click the notebook's title and change it to something more descriptive, such as `Unstructured Workflow Runner Notebook`.

4. In the notebook's first cell, add the following code:

   ```python  theme={null}
   !pip install requests
   ```

5. Click **Edit > Insert cell below**.

6. In this second cell, add the following code:

   ```python  theme={null}
   import requests, os

   url = '<unstructured-api-url>' + '/workflows/<workflow-id>/run'

   # Option 1 (Recommended): Get your Unstructured API key from Databricks Secrets.
   api_key = dbutils.secrets.get(scope="<scope>", key="<key>")

   # Option 2: Get your Unstructured API key from an environment variable stored on 
   # the notebook's attached compute resource.
   api_key = os.getenv("UNSTRUCTURED_API_KEY")

   headers = {
       'accept': 'application/json',
       'content-type': 'application/json',
       'unstructured-api-key': api_key
   }

   json_data = {}

   try:
       response = requests.post(url, headers=headers, json=json_data)
       response.raise_for_status()
       print(f'Status Code: {response.status_code}')
       print('Response:', response.json())
   except Exception as e:
       print('An error occurred:', e)
   ```

7. Replace the placeholders in this second cell as follows:

   * Replace `<unstructured-api-url>` with the value of the **Unstructured API's workflow operations** field earlier from the requirements.
   * Replace `<workflow-id>` with the ID of the workflow that you want to run.
   * For your Unstructured API key, do one of the following:

     * (Recommended) If you want to use Databricks Secrets, replace `<scope>` and `<key>` with the scope and key names for the existing secret that you have already created in Databricks Secrets.
       Learn how to work with Databricks Secrets for
       [AWS](https://docs.databricks.com/aws/security/secrets/#secrets-overview),
       [Azure](https://learn.microsoft.com/azure/databricks/security/secrets/#secrets-overview), or
       [GCP](https://docs.databricks.com/gcp/security/secrets#secrets-overview).
     * If you want to use environment variables on the attached compute resource, set the `UNSTRUCTURED_API_KEY` to your Unstructured API key value. Learn how for
       [AWS](https://docs.databricks.com/aws/compute/configure#environment-variables),
       [Azure](https://learn.microsoft.com/azure/databricks/compute/configure#environment-variables), or
       [GCP](https://docs.databricks.com/gcp/compute/configure#environment-variables).

## Step 2: Create a job in Lakeflow Jobs to run the notebook

1. With your Databricks workspace still open from the previous step, on the sidebar, click **Jobs & Pipelines**.

2. Expand **Create new**, and then click **Job**.

3. Click the job's title and change it to something more descriptive, such as `Unstructured Workflow Runner Job`.

4. On the **Tasks** tab, enter some **Task name** such as `Run_Unstructured_Workflow_Runner_Notebook`.

5. With **Notebook** selected for **Type**, and with **Workspace** selected for **Source**, use the **Path** dropdown to select the notebook you created in the previous step.

6. For **Cluster**, select the cluster you want to use to run the notebook.

7. Click **Create task**.

8. In the **Job details** pane, under **Schedules & Triggers**, click **Add trigger**.

9. For **Trigger type**, select **File arrival**.

10. For **Storage location**, enter the path to the volume to monitor or, if you are monitoring a folder within that volume, the path to the folder. To get this path, do the following:

    a. On the sidebar, click **Catalog**.<br />
    b. In the list of catalogs, expand the catalog that contains the volume you want to monitor.<br />
    c. In the list of schemas (formerly known as databases), expand the schema that contains the volume you want to monitor.<br />
    d. Expand **Volumes**.<br />
    e. Click the volume you want to monitor.<br />
    f. On the **Overview** tab, copy the path to the volume you want to monitor or, if you are monitoring a folder within that volume, click the path to the folder and then copy the path to that folder.<br />

11. Click **Save**.

## Step 3: Trigger the job

1. With your Databricks workspace still open from the previous step, on the sidebar, click **Catalog**.
2. In the list of catalogs, expand the catalog that contains the volume that is being monitored.
3. In the list of schemas (formerly known as databases), expand the schema that contains the volume that is being monitored.
4. Expand **Volumes**.
5. Click the volume that is being monitored or, if you are monitoring a folder within that volume, click the folder.
6. Click **Upload to this volume**, and follow the on-screen instructions to upload a file to the volume or folder that is being monitored.

## Step 4: View trigger results

1. With your Databricks workspace still open from the previous step, on the sidebar, click **Jobs & Pipelines**.
2. On the **Jobs & pipelines** tab, click the name of the job you created earlier in Step 2.
3. On the **Runs** tab, wait until the current job run shows a **Status** of **Succeeded**.
4. In the Unstructured user interface for your account, click **Jobs** on the sidebar.
5. In the list of jobs, click the newly running job for your workflow.
6. After the job status shows **Finished**, go to your destination location to see the results.

## Step 5 (Optional): Pause the trigger

To stop triggering the job, with your job in Lakeflow Jobs still open earlier from Step 4, in the **Job details** pane, under **Schedules & Triggers**, click **Pause**.

---

# Source: https://docs.unstructured.io/ui/sources/databricks-volumes.md

# Source: https://docs.unstructured.io/ui/destinations/databricks-volumes.md

# Source: https://docs.unstructured.io/open-source/ingestion/source-connectors/databricks-volumes.md

# Source: https://docs.unstructured.io/open-source/ingestion/destination-connectors/databricks-volumes.md

# Source: https://docs.unstructured.io/api-reference/workflow/sources/databricks-volumes.md

# Source: https://docs.unstructured.io/api-reference/workflow/destinations/databricks-volumes.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Databricks Volumes

<Tip>
  This article covers connecting Unstructured to Databricks Volumes.

  For information about connecting Unstructured to Delta Tables in Databricks instead, see
  [Delta Tables in Databricks](/api-reference/workflow/destinations/databricks-delta-table).
</Tip>

<Note>
  If you're new to Unstructured, read this note first.

  Before you can create a destination connector, you must first sign in to your Unstructured account:

  * If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
    After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).
    To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/api-reference/overview#pricing).
  * If you already have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at
    [https://platform.unstructured.io](https://platform.unstructured.io). For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
    or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  After you sign in, the [Unstructured user interface](/ui/overview) (UI) appears, which you use to get your Unstructured API key.

  1. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business** account, click **API Keys** on the sidebar.<br />

     For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
     for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).

  2. Click **Generate API Key**.<br />

  3. Follow the on-screen instructions to finish generating the key.<br />

  4. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

  After you create the destination connector, add it along with a
  [source connector](/api-reference/workflow/sources/overview) to a [workflow](/api-reference/workflow/overview#workflows).
  Then run the worklow as a [job](/api-reference/workflow/overview#jobs). To learn how, try out the
  the notebook [Dropbox-To-Pinecone Connector API Quickstart for Unstructured](https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Dropbox_To_Pinecone_Connector_Quickstart.ipynb),
  or watch the two 4-minute video tutorials for the [Unstructured Python SDK](/api-reference/workflow/overview#unstructured-python-sdk).

  You can also create destination connectors with the Unstructured user interface (UI).
  [Learn how](/ui/destinations/overview).

  If you need help, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  You are now ready to start creating a destination connector! Keep reading to learn how.
</Note>

Send processed data from Unstructured to Databricks Volumes.

The requirements are as follows.

* A Databricks account on [AWS](https://docs.databricks.com/getting-started/free-trial.html),
  [Azure](https://learn.microsoft.com/azure/databricks/getting-started/), or
  [GCP](https://docs.gcp.databricks.com/getting-started/index.html).

* A workspace within the Databricks account for [AWS](https://docs.databricks.com/admin/workspace/index.html),
  [Azure](https://learn.microsoft.com/azure/databricks/admin/workspace/), or
  [GCP](https://docs.gcp.databricks.com/admin/workspace/index.html).

* The workspace's URL. Get the workspace URL for
  [AWS](https://docs.databricks.com/workspace/workspace-details.html#workspace-instance-names-urls-and-ids),
  [Azure](https://learn.microsoft.com/azure/databricks/workspace/workspace-details#workspace-instance-names-urls-and-ids),
  or [GCP](https://docs.gcp.databricks.com/workspace/workspace-details.html#workspace-instance-names-urls-and-ids).

  Examples:

  * AWS: `https://<workspace-id>.cloud.databricks.com`
  * Azure: `https://adb-<workspace-id>.<random-number>.azuredatabricks.net`
  * GCP: `https://<workspace-id>.<random-number>.gcp.databricks.com`

  <Note>
    Do not add a trailing slash (`/`) to the workspace URL.
  </Note>

* The Databricks authentication details. For more information, see the documentation for
  [AWS](https://docs.databricks.com/dev-tools/auth/index.html),
  [Azure](https://learn.microsoft.com/azure/databricks/dev-tools/auth/),
  or [GCP](https://docs.gcp.databricks.com/dev-tools/auth/index.html).

  For the [Unstructured UI](/ui/overview) or the [Unstructured API](/api-reference/overview), the following Databricks authentication types are supported:

  * Databricks OAuth machine-to-machine (M2M) authentication for\
    [AWS](https://docs.databricks.com/dev-tools/auth/oauth-m2m.html),
    [Azure](https://learn.microsoft.com/azure/databricks/dev-tools/auth/oauth-m2m), or
    [GCP](https://docs.gcp.databricks.com/dev-tools/auth/oauth-m2m.html).

    You will need the **Client ID** (or **UUID** or **Application** ID) and OAuth **Secret** (client secret) values for the corresponding service principal.
    Note that for Azure, only Databricks managed service principals are supported. Microsoft Entra ID managed service principals are not supported.

    The following video shows how to create a Databricks managed service principal:

    <iframe width="560" height="315" src="https://www.youtube.com/embed/wBmqv5DaA1E" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

  * Databricks personal access token authentication for
    [AWS](https://docs.databricks.com/dev-tools/auth/pat.html),
    [Azure](https://learn.microsoft.com/azure/databricks/dev-tools/auth/pat), or
    [GCP](https://docs.gcp.databricks.com/dev-tools/auth/pat.html).

    You will need the personal access token's value.

    The following video shows how to create a Databricks personal access token:

    <iframe width="560" height="315" src="https://www.youtube.com/embed/OzEU2miAS6I" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

  For [Unstructured Ingest](/open-source/ingestion/overview), the following Databricks authentication types are supported:

  * For Databricks personal access token authentication for
    [AWS](https://docs.databricks.com/dev-tools/auth/pat.html),
    [Azure](https://learn.microsoft.com/azure/databricks/dev-tools/auth/pat), or
    [GCP](https://docs.gcp.databricks.com/dev-tools/auth/pat.html): The personal access token's value.

    The following video shows how to create a Databricks personal access token:

    <iframe width="560" height="315" src="https://www.youtube.com/embed/OzEU2miAS6I" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

  * For username and password (basic) authentication ([AWS](https://docs.databricks.com/archive/dev-tools/basic.html) only): The user's name and password values.

  * For OAuth machine-to-machine (M2M) authentication ([AWS](https://docs.databricks.com/dev-tools/auth/oauth-m2m.html),
    [Azure](https://learn.microsoft.com/azure/databricks/dev-tools/auth/oauth-m2m), and
    [GCP](https://docs.gcp.databricks.com/dev-tools/auth/oauth-m2m.html)): The client ID and OAuth secret values for the corresponding service principal.

  * For OAuth user-to-machine (U2M) authentication ([AWS](https://docs.databricks.com/dev-tools/auth/oauth-u2m.html),
    [Azure](https://learn.microsoft.com/azure/databricks/dev-tools/auth/oauth-u2m), and
    [GCP](https://docs.gcp.databricks.com/dev-tools/auth/oauth-u2m.html)): No additional values.

  * For Azure managed identities (formerly Managed Service Identities (MSI) authentication) ([Azure](https://learn.microsoft.com/azure/databricks/dev-tools/auth/azure-mi) only): The client ID value for the corresponding managed identity.

  * For Microsoft Entra ID service principal authentication ([Azure](https://learn.microsoft.com/azure/databricks/dev-tools/auth/azure-sp) only): The tenant ID, client ID, and client secret values for the corresponding service principal.

  * For Azure CLI authentication ([Azure](https://learn.microsoft.com/azure/databricks/dev-tools/auth/azure-cli) only): No additional values.

  * For Microsoft Entra ID user authentication ([Azure](https://learn.microsoft.com/azure/databricks/dev-tools/user-aad-token) only): The Entra ID token for the corresponding Entra ID user.

  * For Google Cloud Platform credentials authentication ([GCP](https://docs.gcp.databricks.com/dev-tools/auth/gcp-creds.html) only): The local path to the corresponding Google Cloud service account's credentials file.

  * For Google Cloud Platform ID authentication ([GCP](https://docs.gcp.databricks.com/dev-tools/auth/gcp-id.html) only): The Google Cloud service account's email address.

* The name of the parent catalog in Unity Catalog for
  [AWS](https://docs.databricks.com/catalogs/create-catalog.html),
  [Azure](https://learn.microsoft.com/azure/databricks/catalogs/create-catalog), or
  [GCP](https://docs.gcp.databricks.com/catalogs/create-catalog.html) for the volume.

* The name of the parent schema (formerly known as a database) in Unity Catalog for
  [AWS](https://docs.databricks.com/schemas/create-schema.html),
  [Azure](https://learn.microsoft.com/azure/databricks/schemas/create-schema), or
  [GCP](https://docs.gcp.databricks.com/schemas/create-schema.html) for the volume.

* The name of the volume in Unity Catalog for [AWS](https://docs.databricks.com/tables/managed.html),
  [Azure](https://learn.microsoft.com/azure/databricks/tables/managed), or
  [GCP](https://docs.gcp.databricks.com/tables/managed.html), and optionally any path in that volume that you want to access directly, beginning with the volume's root.

* The Databricks workspace user or service principal must have the following *minimum* set of privileges to read from or write to the
  existing volume in Unity Catalog:

  * `USE CATALOG` on the volume's parent catalog in Unity Catalog.
  * `USE SCHEMA` on the volume's parent schema (formerly known as a database) in Unity Catalog.
  * `READ VOLUME` and `WRITE VOLUME` on the volume.

  The following videos show how to create and set privileges for a catalog, schema (formerly known as a database), and volume in Unity Catalog.

  <iframe width="560" height="315" src="https://www.youtube.com/embed/yF9DJphhQQc" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

  Learn more about how to check and set Unity Catalog privileges for
  [AWS](https://docs.databricks.com/data-governance/unity-catalog/manage-privileges/index.html#show-grant-and-revoke-privileges),
  [Azure](https://learn.microsoft.com/azure/databricks/data-governance/unity-catalog/manage-privileges/#grant), or
  [GCP](https://docs.gcp.databricks.com/data-governance/unity-catalog/manage-privileges/index.html#show-grant-and-revoke-privileges).

To create a Databricks Volumes destination connector, see the following examples.

<CodeGroup>
  ```python Python SDK theme={null}
  import os

  from unstructured_client import UnstructuredClient
  from unstructured_client.models.operations import CreateDestinationRequest
  from unstructured_client.models.shared import CreateDestinationConnector

  with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
      response = client.destinations.create_destination(
          request=CreateDestinationRequest(
              create_destination_connector=CreateDestinationConnector(
                  name="<name>",
                  type="databricks_volumes",
                  config={
                      "host": "<host>",
                      "catalog": "<catalog>",
                      "schema": "<schema>",
                      "volume": "<volume>",
                      "volume_path": "<volume_path>",

                      # For Databricks OAuth machine-to-machine (M2M) authentication:
                      "client_secret": "<client-secret>",
                      "client_id": "<client-id>"

                      # For Databricks personal access token authentication:
                      "token": "<token>"
                  }
              )
          )
      )

      print(response.destination_connector_information)
  ```

  ```bash curl theme={null}
  curl --request 'POST' --location \
  "$UNSTRUCTURED_API_URL/sources" \
  --header 'accept: application/json' \
  --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
  --header 'content-type: application/json' \
  --data \
  '{
      "name": "<name>",
      "type": "databricks_volumes",
      "config": {
          "host": "<host>",
          "catalog": "<catalog>",
          "schema": "<schema>",
          "volume": "<volume>",
          "volume_path": "<volume_path>",

          # For Databricks OAuth machine-to-machine (M2M) authentication:
          "client_secret": "<client-secret>",
          "client_id": "<client-id>"

          # For Databricks personal access token authentication:
          "token": "<token>"
      }
  }'
  ```
</CodeGroup>

Replace the preceding placeholders as follows:

* `<name>` (*required*) - A unique name for this connector.

* `<host>` (*required*) - The Databricks workspace host URL.

  <Note>
    Do not add a trailing slash (`/`) to the workspace host URL.
  </Note>

* `<client-id>` (*required*) - For Databricks OAuth machine-to-machine (M2M) authentication,
  the **Client ID** (or **UUID** or **Application ID**) value for the Databricks managed service principal that has the appropriate privileges to the volume.

* `<client-secret>` (*required*) - For Databricks OAuth M2M authentication,
  the associated OAuth **Secret** value for the Databricks managed service principal that has the appropriate privileges to the volume.

* `<token>` (*required*) - For Databricks personal access token authentication, the personal access token's value.

* `<catalog>` (*required*) - The name of the catalog to use.

* `<schema>` - The name of the associated schema. If not specified, `default` is used.

* `<volume>` (*required*) - The name of the associated volume.

* `<volume_path>` - Any optional path to access within the volume.

---

# Source: https://docs.unstructured.io/business/aws/dedicated-instance-privatelink.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Unstructured Business dedicated instance on Amazon Web Services (AWS) with PrivateLink - onboarding

<Note>
  The following information applies only to dedicated instance deployments with PrivateLink for [Unstructured Business](/business/overview) on AWS.

  For dedicated instance deployments of Unstructured Business to AWS *without* PrivateLink, contact your Unstructured sales representative,
  or email Unstructured Sales at [sales@unstructured.io](mailto:sales@unstructured.io).

  For in-VPC deployments of Unstructured Business to AWS instead, see the [AWS in-VPC onboarding](/business/aws/onboard) checklist.
</Note>

After your organization has signed the **Business** account agreement with Unstructured, a member of the Unstructured technical enablement team will reach out to you to begin the
deployment onboarding process. While you wait, you can get started by following the onboarding process:

1. Provide your Unstructured sales representative or technical enablement contact with the AWS Region name
   where you want the dedicated instance deployment to be created. Unstructured will begin creating the dedicated instance deployment in that AWS Region.

   <Note>
     If you do not know who your Unstructured sales representative or technical enablement contact is,
     email Unstructured Sales at [sales@unstructured.io](mailto:sales@unstructured.io), and a member of the Unstructured sales or technical enablement teams
     will get back to you as soon as possible.
   </Note>

   Unstructured supports only the following AWS Regions for dedicated instance deployments:

   | Name             | Code                  |
   | ---------------- | --------------------- |
   | `us-east-1`      | US East (N. Virginia) |
   | `us-east-2`      | US East(Ohio)         |
   | `us-west-2`      | US West(Oregon)       |
   | `eu-west-1`      | Europoe (Ireland)     |
   | `eu-north-1`     | Europe (Stockholm)    |
   | `eu-central-1`   | Europe (Frankfurt)    |
   | `ap-southeast-2` | Asia Pacific (Sydney) |
   | `ap-south-1`     | Asia Pacific (Mumbai) |
   | `ap-northeast-2` | Asia Pacific (Seoul)  |

2. In your AWS account, create an endpoint service powered by AWS PrivateLink. Be sure when creating the endpoint service to use the same AWS Region name that you provided to Unstructured.
   [Learn how](https://docs.aws.amazon.com/vpc/latest/privatelink/create-endpoint-service.html). (In AWS terminology, this makes you the *service provider* and
   Unstructured the *service consumer*.)

3. As the service provider, you must provide your Unstructured sales representative or technical enablement contact (as the service consumer) with the name of your endpoint service
   (for example, `com.amazonaws.vpce.us-east-1.vpce-svc-0e123abc123198abc`) and the supported Availability Zones for that endpoint service, for the AWS Region
   name that you provided to Unstructured earlier. [Learn how](https://docs.aws.amazon.com/vpc/latest/privatelink/create-endpoint-service.html#share-endpoint-service).

4. As the service consumer, Unstructured will create a related endpoint service in its own AWS account in the same AWS Region as your AWS account. Unstructured will then send you an endpoint connection request to
   complete the connection between its endpoint service (as the service consumer) and your endpoint service (as the service provider).
   You must accept Unstructured's endpoint connection request as soon as possible, to avoid delays in the deployment process.
   [Learn how](https://docs.aws.amazon.com/vpc/latest/privatelink/configure-endpoint-service.html#accept-reject-connection-requests).

## Questions? Need help?

If you have questions or need help as you go, contact your Unstructured sales representative or technical enablement contact. If you do not know who they are,
email Unstructured Sales at [sales@unstructured.io](mailto:sales@unstructured.io), and a member of the Unstructured sales or technical enablement teams
will get back to you as soon as possible.

---

# Source: https://docs.unstructured.io/api-reference/destinations/delete-destination-connector.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Delete destination connector

> Delete a specific destination connector by its ID.


## OpenAPI

````yaml https://platform.unstructuredapp.io/openapi.json delete /api/v1/destinations/{destination_id}
openapi: 3.1.0
info:
  title: Platform API
  version: 3.1.0
servers:
  - url: https://platform.unstructuredapp.io/
    description: Unstructured Platform API
    x-speakeasy-server-id: platform-api
security: []
paths:
  /api/v1/destinations/{destination_id}:
    delete:
      tags:
        - destinations
      summary: Delete destination connector
      description: Delete a specific destination connector by its ID.
      operationId: delete_destination
      parameters:
        - name: destination_id
          in: path
          required: true
          schema:
            type: string
            format: uuid
            title: Destination Id
        - name: unstructured-api-key
          in: header
          required: false
          schema:
            anyOf:
              - type: string
              - type: 'null'
            title: Unstructured-Api-Key
      responses:
        '200':
          description: Successful Response
          content:
            application/json:
              schema: {}
        '422':
          description: Validation Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/HTTPValidationError'
components:
  schemas:
    HTTPValidationError:
      properties:
        detail:
          items:
            $ref: '#/components/schemas/ValidationError'
          type: array
          title: Detail
      type: object
      title: HTTPValidationError
    ValidationError:
      properties:
        loc:
          items:
            anyOf:
              - type: string
              - type: integer
          type: array
          title: Location
        msg:
          type: string
          title: Message
        type:
          type: string
          title: Error Type
      type: object
      required:
        - loc
        - msg
        - type
      title: ValidationError

````

---

# Source: https://docs.unstructured.io/api-reference/sources/delete-source-connector.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Delete source connector

> Delete a specific source connector identified by its ID.


## OpenAPI

````yaml https://platform.unstructuredapp.io/openapi.json delete /api/v1/sources/{source_id}
openapi: 3.1.0
info:
  title: Platform API
  version: 3.1.0
servers:
  - url: https://platform.unstructuredapp.io/
    description: Unstructured Platform API
    x-speakeasy-server-id: platform-api
security: []
paths:
  /api/v1/sources/{source_id}:
    delete:
      tags:
        - sources
      summary: Delete source connector
      description: Delete a specific source connector identified by its ID.
      operationId: delete_source
      parameters:
        - name: source_id
          in: path
          required: true
          schema:
            type: string
            format: uuid
            title: Source Id
        - name: unstructured-api-key
          in: header
          required: false
          schema:
            anyOf:
              - type: string
              - type: 'null'
            title: Unstructured-Api-Key
      responses:
        '200':
          description: Successful Response
          content:
            application/json:
              schema: {}
        '422':
          description: Validation Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/HTTPValidationError'
components:
  schemas:
    HTTPValidationError:
      properties:
        detail:
          items:
            $ref: '#/components/schemas/ValidationError'
          type: array
          title: Detail
      type: object
      title: HTTPValidationError
    ValidationError:
      properties:
        loc:
          items:
            anyOf:
              - type: string
              - type: integer
          type: array
          title: Location
        msg:
          type: string
          title: Message
        type:
          type: string
          title: Error Type
      type: object
      required:
        - loc
        - msg
        - type
      title: ValidationError

````

---

# Source: https://docs.unstructured.io/api-reference/workflows/delete-workflow.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Delete Workflow

> Delete a workflow by its ID.


## OpenAPI

````yaml https://platform.unstructuredapp.io/openapi.json delete /api/v1/workflows/{workflow_id}
openapi: 3.1.0
info:
  title: Platform API
  version: 3.1.0
servers:
  - url: https://platform.unstructuredapp.io/
    description: Unstructured Platform API
    x-speakeasy-server-id: platform-api
security: []
paths:
  /api/v1/workflows/{workflow_id}:
    delete:
      tags:
        - workflows
      summary: Delete Workflow
      description: Delete a workflow by its ID.
      operationId: delete_workflow
      parameters:
        - name: workflow_id
          in: path
          required: true
          schema:
            type: string
            format: uuid
            title: Workflow Id
        - name: unstructured-api-key
          in: header
          required: false
          schema:
            anyOf:
              - type: string
              - type: 'null'
            title: Unstructured-Api-Key
      responses:
        '200':
          description: Successful Response
          content:
            application/json:
              schema: {}
        '422':
          description: Validation Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/HTTPValidationError'
components:
  schemas:
    HTTPValidationError:
      properties:
        detail:
          items:
            $ref: '#/components/schemas/ValidationError'
          type: array
          title: Detail
      type: object
      title: HTTPValidationError
    ValidationError:
      properties:
        loc:
          items:
            anyOf:
              - type: string
              - type: integer
          type: array
          title: Location
        msg:
          type: string
          title: Message
        type:
          type: string
          title: Error Type
      type: object
      required:
        - loc
        - msg
        - type
      title: ValidationError

````

---

# Source: https://docs.unstructured.io/ui/destinations/delta-table.md

# Source: https://docs.unstructured.io/open-source/ingestion/source-connectors/delta-table.md

# Source: https://docs.unstructured.io/open-source/ingestion/destination-connectors/delta-table.md

# Source: https://docs.unstructured.io/api-reference/workflow/destinations/delta-table.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Delta Tables in Amazon S3

<Tip>
  This article covers connecting Unstructured to Delta Tables in Amazon S3. For information about
  connecting Unstructured to Delta Tables in Databricks instead, see
  [Delta Tables in Databricks](/api-reference/workflow/destinations/databricks-delta-table).
</Tip>

<Note>
  If you're new to Unstructured, read this note first.

  Before you can create a destination connector, you must first sign in to your Unstructured account:

  * If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
    After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).
    To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/api-reference/overview#pricing).
  * If you already have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at
    [https://platform.unstructured.io](https://platform.unstructured.io). For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
    or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  After you sign in, the [Unstructured user interface](/ui/overview) (UI) appears, which you use to get your Unstructured API key.

  1. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business** account, click **API Keys** on the sidebar.<br />

     For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
     for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).

  2. Click **Generate API Key**.<br />

  3. Follow the on-screen instructions to finish generating the key.<br />

  4. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

  After you create the destination connector, add it along with a
  [source connector](/api-reference/workflow/sources/overview) to a [workflow](/api-reference/workflow/overview#workflows).
  Then run the worklow as a [job](/api-reference/workflow/overview#jobs). To learn how, try out the
  the notebook [Dropbox-To-Pinecone Connector API Quickstart for Unstructured](https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Dropbox_To_Pinecone_Connector_Quickstart.ipynb),
  or watch the two 4-minute video tutorials for the [Unstructured Python SDK](/api-reference/workflow/overview#unstructured-python-sdk).

  You can also create destination connectors with the Unstructured user interface (UI).
  [Learn how](/ui/destinations/overview).

  If you need help, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  You are now ready to start creating a destination connector! Keep reading to learn how.
</Note>

Send processed data from Unstructured to a Delta Table, stored in Amazon S3.

The requirements are as follows.

The following video shows how to fulfill the minimum set of Amazon S3 requirements to store Delta Tables:

<iframe width="560" height="315" src="https://www.youtube.com/embed/_W4565dcUGI" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

<Warning>
  If you are experiencing S3 connector or workflow failures after adding a new S3 bucket or updating an existing S3 bucket,
  it could be due to S3 latency issues. You might need to wait up to a few hours before any related S3 connectors
  and workflows begin working without failures.

  Various Amazon S3 operations such as propagating DNS records for new buckets, updating bucket access policies and
  permissions, reusing bucket names after deletion, and using AWS Regions that are not geographically closer
  to your users or applications, can take a few minutes to hours to fully propagate across the Amazon network.
</Warning>

The preceding video does not show how to create an AWS account.

For more information about requirements, see the following:

* An AWS account. [Create an AWS account](https://aws.amazon.com/free).

  <iframe width="560" height="315" src="https://www.youtube.com/embed/lIdh92JmWtg" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* An S3 bucket. You can create an S3 bucket by using the S3 console, following the steps [in the S3 documentation](https://docs.aws.amazon.com/AmazonS3/latest/userguide/creating-bucket.html) or in the following video.
  Additional approaches that use AWS CloudFormation or the AWS CLI are in the how-to sections later on this page.

  <iframe width="560" height="315" src="https://www.youtube.com/embed/e6w9LwZJFIA" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* For authenticated bucket write access or both, you should first
  [block all public access to the bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/configuring-block-public-access-bucket.html).

  After blocking all public access to the bucket, for read access, the authenticated AWS IAM user must have at minimum the permissions of `s3:ListBucket` and `s3:GetObject` for that bucket.
  For write access, the authenticated AWS IAM user must have at minimum the permission of `s3:PutObject` for that bucket. To grant permissions,
  attach the appropriate bucket policy to the bucket. See the policy examples later on this page, and [learn about bucket policies for S3](https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-policy-language-overview.html). These permissions remain in effect until the bucket policy is removed from the bucket.
  To apply a bucket policy by using the S3 console, follow the steps [in the S3 documentation](https://docs.aws.amazon.com/AmazonS3/latest/userguide/add-bucket-policy.html) or in the following video.
  Additional approaches that use AWS CloudFormation or the AWS CLI are in the how-to sections later on this page.

  <iframe width="560" height="315" src="https://www.youtube.com/embed/y4SfQoJpipo" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* Provide an AWS access key and secret access key for the authenticated AWS IAM user in the account.
  Create an AWS access key and secret access key by following the steps [in the IAM documentation](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html#Using_CreateAccessKey) or in the following video.

  <iframe width="560" height="315" src="https://www.youtube.com/embed/MoFTaGJE65Q" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* If the target files are in the root of the bucket, provide the path to the bucket, formatted as `protocol://bucket/` (for example, `s3://my-bucket/`).
  If the target files are in a folder, the path to the target folder in the S3 bucket, formatted as `protocol://bucket/path/to/folder/` (for example, `s3://my-bucket/my-folder/`).

* If the target files are in a folder, make sure the authenticated AWS IAM user has
  authenticated access to the folder as well. [See examples of authenticated folder access](https://docs.aws.amazon.com/AmazonS3/latest/userguide/example-bucket-policies.html#example-bucket-policies-folders).

## Add an access policy to an existing bucket

To use the Amazon S3 console to add an access policy that allows all authenticated AWS IAM users in the
corresponding AWS account to read and write to an existing S3 bucket, do the following.

<Info>Your organization might have stricter bucket policy requirements. Check with your AWS account
administrator if you are unsure.</Info>

1. Sign in to the [AWS Management Console](https://console.aws.amazon.com/).

2. Open the [Amazon S3 Console](https://console.aws.amazon.com/s3/home).

3. Browse to the existing bucket and open it.

4. Click the **Permissions** tab.

5. In the **Bucket policy** area, click **Edit**.

6. In the **Policy** text area, copy the following JSON-formatted policy.
   To change the following policy to restrict it to a specific user in the AWS account, change `root` to that
   specific username.

   In this policy, replace the following:

   * Replace `<my-account-id>` with your AWS account ID.
   * Replace `<my-bucket-name>` in two places with the name of your bucket.

   ```json  theme={null}
   {
       "Version": "2012-10-17",
       "Statement": [
           {
               "Sid": "AllowAuthenticatedUsersInAccountReadWrite",
               "Effect": "Allow",
               "Principal": {
                   "AWS": "arn:aws:iam::<my-account-id>:root"
               },
               "Action": [
                   "s3:GetObject",
                   "s3:PutObject",
                   "s3:ListBucket",
                   "s3:DeleteObject"
               ],
               "Resource": [
                   "arn:aws:s3:::<my-bucket-name>",
                   "arn:aws:s3:::<my-bucket-name>/*"
               ],
               "Condition": {
                   "StringEquals": {
                       "aws:PrincipalType": "IAMUser"
                   }
               }
           }
       ]
   }
   ```

7. Click **Save changes**.

## Create a bucket with AWS CloudFormation

To use the AWS CloudFormation console to create an Amazon S3 bucket that allows all authenticated AWS IAM users
in the corresponding AWS account to read and write to the bucket, do the following.

<Info>Your organization might have stricter bucket policy requirements. Check with your AWS account
administrator if you are unsure.</Info>

1. Save the following YAML to a file on your local machine, for example `create-s3-bucket.yaml`. To change
   the following bucket policy to restrict it to a specific user in the AWS account, change `root` to that
   specific username.

   ```yaml  theme={null}
   AWSTemplateFormatVersion: '2010-09-09'
   Description: 'CloudFormation template to create an S3 bucket with specific permissions for account users.'

   Parameters:
     BucketName:
       Type: String
       Description: 'Name of the S3 bucket to create'

   Resources:
     MyS3Bucket:
       Type: 'AWS::S3::Bucket'
       Properties:
         BucketName: !Ref BucketName
         PublicAccessBlockConfiguration:
           BlockPublicAcls: true
           BlockPublicPolicy: false
           IgnorePublicAcls: true
           RestrictPublicBuckets: true

     BucketPolicy:
       Type: 'AWS::S3::BucketPolicy'
       Properties:
         Bucket: !Ref MyS3Bucket
         PolicyDocument:
           Version: '2012-10-17'
           Statement:
             - Sid: AllowAllAuthenticatedUsersInAccount
               Effect: Allow
               Principal:
                 AWS: !Sub 'arn:aws:iam::${AWS::AccountId}:root'
               Action:
                 - 's3:GetObject'
                 - 's3:PutObject'
                 - 's3:ListBucket'
                 - 's3:DeleteObject'
               Resource:
                 - !Sub 'arn:aws:s3:::${BucketName}'
                 - !Sub 'arn:aws:s3:::${BucketName}/*'

   Outputs:
     BucketName:
       Description: 'Name of the created S3 bucket'
       Value: !Ref MyS3Bucket
   ```

2. Sign in to the [AWS Management Console](https://console.aws.amazon.com/).

3. Open the [AWS CloudFormation Console](https://console.aws.amazon.com/cloudformation/home).

4. Click **Create stack > With new resources (standard)**.

5. On the **Create stack** page, with **Choose an existing template** already selected, select **Upload a template file**.

6. Click **Choose file**, and browse to and select the YAML file from your local machine.

7. Click **Next**.

8. Enter a unique **Stack name** and **BucketName**.

9. Click **Next** two times.

10. Click **Submit**.

11. Wait until the **Status** changes to **CREATE\_COMPLETE**.

12. After the bucket is created, you can delete the YAML file, if you want.

## Create a bucket with the AWS CLI

To use the AWS CLI to create an Amazon S3 bucket that allows all authenticated AWS IAM users in the
corresponding AWS account to read and write to the bucket, do the following.

<Info>Your organization might have stricter bucket policy requirements. Check with your AWS account
administrator if you are unsure.</Info>

1. [Install the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html).

2. [Set up the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-quickstart.html).

3. Copy the following script to a file on your local machine, for example a file named `create-s3-bucket.sh`.
   To change the following bucket policy to restrict it to a specific user in the AWS account, change `root` to that
   specific username.

   In this script, replace the following:

   * Replace `<my-account-id>` with your AWS account ID.
   * Replace `<my-unique-bucket-name>` with the name of your bucket.
   * Replace `<us-east-1>` with your AWS Region.

   ```bash  theme={null}
   #!/bin/bash

   # Set variables for the AWS account ID, Amazon S3 bucket name, and AWS Region.
   ACCOUNT_ID="<my-account-id>"
   BUCKET_NAME="<my-unique-bucket-name>"
   REGION="<us-east-1>"

   # Temporary filename for the bucket policy.
   # Do not change this variable.
   POLICY_FILE="bucket_policy.json"

   # Create the bucket.
   aws s3api create-bucket --bucket $BUCKET_NAME --region $REGION

   # Wait for the bucket to exist.
   echo "Waiting for bucket '$BUCKET_NAME' to be fully created..."
   aws s3api wait bucket-exists --bucket $BUCKET_NAME

   # Check if the wait command was successful.
   if [ $? -eq 0 ]; then
       echo "The bucket '$BUCKET_NAME' has been fully created."
   else
       echo "Error: Timed out waiting for bucket '$BUCKET_NAME' to be created."
       exit 1
   fi

   # Remove the "block public policy" bucket access setting.
   aws s3api put-public-access-block \
       --bucket $BUCKET_NAME \
       --public-access-block-configuration \
       '{"BlockPublicPolicy": false, "IgnorePublicAcls": false, "BlockPublicAcls": false, "RestrictPublicBuckets": false}'

   # Check if the operation was successful.
   if [ $? -eq 0 ]; then
       echo "The block public policy access setting was removed from '$BUCKET_NAME'."
   else
       echo "Error: Failed to remove the block public policy access setting from '$BUCKET_NAME'."
       exit 1
   fi

   # Create the bucket policy.
   cat << EOF > $POLICY_FILE
   {
       "Version": "2012-10-17",
       "Statement": [
           {
               "Sid": "AllowAuthenticatedUsersInAccountReadWrite",
               "Effect": "Allow",
               "Principal": {
                   "AWS": "arn:aws:iam::$ACCOUNT_ID:root"
               },
               "Action": [
                   "s3:GetObject",
                   "s3:PutObject",
                   "s3:ListBucket",
                   "s3:DeleteObject"
               ],
               "Resource": [
                   "arn:aws:s3:::$BUCKET_NAME",
                   "arn:aws:s3:::$BUCKET_NAME/*"
               ],
               "Condition": {
                   "StringEquals": {
                       "aws:PrincipalType": "IAMUser"
                   }
               }
           }
       ]
   }
   EOF

   # Apply the bucket policy.
   aws s3api put-bucket-policy --bucket $BUCKET_NAME --policy file://$POLICY_FILE

   # Check if the policy application was successful.
   if [ $? -eq 0 ]; then
       echo "The bucket policy was applied to '$BUCKET_NAME'."
   else
       echo "Error: Failed to apply the bucket policy to '$BUCKET_NAME'."
       exit 1
   fi

   # Verify the applied policy.
   echo "Verifying the applied policy:"
   aws s3api get-bucket-policy --bucket $BUCKET_NAME --query Policy --output text

   # Remove the temporary bucket policy file.
   rm $POLICY_FILE
   ```

4. Run the script, for example:

   ```bash  theme={null}
   sh create-s3-bucket.sh
   ```

5. After the bucket is created, you can delete the script file, if you want.

## Delta table output format

A Delta table consists of Parquet files that contain data and a transaction log that stores metadata about the transactions.
[Learn more](https://delta-io.github.io/delta-rs/how-delta-lake-works/architecture-of-delta-table/).

The Delta Tables in Amazon S3 destination connector generates the following output within the specified path to the S3 bucket (or the specified folder within the bucket):

* Initially, one Parquet (`.parquet`) file per file in the source location. For example, for a file in the source location named `my-file.pdf`, an associated
  file with the extension `.parquet` is generated. Various kinds of file transactions can result in additional Parquet files being generated. These Parquet filenames are automatically generated by the Delta Lake engine and are not meant to be manually modified.
* A folder named `_delta_log` that contains metadata and change history about the `.parquet` files. As Parquet files are added to, changed, or removed from
  the specified bucket or folder path, the `_delta_log` folder is updated with any related metadata and change history details.

Together, this set of Parquet files and their associated `_delta_log` folder (and its contents) describe a single, versioned Delta table. Because of this, Unstructured recommends the following usage best practices:

* In the source location, each set of source files that is to be considered as a unit for change management purposes should be controlled by a unique, dedicated
  Delta Tables in S3 destination connector. This connector should reference a unique, dedicated output folder within the bucket. Having
  multiple workflows refer to different sets of source files, yet all share the same Delta table, could results in data loss or table corruption.
* Avoid directly modifying, adding, or deleting Parquet data files or the `_delta_log` folder within a Delta table's directory. This can lead to data loss or table corruption.
* If you need to copy or move a Delta table to a different location,
  you must move or copy its entire set of Parquet files and its associated `_delta_log` folder (and its contents) together as a unit.
  Note that the copied or moved Delta table will
  no longer be controlled by the original Delta Tables in S3 destination connector.

## Create the destination connector

To create a Delta Tables in Amazon S3 destination connector, see the following examples.

<CodeGroup>
  ```python Python SDK theme={null}
  import os

  from unstructured_client import UnstructuredClient
  from unstructured_client.models.operations import CreateDestinationRequest
  from unstructured_client.models.shared import CreateDestinationConnector

  with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
      response = client.destinations.create_destination(
          request=CreateDestinationRequest(
              create_destination_connector=CreateDestinationConnector(
                  name="<name>",
                  type="delta_table",
                  config={
                      "aws_region": "<aws-region>",
                      "table_uri": "<table-uri>",
                      "aws_access_key_id": "<aws-access-key-id>",
                      "aws_secret_access_key": "<aws-secret-access-key>"
                  }
              )
          )
      )

      print(response.destination_connector_information)
  ```

  ```bash curl theme={null}
  curl --request 'POST' --location \
  "$UNSTRUCTURED_API_URL/destinations" \
  --header 'accept: application/json' \
  --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
  --header 'content-type: application/json' \
  --data \
  '{
      "type": "delta_table",
      "name": "<name>",
      "config": {
          "aws_region": "<aws-region>",
          "table_uri": "<table-uri>",
          "aws_access_key_id": "<aws-access-key-id>",
          "aws_secret_access_key": "<aws-secret-access-key>"
      }
  }'
  ```
</CodeGroup>

Replace the preceding placeholders as follows:

* `<name>` (*required*) - A unique name for this connector.
* `<aws-region>` (*required*) - The AWS Region identifier (for example, `us-east-1`) for the Amazon S3 bucket you want to store the Delta Table in.
* `<table-uri>` (*required*) - The URI of the Amazon S3 bucket you want to store the Delta Table in. This typically takes the format `s3://my-bucket/my-folder`.
* `<aws-access-key-id>` (*required*) - The AWS access key ID for the AWS IAM principal (such as an IAM user) that has the appropriate access to the S3 bucket.
* `<aws-secret-access-key>` (*required*) - The AWS secret access key for the corresponding AWS access key ID.

---

# Source: https://docs.unstructured.io/open-source/ingestion/source-connectors/discord.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Discord

Connect Discord to your preprocessing pipeline, and use the Unstructured CLI or Python to batch process all your documents and store structured outputs locally on your filesystem.

The requirements are as follows.

* A Discord [access token](https://discord.com/developers/docs/topics/oauth2).
* The target Discord [channel IDs](https://support.discord.com/hc/articles/206346498-Where-can-I-find-my-User-Server-Message-ID#h_01HRSTXPS5FMK2A5SMVSX4JW4E).

The Discord connector dependencies:

```bash CLI, Python theme={null}
pip install "unstructured-ingest[discord]"
```

You might also need to install additional dependencies, depending on your needs. [Learn more](/open-source/ingestion/ingest-dependencies).

The following environment variables:

* `DISCORD_ACCESS_TOKEN` - The Discord access token, represented by `--token` (CLI) or `token` (Python).

Now call the Unstructured CLI or Python. The destination connector can be any of the ones supported. This example uses the local destination connector.

This example sends data to Unstructured for processing by default. To process data locally instead, see the instructions at the end of this page.

<CodeGroup>
  ```bash CLI theme={null}
  #!/usr/bin/env bash

  unstructured-ingest \
    discord \
      --token $DISCORD_ACCESS_TOKEN \
      --channels 475182341782896651,847950191108554762 \
      --output-dir $LOCAL_FILE_OUTPUT_DIR \
      --partition-by-api \
      --api-key $UNSTRUCTURED_API_KEY \
      --partition-endpoint $UNSTRUCTURED_API_URL
  ```

  ```python Python Ingest theme={null}
  import os

  from unstructured_ingest.pipeline.pipeline import Pipeline
  from unstructured_ingest.interfaces import ProcessorConfig

  from unstructured_ingest.processes.connectors.discord import (
      DiscordIndexerConfig,
      DiscordDownloaderConfig,
      DiscordConnectionConfig,
      DiscordAccessConfig
  )

  from unstructured_ingest.processes.partitioner import PartitionerConfig
  from unstructured_ingest.processes.chunker import ChunkerConfig
  from unstructured_ingest.processes.embedder import EmbedderConfig

  from unstructured_ingest.processes.connectors.local import LocalUploaderConfig

  # Chunking and embedding are optional.

  if __name__ == "__main__":
      Pipeline.from_configs(
          context=ProcessorConfig(),
          indexer_config=DiscordIndexerConfig(
              channels=[
                  "475182341782896651",
                  "847950191108554762"
              ]
          ),
          downloader_config=DiscordDownloaderConfig(download_dir=os.getenv("LOCAL_FILE_DOWNLOAD_DIR")),
          source_connection_config=DiscordConnectionConfig(
              access_config=DiscordAccessConfig(token=os.getenv("DISCORD_ACCESS_TOKEN"))
          ),
          partitioner_config=PartitionerConfig(
              partition_by_api=True,
              api_key=os.getenv("UNSTRUCTURED_API_KEY"),
              partition_endpoint=os.getenv("UNSTRUCTURED_API_URL"),
              additional_partition_args={
                  "split_pdf_page": True,
                  "split_pdf_allow_failed": True,
                  "split_pdf_concurrency_level": 15
              }
          ),
          chunker_config=ChunkerConfig(chunking_strategy="by_title"),
          embedder_config=EmbedderConfig(embedding_provider="huggingface"),
          uploader_config=LocalUploaderConfig(output_dir=os.getenv("LOCAL_FILE_OUTPUT_DIR"))
      ).run()
  ```
</CodeGroup>

For the Unstructured Ingest CLI and the Unstructured Ingest Python library, you can use the `--partition-by-api` option (CLI) or `partition_by_api` (Python) parameter to specify where files are processed:

* To do local file processing, omit `--partition-by-api` (CLI) or `partition_by_api` (Python), or explicitly specify `partition_by_api=False` (Python).

  Local file processing does not use an Unstructured API key or API URL, so you can also omit the following, if they appear:

  * `--api-key $UNSTRUCTURED_API_KEY` (CLI) or `api_key=os.getenv("UNSTRUCTURED_API_KEY")` (Python)
  * `--partition-endpoint $UNSTRUCTURED_API_URL` (CLI) or `partition_endpoint=os.getenv("UNSTRUCTURED_API_URL")` (Python)
  * The environment variables `UNSTRUCTURED_API_KEY` and `UNSTRUCTURED_API_URL`

* To send files to the legacy [Unstructured Partition Endpoint](/api-reference/legacy-api/partition/overview) for processing, specify `--partition-by-api` (CLI) or `partition_by_api=True` (Python).

  Unstructured also requires an Unstructured API key and API URL, by adding the following:

  * `--api-key $UNSTRUCTURED_API_KEY` (CLI) or `api_key=os.getenv("UNSTRUCTURED_API_KEY")` (Python)
  * `--partition-endpoint $UNSTRUCTURED_API_URL` (CLI) or `partition_endpoint=os.getenv("UNSTRUCTURED_API_URL")` (Python)
  * The environment variables `UNSTRUCTURED_API_KEY` and `UNSTRUCTURED_API_URL`, representing your API key and API URL, respectively.

  <Note>
    You must specify the API URL only if you are not using the default API URL for Unstructured Ingest, which applies to **Let's Go**, **Pay-As-You-Go**, and **Business SaaS** accounts.

    The default API URL for Unstructured Ingest is `https://api.unstructuredapp.io/general/v0/general`, which is the API URL for the legacy[Unstructured Partition Endpoint](/api-reference/legacy-api/partition/overview). However, you should always use the URL that was provided to you when your Unstructured account was created. If you do not have this URL, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

    If you do not have an API key, [get one now](/api-reference/legacy-api/partition/overview).

    If you are using a **Business** account, the process
    for generating Unstructured API keys, and the Unstructured API URL that you use, are different.
    For instructions, see your Unstructured account administrator, or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).
  </Note>

---

# Source: https://docs.unstructured.io/open-source/installation/docker-installation.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Docker installation

Follow these steps to run the Unstructured open source library inside a Docker container.

<Steps>
  <Step title="Install and run Docker">
    If you do not have Docker already installed and running, you can install and run a tool such as Docker Desktop, which is
    available for macOS, Windows, and Linux. Learn how to install and run:

    * [Docker Desktop on Mac](https://docs.docker.com/desktop/setup/install/mac-install/)
    * [Docker Desktop on Windows](https://docs.docker.com/desktop/setup/install/windows-install/)
    * [Docker Desktop on Linux](https://docs.docker.com/desktop/setup/install/linux/)
  </Step>

  <Step title="Pull the Unstructured Docker image">
    <Info>If you are an experienced Docker user, you plan to parse only a single type of data, and you want to accelerate the image-building process, you can [build your own Docker image](#building-your-own-docker-image) instead of pulling the latest prebuilt image.</Info>

    <Tabs>
      <Tab title="Docker Desktop UI">
        <Note>
          The following steps are for AMD64-based systems.

          If you are using an ARM64-based system (such as Apple Silicon), follow the instructions on the **Docker CLI** tab in this step instead.
        </Note>

        1. In your Docker Desktop UI's search box, enter `downloads.unstructured.io/unstructured-io/unstructured:latest`.
        2. On the **Images** tab, next to **unstructured-io/unstructured**, click **Pull**.

        To list the available images on your machine, in the sidebar, click **Images**.

        To remove this image from your machine at any time, click the trash can (**Delete**) icon next to the image in the
        list of available images.
      </Tab>

      <Tab title="Docker CLI">
        From your terminal or command prompt, run the following command.

        <Tip>If you have the Docker Desktop UI running, you can click the **Terminal** button in the UI's lower right corner to run a Docker CLI session from within the Docker Desktop UI.</Tip>

        For AMD64-based systems, run the following command:

        ```bash  theme={null}
        # The AMD64 platform is the default.
        docker pull downloads.unstructured.io/unstructured-io/unstructured:latest

        # Or, to explicitly specify the AMD64 platform:
        docker pull --platform=linux/amd64 downloads.unstructured.io/unstructured-io/unstructured:latest
        ```

        For ARM64-based systems (such as Apple Silicon), run the following command instead:

        ```bash  theme={null}
        docker pull --platform=linux/arm64 downloads.unstructured.io/unstructured-io/unstructured:latest
        ```

        To list the available images on your machine, run the following command:

        ```bash  theme={null}
        docker images
        ```

        To remove this image from your machine at any time, run the following command:

        ```bash  theme={null}
        docker rmi downloads.unstructured.io/unstructured-io/unstructured:latest
        ```
      </Tab>
    </Tabs>
  </Step>

  <Step title="Create and run a container from the image">
    <Tabs>
      <Tab title="Docker Desktop UI">
        <Note>
          The following steps are for AMD64-based systems.

          If you are using an ARM64-based system (such as Apple Silicon), follow the instructions on the **Docker CLI** tab in this step instead.
        </Note>

        1. In the Docker Desktop UI's sidebar, click **Images**.
        2. Next to **unstructured-io/unstructured**, click the play (**Run**) icon.
        3. Expand **Optional settings**.
        4. For **Container name**, enter some name for your container, such as `unstructured`.
        5. In the sidebar, click **Containers**.
        6. Next to your container, click the play (**Start**) icon.
      </Tab>

      <Tab title="Docker CLI">
        For AMD64-based systems, run the following command, replacing `<container-name>` with some name for your container, such as `unstructured`:

        ```bash  theme={null}
        # The AMD64 platform is the default.
        docker run -dt --name <container-name> downloads.unstructured.io/unstructured-io/unstructured:latest

        # Or, to explicitly specify the AMD64 platform:
        docker run -dt --platform=linux/amd64 --name <container-name> downloads.unstructured.io/unstructured-io/unstructured:latest
        ```

        For ARM64-based systems (such as Apple Silicon), run the following command instead, replacing `<container-name>` with some name for your container, such as `unstructured`:

        ```bash  theme={null}
        docker run -dt --platform=linux/arm64 --name <container-name> downloads.unstructured.io/unstructured-io/unstructured:latest
        ```
      </Tab>
    </Tabs>
  </Step>

  <Step title="Interact with the Unstructured open source library by running code inside the container">
    <Tabs>
      <Tab title="Docker Desktop UI">
        1. In the Docker Desktop UI, in the lower right corner, click the **Terminal** button.

        2. To start a terminal session inside the container, run the following command, replacing `<container-name>` with the name of your container, such as `unstructured`:

           ```bash  theme={null}
           docker exec -it <container-name> bash
           ```

        3. Run Unstructured open source library calls from inside the container. For example, start the Python interpreter:

           ```bash  theme={null}
           python
           ```

           And then run the following commands, one command at a time, to make calls to the Unstructured open source library.
           These calls process a PDF file in the `/app/example-docs/pdf` directory named `layout-parser-paper.pdf`. The
           processed data is written as a JSON file named `layout-parser-paper-output.json` in that same directory:

           ```bash  theme={null}
           >>> from unstructured.partition.pdf import partition_pdf
           >>> from unstructured.staging.base import elements_to_json
           >>> elements = partition_pdf(filename="/app/example-docs/pdf/layout-parser-paper.pdf")
           >>> elements_to_json(elements=elements, filename="/app/example-docs/pdf/layout-parser-paper-output.json")
           ```

           After the last call finishes running, exit the Python interpreter, and then print the contents of the JSON file to the terminal:

           ```bash  theme={null}
           >>> exit()

           cat ./example-docs/pdf/layout-parser-paper-output.json
           ```

        4. To exit the terminal session, run the following command, or press `Ctrl+D`:

           ```bash  theme={null}
           exit
           ```
      </Tab>

      <Tab title="Docker CLI">
        1. Run the following command, replacing `<container-name>` with the name of your container, such as `unstructured`:

           ```bash  theme={null}
           docker exec -it <container-name> bash
           ```

        2. Run Unstructured open source library calls from inside the container. For example, start the Python interpreter:

           ```bash  theme={null}
           python
           ```

           And then run the following commands, one command at a time, to make calls to the Unstructured open source library.
           These calls process a PDF file in the `/app/example-docs/pdf` directory named `layout-parser-paper.pdf`. The
           processed data is written as a JSON file named `layout-parser-paper-output.json` in that same directory:

           ```bash  theme={null}
           >>> from unstructured.partition.pdf import partition_pdf
           >>> from unstructured.staging.base import elements_to_json
           >>> elements = partition_pdf(filename="/app/example-docs/pdf/layout-parser-paper.pdf")
           >>> elements_to_json(elements=elements, filename="/app/example-docs/pdf/layout-parser-paper-output.json")
           ```

           After the last call finishes running, exit the Python interpreter, and then print the contents of the JSON file to the terminal:

           ```bash  theme={null}
           >>> exit()

           cat ./example-docs/pdf/layout-parser-paper-output.json
           ```

        3. To exit the terminal session, run the following command, or press `Ctrl+D`:

           ```bash  theme={null}
           exit
           ```
      </Tab>
    </Tabs>
  </Step>

  <Step title="Interact with the Unstructured open source library by running code outside the container">
    You can also interact with the Unstructured open source library by running code that is on the
    same machine as the running container but not within the container itself. To do this, you can
    use the Docker CLI to create a container that mounts the local directory containing the
    code into the container itself, and then run that code from the container.

    1. Run one of the following commands, replacing the following placeholders with the appropriate values:

       * Replace `<host-path>` with the path to the directory containing your code, for example `/Users/<username>/my_example_code/`.
       * Replace `<container-path>` with the path to some directory within the container to mount `<host-path>` into, for example `/app/my_example_code/`. If
         `<container-path>` does not already exist, it will be created at the same time that the container is created.
       * Replace `<container-name>` with some name for your container, such as `unstructured_mount`.

       For AMD64-based systems, run the following command:

       ```bash  theme={null}
       # The AMD64 platform is the default.
       docker run -dt -v <host-path>:<container-path>--name <container-name> downloads.unstructured.io/unstructured-io/unstructured:latest

       # Or, to explicitly specify the AMD64 platform:
       docker run -dt -v <host-path>:<container-path> --platform=linux/amd64 --name <container-name> downloads.unstructured.io/unstructured-io/unstructured:latest
       ```

       For ARM64-based systems (such as Apple Silicon), run the following command instead:

       ```bash  theme={null}
       docker run -dt -v <host-path>:<container-path> --platform=linux/arm64 --name <container-name> downloads.unstructured.io/unstructured-io/unstructured:latest
       ```

    2. Start a terminal session inside the container by running the following command, replacing `<container-name>` with the name of your container, such as `unstructured_mount`:

       ```bash  theme={null}
       docker exec -it <container-name> bash
       ```

    3. Add `<container-path>` to the `PYTHONPATH` environment variable within the container by running the following commands,
       replacing `<container-path>` with the path to the target directory within the container:

       ```bash  theme={null}
       PYTHONPATH="${PYTHONPATH}:<container-path>"
       export PYTHONPATH
       ```

    4. Run Unstructured open source library calls, referencing your code from `<container-path>`.

       For example, if you have a file named `main.py` in `<host-path>`that contains the four commands following `>>>` from the previous step,
       you can run it as follows, replacing `<container-path>` with the path to the target directory within the container:

       ```bash  theme={null}
       python <container-path>/main.py
       ```

       To print the contents of the JSON file to the terminal, run the following command:

       ```bash  theme={null}
       cat /app/example-docs/pdf/layout-parser-paper-output.json
       ```

    5. To exit the terminal session, run the following command, or press `Ctrl+D`:

       ```bash  theme={null}
       exit
       ```
  </Step>

  <Step title="Stop running the container">
    If you do not need the keep running the container, you can stop it as follows:

    <Tabs>
      <Tab title="Docker Desktop UI">
        1. In the Docker Desktop UI, in the sidebar, click **Containers**.
        2. Next to your container, click the square (**Stop**) icon.
      </Tab>

      <Tab title="Docker CLI">
        Run the following command, replacing `<container-name>` with the name of your container, such as `unstructured` or `unstructured_mount`:

        ```bash  theme={null}
        docker stop <container-name>
        ```
      </Tab>
    </Tabs>
  </Step>
</Steps>

## Building your own Docker image

You can build your own Docker image instead of pulling the latest prebuilt image.
If you only plan to parse a single type of data, you can accelerate the
build process by excluding certain packages or requirements needed for other data types. Refer to the
[Dockerfile](https://github.com/Unstructured-IO/unstructured/blob/main/Dockerfile) to determine which lines
are necessary for your requirements.

```bash  theme={null}
make docker-build

# Start a Bash shell inside of the running Docker container.
make docker-start-bash
```

---

# Source: https://docs.unstructured.io/ui/document-elements.md

# Source: https://docs.unstructured.io/open-source/concepts/document-elements.md

# Source: https://docs.unstructured.io/api-reference/legacy-api/partition/document-elements.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Document elements and metadata

<Note>
  The following information applies to the legacy Unstructured Partition Endpoint.

  Unstructured recommends that you use the
  [on-demand jobs](/api-reference/workflow/overview#run-an-on-demand-job) functionality in the
  [Unstructured API](/api-reference/overview) instead. Unstructured's on-demand jobs provide
  many benefits over the legacy Unstructured Partition Endpoint, including support for:

  * Production-level usage.
  * Multiple local input files in batches.
  * The latest and highest-performing models.
  * Post-transform enrichments.
  * All of Unstructured's chunking strategies.
  * The generation of vector embeddings.

  The Unstructured API also provides support for processing files and data in remote locations.
</Note>

`unstructured` simplifies and streamline the preprocessing of structured and unstructured documents for downstream tasks.
What that means is no matter where your data is and no matter what format that data is in, Unstructured's toolkit will
transform and preprocess that data into an easily digestible and usable format that is uniform across data formats.

When you partition a document with Unstructured, the result is a list of document `Element` objects.
These element objects represent different components of the source document.

## Element example

Here's an example of what an element might look like:

```
{'type': 'NarrativeText',
 'element_id': '5ef1d1117721f0472c1ad825991d7d37',
 'text': 'The Unstructured API documentation covers the following API services:',
 'metadata': {'last_modified': '2024-05-01T14:15:22',
  'page_number': 1,
  'languages': ['eng'],
  'parent_id': '56f24319ae258b735cac3ec2a271b1d9',
  'file_directory': '/content',
  'filename': 'Unstructured - Unstructured.html',
  'filetype': 'text/html'}}
```

As you can see, every element will have a `type`, an `element_id`, the extracted `text`, and some `metadata` which may
vary depending on the element type, document structure, and some additional parameters used during partitioning and/or chunking.

Let's explore some of these document element components in more detail.

## Element type

Instead of treating all documents like a wall of plain text, Unstructured preserves the semantic structure of the documents.
This gives you more control and flexibility over how you further use the processed documents and allows you to take their
structure into consideration. At the same time, normalizing data from various file formats to the Unstructured element
type scheme lets you treat all documents the same in your downstream processing, regardless of source format.
For example, if you plan to summarize a document, you may only be interested in the narrative
of the document, and not care about footers and headers. You can easily filter out the elements you don't need using their type.

Here are some examples of the element types your document may contain:

| Element type        | Description                                                                                                                                        |
| ------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------- |
| `Formula`           | An element containing formulas in a document.                                                                                                      |
| `FigureCaption`     | An element for capturing text associated with figure captions.                                                                                     |
| `NarrativeText`     | NarrativeText is an element consisting of multiple, well-formulated sentences. This excludes elements such titles, headers, footers, and captions. |
| `ListItem`          | ListItem is a NarrativeText element that is part of a list.                                                                                        |
| `Title`             | A text element for capturing titles.                                                                                                               |
| `Address`           | A text element for capturing physical addresses.                                                                                                   |
| `EmailAddress`      | A text element for capturing email addresses.                                                                                                      |
| `Image`             | A text element for capturing image metadata.                                                                                                       |
| `PageBreak`         | An element for capturing page breaks.                                                                                                              |
| `Table`             | An element for capturing tables.                                                                                                                   |
| `Header`            | An element for capturing document headers.                                                                                                         |
| `Footer`            | An element for capturing document footers.                                                                                                         |
| `CodeSnippet`       | An element for capturing code snippets.                                                                                                            |
| `PageNumber`        | An element for capturing page numbers.                                                                                                             |
| `UncategorizedText` | Base element for capturing free text from within document.                                                                                         |

If you apply chunking during partitioning of a document or later, you will also see the `CompositeElement` type.
`CompositeElement` is a chunk formed from text (non-Table) elements. It is only produced by chunking.
A composite element may be formed by combining one or more sequential elements produced by partitioning. For example,
several individual list items may be combined into a single chunk.

## Element ID

By default, the element ID is a SHA-256 hash of the element's text, its position on the page, page number it's on,
and the name of the document file - this is to ensure that the ID is deterministic and unique at the document level.
To obtain globally unique IDs in the output (UUIDs), you can pass `unique_element_ids=True` into any of the partition
functions. This can be helpful if you'd like to use the IDs as a primary key in a database, for example.

## Metadata

Unstructured tracks a variety of metadata about the elements extracted from documents.
Here are a couple of examples of what element metadata enables you to do:

* filter document elements based on an element metadata value. For instance, you may want to limit your scope to elements from a certain page, or you may want to use only elements that have an email matching a regular expression in their metadata.
* map an element to the document page where it occurred so that original page can be retrieved when that element matches search criteria.

Metadata is tracked at the element level. You can access the metadata for a given document element with
`element.metadata`. For a dictionary representation, use `element.metadata.to_dict()`.

### Common metadata fields

All document types return the following metadata fields when the information is available from the source file:

| Metadata field name        | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
| -------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `filename`                 | Filename                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| `file_directory`           | File directory                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
| `last_modified`            | Last modified Date                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
| `filetype`                 | File type                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| `coordinates`              | XY Bounding Box Coordinates. See notes below for further details about the bounding box.                                                                                                                                                                                                                                                                                                                                                                                           |
| `parent_id`                | Element Hierarchy. `parent_id` may be used to infer where an element resides within the overall hierarchy of a document. For instance, a NarrativeText element may have a Title element as a parent (a “sub-title”), which in turn may have another Title element as its parent (a "title").                                                                                                                                                                                       |
| `category_depth`           | Element depth relative to other elements of the same category.  Category depth is the depth of an element relative to other elements of the same category. It’s set by a document partitioner and enables the hierarchy post-processor to compute more accurate hierarchies. Category depth may be set using native document hierarchies, e.g. reflecting \<H1>, \<H2>, or \<H3> tags within an HTML document or the indentation level of a bulleted list item in a Word document. |
| `image_base64`             | A Base64 representation of the detected image or table. Only applicable to image and table elements when High Res partitioning is used. After chunking, `image_base64` is not preserved in the output.                                                                                                                                                                                                                                                                             |
| `image_mime_type`          | MIME type of the image. Only applicable to image elements.                                                                                                                                                                                                                                                                                                                                                                                                                         |
| `text_as_html`             | HTML representation of extracted tables. Only applicable to table elements.                                                                                                                                                                                                                                                                                                                                                                                                        |
| `languages`                | Document Languages.  At document level or element level. List is ordered by probability of being the primary language of the text.                                                                                                                                                                                                                                                                                                                                                 |
| `emphasized_text_contents` | Emphasized text (bold or italic) in the original document.                                                                                                                                                                                                                                                                                                                                                                                                                         |
| `emphasized_text_tags`     | Tags on text that is emphasized in the original document.                                                                                                                                                                                                                                                                                                                                                                                                                          |
| `orig_elements`            | For chunked elements, a list of the original elements that were used to create the current chunked element.                                                                                                                                                                                                                                                                                                                                                                        |
| `is_continuation`          | True if element is a continuation of a previous element.  Only relevant for chunking, if an element was divided into two due to max\_characters.                                                                                                                                                                                                                                                                                                                                   |
| `detection_class_prob`     | Detection model class probabilities.  From unstructured-inference, hi-res strategy.                                                                                                                                                                                                                                                                                                                                                                                                |

Notes on common metadata fields:

#### Metadata for document hierarchy

`parent_id` and `category_depth` enhance hierarchy detection to identify the document
structure in various file formats by measuring relative depth of an element within its category. This is especially
useful in documents with native hierarchies like HTML or Word files, where elements like headings or list items inherently define structure.

#### Element's coordinates

Some document types support location data for the elements, usually in the form of bounding boxes.
If it exists, an element's location data is available with `element.metadata.coordinates`. The `coordinates` property of an `ElementMetadata` stores:

* `points` : These specify the corners of the bounding box starting from the top left corner and proceeding counter-clockwise. The points represent pixels, the origin is in the top left and the `y` coordinate increases in the downward direction.
* `system`: The points have an associated coordinate system. A typical example of a coordinate system is `PixelSpace`, which is used for representing the coordinates of images. The coordinate system has a name, orientation, layout width, and layout height.

The Unstructured Open Source library offers a way to change the coordinates of an element to a new coordinate system by
using the `Element.convert_coordinates_to_new_system` method. If the `in_place` flag is `True`, the coordinate system
and points of the element are updated in place and the new coordinates are returned. If the `in_place` flag is `False`,
only the altered coordinates are returned.

```python  theme={null}
from unstructured.documents.elements import Element
from unstructured.documents.coordinates import PixelSpace, RelativeCoordinateSystem

coordinates = ((10, 10), (10, 100), (200, 100), (200, 10))
coordinate_system = PixelSpace(width=850, height=1100)

element = Element(coordinates=coordinates, coordinate_system=coordinate_system)
print(element.metadata.coordinates.to_dict())
print(element.metadata.coordinates.system.orientation)
print(element.metadata.coordinates.system.width)
print(element.metadata.coordinates.system.height)

element.convert_coordinates_to_new_system(RelativeCoordinateSystem(), in_place=True)
# Should now be in terms of new coordinate system
print(element.metadata.coordinates.to_dict())
print(element.metadata.coordinates.system.orientation)
print(element.metadata.coordinates.system.width)
print(element.metadata.coordinates.system.height)

```

### Additional metadata fields by document type

| Field name             | Applicable file types | Description                                                                                                                                                          |
| ---------------------- | --------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `attached_to_filename` | MSG                   | The name of the file that the attached file is attached to.                                                                                                          |
| `bcc_recipient`        | EML                   | The related [email](#email) BCC recipient.                                                                                                                           |
| `cc_recipient`         | EML                   | The related [email](#email) CC recipient.                                                                                                                            |
| `email_message_id`     | EML                   | The related [email](#email) message ID.                                                                                                                              |
| `header_footer_type`   | Word Doc              | The pages that a header or footer applies to in a [Word document](#microsoft-word-files): `primary`, `even_only`, and `first_page`.                                  |
| `image_path`           | PDF                   | The path to the image. This is useful when you want to extract the image and save it in a specified path instead of serializing the image within the processed data. |
| `image_mime_type`      | PDF                   | The MIME type of the image.                                                                                                                                          |
| `image_url`            | HTML                  | The URL to the image.                                                                                                                                                |
| `link_start_indexes`   | HTML, PDF             | A list of the index locations within the extracted content where the `links` can be found.                                                                           |
| `link_texts`           | HTML                  | A list of text strings that are associated with the `link_urls`.                                                                                                     |
| `link_urls`            | HTML                  | A list of URLs within the extracted content.                                                                                                                         |
| `links`                | PDF                   | A list of links within the extracted content.                                                                                                                        |
| `page_name`            | XLSX                  | The related sheet's name in an [Excel file](#microsoft-excel-files).                                                                                                 |
| `page_number`          | DOCX, PDF, PPT, XLSX  | The related file's page number.                                                                                                                                      |
| `section`              | EPUB                  | The book section title corresponding to a table of contents.                                                                                                         |
| `sent_from`            | EML                   | The related [email](#email) sender.                                                                                                                                  |
| `sent_to`              | EML                   | The related [email](#email) recipient.                                                                                                                               |
| `signature`            | EML                   | The related [email](#email) signature.                                                                                                                               |
| `subject`              | EML                   | The related [email](#email) subject.                                                                                                                                 |

Notes on additional metadata by document type:

#### Email

For emails, metadata will contain the following fields, where available:

* `bcc_recipient`
* `cc_recipient`
* `email_message_id`
* `sent_from`
* `sent_to`
* `signature`
* `subject`

`sent_from` is a list of strings because the [RFC 822](https://www.rfc-editor.org/rfc/rfc822) spec for emails allows for multiple sent from email addresses.

#### Microsoft Excel documents

For Excel documents, `ElementMetadata` will contain a `page_name` element, which corresponds to the sheet name in the Excel
document.

#### Microsoft Word documents

Headers and footers in Word documents include a `header_footer_type` indicating which page a header or footer applies to.
Valid values are `"primary"`, `"even_only"`, and `"first_page"`.

### Table-specific metadata

For `Table` elements, the raw text of the table will be stored in the `text` attribute for the Element, and HTML representation
of the table will be available in the element metadata under `element.metadata.text_as_html`. By default,
Unstructured will automatically extract all tables for all doc types unless you set `skip_infer_table_types` parameter.

Here's an example of a table element. The `text` of the element will look like this:

```
Dataset Base Model1 Large Model Notes PubLayNet [38] F / M M Layouts of modern scientific documents PRImA [3] M - Layouts of scanned modern magazines and scientific reports Newspaper [17] F - Layouts of scanned US newspapers from the 20th century TableBank [18] F F Table region on modern scientific and business document HJDataset [31] F / M - Layouts of history Japanese documents
```

And the `text_as_html` metadata for the same element will look like this:

```py  theme={null}
<table><thead><th>Dataset</th><th>| Base Model’</th><th>| Notes</th></thead><tr><td>PubLayNet</td><td>[38] F/M</td><td>Layouts of modern scientific documents</td></tr><tr><td>PRImA [3]</td><td>M</td><td>Layouts of scanned modern magazines and scientific reports</td></tr><tr><td>Newspaper</td><td>F</td><td>Layouts of scanned US newspapers from the 20th century</td></tr><tr><td>TableBank</td><td>F</td><td>Table region on modern scientific and business document</td></tr><tr><td>HJDataset [31]</td><td>F/M</td><td>Layouts of history Japanese documents</td></tr></table>
```

### Data connector metadata fields

Documents processed through source connectors include additional document metadata. These additional fields only ever
appear if the source document was processed by a connector.

#### Common data connector metadata fields

* Data Source metadata (on json output):
  * url
  * version
  * date created
  * date modified
  * date processed
  * record locator
* Record locator is specific to each connector

#### Additional metadata fields by connector type (via record locator)

| Source connector      | Additional metadata              |
| --------------------- | -------------------------------- |
| airtable              | base id, table id, view id       |
| azure (from fsspec)   | protocol, remote file path       |
| box (from fsspec)     | protocol, remote file path       |
| confluence            | url, page id                     |
| discord               | channel                          |
| dropbox (from fsspec) | protocol, remote file path       |
| elasticsearch         | url, index name, document id     |
| fsspec                | protocol, remote file path       |
| google drive          | drive id, file id                |
| gcs (from fsspec)     | protocol, remote file path       |
| jira                  | base url, issue key              |
| onedrive              | user pname, server relative path |
| outlook               | message id, user email           |
| s3 (from fsspec)      | protocol, remote file path       |
| sharepoint            | server path, site url            |
| wikipedia             | page title, age url              |

---

# Source: https://docs.unstructured.io/support/issues/document-processing.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Document processing issues

## Issues

When Unstructured tries to partition or chunk a document, you get one of the following error types:

* `TooManyPageFailuresException`
* `ControllerException`: For example, an error wrapping partitioning or chunking errors such as code `512`.

## Possible causes

* There are issues with the document's underlying content or structure.
* Unstructured encountered too many errors while processing individual pages or elements within a specific document, exceeding some internal failure threshold.
* The document is corrupted or malformed. This is especially the case for some complex PDFs.
* The document has a highly unusual underlying structure or has highly unusual content, which the specified partitioning model cannot handle.
* The document is encrypted or password-protected.
* Some underlying issues are causing page-level failures. This could sometimes be related to quotas if a per-page VLM is used.

## Possible solutions

* **Inspect the document**: Examine the specific document. Make sure it opens correctly, does not seem to look unusual, and does not appear to be corrupted or encrypted.
* **Test a simpler document**: Try processing a known-good, simple document of the same type, to see if the error is document-specific.
* **Check quotas (if VLM related)**: If the underlying errors mention quotas, see [Quota, billing, and rate limiting issues](/support/issues/quota-billing-rate-limiting).
* **Report the problematic document to Unstructured Support**: If the issue seems specific to an otherwise seemingly valid document, [report it to Unstructured Support](/support/request). If possible, provide the document for Unstructured to further investigate.

## Additional resources

To ask questions or get additional help with this issue, see [requesting support](/support/request).

---

# Source: https://docs.unstructured.io/api-reference/jobs/download-job-output.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Download Job output

> Download the output of a job from a workflow where the input file was provided at runtime.


## OpenAPI

````yaml https://platform.unstructuredapp.io/openapi.json get /api/v1/jobs/{job_id}/download
openapi: 3.1.0
info:
  title: Platform API
  version: 3.1.0
servers:
  - url: https://platform.unstructuredapp.io/
    description: Unstructured Platform API
    x-speakeasy-server-id: platform-api
security: []
paths:
  /api/v1/jobs/{job_id}/download:
    get:
      tags:
        - jobs
      summary: Download Job output
      description: >-
        Download the output of a job from a workflow where the input file was
        provided at runtime.
      operationId: download_job_output
      parameters:
        - name: job_id
          in: path
          required: true
          schema:
            type: string
            format: uuid
            title: Job Id
        - name: file_id
          in: query
          required: true
          schema:
            type: string
            description: ID of the file to download
            title: File Id
          description: ID of the file to download
        - name: node_id
          in: query
          required: false
          schema:
            anyOf:
              - type: string
                format: uuid
              - type: 'null'
            description: >-
              Node ID to retrieve the corresponding output file.If not provided,
              uses the last node in the workflow.
            title: Node Id
          description: >-
            Node ID to retrieve the corresponding output file.If not provided,
            uses the last node in the workflow.
        - name: unstructured-api-key
          in: header
          required: false
          schema:
            anyOf:
              - type: string
              - type: 'null'
            title: Unstructured-Api-Key
      responses:
        '200':
          description: Successful Response
          content:
            application/json:
              schema: {}
        '422':
          description: Validation Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/HTTPValidationError'
components:
  schemas:
    HTTPValidationError:
      properties:
        detail:
          items:
            $ref: '#/components/schemas/ValidationError'
          type: array
          title: Detail
      type: object
      title: HTTPValidationError
    ValidationError:
      properties:
        loc:
          items:
            anyOf:
              - type: string
              - type: integer
          type: array
          title: Location
        msg:
          type: string
          title: Message
        type:
          type: string
          title: Error Type
      type: object
      required:
        - loc
        - msg
        - type
      title: ValidationError

````

---

# Source: https://docs.unstructured.io/ui/sources/dropbox-source-quickstart.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Dropbox source connector quickstart

Unstructured can connect to several types of [sources](/ui/connectors#sources). In this quickstart, you create a [Dropbox](https://www.dropbox.com/) source connector that you can add to your Unstructured [workflows](/ui/workflows).
This source connector enables your Unstructured workflows to process your files that you store in a Dropbox account.

In this quickstart, you will:

* Create a free Dropbox Basic account.
* Create a Dropbox app in your Dropbox account. This app will provide the connection between your Dropbox account and your Unstructured account.
* Get specific information about your Dropbox app that Unstructured needs to connect to the app.
* Create a Dropbox source connector in your Unstructured account.
* Add the Dropbox source connector to a workflow in your Unstructured account.

If you are not able to complete the following steps, contact Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

## Step 1: Create your Dropbox account

In this step, you create a [free Dropbox Basic account](https://www.dropbox.com/basic). This account is limited to a single user and 2 GB of storage.

If you already have a Dropbox account and want to use it instead, then skip ahead to Step 2.

1. Go to the Dropbox account sign up page, at [https://www.dropbox.com/register](https://www.dropbox.com/register).
2. Enter your email address, and then click **Continue**.
3. Enter your name and then, for **Password**, enter some password for your new account. Be sure to save this password to some secure location,
   as you will need it to access your Dropbox account later.
4. Click **Agree and sign up**.
5. To create a Dropbox Basic account, click the **Personal** tile.
6. When you are prompted to install the Dropbox desktop app or mobile app, click **Skip step**. (You can always install these apps later.)
7. To create a Dropbox Basic account, click **Continue with 2 GB Dropbox Basic plan** at the bottom of the account type selectionpage.
8. You are automatically signed in to your new Dropbox Basic account, and the Dropbox user interface (UI) appears.

   <img src="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Dropbox-UI.png?fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=0beca3a2af745ab441706f123aa4e800" alt="Dropbox user interface" data-og-width="1877" width="1877" data-og-height="730" height="730" data-path="img/dropbox-quickstart/Dropbox-UI.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Dropbox-UI.png?w=280&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=83ae530f25a13af5484b5a08d186458d 280w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Dropbox-UI.png?w=560&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=67fbec1d3597b5de23b2a71b598dc4fe 560w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Dropbox-UI.png?w=840&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=9be6af38fc30b729026c07f8eaded821 840w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Dropbox-UI.png?w=1100&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=a3309ad6cc63ab22d9e5cf7eb9db3e20 1100w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Dropbox-UI.png?w=1650&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=1f758ca3da4cb8a754676ebfc71e28b8 1650w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Dropbox-UI.png?w=2500&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=4429f5bbbd6c70df4aea7161ba42beca 2500w" />

## Step 2: Create a Dropbox app

In this step, you create a Dropbox app in your Dropbox account. Unstructured will use this app to access your Dropbox account.

1. From a new tab in your web browser, open the Dropbox Developers page, at [https://www.dropbox.com/developers](https://www.dropbox.com/developers).

2. Click **Create apps**.

   <img src="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Create-Apps.png?fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=df03181b070dc3b781aefc899268aed0" alt="Begin creating a Dropbox app" data-og-width="1318" width="1318" data-og-height="1206" height="1206" data-path="img/dropbox-quickstart/Create-Apps.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Create-Apps.png?w=280&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=f0c4dd3c7f20fbec293c9748156db2d3 280w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Create-Apps.png?w=560&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=538190fa3b5f0e58e2f2086ae1d97694 560w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Create-Apps.png?w=840&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=ccc3280febfe8da9ca2cf8c67d1a3d28 840w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Create-Apps.png?w=1100&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=a5962c771cc0b5cbf06ef56bd29d54a8 1100w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Create-Apps.png?w=1650&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=8d28f988433f5378b526121c563c9980 1650w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Create-Apps.png?w=2500&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=7ddb665b12319931e25c9fca463878ca 2500w" />

   <Tip>
     If **Create apps** is not visible then, on the top navigation bar, click **App Console**, and then click **Create app**.
   </Tip>

3. For **Choose an API**, select the **Scoped access** radio button.

4. For **Choose the type of access you need**, select the **App folder** radio button.

5. For **Name your app**, enter some name for your Dropbox app.

6. Check the box labelled **I agree to Dropbox API Terms and Conditions**, if it appears.

7. Click **Create app**.

   <img src="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Create-App.png?fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=70fe95f66275eb274501c245bd6090c3" alt="Finish creating the Dropbox app" data-og-width="1100" width="1100" data-og-height="779" height="779" data-path="img/dropbox-quickstart/Create-App.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Create-App.png?w=280&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=6a3c51438c2ea30a4f47dcf73e957a06 280w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Create-App.png?w=560&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=11096fd858977ffeb44e86d69671ef20 560w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Create-App.png?w=840&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=2dc24c6a5cf7f4688ae3cbee6d75d13e 840w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Create-App.png?w=1100&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=a4ac96570b34a9eafa78cae4e91bdbd0 1100w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Create-App.png?w=1650&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=d89c3c7989858a52fc68e105ba59ffde 1650w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Create-App.png?w=2500&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=2d4658f81da51559ce3b270570d6d51c 2500w" />

8. On the Dropbox app's **Permissions** tab, under **Files and folders**, check the box labelled **files.content.read**, and then click **Submit**.

   <img src="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Files-Content-Read.png?fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=6f3a9d2df452cd419eb248e2823e490e" alt="Setting files.content.read for the Dropbox app" data-og-width="1096" width="1096" data-og-height="741" height="741" data-path="img/dropbox-quickstart/Files-Content-Read.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Files-Content-Read.png?w=280&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=09cf672b43953cf6cec7c5a25c951d90 280w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Files-Content-Read.png?w=560&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=e090a83a7f1645963a80101c18c91de1 560w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Files-Content-Read.png?w=840&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=b72f8b1bcaa4b6ac6dc2a8456d414c27 840w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Files-Content-Read.png?w=1100&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=1d47802fd4a63d3d9580741b1d6f314e 1100w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Files-Content-Read.png?w=1650&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=4d5bf278fa7fdd42cb1c638105fef81a 1650w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Files-Content-Read.png?w=2500&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=1c9fbfa772ac07535971b8530912810a 2500w" />

9. On the app's **Settings** tab, note the value of the **App folder name** field. This is the name of the subfolder that Dropbox will create under the `Apps` top-level folder in your Dropbox account. Your new Dropbox app will use this subfolder for access.

   <img src="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/App-Folder-Name.png?fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=43d6e899019d1e071dc94d1e472ad5ca" alt="Noting the app folder's name" data-og-width="1092" width="1092" data-og-height="764" height="764" data-path="img/dropbox-quickstart/App-Folder-Name.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/App-Folder-Name.png?w=280&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=241170fc4054b02d8a442c1aa7193413 280w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/App-Folder-Name.png?w=560&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=d8132327f6cef89a1ae0365760b39ce6 560w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/App-Folder-Name.png?w=840&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=026270ae66b1edda7f8ca9cf8d698a2d 840w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/App-Folder-Name.png?w=1100&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=f24f8158101d5f22edc5530f535b6c75 1100w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/App-Folder-Name.png?w=1650&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=cf95976ab7e72ba2a7e2a78c0b4a9ae7 1650w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/App-Folder-Name.png?w=2500&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=bc93581d2e82e88250e22101a10fbbc4 2500w" />

10. With the app's **Settings** tab still showing, scroll down to **App key**.

11. Next to **App secret**, click **Show**.

12. Note the values of **App key** and **App secret**, as you will need them later for Steps 3 and 5.

    <img src="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/App-Key-App-Secret.png?fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=b807f7a326c3eed8e0c0c2df66c5c4ae" alt="Noting the app's key and secret" data-og-width="1102" width="1102" data-og-height="766" height="766" data-path="img/dropbox-quickstart/App-Key-App-Secret.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/App-Key-App-Secret.png?w=280&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=cf2e50c38649436db2c5e2c3a9f43f41 280w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/App-Key-App-Secret.png?w=560&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=42d2c32ed4eb28199d0013d38deed327 560w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/App-Key-App-Secret.png?w=840&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=b3da426a67b041732c75108e3073a5a9 840w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/App-Key-App-Secret.png?w=1100&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=5a633560d637a3b1a9dd078a3454c3f3 1100w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/App-Key-App-Secret.png?w=1650&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=676f7de1eccab82091d58aa7a3e0dbfe 1650w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/App-Key-App-Secret.png?w=2500&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=628cee357caa73dced3b4730833557d6 2500w" />

## Step 3: Get a refresh token for your Dropbox app

In this step, you get a refresh token for your Dropbox app. Unstructured needs this refresh token, along with the
**App key** and **App secret** from the previous step, to be able to use your Dropbox app to connect to your Dropbox account.

1. In a new tab in your web browser, enter the following address. In this address,
   replace `<app-key>` with the **App key** you noted in Step 2:

   ```bash  theme={null}
   https://www.dropbox.com/oauth2/authorize?client_id=<app-key>&response_type=code&token_access_type=offline
   ```

   For example, if your **App key** is `aaa1aaaaaa1aaaa`, then your address should look like this:

   ```bash  theme={null}
   https://www.dropbox.com/oauth2/authorize?client_id=aaa1aaaaaa1aaaa&response_type=code&token_access_type=offline
   ```

2. Click **Continue**.

   <img src="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Access-Code-Continue.png?fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=5512ebe1a9c77eb88d0884adfbb60a4d" alt="Continuing to get the access code" data-og-width="655" width="655" data-og-height="523" height="523" data-path="img/dropbox-quickstart/Access-Code-Continue.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Access-Code-Continue.png?w=280&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=d240178f9e1187d4d6c8f59a3dba9af4 280w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Access-Code-Continue.png?w=560&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=ba5aa07174b4d523c2319b02419f8dc4 560w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Access-Code-Continue.png?w=840&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=b12fcde35f2ca2e7bd9c8e2f1430b33b 840w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Access-Code-Continue.png?w=1100&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=8f1c0fc56a40e89b6a05d450d04829f5 1100w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Access-Code-Continue.png?w=1650&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=c8bad370f16aa76068f90b2fc73da5d0 1650w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Access-Code-Continue.png?w=2500&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=e22a5c11bc6a598b6217b488c71e9f47 2500w" />

3. Click **Allow**.

   <img src="https://mintcdn.com/unstructured-53/sm3rwRONB4yNpTK7/img/dropbox-quickstart/Access-Code-Allow.png?fit=max&auto=format&n=sm3rwRONB4yNpTK7&q=85&s=5ee6666f40638018e4d7d714ec6f2ea2" alt="Allowing the access code to be generated" data-og-width="633" width="633" data-og-height="447" height="447" data-path="img/dropbox-quickstart/Access-Code-Allow.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/sm3rwRONB4yNpTK7/img/dropbox-quickstart/Access-Code-Allow.png?w=280&fit=max&auto=format&n=sm3rwRONB4yNpTK7&q=85&s=dd7bbd671704e6dd5af9a0382c3f8b5f 280w, https://mintcdn.com/unstructured-53/sm3rwRONB4yNpTK7/img/dropbox-quickstart/Access-Code-Allow.png?w=560&fit=max&auto=format&n=sm3rwRONB4yNpTK7&q=85&s=a0c064a5901cbe857cfcdc862564eda3 560w, https://mintcdn.com/unstructured-53/sm3rwRONB4yNpTK7/img/dropbox-quickstart/Access-Code-Allow.png?w=840&fit=max&auto=format&n=sm3rwRONB4yNpTK7&q=85&s=67662d82ee46621461b18d1bb8386369 840w, https://mintcdn.com/unstructured-53/sm3rwRONB4yNpTK7/img/dropbox-quickstart/Access-Code-Allow.png?w=1100&fit=max&auto=format&n=sm3rwRONB4yNpTK7&q=85&s=275f1b9a307a6414cef9fcf9b6910675 1100w, https://mintcdn.com/unstructured-53/sm3rwRONB4yNpTK7/img/dropbox-quickstart/Access-Code-Allow.png?w=1650&fit=max&auto=format&n=sm3rwRONB4yNpTK7&q=85&s=46ac8a544e8a1f30db7a04e3859cf803 1650w, https://mintcdn.com/unstructured-53/sm3rwRONB4yNpTK7/img/dropbox-quickstart/Access-Code-Allow.png?w=2500&fit=max&auto=format&n=sm3rwRONB4yNpTK7&q=85&s=48fe0774aafe5948090e82e825cbdf42 2500w" />

4. Note the value in the **Access Code Generated** box.

   <img src="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Access-Code-Generated.png?fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=97ec73db8b40ea3bbc34c928bbe86828" alt="Viewing the generated access code" data-og-width="648" width="648" data-og-height="319" height="319" data-path="img/dropbox-quickstart/Access-Code-Generated.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Access-Code-Generated.png?w=280&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=d306c988134824762a162f81777d3c32 280w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Access-Code-Generated.png?w=560&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=bd03db23304e0cf09ad1c74301523d79 560w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Access-Code-Generated.png?w=840&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=4ad73a83a027d954824213dac9412f28 840w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Access-Code-Generated.png?w=1100&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=ddc120c263c87cf0d265a136ce4bd4ac 1100w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Access-Code-Generated.png?w=1650&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=dc241c587e30fd5f7233c627efc467c3 1650w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Access-Code-Generated.png?w=2500&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=ba8b0a1e217d2f73a1ce4df1cd635423 2500w" />

   <Tip>
     This same access code value also appears in the **auth\_code** parameter from the address bar. For example, if the address bar shows this:

     ```bash  theme={null}
     https://www.dropbox.com/oauth2/authorize_success?auth_code=ccc1ccc1ccc1-1cc-1cc&client_id=aaa1aaaaaa1aaaa&version=1
     ```

     Then the **auth\_code** value is `ccc1ccc1ccc1-1cc-1cc`.
   </Tip>

5. On your computer, open the Terminal if you're using macOS, a terminal window if you're using Linux, or the Command Prompt if you're using Windows.

6. Check to see if the `curl` utility is alredy installed on your computer by running the following command:

   ```bash  theme={null}
   curl --version
   ```

7. If the `curl` utility is already installed, then you will see the `curl` version number and some other information about `curl`.

   If some kind of error message appears instead, then install `curl` on your computer by following [instructions for your operating system](https://ec.haxx.se/install/index.html).

8. From your Terminal, terminal window, or Command Prompt, run the following `curl` command. In this command,
   replace the following placeholders:

   * Replace `<app-key>` with the **App key** you noted in Step 2.
   * Replace `<app-secret>` with the **App secret** you noted in Step 2.
   * Replace `<access-code>` with the **Access Code Generated** (or `auth_code`, if you followed the Tip) that you noted earlier in this step.

   <Icon icon="apple" />  <Icon icon="linux" />  For macOS or Linux, run this `curl` command:

   ```bash  theme={null}
   curl --location --request POST 'https://api.dropbox.com/oauth2/token' \
   --user '<app-key>:<app-secret>' \
   --header 'Content-Type: application/x-www-form-urlencoded' \
   --data-urlencode 'code=<access-code>' \
   --data-urlencode 'grant_type=authorization_code'
   ```

   <Icon icon="apple" />  <Icon icon="linux" />  For example, if your **App key** is `aaa1aaaaaa1aaaa`, your **App secret** is `bbb1bbb1bbb1bbb1`, and your **Access Code Generated** or `auth_code` value is `ccc1ccc1ccc1-1cc-1cc`, then your command should look like this:

   ```bash  theme={null}
   curl --location --request POST 'https://api.dropbox.com/oauth2/token' \
   --user 'aaa1aaaaaa1aaaa:bbb1bbb1bbb1bbb1' \
   --header 'Content-Type: application/x-www-form-urlencoded' \
   --data-urlencode 'code=ccc1ccc1ccc1-1cc-1cc' \
   --data-urlencode 'grant_type=authorization_code'
   ```

   <Icon icon="windows" />  For Windows, run this `curl` command:

   ```bash  theme={null}
   curl --location --request POST "https://api.dropbox.com/oauth2/token" ^
   --user "<app-key>:<app-secret>" ^
   --header "Content-Type: application/x-www-form-urlencoded" ^
   --data-urlencode "code=<access-code>" ^
   --data-urlencode "grant_type=authorization_code"
   ```

   <Icon icon="windows" />  For example, if your **App key** is `aaa1aaaaaa1aaaa`, your **App secret** is `bbb1bbb1bbb1bbb1`, and your **Access Code Generated** or `auth_code` value is `ccc1ccc1ccc1-1cc-1cc`, then your command should look like this:

   ```bash  theme={null}
   curl --location --request POST "https://api.dropbox.com/oauth2/token" ^
   --user "aaa1aaaaaa1aaaa:bbb1bbb1bbb1bbb1" ^
   --header "Content-Type: application/x-www-form-urlencoded" ^
   --data-urlencode "code=ccc1ccc1ccc1-1cc-1cc" ^
   --data-urlencode "grant_type=authorization_code"
   ```

9. In the response, note the vaue of the `refresh_token` field, which you will need later for Step 5. For example, if the response looks like this
   (line breaks are added here for readability—the actual response will not have these line breaks):

   ```bash  theme={null}
   {
       "access_token": "sl.u.AF-aaa1aaa1", 
       "token_type": "bearer", 
       "expires_in": 14400, 
       "refresh_token": "ddd1ddd1ddd1ddd1", <-- Note this value.
       "scope": "account_info.read files.content.read files.metadata.read", 
       "uid": "1111111111", 
       "account_id": "dbid:aaa1aaa1aaa1"
   }
   ```

   Then the value of the `refresh_token` field is `ddd1ddd1ddd1ddd1`.

## Step 4: Upload documents to your Dropbox app folder

In this step, you add your documents to your Dropbox app folder in your Dropbox account.

1. In the Dropbox UI from Step 1, expand `Apps`, and then click the folder that matches the **App folder name** field you noted in Step 2.

   <img src="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Select-App-Folder.png?fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=fd18c9539cb6b109a95e28d0c37eab65" alt="Selecting the app folder" data-og-width="680" width="680" data-og-height="470" height="470" data-path="img/dropbox-quickstart/Select-App-Folder.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Select-App-Folder.png?w=280&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=eac05f03b10cdcf436919be572a23703 280w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Select-App-Folder.png?w=560&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=02801ec26e52b8d7f25995a90aae6e83 560w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Select-App-Folder.png?w=840&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=c4e879b05130505f20391614d93b3538 840w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Select-App-Folder.png?w=1100&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=960ae8ce6e13ab01d073dc7f52d2a0d8 1100w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Select-App-Folder.png?w=1650&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=3c4e4df14cfdd92937ce5e45ee303461 1650w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Select-App-Folder.png?w=2500&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=d20f651b0da0753ccfcc09eda56b45bc 2500w" />

   <Tip>
     If the `Apps` folder is not clickable, or if Dropbox reports that the folder is not found,
     then try refreshing the page and clicking the `Apps` folder again.
   </Tip>

2. Click **Create folder**.

   <img src="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Create-Folder.png?fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=76e91e850f97fc97b3203bc89cc75392" alt="Begin creating the subfolder" data-og-width="1958" width="1958" data-og-height="522" height="522" data-path="img/dropbox-quickstart/Create-Folder.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Create-Folder.png?w=280&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=10d497357a505be443d8290a0d4d4bb9 280w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Create-Folder.png?w=560&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=3f917c48c6f01c4ed5a567646697ef0a 560w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Create-Folder.png?w=840&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=e085f42e3ab391f6d74dea61ca76341b 840w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Create-Folder.png?w=1100&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=1a42788df2b6670106872ad332e53b13 1100w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Create-Folder.png?w=1650&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=74dd379b8925a27dd1b439df1df4d65f 1650w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Create-Folder.png?w=2500&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=05c7419474732d2e9548fe7b6d7a2a57 2500w" />

3. Give the subfolder a name, and then click **Create**.

   <img src="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Create-Apps-Subfolder.png?fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=b2ffe53e59fa04c3dcbe4535d95b8d24" alt="Finish creating the subfolder" data-og-width="389" width="389" data-og-height="407" height="407" data-path="img/dropbox-quickstart/Create-Apps-Subfolder.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Create-Apps-Subfolder.png?w=280&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=397ed6556d2d2799f958ac1549f5e4e2 280w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Create-Apps-Subfolder.png?w=560&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=e1f4a01a05a2ba0b5167c9c78567da83 560w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Create-Apps-Subfolder.png?w=840&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=6604b1d902fbc815697fb58fe021e239 840w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Create-Apps-Subfolder.png?w=1100&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=2923ebcb345275cf0b8e6da1313fa82a 1100w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Create-Apps-Subfolder.png?w=1650&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=adbcb4f05ee0a230a060e132f7c8f5a5 1650w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Create-Apps-Subfolder.png?w=2500&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=5c52ab5250f73394df1125f1df87cc8a 2500w" />

4. Click **Upload or drop** (or **Upload > Files** or **Upload > Folder**), and then follow the on-screen instructions to upload some documents to this subfolder in your Dropbox app folder.
   For a Dropbox Basic account, the total size of all of the files you upload and store in your Dropbox account (not just this subfolder) cannot exceed 2 GB.

   <img src="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Upload-To-Apps-Subfolder.png?fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=1f78b1fc056da0384b33718748e3cc4c" alt="Upload files to subfolder" data-og-width="1194" width="1194" data-og-height="688" height="688" data-path="img/dropbox-quickstart/Upload-To-Apps-Subfolder.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Upload-To-Apps-Subfolder.png?w=280&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=61331b2f4c01178b79f991ab2ed4272a 280w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Upload-To-Apps-Subfolder.png?w=560&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=55170b33b1b37b6302314a728adbd161 560w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Upload-To-Apps-Subfolder.png?w=840&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=30110f1e0242e419700e64e799fc2286 840w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Upload-To-Apps-Subfolder.png?w=1100&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=9f6023e93ee13cbe795ca4d69bc41da4 1100w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Upload-To-Apps-Subfolder.png?w=1650&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=9480fb585ec42dd36e8c84ac807f1bcc 1650w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Upload-To-Apps-Subfolder.png?w=2500&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=41b8a6159650d0fb6b7ded55af0594cc 2500w" />

## Step 5: Create the Dropbox source connector

In this step, you create a Dropbox source connector in your Unstructured account. This source connector
is used by Unstructured to connect to your Dropbox account and then process the documents in the specified folder.

1. If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
   After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).

   <Note>
     To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/ui/overview#how-am-i-billed%3F).
   </Note>

2. If you have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at [https://platform.unstructured.io](https://platform.unstructured.io).

   <Note>
     For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
     or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).
   </Note>

3. On the sidebar, click **Connectors**.

   <img src="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Sidebar-Connectors.png?fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=7809fcd7555999c5923ea46a1b589db4" alt="Connectors sidebar icon" data-og-width="77" width="77" data-og-height="216" height="216" data-path="img/dropbox-quickstart/Sidebar-Connectors.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Sidebar-Connectors.png?w=280&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=cac397b00d8d715257b7415daec8a1ac 280w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Sidebar-Connectors.png?w=560&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=90e5ff953095c78bca22fd48b79fd00f 560w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Sidebar-Connectors.png?w=840&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=66abd4a711415f1fcbcb97755461e61a 840w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Sidebar-Connectors.png?w=1100&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=88a7b3ae33a818ddbfb5a4f30d12c0a5 1100w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Sidebar-Connectors.png?w=1650&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=250ba3ba3b1c9dae9f3974406a333d07 1650w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Sidebar-Connectors.png?w=2500&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=376f4cad7cbd7ba944087988b1b33c6a 2500w" />

4. Click **+ New**.

5. Enter some unique name for this connector, for example `dropbox-source-connector`.

6. For **Type**, click **Source**.

7. For **Provider**, click **Dropbox**.

8. Click **Continue**.

   <img src="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Select-Connector-Type.png?fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=846a52dd2e971b4acb98136cc83b4afb" alt="Selecting the connector type" data-og-width="701" width="701" data-og-height="673" height="673" data-path="img/dropbox-quickstart/Select-Connector-Type.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Select-Connector-Type.png?w=280&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=b29dc78b6047e92f1514dfbe4efc013e 280w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Select-Connector-Type.png?w=560&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=cc6e4d1188488eacf1392a27211d66d3 560w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Select-Connector-Type.png?w=840&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=110a0a631f1b752b4594e6921c41b69b 840w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Select-Connector-Type.png?w=1100&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=29b4200e4ffa7f2945e9ef58b0c32941 1100w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Select-Connector-Type.png?w=1650&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=a949f858ce74a0937073a21e65e84afd 1650w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Select-Connector-Type.png?w=2500&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=bb8f43093583200a73ed8a29164d5da0 2500w" />

9. For **Data URL**, enter `dropbox://`, followed by the name of the subfolder you created in the previous step. For example,
   if the name of the subfolder is `my-folder`, then the data URL would be `dropbox://my-folder`.

   <img src="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Apps-Subfolder-Name.png?fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=a1a545f6fb42c712a3baa8276cbc8539" alt="Forming the data URL" data-og-width="571" width="571" data-og-height="303" height="303" data-path="img/dropbox-quickstart/Apps-Subfolder-Name.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Apps-Subfolder-Name.png?w=280&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=4344e66bd9ac57072b664e26e68f26ab 280w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Apps-Subfolder-Name.png?w=560&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=f4efb44145c0b750adc311ef8173ac3d 560w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Apps-Subfolder-Name.png?w=840&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=3ac38a5c92d1bf87b7f514a865afb21e 840w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Apps-Subfolder-Name.png?w=1100&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=a210e774b62180a80aca705f0e449437 1100w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Apps-Subfolder-Name.png?w=1650&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=46f85b40bca38dde5d46b3bb952d9800 1650w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Apps-Subfolder-Name.png?w=2500&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=403cd8c0c7c309fa6208d412e2f99865 2500w" />

10. For **App key**, enter the **App key** you noted in Step 2.

11. For **App secret**, enter the **App secret** you noted in Step 2.

12. For **Refresh token**, enter the **Refresh token** you noted in Step 3.

13. Click **Save and Test**, and wait while Unstructured tests the connector.

<img src="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Source-Connector-Settings.png?fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=7ce24fc4dd787e378839a9bd6d82490c" alt="Entering connector settings" data-og-width="701" width="701" data-og-height="719" height="719" data-path="img/dropbox-quickstart/Source-Connector-Settings.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Source-Connector-Settings.png?w=280&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=7908754b2e486ef8e3a7fe7c6e855212 280w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Source-Connector-Settings.png?w=560&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=57a9a4e3483e89a9c44b90aa4c5bf5ab 560w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Source-Connector-Settings.png?w=840&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=1b59cb576e15e4e27dc66688a5d44ea4 840w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Source-Connector-Settings.png?w=1100&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=d7c1af20e209987f679d19e2b526ea34 1100w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Source-Connector-Settings.png?w=1650&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=8f163878e136a8c906a0d858753b1be1 1650w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Source-Connector-Settings.png?w=2500&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=cb8f6bb49f7d5ddd0d0dd23a40482e91 2500w" />

14. If a green **Successful** message appears, then you have successfully created the connector.

    <img src="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Connector-Test-Success.png?fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=96a856445d7d7fe19a4da6528ca6d4db" alt="Successful connector test" data-og-width="927" width="927" data-og-height="161" height="161" data-path="img/dropbox-quickstart/Connector-Test-Success.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Connector-Test-Success.png?w=280&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=06c54f4baa5152f6be9644a9167ff081 280w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Connector-Test-Success.png?w=560&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=b223df60d77ddd1cbd730510478e3928 560w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Connector-Test-Success.png?w=840&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=2374d62c4af0da0d98ca231adebc1df8 840w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Connector-Test-Success.png?w=1100&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=37f8aa49bec955be73f60595b014c721 1100w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Connector-Test-Success.png?w=1650&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=48a535b024e773f001820710f1b7bc1d 1650w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Connector-Test-Success.png?w=2500&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=63e0637d07d72a1f76b87cfd1943c51d 2500w" />

    If, however, a red error message appears, fix the issue, and try this step again.

    If you cannot fix the issue, contact Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

Congratulations! You have successfully created a Dropbox source connector in your Unstructured account.

If you are not able to complete these steps, contact Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

## Next steps

* If you do not have a destination connector in your Unstructured account, then complete the [Pinecone destination connector quickstart](/ui/destinations/pinecone-destination-quickstart).

  If you're not sure if you have a destination connector, click **Connectors** in your Unstructured account's sidebar, and then click **Destinations** to see if there are any listed.

  <img src="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Sidebar-Connectors.png?fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=7809fcd7555999c5923ea46a1b589db4" alt="Connectors sidebar icon" data-og-width="77" width="77" data-og-height="216" height="216" data-path="img/dropbox-quickstart/Sidebar-Connectors.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Sidebar-Connectors.png?w=280&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=cac397b00d8d715257b7415daec8a1ac 280w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Sidebar-Connectors.png?w=560&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=90e5ff953095c78bca22fd48b79fd00f 560w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Sidebar-Connectors.png?w=840&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=66abd4a711415f1fcbcb97755461e61a 840w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Sidebar-Connectors.png?w=1100&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=88a7b3ae33a818ddbfb5a4f30d12c0a5 1100w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Sidebar-Connectors.png?w=1650&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=250ba3ba3b1c9dae9f3974406a333d07 1650w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Sidebar-Connectors.png?w=2500&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=376f4cad7cbd7ba944087988b1b33c6a 2500w" />

* If you already have a destination connector, then you can add this Dropbox source connector as well as your destination connector to a workflow in your Unstructured account. To do this:

  1. Click **Workflows** in your Unstructured account's sidebar.

     <img src="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Sidebar-Workflows.png?fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=fe6cae91b1bbb6657edc98fd739ad060" alt="Workflows sidebar icon" data-og-width="75" width="75" data-og-height="279" height="279" data-path="img/dropbox-quickstart/Sidebar-Workflows.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Sidebar-Workflows.png?w=280&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=ebac804ce00f68186ce2c5318609ecc9 280w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Sidebar-Workflows.png?w=560&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=e3eb968f82e57ef4f06477ca1df42bd2 560w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Sidebar-Workflows.png?w=840&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=e240139d5635f7b89d260559df190960 840w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Sidebar-Workflows.png?w=1100&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=e86fadd7176e1624e17db70070512c2f 1100w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Sidebar-Workflows.png?w=1650&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=492b7387bccc0164296220f148b9f5fd 1650w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Sidebar-Workflows.png?w=2500&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=f620470f39e7e725b5dacf1bbd37496c 2500w" />

  2. Click **New Workflow +**.

     <img src="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/New-Workflow.png?fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=acef2a97aa6a38127834f0f681fec8a5" alt="Begin creating a workflow" data-og-width="2612" width="2612" data-og-height="250" height="250" data-path="img/dropbox-quickstart/New-Workflow.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/New-Workflow.png?w=280&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=2b7e81208e7741a4bfecbb2a65cd3a78 280w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/New-Workflow.png?w=560&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=af62730df3d6c59500cbbe7b3ae23af8 560w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/New-Workflow.png?w=840&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=02cc67b6aa23b47d2b0858787f4841a2 840w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/New-Workflow.png?w=1100&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=172af9b01c5324cdd3b138a4f495312e 1100w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/New-Workflow.png?w=1650&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=1ee1a185e5e9c89909a5fa58b53dbcbc 1650w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/New-Workflow.png?w=2500&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=60905eb7f776a98953af9912f5f2926e 2500w" />

  3. With **Build it Myself** already selected, click **Continue**.

     <img src="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Build-It-Myself.png?fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=f39c417338fdc42b6d1762ab9b41e5c7" alt="Build it myself workflow type" data-og-width="1162" width="1162" data-og-height="846" height="846" data-path="img/dropbox-quickstart/Build-It-Myself.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Build-It-Myself.png?w=280&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=f608ebcea60ba91f393ac8e00c02e851 280w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Build-It-Myself.png?w=560&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=91365c5854818955c30f7350a6843172 560w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Build-It-Myself.png?w=840&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=d36c295c9a5f1286f0ed37cdc6ea7e2d 840w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Build-It-Myself.png?w=1100&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=e7228156bf7ff5cdfa7f00848f6f2d1a 1100w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Build-It-Myself.png?w=1650&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=d3871592459cc92067577f6ff10342e1 1650w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Build-It-Myself.png?w=2500&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=ff959f43b4f1c851b15ac7ee23a19a51 2500w" />

  4. Click the **Source** node. (Do not click **Drop file to test**.)

     <img src="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Source-Node.png?fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=22df9f41b4db6897fb15e543f391c92c" alt="Workflow source node" data-og-width="237" width="237" data-og-height="182" height="182" data-path="img/dropbox-quickstart/Source-Node.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Source-Node.png?w=280&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=ac528908f9756e3688722343622992e2 280w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Source-Node.png?w=560&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=70546577bdd29b73e55f0393d3976ad5 560w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Source-Node.png?w=840&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=c8eca1cd848309c58c7cbd4a62ee5242 840w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Source-Node.png?w=1100&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=49d237824e6b06ea75a85c993343872d 1100w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Source-Node.png?w=1650&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=07a8ea27ee57de806d9a98156102d9e4 1650w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Source-Node.png?w=2500&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=be37167f4d94ebb177ca272040c8b752 2500w" />

  5. On the **Details** tab, click **Connectors**, and then click the name of your Dropbox source connector.

     <img src="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Select-Source-Connector.png?fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=1da059d0b2bf301372663becd411578b" alt="Selecting the source connector" data-og-width="530" width="530" data-og-height="311" height="311" data-path="img/dropbox-quickstart/Select-Source-Connector.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Select-Source-Connector.png?w=280&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=286e537f62cfb37911d6e39e87de8a19 280w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Select-Source-Connector.png?w=560&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=b8ce7eb59d82ed2e9cd69342c6bb7a9f 560w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Select-Source-Connector.png?w=840&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=110c32ddd23c6378d15b8ce7c73943f4 840w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Select-Source-Connector.png?w=1100&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=1dca60cb1771bbb624528eb2a6bf1296 1100w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Select-Source-Connector.png?w=1650&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=353a8a3ad23881c394a07113d218da5d 1650w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Select-Source-Connector.png?w=2500&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=6994adfbb3a05cfa1edd51a49283b172 2500w" />

  6. Click the **Destination** node.

     <img src="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Destination-Node.png?fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=7f4fed222a8f2fd3499e1986cfc552dc" alt="Workflow destination node" data-og-width="287" width="287" data-og-height="118" height="118" data-path="img/dropbox-quickstart/Destination-Node.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Destination-Node.png?w=280&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=8940a2e174a4e09efc205593b66880c8 280w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Destination-Node.png?w=560&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=8f4ddd6998728191b1e8bb4db97e437f 560w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Destination-Node.png?w=840&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=aa0a7286a2a033f51a8a34e14382f6d2 840w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Destination-Node.png?w=1100&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=7a85c73b480c54332044a7d4abaf3336 1100w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Destination-Node.png?w=1650&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=91ee05c1d90cd9e6473e2b458cf24771 1650w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Destination-Node.png?w=2500&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=6224a7daff25ce5f360eb443526e43fb 2500w" />

  7. On the **Details** tab, click the name of your destination connector.

     <img src="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Select-Destination-Connector.png?fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=700084df5d3f878428e707860a96099a" alt="Selecting the destination connector" data-og-width="524" width="524" data-og-height="133" height="133" data-path="img/dropbox-quickstart/Select-Destination-Connector.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Select-Destination-Connector.png?w=280&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=599854572e812a11d32bbdd4a05ca174 280w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Select-Destination-Connector.png?w=560&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=6b75cc27ad2fbd62e0ca98a79203a8f6 560w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Select-Destination-Connector.png?w=840&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=7e367af7239054bcd2bb27a42c76ad15 840w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Select-Destination-Connector.png?w=1100&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=cc1ed12ad23ea0cd234c2f10a12c7a7e 1100w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Select-Destination-Connector.png?w=1650&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=2b20ccd9ce443adb01ccb22f27aab4f0 1650w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Select-Destination-Connector.png?w=2500&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=21bc2408c40c47b20240e2fcf14c2955 2500w" />

  8. Switch **Active** to on, and then click **Save**.

     <img src="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Make-Workflow-Active.png?fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=13b389cafe579fff4ef97899850201d8" alt="Making the workflow active" data-og-width="192" width="192" data-og-height="45" height="45" data-path="img/dropbox-quickstart/Make-Workflow-Active.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Make-Workflow-Active.png?w=280&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=67f2a3c3bd42b621c510615c7f36a580 280w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Make-Workflow-Active.png?w=560&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=1fd84c28d846aaba3fb88a2f0791900c 560w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Make-Workflow-Active.png?w=840&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=b6b0d9fcc2fe46520377cb7ecfa353a9 840w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Make-Workflow-Active.png?w=1100&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=2a9d3550f2f876baddaafe585483e6dd 1100w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Make-Workflow-Active.png?w=1650&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=70628c310653183beacda00e6cf4f824 1650w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Make-Workflow-Active.png?w=2500&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=e32edb8b00da2ad3026c34e14fbcedd7 2500w" />

  9. Next to your workflow's name, click **Run**.

     <img src="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Run-Workflow.png?fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=b11755bcce69512120501e991414f31a" alt="Running the workflow" data-og-width="1457" width="1457" data-og-height="176" height="176" data-path="img/dropbox-quickstart/Run-Workflow.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Run-Workflow.png?w=280&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=93d0bb71b6fc209f1ef9cdede453ac51 280w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Run-Workflow.png?w=560&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=b67f0d5bcb7fc463d251de3cb7b5319a 560w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Run-Workflow.png?w=840&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=510de50a2a7665649eb8df387413cdd0 840w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Run-Workflow.png?w=1100&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=7319a1f4d84569a90863084b233044d4 1100w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Run-Workflow.png?w=1650&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=5053333465dc5bdfb0769800310a4caf 1650w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Run-Workflow.png?w=2500&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=b032acb155befecf42bd5ffcab6feed2 2500w" />

  10. Click **Jobs** in your Unstructured account's sidebar.

      <img src="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Sidebar-Jobs.png?fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=c2fae030bdd046d1a8de983ab3939371" alt="Jobs sidebar icon" data-og-width="78" width="78" data-og-height="364" height="364" data-path="img/dropbox-quickstart/Sidebar-Jobs.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Sidebar-Jobs.png?w=280&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=aff1efcd8966fc59af383827eee66906 280w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Sidebar-Jobs.png?w=560&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=9e773fd9792a88b78299695e5d0c5e9c 560w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Sidebar-Jobs.png?w=840&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=61890d011b45e43bba7e5d3f01ebc456 840w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Sidebar-Jobs.png?w=1100&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=ae38f96ea981889bafc1782e90631cfc 1100w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Sidebar-Jobs.png?w=1650&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=c9b287ae715cdf51c414625f3b459d15 1650w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Sidebar-Jobs.png?w=2500&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=757012b87d165854ed64e42f4fb0d87d 2500w" />

  11. After the job shows **Finished** with a green checkmark, go to your destination's location to see Unstructured's processed data output.

  <img src="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Finished-Job.png?fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=d4f62ceee8057bab11846eb8788ed205" alt="Finished job status" data-og-width="786" width="786" data-og-height="382" height="382" data-path="img/dropbox-quickstart/Finished-Job.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Finished-Job.png?w=280&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=b82e7673d32051f0be4beb705fae9843 280w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Finished-Job.png?w=560&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=a8efa3d651e2959ea336f259ec7c957f 560w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Finished-Job.png?w=840&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=662f20e52c4b9513d03b854b70dacffc 840w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Finished-Job.png?w=1100&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=ca58cbc3ace98f66fbb7373016e388e9 1100w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Finished-Job.png?w=1650&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=cc42487ad1b04f8f1e72a5e08c984901 1650w, https://mintcdn.com/unstructured-53/vQCKW1wFRcqqOuSd/img/dropbox-quickstart/Finished-Job.png?w=2500&fit=max&auto=format&n=vQCKW1wFRcqqOuSd&q=85&s=bec116353e726af4d39ceba0696abb49 2500w" />

If you are not able to complete these steps, contact Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

---

# Source: https://docs.unstructured.io/ui/sources/dropbox.md

# Source: https://docs.unstructured.io/open-source/ingestion/source-connectors/dropbox.md

# Source: https://docs.unstructured.io/open-source/ingestion/destination-connectors/dropbox.md

# Source: https://docs.unstructured.io/api-reference/workflow/sources/dropbox.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Dropbox

<Note>
  If you're new to Unstructured, read this note first.

  Before you can create a source connector, you must first sign in to your Unstructured account:

  * If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
    After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).
    To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/api-reference/overview#pricing).
  * If you already have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at
    [https://platform.unstructured.io](https://platform.unstructured.io). For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
    or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  After you sign in, the [Unstructured user interface](/ui/overview) (UI) appears, which you use to get your Unstructured API key, as follows:

  1. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business** account, click **API Keys** on the sidebar.<br />

     For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
     for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).

  2. Click **Generate API Key**.<br />

  3. Follow the on-screen instructions to finish generating the key.<br />

  4. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

  After you create the source connector, add it along with a
  [destination connector](/api-reference/workflow/destinations/overview) to a [workflow](/api-reference/workflow/overview#workflows).
  Then run the worklow as a [job](/api-reference/workflow/overview#jobs). To learn how, try out the
  the notebook [Dropbox-To-Pinecone Connector API Quickstart for Unstructured](https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Dropbox_To_Pinecone_Connector_Quickstart.ipynb),
  or watch the two 4-minute video tutorials for the [Unstructured Python SDK](/api-reference/workflow/overview#unstructured-python-sdk).

  You can also create source connectors with the Unstructured user interface (UI).
  [Learn how](/ui/sources/overview).

  If you need help, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  You are now ready to start creating a source connector! Keep reading to learn how.
</Note>

Ingest your files into Unstructured from Dropbox.

The requirements are as follows.

<iframe width="560" height="315" src="https://www.youtube.com/embed/Vku5uYa-2N4" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

1. A [Dropbox account](https://www.dropbox.com/try/teams).

2. A Dropbox app for your Dropbox account. To create a Dropbox app, do the following:

   a) Sign in to the [Dropbox Developers](https://www.dropbox.com/developers) portal with the same credentials as your Dropbox account.<br />
   b) Open your [App Console](https://www.dropbox.com/developers/apps).<br />
   c) Click **Create app**.<br />
   d) For **Choose an API**, select **Scoped access**.<br />
   e) For **Choose the type of access you need**, select **App folder**.<br />
   f) Enter a name for your app, and then click **Create app**.<br />
   g) On the app's **Permissions** tab, under **Files and folders**, check the boxes labelled **files.content.read** or **files.content.write** or both,
   depending on whether you want to read files, write files, or both. Then click **Submit**.<br />
   h) On the app's **Settings** tab, note the value of the **App folder name** field. This is the name of the app folder that Dropbox will create under the `Apps` top-level folder in your Dropbox
   account that the Dropbox app will use for access. If you change the value of **App folder name** field here, Dropbox will create an app folder with that name under the `Apps` top-level folder instead.<br />
   i) Under **OAuth 2**, next to **Generated access token**, click **Generate**. Copy the value of this access token. You should only click **Generate** after you have completed all of the preceding steps first.
   This is because the access token is scoped to the specific app folder and settings at the time the access token is generated. If you change the app folder name or any of the permissions later,
   you should regenerate the access token.<br />

   <Warning>
     Access tokens are valid for **only four hours** after they are created. After this four-hour period, you can no longer use the expired access token.
     Dropbox does not allow the creation of access tokens that are valid for more than four hours.

     To replace an expired access token, you must first generate a *refresh token* for the corresponding access token. To learn how to generate an access token and its corresponding refresh token,
     see [Replace an expired access token](#replace-an-expired-access-token), later in this article.

     If you do not already have the corresponding refresh token for an existing access token, or if you lose a refresh token after you generate it,
     you must generate a new access token and its corresponding refresh token.

     Instead of continualy replacing expired access tokens yourself, you can have Unstructured do it for you as needed; just supply Unstructured
     with the refresh token along with the Dropbox app's **App key** and **App secret** values.
     To learn how to supply these to Unstructured, look for mentions of "refresh token," "app key," and "app secret" in the connector settings later in this article.
   </Warning>

3. The app folder that your Dropbox app will use for access can be found in your Dropbox account under the `Apps` top-level folder. For example, if the value of the **App folder name**
   field above is `my-folder`, then the app folder that your Dropbox app will use for access can be found under `https://dropbox.com/home/Apps/my-folder`

   <img src="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/connectors/dropbox-app-folder.png?fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=b5f9679d941b592ae0c62b0c661d6dac" alt="The my-folder app folder under the Apps top-level folder" data-og-width="254" width="254" data-og-height="279" height="279" data-path="img/connectors/dropbox-app-folder.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/connectors/dropbox-app-folder.png?w=280&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=96784d3e87b315412327a8a051173244 280w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/connectors/dropbox-app-folder.png?w=560&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=b3dc98eb9b0d97d17a91cc7c328c1ac1 560w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/connectors/dropbox-app-folder.png?w=840&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=c3614d4e743e978e288681bc7876e91b 840w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/connectors/dropbox-app-folder.png?w=1100&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=3f63e54518ac1abc767a8506ee7c4b59 1100w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/connectors/dropbox-app-folder.png?w=1650&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=27f3406da66a7be0ea1616c7ea7af434 1650w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/connectors/dropbox-app-folder.png?w=2500&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=2f3a1cb5d47837746c005256c9cbd204 2500w" />

   <Warning>
     Your Dropbox app will *not* have access to upload or download files from the root of the app folder. Instead, you *must* create a subfolder inside of the app folder for your Dropbox
     app to upload or download files from. You will use the name of that subfolder when specifying your remote URL in the next step. For example, if your Dropbox app uses an app folder named `my-folder`
     for access within the `Apps` top-level folder, and you create a subfolder named `data` within the `my-folder` app folder, then the subfolder that your Dropbox app will upload and download files from
     can be found under `https://dropbox.com/home/Apps/my-folder/data`

       <img src="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/connectors/dropbox-app-subfolder.png?fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=63e6a2c4483435a32ff6ece5f9a3c297" alt="The data subfolder under the my-folder subfolder" data-og-width="253" width="253" data-og-height="316" height="316" data-path="img/connectors/dropbox-app-subfolder.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/connectors/dropbox-app-subfolder.png?w=280&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=9370c2c5af305d9c8d5608de9e64a18e 280w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/connectors/dropbox-app-subfolder.png?w=560&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=5feb79063a38cbc91aa4abe8cf4d06da 560w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/connectors/dropbox-app-subfolder.png?w=840&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=78afbfbd91904675d2c1e9c4b671da7f 840w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/connectors/dropbox-app-subfolder.png?w=1100&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=e54cebed177b25e770559a8ba1855a4b 1100w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/connectors/dropbox-app-subfolder.png?w=1650&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=44e94fe9e106893a5013838610203a87 1650w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/connectors/dropbox-app-subfolder.png?w=2500&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=5222a5db731a8ecf05b2fa7e4eee0540 2500w" />
   </Warning>

4. Note the remote URL to your subfolder inside of the app folder, which takes the format `dropbox://<subfolder-name>`. For example,
   if your Dropbox app uses an app folder named `my-folder` for access within the `Apps` top-level folder, and you create a subfolder named `data` within the `my-folder` app folder, then
   the remote URL is `dropbox://data`

   <img src="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/connectors/dropbox-app-subfolder.png?fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=63e6a2c4483435a32ff6ece5f9a3c297" alt="The data subfolder under the my-folder subfolder" data-og-width="253" width="253" data-og-height="316" height="316" data-path="img/connectors/dropbox-app-subfolder.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/connectors/dropbox-app-subfolder.png?w=280&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=9370c2c5af305d9c8d5608de9e64a18e 280w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/connectors/dropbox-app-subfolder.png?w=560&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=5feb79063a38cbc91aa4abe8cf4d06da 560w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/connectors/dropbox-app-subfolder.png?w=840&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=78afbfbd91904675d2c1e9c4b671da7f 840w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/connectors/dropbox-app-subfolder.png?w=1100&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=e54cebed177b25e770559a8ba1855a4b 1100w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/connectors/dropbox-app-subfolder.png?w=1650&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=44e94fe9e106893a5013838610203a87 1650w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/connectors/dropbox-app-subfolder.png?w=2500&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=5222a5db731a8ecf05b2fa7e4eee0540 2500w" />

## Replace an expired access token

Dropbox app access tokens are valid for **only four hours**. After this time, you can no longer use the expired access token.

To have Unstructured automatically replace expired access tokens on your behalf, do the following:

<iframe width="560" height="315" src="https://www.youtube.com/embed/PZyRgpPNEUs" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

1. Get the app key and app secret values for your Dropbox app. To do this:

   a) Sign in to the [Dropbox Developers](https://www.dropbox.com/developers) portal with the same credentials as your Dropbox account.<br />
   b) Open your [App Console](https://www.dropbox.com/developers/apps).<br />
   c) Click your Dropbox app's icon.<br />
   d) On the **Settings** tab, next to **App key**, copy the value of the app key.<br />
   e) Next to **App secret**, click **Show**, and then copy the value of the app secret.

2. Use your web browser to browse to the following URL, replacing `<app-key>` with the app key for your Dropbox app:

   ```text  theme={null}
   https://www.dropbox.com/oauth2/authorize?client_id=<app-key>&response_type=code&token_access_type=offline
   ```

3. Click **Continue**.

4. Click **Allow**.

5. In the **Access code generated** tile, copy the access code that is shown.

6. Use the [curl](https://curl.se/) utility in your Terminal or Command Prompt, or use a REST API client such as
   [Postman](https://www.postman.com/product/api-client/), to make the following REST API call, replacing the following placeholders:

   * Replace `<app-key>` with the app key for your Dropbox app.
   * Replace `<app-secret>` with the app secret for your Dropbox app.
   * Replace `<access-code>` with the access code that you just copied.

   ```text  theme={null}
   curl --location --request POST 'https://api.dropbox.com/oauth2/token' \
   --user '<app-key>:<app-secret>' \
   --header 'Content-Type: application/x-www-form-urlencoded' \
   --data-urlencode 'code=<access-code>' \
   --data-urlencode 'grant_type=authorization_code'
   ```

7. In the response, copy the following two values:

   * The value of `access_token` (starting with the characters `sl`) is the new, valid access token.
   * The value of `refresh_token` is the refresh token that can be used to replace this access token much faster and easier next time.
     If you lose this refresh token, you must go back to Step 2.

   For the [Unstructured UI](/ui/overview), if you want Unstructured to use this refresh token to automatically replace the expired access token instead of replacing it yourself, then
   add the following values to your connector settings, and then stop here:

   * Add the `refresh_token` value to the connector settings **Refresh token** field.
   * Add the `<app-key>` value to the connector settings **App key** field.
   * Add the `<app-secret>` value to the connector settings **App secret** field.

   For the [Unstructured API](/api-reference/overview) and [Unstructured Ingest](/open-source/ingestion/overview), if you want Unstructured to use this refresh token to automatically replace the expired access token instead of replacing it yourself, then
   add the following values to your connector settings, and then stop here:

   * Add the `refresh_token` value to the `refresh_token` parameter.
   * Add the `<app-key>` value to the `app_key` parameter.
   * Add the `<app-secret>` value to the connector settings `app_secret` parameter.

8. If for some reason you need to manually replace the expired access token yourself instead of having Unstructured do it for you, you can use the refresh token that you just copied to get a new access token:

   * Replace `<refresh-token>` with the refresh token.
   * Replace `<app-key>` with the app key for your Dropbox app.
   * Replace `<app-secret>` with the app secret for your Dropbox app.

   ```text  theme={null}
   curl https://api.dropbox.com/oauth2/token \
   --data refresh_token=<refresh-token> \
   --data grant_type=refresh_token \
   --data client_id=<app-key> \
   --data client_secret=<app-secret>
   ```

9. In the response, copy the following two values:

   * The value of `access_token` (starting with the characters `sl`) is the new, valid access token. In the connector, replace the old,
     expired access token value with this new, valid access token value.

   * The value of `refresh_token` is the new, valid refresh token. To replace the expired access token yourself, go back to Step 8.

To create a Dropbox source connector, see the following examples.

<CodeGroup>
  ```python Python SDK theme={null}
  import os

  from unstructured_client import UnstructuredClient
  from unstructured_client.models.operations import CreateSourceRequest
  from unstructured_client.models.shared import CreateSourceConnector

  with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
      response = client.sources.create_source(
          request=CreateSourceRequest(
              create_source_connector=CreateSourceConnector(
                  name="<name>",
                  type="dropbox",
                  config={
                      "remote_url": "<remote-url>",
                      "recursive": <True|False>,
                      "refresh_token": "<refresh-token>",
                      "app_key": "<app-key>",
                      "app_secret": "<app-secret>"
                  }
              )
          )
      )

      print(response.source_connector_information)
  ```

  ```bash curl theme={null}
  curl --request 'POST' --location \
  "$UNSTRUCTURED_API_URL/sources" \
  --header 'accept: application/json' \
  --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
  --header 'content-type: application/json' \
  --data \
  '{
      "name": "<name>",
      "type": "dropbox",
      "config": {
          "remote_url": "<remote-url>",
          "recursive": <true|false>,
          "refresh_token": "<refresh-token>",
          "app_key": "<app-key>",
          "app_secret": "<app-secret>"
      }
  }'
  ```
</CodeGroup>

Replace the preceding placeholders as follows:

* `<name>` (*required*) - A unique name for this connector.
* `<remote-url>` (*required*) - The remote URL to the target subfolder inside of the app folder for the Dropbox app.
* Set `recursive` to `true` to recursively process data from subfolders within the target subfolder. The default is `false` if not otherwise specified.
* `<app-key>` (*required*) - The app key for your Dropbox app. This allows Unstructured to automatically replace expired access tokens.
* `<app-secret>` (*required*) - The app secret for your Dropbox app. This allows Unstructured automatically to replace expired access tokens.
* `<refresh-token>` (*required*) - The refresh token for the Dropbox app. This allows Unstructured to automatically replace expired access tokens.

---

# Source: https://docs.unstructured.io/open-source/ingestion/destination-connectors/duckdb.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# DuckDB

Batch process all your records to store structured outputs in a DuckDB installation.

The requirements are as follows.

* A [DuckDB installation](https://duckdb.org/docs/installation).

* A [persistent database](https://duckdb.org/docs/connect/overview.html#persistent-database), for example by running the
  [DuckDB CLI](https://duckdb.org/docs/api/cli) command `duckdb <my-database-filename>.db` or
  `duckdb <my-database-filename>.duckdb`, replacing `<my-database-filename>` with the name of the target file.

* The path to the target persistent database file.

* A schema in the target database.

  * [Create a schema](https://duckdb.org/docs/sql/statements/create_schema.html).
  * You can list available schemas and their parent catalogs by running the following DuckDB CLI command:

    ```sql  theme={null}
    SELECT * FROM information_schema.schemata;
    ```

  The DuckDB connector uses the default schema name of `main` if not otherwise specified.

* A table in the target schema.

  * [Create a table](https://duckdb.org/docs/sql/statements/create_table).
  * You can list available tables in a schema by running the following DuckDB CLI commands, replacing the target catalog and schema names:

    ```sql  theme={null}
    USE <catalog_name>.<schema_name>;
    SHOW TABLES;
    ```

  The DuckDB connector uses the default table name of `elements` if not otherwise specified.

  For maximum compatibility, Unstructured recommends the following table schema:

  ```sql  theme={null}
  CREATE TABLE elements (
      id VARCHAR,
      element_id VARCHAR,
      text TEXT,
      embeddings FLOAT[],
      type VARCHAR,
      system VARCHAR,
      layout_width DECIMAL,
      layout_height DECIMAL,
      points TEXT,
      url TEXT,
      version VARCHAR,
      date_created INTEGER,
      date_modified INTEGER,
      date_processed DOUBLE,
      permissions_data TEXT,
      record_locator TEXT,
      category_depth INTEGER,
      parent_id VARCHAR,
      attached_filename VARCHAR,
      filetype VARCHAR,
      last_modified TIMESTAMP,
      file_directory VARCHAR,
      filename VARCHAR,
      languages VARCHAR[],
      page_number VARCHAR,
      links TEXT,
      page_name VARCHAR,
      link_urls VARCHAR[],
      link_texts VARCHAR[],
      sent_from VARCHAR[],
      sent_to VARCHAR[],
      subject VARCHAR,
      section VARCHAR,
      header_footer_type VARCHAR,
      emphasized_text_contents VARCHAR[],
      emphasized_text_tags VARCHAR[],
      text_as_html TEXT,
      regex_metadata TEXT,
      detection_class_prob DECIMAL,
      partitioner_type VARCHAR
  );
  ```

  You can list the schema of a table by running the following DuckDB CLI commands, replacing the target catalog, schema, and table names:

  ```sql  theme={null}
  USE <catalog_name>.<schema_name>;
  DESCRIBE TABLE <table_name>;
  ```

The DuckDB connector dependencies:

```bash CLI, Python theme={null}
pip install "unstructured-ingest[duckdb]"
```

You might also need to install additional dependencies, depending on your needs. [Learn more](/open-source/ingestion/ingest-dependencies).

The following environment variables:

* `DUCKDB_DATABASE` - The path to the target DuckDB persistent database file with the extension `.db` or `.duckdb`, represented by `--database` (CLI) or `database` (Python).
* `DUCKDB_DB_SCHEMA` - The name of the target schema in the database, represented by `--db-schema` (CLI) or `db_schema` (Python).
* `DUCKDB_TABLE` - The name of the target table in the schema, represented by `--table` (CLI) or `table` (Python).

Now call the Unstructured CLI or Python. The source connector can be any of the ones supported. This example uses the local source connector.

This example sends files to Unstructured for processing by default. To process files locally instead, see the instructions at the end of this page.

<CodeGroup>
  ```bash CLI theme={null}
  #!/usr/bin/env bash

  # Chunking and embedding are optional.

  unstructured-ingest \
    local \
      --input-path $LOCAL_FILE_INPUT_DIR \
      --chunking-strategy by_title \
      --embedding-provider huggingface \
      --partition-by-api \
      --api-key $UNSTRUCTURED_API_KEY \
      --partition-endpoint $UNSTRUCTURED_API_URL \
      --additional-partition-args="{\"split_pdf_page\":\"true\", \"split_pdf_allow_failed\":\"true\", \"split_pdf_concurrency_level\": 15}" \
    duckdb \
      --database $DUCKDB_DATABASE \
      --db-schema $DUCKDB_DB_SCHEMA \
      --table $DUCKDB_TABLE
  ```

  ```python Python Ingest theme={null}
  import os

  from unstructured_ingest.pipeline.pipeline import Pipeline
  from unstructured_ingest.interfaces import ProcessorConfig

  from unstructured_ingest.processes.connectors.duckdb.duckdb import (
      DuckDBAccessConfig,
      DuckDBConnectionConfig,
      DuckDBUploadStagerConfig,
      DuckDBUploaderConfig
  )
  from unstructured_ingest.processes.connectors.local import (
      LocalIndexerConfig,
      LocalConnectionConfig,
      LocalDownloaderConfig
  )
  from unstructured_ingest.processes.partitioner import PartitionerConfig
  from unstructured_ingest.processes.chunker import ChunkerConfig
  from unstructured_ingest.processes.embedder import EmbedderConfig

  # Chunking and embedding are optional.

  if __name__ == "__main__":
      Pipeline.from_configs(
          context=ProcessorConfig(),
          indexer_config=LocalIndexerConfig(input_path=os.getenv("LOCAL_FILE_INPUT_DIR")),
          downloader_config=LocalDownloaderConfig(),
          source_connection_config=LocalConnectionConfig(),
          partitioner_config=PartitionerConfig(
              partition_by_api=True,
              api_key=os.getenv("UNSTRUCTURED_API_KEY"),
              partition_endpoint=os.getenv("UNSTRUCTURED_API_URL"),
              additional_partition_args={
                  "split_pdf_page": True,
                  "split_pdf_allow_failed": True,
                  "split_pdf_concurrency_level": 15
              }
          ),
          chunker_config=ChunkerConfig(chunking_strategy="by_title"),
          embedder_config=EmbedderConfig(embedding_provider="huggingface"),
          destination_connection_config=DuckDBConnectionConfig(
              access_config=DuckDBAccessConfig(),
              database=os.getenv("DUCKDB_DATABASE"),
              db_schema=os.getenv("DUCKDB_DB_SCHEMA"),
              table=os.getenv("DUCKDB_TABLE")
          ),
          stager_config=DuckDBUploadStagerConfig(),
          uploader_config=DuckDBUploaderConfig(batch_size=50)
      ).run()
  ```
</CodeGroup>

For the Unstructured Ingest CLI and the Unstructured Ingest Python library, you can use the `--partition-by-api` option (CLI) or `partition_by_api` (Python) parameter to specify where files are processed:

* To do local file processing, omit `--partition-by-api` (CLI) or `partition_by_api` (Python), or explicitly specify `partition_by_api=False` (Python).

  Local file processing does not use an Unstructured API key or API URL, so you can also omit the following, if they appear:

  * `--api-key $UNSTRUCTURED_API_KEY` (CLI) or `api_key=os.getenv("UNSTRUCTURED_API_KEY")` (Python)
  * `--partition-endpoint $UNSTRUCTURED_API_URL` (CLI) or `partition_endpoint=os.getenv("UNSTRUCTURED_API_URL")` (Python)
  * The environment variables `UNSTRUCTURED_API_KEY` and `UNSTRUCTURED_API_URL`

* To send files to the legacy [Unstructured Partition Endpoint](/api-reference/legacy-api/partition/overview) for processing, specify `--partition-by-api` (CLI) or `partition_by_api=True` (Python).

  Unstructured also requires an Unstructured API key and API URL, by adding the following:

  * `--api-key $UNSTRUCTURED_API_KEY` (CLI) or `api_key=os.getenv("UNSTRUCTURED_API_KEY")` (Python)
  * `--partition-endpoint $UNSTRUCTURED_API_URL` (CLI) or `partition_endpoint=os.getenv("UNSTRUCTURED_API_URL")` (Python)
  * The environment variables `UNSTRUCTURED_API_KEY` and `UNSTRUCTURED_API_URL`, representing your API key and API URL, respectively.

  <Note>
    You must specify the API URL only if you are not using the default API URL for Unstructured Ingest, which applies to **Let's Go**, **Pay-As-You-Go**, and **Business SaaS** accounts.

    The default API URL for Unstructured Ingest is `https://api.unstructuredapp.io/general/v0/general`, which is the API URL for the legacy[Unstructured Partition Endpoint](/api-reference/legacy-api/partition/overview). However, you should always use the URL that was provided to you when your Unstructured account was created. If you do not have this URL, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

    If you do not have an API key, [get one now](/api-reference/legacy-api/partition/overview).

    If you are using a **Business** account, the process
    for generating Unstructured API keys, and the Unstructured API URL that you use, are different.
    For instructions, see your Unstructured account administrator, or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).
  </Note>

---

# Source: https://docs.unstructured.io/open-source/ingestion/source-connectors/elastic-search.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Elasticsearch

Connect Elasticsearch to your preprocessing pipeline, and use the Unstructured Ingest CLI or the Unstructured Ingest Python library to batch process all your documents and store structured outputs locally on your filesystem.

The requirements are as follows.

* For the [Unstructured UI](/ui/overview) or the [Unstructured API](/api-reference/overview), only Elastic Cloud instances are supported.

* For [Unstructured Ingest](/open-source/ingestion/overview), Elastic Cloud instances and self-manged Elasticsearch instances are supported.

* For Elastic Cloud, you will need an [Elastic Cloud service instance](https://www.elastic.co/guide/en/elasticsearch/reference/current/install-elasticsearch.html#hosted-elasticsearch-service).

  <iframe width="560" height="315" src="https://www.youtube.com/embed/HEYIFC7uOus" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* For self-managed Elasticsearch, you will need a [self-managed Elasticsearch instance](https://www.elastic.co/guide/en/elasticsearch/reference/current/install-elasticsearch.html#elasticsearch-deployment-options).

  <iframe width="560" height="315" src="https://www.youtube.com/embed/9znbAkNG1Ok" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* You will need the name of the index on the instance.

  For the destination connector, if you need to create an index, you can use for example the following `curl` command. Replace the following placeholders:

  * Replace `<host>:<port>` with the instance's host identifier and port number.
  * Replace one of the following, and remove the other:

    * For an instance's user and password combination, replace `<username>` with your Elasticsearch or Elastic Cloud instance's user name, replace `<password>` with your password, and then remove `--user "Authorization: ApiKey <your-api-key>"`.
    * For an Elastic Cloud API key, replace `<your-api-key>` with your Elastic Cloud API key, and then remove `--user "<username>:<password>"`.
  * Replace `<index-name>` with the name of the new index on the instance.
  * Replace `<index-schema>` with the schema for the index. A schema is optional; see the explanation
    following this `curl` command for more information.

  ```bash  theme={null}
  curl --request PUT "<host>:<port>/<index-name>" \
  --user "<username>:<password>" \
  --user "Authorization: ApiKey <your-api-key>" \
  [--header "Content-Type: application/json" \
  --data '<index-schema>']
  ```

  To learn more, see [Create index](https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-create-index.html) and [Get index](https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-get-index.html).

  For the destination connector, the index does not need to contain a schema beforehand. If Unstructured encounters an index without a schema,
  Unstructured will automatically create a compatible schema for you before inserting items into the index. Nonetheless,
  to reduce possible schema compatibility issues, Unstructured recommends that you create a schema that is compatible with Unstructured's schema.
  Unstructured cannot provide a schema that is guaranteed to work in all
  circumstances. This is because these schemas will vary based on your source files' types; how you
  want Unstructured to partition, chunk, and generate embeddings; any custom post-processing code that you run; and other factors.

  For objects in the `metadata` field that Unstructured produces and that you want to store in an Elasticsearch or Elastic Cloud index, you must create fields in your index's schema that
  follows Unstructured's `metadata` field naming convention. For example, if Unstructured produces a `metadata` field with the following
  child objects:

  ```json  theme={null}
  "metadata": {
    "is_extracted": "true",
    "coordinates": {
      "points": [
        [
          134.20055555555555,
          241.36027777777795
        ],
        [
          134.20055555555555,
          420.0269444444447
        ],
        [
          529.7005555555555,
          420.0269444444447
        ],
        [
          529.7005555555555,
          241.36027777777795
        ]
      ],
      "system": "PixelSpace",
      "layout_width": 1654,
      "layout_height": 2339
    },
    "filetype": "application/pdf",
    "languages": [
      "eng"
    ],
    "page_number": 1,
    "image_mime_type": "image/jpeg",
    "filename": "realestate.pdf",
    "data_source": {
      "url": "file:///home/etl/node/downloads/00000000-0000-0000-0000-000000000001/7458635f-realestate.pdf",
      "record_locator": {
        "protocol": "file",
        "remote_file_path": "file:///home/etl/node/downloads/00000000-0000-0000-0000-000000000001/7458635f-realestate.pdf"
      }
    },
    "entities": {
      "items": [
        {
          "entity": "HOME FOR FUTURE",
          "type": "ORGANIZATION"
        },
        {
          "entity": "221 Queen Street, Melbourne VIC 3000",
          "type": "LOCATION"
        }
      ],
      "relationships": [
        {
          "from": "HOME FOR FUTURE",
          "relationship": "based_in",
          "to": "221 Queen Street, Melbourne VIC 3000"
        }
      ]
    }
  }
  ```

  You can adapt the following index schema example for your own needs. Note that outside of `metadata`, the following fields are
  required by Unstructured whenever you create your own index schema:

  * `element_id`
  * `record_id`, which is required by Unstructured for intelligent record updates.
  * `type`, which is not required, but highly recommended.
  * `text`
  * `embeddings` if embeddings are generated; make sure to set `dims` to the same number of dimensions as the embedding model generates.

  ```json  theme={null}
  {
    "mappings": {
      "properties": {
        "element_id": {
          "type": "keyword"
        },
        "record_id": {
          "type": "text"
        },
        "text": {
          "type": "text"
        },
        "type": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "embeddings": {
          "type": "dense_vector",
          "dims": 1536,
          "index": true,
          "similarity": "cosine"
        },
        "metadata": {
          "properties": {
            "is_extracted": {
              "type": "boolean"
            },
            "coordinates-points": {
              "type": "float"
            },
            "coordinates-system": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            },
            "coordinates-layout_width": {
              "type": "long"
            },
            "coordinates-layout_height": {
              "type": "long"
            },
            "filetype": {
              "type": "keyword"
            },
            "languages": {
              "type": "keyword"
            },
            "page_number": {
              "type": "integer"
            },
            "image_mime_type": {
              "type": "keyword"
            },
            "filename": {
              "type": "keyword"
            },
            "data_source-url": {
              "type": "keyword"
            },
            "data_source-record_locator-protocol": {
              "type": "keyword"
            },
            "data_source-record_locator-remote_file_path": {
              "type": "keyword"
            },
            "entities-items": {
              "properties": {
                "entity": {
                  "type": "text",
                  "fields": {
                    "keyword": {
                      "type": "keyword",
                      "ignore_above": 256
                    }
                  }
                },
                "type": {
                  "type": "text",
                  "fields": {
                    "keyword": {
                      "type": "keyword",
                      "ignore_above": 256
                    }
                  }
                }
              }
            },
            "entities-relationships": {
              "properties": {
                "from": {
                  "type": "text",
                  "fields": {
                    "keyword": {
                      "type": "keyword",
                      "ignore_above": 256
                    }
                  }
                },
                "relationship": {
                  "type": "text",
                  "fields": {
                    "keyword": {
                      "type": "keyword",
                      "ignore_above": 256
                    }
                  }
                },
                "to": {
                  "type": "text",
                  "fields": {
                    "keyword": {
                      "type": "keyword",
                      "ignore_above": 256
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
  ```

  See also:

  * [An Introduction to Elasticsearch Mapping](https://www.elastic.co/blog/found-elasticsearch-mapping-introduction)
  * [Explicit mapping](https://www.elastic.co/guide/en/elasticsearch/reference/current/explicit-mapping.html)
  * [Dynamic field mapping](https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-field-mapping.html)
  * [Unstructured document elements and metadata](/api-reference/legacy-api/partition/document-elements)

  <iframe width="560" height="315" src="https://www.youtube.com/embed/0f0tEcrvP8g" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* For Elastic Cloud, you will need the Elastic Cloud service instance's API key. If you are using Unstructured Ingest, you will also need the instance's Cloud ID. To get these, see your Elasticsearch Service web console or [Elastic Cloud API keys](https://www.elastic.co/docs/deploy-manage/api-keys/elastic-cloud-api-keys).

  <Note>
    When generating an Elastic Cloud service instance's API key through the Elasticsearch Service web console, be sure after you click **Create API key** to select **Encoded** to get the API key's value.
    (The **Encoded** API key value is the Base64 encoding of the UTF-8 representation of the API key's ID and key value joined by a colon \[:].)
  </Note>

* For self-managed Elasticsearch, you will need:

  * The self-managed instance's hostname and port number. See [Networking](https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-network.html).

  * If you're using basic authentication to the self-managed instance, the user's name and password.

  * If you're using token-based authentication to the self-managed instance, the bearer token or API key for the instance. See [Token-based authentication services](https://www.elastic.co/guide/en/elasticsearch/reference/current/token-authentication-services.html) and [Create API key](https://www.elastic.co/guide/en/elasticsearch/reference/current/security-api-create-api-key.html).

    <Note>
      When generating an API key for a self-managed Elasticsearch instance by calling [POST /\_security/api\_key](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-security-create-api-key), be sure to use
      the `encoded` value in the response payload. (This `encoded` value is the Base64 encoding of the UTF-8 representation of the payload's `id` and `api_key` values joined by a colon \[:]). Do not use the `api_key` value by itself.
    </Note>

  * If you're using certificate authentication to the self-managed instance, the path to the Certificate Authority (CA) file on the instance, and the certificate fingerprint. See [SSL certificate API](https://www.elastic.co/guide/en/elasticsearch/reference/current/security-api-ssl.html) and [Where can I see my Certificate Fingerprint?](https://discuss.elastic.co/t/where-can-i-see-my-certificate-fingerprint/319335/3).

The Elasticsearch connector dependencies:

```bash CLI, Python theme={null}
pip install "unstructured-ingest[elasticsearch]"
```

You might also need to install additional dependencies, depending on your needs. [Learn more](/open-source/ingestion/ingest-dependencies).

The following environment variables:

* `ELASTICSEARCH_HOST` - The hostname and port number, defined as `<hostname>:<port-number>` and represented by `--hosts` (CLI) or `hosts` (Python).
* `ELASTICSEARCH_INDEX_NAME` - The name of the search index, represented by `--index-name` (CLI) or `index_name` (Python).

If you're using Elastic Cloud to connect to ElasticSearch:

* `ELASTIC_CLOUD_ID` - The instance's Cloud ID, represented by `--cloud-id` (CLI) or `cloud_id` (Python).
* `ELASTIC_CLOUD_API_KEY` - The corresponding Cloud API key value, represented by `--api-key-id` (CLI) or `api_key_id` (Python).

If you're using basic authentication to the instance:

* `ELASTICSEARCH_USERNAME` - The user's name, represented by `--username` (CLI) or `username` (Python).
* `ELASTICSEARCH_PASSWORD` - The user's password, represented by `--password` (CLI) or `password` (Python).

If you're using token-based authentication to the instance instead:

* `ELASTICSEARCH_BEARER_TOKEN` - The bearer token's value, represented by `--bearer-auth` (CLI) or `bearer_auth` (Python).
* `ELASTIC_CLOUD_API_KEY_ID` - The API key's value, represented by `--api-key` (CLI) or `api_key` (Python).

If you're using certificates:

* `ELASTICSEARCH_CA_CERTS` - The path to the Certificate Authority (CA) file, represented by `--ca-certs` (CLI) or `ca_certs` (Python).
* `ELASTICSEARCH_SSL_ASSERT_FINGERPRINT` - The certificate's fingerprint, represented by `--ssl-assert-fingerprint` or `ssl_assert_fingerprint` (Python).

Now call the Unstructured CLI or Python. The destination connector can be any of the ones supported. This example uses the local destination connector:

This example sends data to Unstructured for processing by default. To process files locally instead, see the instructions at the end of this page.

<CodeGroup>
  ```bash CLI theme={null}
  #!/usr/bin/env bash

  unstructured-ingest \
    elasticsearch \
      --metadata-exclude filename,file_directory,metadata.data_source.date_processed \
      --hosts $ELASTICSEARCH_HOST \
      --username $ELASTICSEARCH_USERNAME \
      --password $ELASTICSEARCH_PASSWORD \
      --index-name $ELASTICSEARCH_INDEX_NAME \
      --ca-certs $ELASTICSEARCH_CA_CERTS \
      --ssl-assert-fingerprint $ELASTICSEARCH_SSL_ASSERT_FINGERPRINT \
      --fields 'ethnicity, director, plot' \
      --output-dir $LOCAL_FILE_OUTPUT_DIR \
      --num-processes 2
      --partition-by-api \
      --api-key $UNSTRUCTURED_API_KEY \
      --partition-endpoint $UNSTRUCTURED_API_URL \
      --strategy hi_res
  ```

  ```python Python Ingest theme={null}
  import os

  from unstructured_ingest.pipeline.pipeline import Pipeline
  from unstructured_ingest.interfaces import ProcessorConfig

  from unstructured_ingest.processes.connectors.elasticsearch.elasticsearch import (
      ElasticsearchIndexerConfig,
      ElasticsearchDownloaderConfig,
      ElasticsearchConnectionConfig,
      ElasticsearchAccessConfig,
      ElasticsearchUploadStagerConfig,
      ElasticsearchUploaderConfig
  )

  from unstructured_ingest.processes.connectors.local import (
      LocalIndexerConfig,
      LocalDownloaderConfig,
      LocalConnectionConfig
  )
  from unstructured_ingest.processes.partitioner import PartitionerConfig
  from unstructured_ingest.processes.chunker import ChunkerConfig
  from unstructured_ingest.processes.embedder import EmbedderConfig

  # Chunking and embedding are optional.

  if __name__ == "__main__":
      Pipeline.from_configs(
          context=ProcessorConfig(),
          indexer_config=ElasticsearchIndexerConfig(index_name=os.getenv("ELASTICSEARCH_INDEX_NAME")),
          downloader_config=ElasticsearchDownloaderConfig(download_dir=os.getenv("LOCAL_FILE_DOWNLOAD_DIR")),
          source_connection_config=ElasticsearchConnectionConfig(
              access_config=ElasticsearchAccessConfig(
                  password=os.getenv("ELASTICSEARCH_PASSWORD"),
                  ssl_assert_fingerprint=os.getenv("ELASTICSEARCH_SSL_ASSERT_FINGERPRINT")
              ),
              hosts=[os.getenv("ELASTICSEARCH_HOST")],
              username=os.getenv("ELASTICSEARCH_USERNAME"),
              ca_certs=os.getenv("ELASTICSEARCH_CA_CERTS")
          ),
          partitioner_config=PartitionerConfig(
              partition_by_api=True,
              api_key=os.getenv("UNSTRUCTURED_API_KEY"),
              partition_endpoint=os.getenv("UNSTRUCTURED_API_URL"),
              strategy="hi_res",
              additional_partition_args={
                  "split_pdf_page": True,
                  "split_pdf_allow_failed": True,
                  "split_pdf_concurrency_level": 15
              }
          ),
          chunker_config=ChunkerConfig(chunking_strategy="by_title"),
          embedder_config=EmbedderConfig(embedding_provider="huggingface"),
          destination_connection_config=LocalConnectionConfig()
      ).run()
  ```
</CodeGroup>

For the Unstructured Ingest CLI and the Unstructured Ingest Python library, you can use the `--partition-by-api` option (CLI) or `partition_by_api` (Python) parameter to specify where files are processed:

* To do local file processing, omit `--partition-by-api` (CLI) or `partition_by_api` (Python), or explicitly specify `partition_by_api=False` (Python).

  Local file processing does not use an Unstructured API key or API URL, so you can also omit the following, if they appear:

  * `--api-key $UNSTRUCTURED_API_KEY` (CLI) or `api_key=os.getenv("UNSTRUCTURED_API_KEY")` (Python)
  * `--partition-endpoint $UNSTRUCTURED_API_URL` (CLI) or `partition_endpoint=os.getenv("UNSTRUCTURED_API_URL")` (Python)
  * The environment variables `UNSTRUCTURED_API_KEY` and `UNSTRUCTURED_API_URL`

* To send files to the legacy [Unstructured Partition Endpoint](/api-reference/legacy-api/partition/overview) for processing, specify `--partition-by-api` (CLI) or `partition_by_api=True` (Python).

  Unstructured also requires an Unstructured API key and API URL, by adding the following:

  * `--api-key $UNSTRUCTURED_API_KEY` (CLI) or `api_key=os.getenv("UNSTRUCTURED_API_KEY")` (Python)
  * `--partition-endpoint $UNSTRUCTURED_API_URL` (CLI) or `partition_endpoint=os.getenv("UNSTRUCTURED_API_URL")` (Python)
  * The environment variables `UNSTRUCTURED_API_KEY` and `UNSTRUCTURED_API_URL`, representing your API key and API URL, respectively.

  <Note>
    You must specify the API URL only if you are not using the default API URL for Unstructured Ingest, which applies to **Let's Go**, **Pay-As-You-Go**, and **Business SaaS** accounts.

    The default API URL for Unstructured Ingest is `https://api.unstructuredapp.io/general/v0/general`, which is the API URL for the legacy[Unstructured Partition Endpoint](/api-reference/legacy-api/partition/overview). However, you should always use the URL that was provided to you when your Unstructured account was created. If you do not have this URL, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

    If you do not have an API key, [get one now](/api-reference/legacy-api/partition/overview).

    If you are using a **Business** account, the process
    for generating Unstructured API keys, and the Unstructured API URL that you use, are different.
    For instructions, see your Unstructured account administrator, or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).
  </Note>

---

# Source: https://docs.unstructured.io/ui/sources/elasticsearch.md

# Source: https://docs.unstructured.io/ui/destinations/elasticsearch.md

# Source: https://docs.unstructured.io/open-source/ingestion/destination-connectors/elasticsearch.md

# Source: https://docs.unstructured.io/api-reference/workflow/sources/elasticsearch.md

# Source: https://docs.unstructured.io/api-reference/workflow/destinations/elasticsearch.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Elasticsearch

<Note>
  If you're new to Unstructured, read this note first.

  Before you can create a destination connector, you must first sign in to your Unstructured account:

  * If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
    After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).
    To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/api-reference/overview#pricing).
  * If you already have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at
    [https://platform.unstructured.io](https://platform.unstructured.io). For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
    or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  After you sign in, the [Unstructured user interface](/ui/overview) (UI) appears, which you use to get your Unstructured API key.

  1. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business** account, click **API Keys** on the sidebar.<br />

     For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
     for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).

  2. Click **Generate API Key**.<br />

  3. Follow the on-screen instructions to finish generating the key.<br />

  4. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

  After you create the destination connector, add it along with a
  [source connector](/api-reference/workflow/sources/overview) to a [workflow](/api-reference/workflow/overview#workflows).
  Then run the worklow as a [job](/api-reference/workflow/overview#jobs). To learn how, try out the
  the notebook [Dropbox-To-Pinecone Connector API Quickstart for Unstructured](https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Dropbox_To_Pinecone_Connector_Quickstart.ipynb),
  or watch the two 4-minute video tutorials for the [Unstructured Python SDK](/api-reference/workflow/overview#unstructured-python-sdk).

  You can also create destination connectors with the Unstructured user interface (UI).
  [Learn how](/ui/destinations/overview).

  If you need help, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  You are now ready to start creating a destination connector! Keep reading to learn how.
</Note>

Send processed data from Unstructured to Elasticsearch.

The requirements are as follows.

* For the [Unstructured UI](/ui/overview) or the [Unstructured API](/api-reference/overview), only Elastic Cloud instances are supported.

* For [Unstructured Ingest](/open-source/ingestion/overview), Elastic Cloud instances and self-manged Elasticsearch instances are supported.

* For Elastic Cloud, you will need an [Elastic Cloud service instance](https://www.elastic.co/guide/en/elasticsearch/reference/current/install-elasticsearch.html#hosted-elasticsearch-service).

  <iframe width="560" height="315" src="https://www.youtube.com/embed/HEYIFC7uOus" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* For self-managed Elasticsearch, you will need a [self-managed Elasticsearch instance](https://www.elastic.co/guide/en/elasticsearch/reference/current/install-elasticsearch.html#elasticsearch-deployment-options).

  <iframe width="560" height="315" src="https://www.youtube.com/embed/9znbAkNG1Ok" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* You will need the name of the index on the instance.

  For the destination connector, if you need to create an index, you can use for example the following `curl` command. Replace the following placeholders:

  * Replace `<host>:<port>` with the instance's host identifier and port number.
  * Replace one of the following, and remove the other:

    * For an instance's user and password combination, replace `<username>` with your Elasticsearch or Elastic Cloud instance's user name, replace `<password>` with your password, and then remove `--user "Authorization: ApiKey <your-api-key>"`.
    * For an Elastic Cloud API key, replace `<your-api-key>` with your Elastic Cloud API key, and then remove `--user "<username>:<password>"`.
  * Replace `<index-name>` with the name of the new index on the instance.
  * Replace `<index-schema>` with the schema for the index. A schema is optional; see the explanation
    following this `curl` command for more information.

  ```bash  theme={null}
  curl --request PUT "<host>:<port>/<index-name>" \
  --user "<username>:<password>" \
  --user "Authorization: ApiKey <your-api-key>" \
  [--header "Content-Type: application/json" \
  --data '<index-schema>']
  ```

  To learn more, see [Create index](https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-create-index.html) and [Get index](https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-get-index.html).

  For the destination connector, the index does not need to contain a schema beforehand. If Unstructured encounters an index without a schema,
  Unstructured will automatically create a compatible schema for you before inserting items into the index. Nonetheless,
  to reduce possible schema compatibility issues, Unstructured recommends that you create a schema that is compatible with Unstructured's schema.
  Unstructured cannot provide a schema that is guaranteed to work in all
  circumstances. This is because these schemas will vary based on your source files' types; how you
  want Unstructured to partition, chunk, and generate embeddings; any custom post-processing code that you run; and other factors.

  For objects in the `metadata` field that Unstructured produces and that you want to store in an Elasticsearch or Elastic Cloud index, you must create fields in your index's schema that
  follows Unstructured's `metadata` field naming convention. For example, if Unstructured produces a `metadata` field with the following
  child objects:

  ```json  theme={null}
  "metadata": {
    "is_extracted": "true",
    "coordinates": {
      "points": [
        [
          134.20055555555555,
          241.36027777777795
        ],
        [
          134.20055555555555,
          420.0269444444447
        ],
        [
          529.7005555555555,
          420.0269444444447
        ],
        [
          529.7005555555555,
          241.36027777777795
        ]
      ],
      "system": "PixelSpace",
      "layout_width": 1654,
      "layout_height": 2339
    },
    "filetype": "application/pdf",
    "languages": [
      "eng"
    ],
    "page_number": 1,
    "image_mime_type": "image/jpeg",
    "filename": "realestate.pdf",
    "data_source": {
      "url": "file:///home/etl/node/downloads/00000000-0000-0000-0000-000000000001/7458635f-realestate.pdf",
      "record_locator": {
        "protocol": "file",
        "remote_file_path": "file:///home/etl/node/downloads/00000000-0000-0000-0000-000000000001/7458635f-realestate.pdf"
      }
    },
    "entities": {
      "items": [
        {
          "entity": "HOME FOR FUTURE",
          "type": "ORGANIZATION"
        },
        {
          "entity": "221 Queen Street, Melbourne VIC 3000",
          "type": "LOCATION"
        }
      ],
      "relationships": [
        {
          "from": "HOME FOR FUTURE",
          "relationship": "based_in",
          "to": "221 Queen Street, Melbourne VIC 3000"
        }
      ]
    }
  }
  ```

  You can adapt the following index schema example for your own needs. Note that outside of `metadata`, the following fields are
  required by Unstructured whenever you create your own index schema:

  * `element_id`
  * `record_id`, which is required by Unstructured for intelligent record updates.
  * `type`, which is not required, but highly recommended.
  * `text`
  * `embeddings` if embeddings are generated; make sure to set `dims` to the same number of dimensions as the embedding model generates.

  ```json  theme={null}
  {
    "mappings": {
      "properties": {
        "element_id": {
          "type": "keyword"
        },
        "record_id": {
          "type": "text"
        },
        "text": {
          "type": "text"
        },
        "type": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "embeddings": {
          "type": "dense_vector",
          "dims": 1536,
          "index": true,
          "similarity": "cosine"
        },
        "metadata": {
          "properties": {
            "is_extracted": {
              "type": "boolean"
            },
            "coordinates-points": {
              "type": "float"
            },
            "coordinates-system": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            },
            "coordinates-layout_width": {
              "type": "long"
            },
            "coordinates-layout_height": {
              "type": "long"
            },
            "filetype": {
              "type": "keyword"
            },
            "languages": {
              "type": "keyword"
            },
            "page_number": {
              "type": "integer"
            },
            "image_mime_type": {
              "type": "keyword"
            },
            "filename": {
              "type": "keyword"
            },
            "data_source-url": {
              "type": "keyword"
            },
            "data_source-record_locator-protocol": {
              "type": "keyword"
            },
            "data_source-record_locator-remote_file_path": {
              "type": "keyword"
            },
            "entities-items": {
              "properties": {
                "entity": {
                  "type": "text",
                  "fields": {
                    "keyword": {
                      "type": "keyword",
                      "ignore_above": 256
                    }
                  }
                },
                "type": {
                  "type": "text",
                  "fields": {
                    "keyword": {
                      "type": "keyword",
                      "ignore_above": 256
                    }
                  }
                }
              }
            },
            "entities-relationships": {
              "properties": {
                "from": {
                  "type": "text",
                  "fields": {
                    "keyword": {
                      "type": "keyword",
                      "ignore_above": 256
                    }
                  }
                },
                "relationship": {
                  "type": "text",
                  "fields": {
                    "keyword": {
                      "type": "keyword",
                      "ignore_above": 256
                    }
                  }
                },
                "to": {
                  "type": "text",
                  "fields": {
                    "keyword": {
                      "type": "keyword",
                      "ignore_above": 256
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
  ```

  See also:

  * [An Introduction to Elasticsearch Mapping](https://www.elastic.co/blog/found-elasticsearch-mapping-introduction)
  * [Explicit mapping](https://www.elastic.co/guide/en/elasticsearch/reference/current/explicit-mapping.html)
  * [Dynamic field mapping](https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-field-mapping.html)
  * [Unstructured document elements and metadata](/api-reference/legacy-api/partition/document-elements)

  <iframe width="560" height="315" src="https://www.youtube.com/embed/0f0tEcrvP8g" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* For Elastic Cloud, you will need the Elastic Cloud service instance's API key. If you are using Unstructured Ingest, you will also need the instance's Cloud ID. To get these, see your Elasticsearch Service web console or [Elastic Cloud API keys](https://www.elastic.co/docs/deploy-manage/api-keys/elastic-cloud-api-keys).

  <Note>
    When generating an Elastic Cloud service instance's API key through the Elasticsearch Service web console, be sure after you click **Create API key** to select **Encoded** to get the API key's value.
    (The **Encoded** API key value is the Base64 encoding of the UTF-8 representation of the API key's ID and key value joined by a colon \[:].)
  </Note>

* For self-managed Elasticsearch, you will need:

  * The self-managed instance's hostname and port number. See [Networking](https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-network.html).

  * If you're using basic authentication to the self-managed instance, the user's name and password.

  * If you're using token-based authentication to the self-managed instance, the bearer token or API key for the instance. See [Token-based authentication services](https://www.elastic.co/guide/en/elasticsearch/reference/current/token-authentication-services.html) and [Create API key](https://www.elastic.co/guide/en/elasticsearch/reference/current/security-api-create-api-key.html).

    <Note>
      When generating an API key for a self-managed Elasticsearch instance by calling [POST /\_security/api\_key](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-security-create-api-key), be sure to use
      the `encoded` value in the response payload. (This `encoded` value is the Base64 encoding of the UTF-8 representation of the payload's `id` and `api_key` values joined by a colon \[:]). Do not use the `api_key` value by itself.
    </Note>

  * If you're using certificate authentication to the self-managed instance, the path to the Certificate Authority (CA) file on the instance, and the certificate fingerprint. See [SSL certificate API](https://www.elastic.co/guide/en/elasticsearch/reference/current/security-api-ssl.html) and [Where can I see my Certificate Fingerprint?](https://discuss.elastic.co/t/where-can-i-see-my-certificate-fingerprint/319335/3).

To create an Elasticsearch destination connector, see the following examples.

<CodeGroup>
  ```python Python SDK theme={null}
  import os

  from unstructured_client import UnstructuredClient
  from unstructured_client.models.operations import CreateDestinationRequest
  from unstructured_client.models.shared import CreateDestinationConnector

  with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
      response = client.destinations.create_destination(
          request=CreateDestinationRequest(
              create_destination_connector=CreateDestinationConnector(
                  name="<name>",
                  type="elasticsearch",
                  config={
                      "hosts": ["<host-url>"],
                      "es_api_key": "<es-api-key>",
                      "index_name": "<index-name>"
                  }
              )
          )
      )

      print(response.destination_connector_information)
  ```

  ```bash curl theme={null}
  curl --request 'POST' --location \
  "$UNSTRUCTURED_API_URL/destinations" \
  --header 'accept: application/json' \
  --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
  --header 'content-type: application/json' \
  --data \
  '{
      "name": "<name>",
      "type": "elasticsearch",
      "config": {
          "hosts": ["<host-url>"],
          "es_api_key": "<es-api-key>",
          "index_name": "<index-name>"
      }
  }'
  ```
</CodeGroup>

Replace the preceding placeholders as follows:

* `<name>` (*required*) - A unique name for this connector.
* `hosts` is a *required* array, with `<host-url>` specifying the URL of the Elastic Cloud instance to access, for example: `["https://<random-number>.<region>.<cloud-provider>.cloud.es.io"]`.
* `<es-api-key>` (*required*) - The Elastic Cloud API key for the target cluster.
* `<index-name>` (*required*) - The name of the target index in the cluster.

## Learn more

* <Icon icon="blog" />  [Finding Needles in a Haystack: PII Detection at Scale with Unstructured, Box, and Elasticsearch](https://unstructured.io/blog/finding-needles-in-a-haystack-pii-detection-at-scale-with-unstructured-box-and-elasticsearch)

---

# Source: https://docs.unstructured.io/support/issues/email-addresses.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Unable to sign up with personal email address

## Issue

When you try to [sign up for a free Unstructured account](https://unstructured.io/?modal=try-for-free), and you enter a personal email address into
the **Company email** field in the sign-up form, the following error message appears:

```text  theme={null}
Please enter a different email address. This form does not accept addresses from <domain>.
```

This error message prevents you from clicking **Submit** to sign up for a free Unstructured account.

## Cause

Unstructured does not accept email addresses from free domains when signing up for a free Unstructured account.

## Solution

Use a company email address when signing up for a free Unstructured account.

## Blocked domains

Unstructured blocks the following free domains during the sign-up process. This list is
subject to change without notice:

```text  theme={null}
027168.com
0815.su
0sg.net
10mail.org
10minutemail.co.za
11mail.com
123.com
123box.net
123india.com
123mail.cl
123mail.org
123qwe.co.uk
126.com
139.com
150mail.com
150ml.com
15meg4free.com
163.com
16mail.com
188.com
189.cn
1ce.us
1chuan.com
1coolplace.com
1freeemail.com
1funplace.com
1internetdrive.com
1mail.ml
1mail.net
1me.net
1mum.com
1musicrow.com
1netdrive.com
1nsyncfan.com
1pad.de
1under.com
1webave.com
1webhighway.com
1zhuan.com
2-mail.com
20email.eu
20mail.in
20mail.it
212.com
21cn.com
24horas.com
2911.net
2980.com
2bmail.co.uk
2d2i.com
2die4.com
2trom.com
3000.it
30minutesmail.com
3126.com
321media.com
33mail.com
37.com
3ammagazine.com
3dmail.com
3email.com
3g.ua
3mail.ga
3xl.net
444.net
4email.com
4email.net
4mg.com
4newyork.com
4warding.net
4warding.org
4x4man.com
50mail.com
60minutemail.com
6ip.us
6mail.cf
6paq.com
74gmail.com
74.ru
7mail.ga
7mail.ml
88.am
8848.net
8mail.ga
8mail.ml
97rock.com
99experts.com
a45.in
aaamail.zzn.com
aamail.net
aapt.net.au
aaronkwok.net
abbeyroadlondon.co.uk
abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijk.com
abcflash.net
abdulnour.com
aberystwyth.com
about.com
abusemail.de
abv.bg
abwesend.de
abyssmail.com
ac20mail.in
academycougars.com
acceso.or.cr
access4less.net
accessgcc.com
accountant.com
acdcfan.com
ace-of-base.com
acmemail.net
acninc.net
activist.com
adam.com.au
add3000.pp.ua
addcom.de
address.com
adelphia.net
adexec.com
adfarrow.com
adios.net
adoption.com
ados.fr
adrenalinefreak.com
advalvas.be
advantimo.com
aeiou.pt
aemail4u.com
aeneasmail.com
afreeinternet.com
africamail.com
africamel.net
ag.us.to
agoodmail.com
ahaa.dk
ahk.jp
aichi.com
aim.com
aircraftmail.com
airforce.net
airforceemail.com
airpost.net
ajacied.com
ajaxapp.net
ak47.hu
aknet.kg
albawaba.com
alex4all.com
alexandria.cc
algeria.com
alhilal.net
alibaba.com
alice.it
alive.cz
aliyun.com
allergist.com
allmail.net
alloymail.com
allracing.com
allsaintsfan.com
alpenjodel.de
alphafrau.de
alskens.dk
altavista.com
altavista.net
altavista.se
alternativagratis.com
alumni.com
alumnidirector.com
alvilag.hu
amail.com
amazonses.com
amele.com
america.hm
ameritech.net
amnetsal.com
amorki.pl
amrer.net
amuro.net
amuromail.com
ananzi.co.za
andylau.net
anfmail.com
angelfire.com
angelic.com
animail.net
animalhouse.com
animalwoman.net
anjungcafe.com
annsmail.com
ano-mail.net
anonmails.de
anonymous.to
anote.com
another.com
anotherdomaincyka.tk
anotherwin95.com
anti-social.com
antisocial.com
antispam24.de
antongijsen.com
antwerpen.com
anymoment.com
anytimenow.com
aol.com
aon.at
apexmail.com
apmail.com
apollo.lv
aport.ru
aport2000.ru
appraiser.net
approvers.net
arabia.com
arabtop.net
archaeologist.com
arcor.de
arcotronics.bg
arcticmail.com
argentina.com
aristotle.org
army.net
armyspy.com
arnet.com.ar
art-en-ligne.pro
artlover.com
artlover.com.au
as-if.com
asdasd.nl
asean-mail.com
asheville.com
asia-links.com
asia-mail.com
asiafind.com
asianavenue.com
asiancityweb.com
asiansonly.net
asianwired.net
asiapoint.net
ass.pp.ua
assala.com
assamesemail.com
astroboymail.com
astrolover.com
astrosfan.com
astrosfan.net
asurfer.com
atheist.com
athenachu.net
atina.cl
atl.lv
atlaswebmail.com
atmc.net
atozasia.com
atrus.ru
att.net
attglobal.net
attymail.com
au.ru
auctioneer.net
ausi.com
aussiemail.com.au
austin.rr.com
australia.edu
australiamail.com
austrosearch.net
autoescuelanerja.com
autograf.pl
autorambler.ru
avh.hu
avia-tonic.fr
awsom.net
axoskate.com
ayna.com
azazazatashkent.tk
azimiweb.com
azmeil.tk
bachelorboy.com
bachelorgal.com
backpackers.com
backstreet-boys.com
backstreetboysclub.com
bagherpour.com
baldmama.de
baldpapa.de
ballyfinance.com
bangkok.com
bangkok2000.com
bannertown.net
baptistmail.com
baptized.com
barcelona.com
bareed.ws
bartender.net
baseballmail.com
basketballmail.com
batuta.net
baudoinconsulting.com
bboy.zzn.com
bcvibes.com
beddly.com
beeebank.com
beenhad.com
beep.ru
beer.com
beethoven.com
belice.com
belizehome.com
bell.net
bellair.net
bellsouth.net
berlin.com
berlin.de
berlinexpo.de
bestmail.us
betriebsdirektor.de
bettergolf.net
bharatmail.com
big1.us
bigassweb.com
bigblue.net.au
bigboab.com
bigfoot.com
bigfoot.de
bigger.com
biggerbadder.com
bigmailbox.com
bigmir.net
bigpond.au
bigpond.com
bigpond.com.au
bigpond.net
bigpond.net.au
bigramp.com
bigstring.com
bikemechanics.com
bikeracer.com
bikeracers.net
bikerider.com
billsfan.com
billsfan.net
bimla.net
bin-wieder-da.de
bio-muesli.info
birdlover.com
birdowner.net
bisons.com
bitmail.com
bitpage.net
bizhosting.com
bk.ru
blackburnmail.com
blackplanet.com
blader.com
bladesmail.net
blazemail.com
bleib-bei-mir.de
blockfilter.com
blogmyway.org
bluebottle.com
bluehyppo.com
bluemail.ch
bluemail.dk
bluesfan.com
bluewin.ch
blueyonder.co.uk
blushmail.com
blutig.me
bmlsports.net
boardermail.com
boatracers.com
bodhi.lawlita.com
bol.com.br
bolando.com
bollywoodz.com
boltonfans.com
bombdiggity.com
bonbon.net
boom.com
bootmail.com
bootybay.de
bornnaked.com
bostonoffice.com
boun.cr
bounce.net
bounces.amazon.com
bouncr.com
box.az
box.ua
boxbg.com
boxemail.com
boxformail.in
boxfrog.com
boximail.com
boyzoneclub.com
bradfordfans.com
brasilia.net
brazilmail.com
brazilmail.com.br
breadtimes.press
breathe.com
brennendesreich.de
bresnan.net
brew-master.com
brew-meister.com
brfree.com.br
briefemail.com
bright.net
britneyclub.com
brittonsign.com
broadcast.net
brokenvalve.com
brusseler.com
bsdmail.com
btcmail.pw
btconnect.co.uk
btconnect.com
btinternet.com
btopenworld.co.uk
buerotiger.de
buffymail.com
bullsfan.com
bullsgame.com
bumerang.ro
bumpymail.com
bund.us
burnthespam.info
burstmail.info
buryfans.com
business-man.com
businessman.net
busta-rhymes.com
buyersusa.com
bvimailbox.com
byom.de
c2.hu
c2i.net
c3.hu
c4.com
c51vsgq.com
cabacabana.com
cable.comcast.com
cableone.net
caere.it
cairomail.com
calendar-server.bounces.google.com
calidifontain.be
californiamail.com
callnetuk.com
callsign.net
caltanet.it
camidge.com
canada-11.com
canada.com
canadianmail.com
canoemail.com
canwetalk.com
caramail.com
care2.com
careerbuildermail.com
carioca.net
cartelera.org
cartestraina.ro
casablancaresort.com
casema.nl
cash4u.com
cashette.com
casino.com
catcha.com
catchamail.com
catholic.org
catlover.com
cd2.com
celineclub.com
celtic.com
center-mail.de
centermail.at
centermail.de
centermail.info
centoper.it
centralpets.com
centrum.cz
centrum.sk
centurytel.net
certifiedmail.com
cfl.rr.com
cgac.es
cghost.s-a-d.de
chacuo.net
chaiyomail.com
chammy.info
chance2mail.com
chandrasekar.net
charmedmail.com
charter.net
chat.ru
chattown.com
chauhanweb.com
cheatmail.de
chechnya.conf.work
check.com
check1check.com
cheerful.com
chef.net
chek.com
chello.nl
chemist.com
chequemail.com
cheyenneweb.com
chez.com
chickmail.com
china.com
china.net.vg
chinamail.com
chirk.com
chocaholic.com.au
chong-mail.com
chong-mail.net
churchusa.com
cia-agent.com
cia.hu
ciaoweb.it
cicciociccio.com
cincinow.net
cinci.rr.com
citiz.net
citlink.net
citromail.hu
city-of-bath.org
city-of-birmingham.com
city-of-brighton.org
city-of-cambridge.com
city-of-coventry.com
city-of-edinburgh.com
city-of-lichfield.com
city-of-lincoln.com
city-of-liverpool.com
city-of-manchester.com
city-of-nottingham.com
city-of-oxford.com
city-of-swansea.com
city-of-westminster.com
city-of-westminster.net
city-of-york.net
cityofcardiff.net
cityoflondon.org
ckaazaza.tk
claramail.com
classicalfan.com
classicmail.co.za
clear.net.nz
clearwire.net
clerk.com
cliffhanger.com
clixser.com
close2you.net
clrmail.com
club4x4.net
clubalfa.com
clubbers.net
clubducati.com
clubhonda.net
clubmember.org
clubnetnoir.com
clubvdo.net
cluemail.com
cmail.net
cmpmail.com
cnnsimail.com
cntv.cn
codec.ro
coder.hu
coid.biz
coldmail.com
collectiblesuperstore.com
collector.org
collegeclub.com
collegemail.com
colleges.com
columbus.rr.com
columbusrr.com
columnist.com
comcast.net
comic.com
communityconnect.com
comporium.net
comprendemail.com
compuserve.com
computer-freak.com
computer4u.com
computermail.net
conexcol.com
conk.com
connect4free.net
connectbox.com
consultant.com
consumerriot.com
contractor.net
contrasto.cu.cc
cookiemonster.com
cool.br
coole-files.de
coolgoose.ca
coolgoose.com
coolkiwi.com
coollist.com
coolmail.com
coolmail.net
coolsend.com
coolsite.net
cooooool.com
cooperation.net
cooperationtogo.net
copacabana.com
copper.net
cornells.com
cornerpub.com
corporatedirtbag.com
correo.terra.com.gt
cortinet.com
cotas.net
counsellor.com
countrylover.com
cox.com
cox.net
coxinet.net
cracker.hu
crapmail.org
crazedanddazed.com
crazymailing.com
crazysexycool.com
cristianemail.com
critterpost.com
croeso.com
crosshairs.com
crosswinds.net
crwmail.com
cry4helponline.com
cs.com
csinibaba.hu
cuemail.com
curio-city.com
curryworld.de
cute-girl.com
cuteandcuddly.com
cutey.com
cww.de
cyber-africa.net
cyber-innovation.club
cyber-matrix.com
cyber-phone.eu
cyber-wizard.com
cyber4all.com
cyberbabies.com
cybercafemaui.com
cyberdude.com
cyberforeplay.net
cybergal.com
cybergrrl.com
cyberinbox.com
cyberleports.com
cybermail.net
cybernet.it
cyberservices.com
cyberspace-asia.com
cybertrains.org
cyclefanz.com
cynetcity.com
dabsol.net
dadacasa.com
daha.com
dailypioneer.com
dallasmail.com
dangerous-minds.com
dansegulvet.com
dasdasdascyka.tk
data54.com
davegracey.com
dawnsonmail.com
dawsonmail.com
dazedandconfused.com
dbzmail.com
dcemail.com
deadlymob.org
deagot.com
deal-maker.com
dearriba.com
death-star.com
deliveryman.com
deneg.net
depechemode.com
deseretmail.com
desertmail.com
desilota.com
deskpilot.com
destin.com
detik.com
deutschland-net.com
devotedcouples.com
dezigner.ru
dfwatson.com
di-ve.com
die-besten-bilder.de
die-genossen.de
die-optimisten.de
die-optimisten.net
diemailbox.de
digibel.be
digital-filestore.de
diplomats.com
directbox.com
dirtracer.com
discard.email
discard.ga
discard.gq
disciples.com
discofan.com
discoverymail.com
disign-concept.eu
disign-revelation.com
disinfo.net
dispomail.eu
disposable.com
dispose.it
dm.w3internet.co.uk
dmailman.com
dnainternet.net
dnsmadeeasy.com
doclist.bounces.google.com
docmail.cz
docs.google.com
doctor.com
dodgit.org
dodo.com.au
dodsi.com
dog.com
dogit.com
doglover.com
dogmail.co.uk
dogsnob.net
doityourself.com
domforfb1.tk
domforfb2.tk
domforfb3.tk
domforfb4.tk
domforfb5.tk
domforfb6.tk
domforfb7.tk
domforfb8.tk
domozmail.com
doneasy.com
donjuan.com
dontgotmail.com
dontmesswithtexas.com
doramail.com
dostmail.com
dotcom.fr
dotmsg.com
dott.it
download-privat.de
dplanet.ch
dr.com
dragoncon.net
dropmail.me
dropzone.com
drotposta.hu
dubaimail.com
dublin.com
dublin.ie
duck.com
dumpmail.com
dumpmail.de
dumpyemail.com
dunlopdriver.com
dunloprider.com
duno.com
duskmail.com
dutchmail.com
dwp.net
dygo.com
dynamitemail.com
dyndns.org
e-apollo.lv
e-mail.com.tr
e-mail.dk
e-mail.ru
e-mail.ua
e-mailanywhere.com
e-mails.ru
e-tapaal.com
earthalliance.com
earthcam.net
earthdome.com
earthling.net
earthlink.net
earthonline.net
eastcoast.co.za
eastmail.com
easy.to
easypost.com
easytrashmail.com
ec.rr.com
ecardmail.com
ecbsolutions.net
echina.com
ecolo-online.fr
ecompare.com
edmail.com
ednatx.com
edtnmail.com
educacao.te.pt
eelmail.com
ehmail.com
einrot.com
einrot.de
eintagsmail.de
eircom.net
elisanet.fi
elitemail.org
elsitio.com
elvis.com
elvisfan.com
email-fake.gq
email-london.co.uk
email.biz
email.cbes.net
email.com
email.cz
email.ee
email.it
email.nu
email.org
email.ro
email.ru
email.si
email.su
email.ua
email2me.net
email4u.info
emailacc.com
emailaccount.com
emailage.ga
emailage.gq
emailasso.net
emailchoice.com
emailcorner.net
emailem.com
emailengine.net
emailengine.org
emailer.hubspot.com
emailforyou.net
emailgo.de
emailgroups.net
emailinfive.com
emailit.com
emailpinoy.com
emailplanet.com
emailplus.org
emailproxsy.com
emails.ga
emails.incisivemedia.com
emails.ru
emailthe.net
emailto.de
emailuser.net
emailx.net
emailz.ga
emailz.gq
ematic.com
embarqmail.com
emeil.in
emeil.ir
emil.com
eml.cc
eml.pp.ua
end-war.com
enel.net
engineer.com
england.com
england.edu
englandmail.com
epage.ru
epatra.com
ephemail.net
epix.net
epost.de
eposta.hu
eqqu.com
eramail.co.za
eresmas.com
eriga.lv
estranet.it
ethos.st
etoast.com
etrademail.com
etranquil.com
etranquil.net
eudoramail.com
europamel.net
europe.com
europemail.com
euroseek.com
eurosport.com
every1.net
everyday.com.kh
everymail.net
everyone.net
everytg.ml
examnotes.net
excite.co.jp
excite.com
excite.it
execs.com
exemail.com.au
exg6.exghost.com
existiert.net
expressasia.com
extenda.net
extended.com
eyepaste.com
eyou.com
ezcybersearch.com
ezmail.egine.com
ezmail.ru
ezrs.com
f-m.fm
f1fans.net
facebook-email.ga
facebook.com
facebookmail.com
facebookmail.gq
fahr-zur-hoelle.org
fake-email.pp.ua
fake-mail.cf
fake-mail.ga
fake-mail.ml
fakemailz.com
falseaddress.com
fan.com
fansonlymail.com
fansworldwide.de
fantasticmail.com
farang.net
farifluset.mailexpire.com
faroweb.com
fast-email.com
fast-mail.fr
fast-mail.org
fastacura.com
fastchevy.com
fastchrysler.com
fastem.com
fastemail.us
fastemailer.com
fastermail.com
fastest.cc
fastimap.com
fastkawasaki.com
fastmail.ca
fastmail.cn
fastmail.co.uk
fastmail.com
fastmail.com.au
fastmail.es
fastmail.fm
fastmail.im
fastmail.in
fastmail.jp
fastmail.mx
fastmail.net
fastmail.nl
fastmail.se
fastmail.to
fastmail.tw
fastmail.us
fastmailbox.net
fastmazda.com
fastmessaging.com
fastmitsubishi.com
fastnissan.com
fastservice.com
fastsubaru.com
fastsuzuki.com
fasttoyota.com
fastyamaha.com
fatcock.net
fatflap.com
fathersrightsne.org
fax.ru
fbi-agent.com
fbi.hu
fdfdsfds.com
fea.st
federalcontractors.com
feinripptraeger.de
felicitymail.com
femenino.com
fetchmail.co.uk
fettabernett.de
feyenoorder.com
ffanet.com
fiberia.com
ficken.de
fightallspam.com
filipinolinks.com
financemail.net
financier.com
findmail.com
finebody.com
fire-brigade.com
fireman.net
fishburne.org
fishfuse.com
fixmail.tk
fizmail.com
flashbox.5july.org
flashmail.com
flashmail.net
fleckens.hu
flipcode.com
fmail.co.uk
fmailbox.com
fmgirl.com
fmguy.com
fnbmail.co.za
fnmail.com
folkfan.com
foodmail.com
footard.com
footballmail.com
foothills.net
for-president.com
force9.co.uk
forfree.at
forgetmail.com
fornow.eu
forpresident.com
fortuncity.com
fortunecity.com
forum.dk
foxmail.com
fr33mail.info
francemel.fr
free-email.ga
free-online.net
free-org.com
free.com.pe
free.fr
freeaccess.nl
freeaccount.com
freeandsingle.com
freedom.usa.com
freedomlover.com
freegates.be
freeghana.com
freelance-france.eu
freeler.nl
freemail.c3.hu
freemail.com.au
freemail.com.pk
freemail.de
freemail.et
freemail.gr
freemail.hu
freemail.it
freemail.lt
freemail.ms
freemail.nl
freemail.org.mk
freemails.ga
freemeil.gq
freenet.de
freenet.kg
freeola.com
freeola.net
freeserve.co.uk
freestart.hu
freesurf.fr
freesurf.nl
freeuk.com
freeuk.net
freeukisp.co.uk
freeweb.org
freewebemail.com
freeyellow.com
freezone.co.uk
fresnomail.com
freudenkinder.de
freundin.ru
friendlymail.co.uk
friends-cafe.com
friendsfan.com
from-africa.com
from-america.com
from-argentina.com
from-asia.com
from-australia.com
from-belgium.com
from-brazil.com
from-canada.com
from-china.net
from-england.com
from-europe.com
from-france.net
from-germany.net
from-holland.com
from-israel.com
from-italy.net
from-japan.net
from-korea.com
from-mexico.com
from-outerspace.com
from-russia.com
from-spain.net
fromalabama.com
fromalaska.com
fromarizona.com
fromarkansas.com
fromcalifornia.com
fromcolorado.com
fromconnecticut.com
fromdelaware.com
fromflorida.net
fromgeorgia.com
fromhawaii.net
fromidaho.com
fromillinois.com
fromindiana.com
fromiowa.com
fromjupiter.com
fromkansas.com
fromkentucky.com
fromlouisiana.com
frommaine.net
frommaryland.com
frommassachusetts.com
frommiami.com
frommichigan.com
fromminnesota.com
frommississippi.com
frommissouri.com
frommontana.com
fromnebraska.com
fromnevada.com
fromnewhampshire.com
fromnewjersey.com
fromnewmexico.com
fromnewyork.net
fromnorthcarolina.com
fromnorthdakota.com
fromohio.com
fromoklahoma.com
fromoregon.net
frompennsylvania.com
fromrhodeisland.com
fromru.com
fromsouthcarolina.com
fromsouthdakota.com
fromtennessee.com
fromtexas.com
fromthestates.com
fromutah.com
fromvermont.com
fromvirginia.com
fromwashington.com
fromwashingtondc.com
fromwestvirginia.com
fromwisconsin.com
fromwyoming.com
front.ru
frontier.com
frontiernet.net
frostbyte.uk.net
fsmail.net
ftc-i.net
ftml.net
fullmail.com
funkfan.com
fuorissimo.com
furnitureprovider.com
fuse.net
fut.es
fux0ringduh.com
fwnb.com
fxsmails.com
galaxy5.com
galaxyhit.com
gamebox.net
gamegeek.com
gamespotmail.com
gamno.config.work
garbage.com
gardener.com
gawab.com
gaybrighton.co.uk
gaza.net
gazeta.pl
gazibooks.com
gci.net
geecities.com
geek.com
geek.hu
geeklife.com
gelitik.in
gencmail.com
general-hospital.com
gentlemansclub.de
geocities.com
geography.net
geologist.com
geopia.com
germanymail.com
get.pp.ua
get1mail.com
getairmail.cf
getairmail.com
getairmail.ga
getairmail.gq
getonemail.net
ghanamail.com
ghostmail.com
ghosttexter.de
giga4u.de
gigileung.org
girl4god.com
givepeaceachance.com
glay.org
glendale.net
globalfree.it
globalpagan.com
globalsite.com.br
gmail.com
gmail.com.br
gmail.ru
gmx.at
gmx.ch
gmx.com
gmx.de
gmx.li
gmx.net
go.com
go.ro
go.ru
go2net.com
gocollege.com
gocubs.com
goemailgo.com
gofree.co.uk
gol.com
goldenmail.ru
goldmail.ru
goldtoolbox.com
golfemail.com
golfilla.info
golfmail.be
gonavy.net
goodnewsmail.com
goodstick.com
googlegroups.com
googlemail.com
goplay.com
gorillaswithdirtyarmpits.com
gorontalo.net
gospelfan.com
gothere.uk.com
gotmail.com
gotmail.org
gotomy.com
gotti.otherinbox.com
gportal.hu
graduate.org
graffiti.net
gramszu.net
grandmamail.com
grandmasmail.com
graphic-designer.com
grapplers.com
gratisweb.com
greenmail.net
groupmail.com
grr.la
grungecafe.com
gtmc.net
gua.net
guessmail.com
guju.net
gustr.com
guy.com
guy2.com
guyanafriends.com
gyorsposta.com
gyorsposta.hu
h-mail.us
hab-verschlafen.de
habmalnefrage.de
hacccc.com
hackermail.com
hackermail.net
hailmail.net
hairdresser.net
hamptonroads.com
handbag.com
handleit.com
hang-ten.com
hanmail.net
happemail.com
happycounsel.com
happypuppy.com
harakirimail.com
hardcorefreak.com
hartbot.de
hawaii.rr.com
hawaiiantel.net
heartthrob.com
heerschap.com
heesun.net
hehe.com
hello.hu
hello.net.au
hello.to
helter-skelter.com
herediano.com
herono1.com
herp.in
herr-der-mails.de
hetnet.nl
hey.to
hhdevel.com
hidzz.com
highmilton.com
highquality.com
highveldmail.co.za
hilarious.com
hiphopfan.com
hispavista.com
hitmail.com
hitthe.net
hkg.net
hkstarphoto.com
hockeymail.com
hollywoodkids.com
home-email.com
home.de
home.nl
home.no.net
home.ro
home.se
homelocator.com
homemail.com
homestead.com
honduras.com
hongkong.com
hookup.net
hoopsmail.com
hopemail.biz
horrormail.com
hot-mail.gq
hot-shot.com
hot.ee
hotbot.com
hotbrev.com
hotfire.net
hotletter.com
hotmail.ca
hotmail.ch
hotmail.co
hotmail.co.il
hotmail.co.jp
hotmail.co.nz
hotmail.co.uk
hotmail.co.za
hotmail.com
hotmail.com.au
hotmail.com.br
hotmail.com.tr
hotmail.de
hotmail.es
hotmail.fi
hotmail.fr
hotmail.it
hotmail.kg
hotmail.kz
hotmail.nl
hotmail.ru
hotmail.se
hotpop.com
hotpop3.com
hotvoice.com
housemail.com
hsuchi.net
hu2.ru
hughes.net
humanoid.net
humn.ws.gy
hunsa.com
hurting.com
hush.com
hushmail.com
hypernautica.com
i-connect.com
i-france.com
i-mail.com.au
i-p.com
i.am
i.ua
i12.com
i2pmail.org
iamawoman.com
iamwaiting.com
iamwasted.com
iamyours.com
icestorm.com
ich-bin-verrueckt-nach-dir.de
ich-will-net.de
icloud.com
icmsconsultants.com
icq.com
icqmail.com
icrazy.com
id-base.com
ididitmyway.com
idigjesus.com
idirect.com
ieatspam.eu
ieatspam.info
ieh-mail.de
iespana.es
ifoward.com
ig.com.br
ignazio.it
ignmail.com
ihateclowns.com
ihateyoualot.info
iheartspam.org
iinet.net.au
ijustdontcare.com
ikbenspamvrij.nl
ilkposta.com
ilovechocolate.com
ilovejesus.com
ilovetocollect.net
ilse.nl
imaginemail.com
imail.ru
imailbox.com
imap-mail.com
imap.cc
imapmail.org
imel.org
imgof.com
imgv.de
immo-gerance.info
imneverwrong.com
imposter.co.uk
imstations.com
imstressed.com
imtoosexy.com
in-box.net
in2jesus.com
iname.com
inbax.tk
inbound.plus
inbox.com
inbox.net
inbox.ru
inbox.si
inboxalias.com
incamail.com
incredimail.com
indeedemail.com
index.ua
indexa.fr
india.com
indiatimes.com
indo-mail.com
indocities.com
indomail.com
indyracers.com
inerted.com
inet.com
inet.net.au
info-media.de
info-radio.ml
info66.com
infohq.com
infomail.es
infomart.or.jp
infospacemail.com
infovia.com.ar
inicia.es
inmail.sk
inmail24.com
inmano.com
inmynetwork.tk
innocent.com
inorbit.com
inoutbox.com
insidebaltimore.net
insight.rr.com
instant-mail.de
instantemailaddress.com
instantmail.fr
instruction.com
instructor.net
insurer.com
interburp.com
interfree.it
interia.pl
interlap.com.ar
intermail.co.il
internet-e-mail.com
internet-mail.org
internet-police.com
internetbiz.com
internetdrive.com
internetegypt.com
internetemails.net
internetmailing.net
internode.on.net
invalid.com
inwind.it
iobox.com
iobox.fi
iol.it
iol.pt
iowaemail.com
ip3.com
ip4.pp.ua
ip6.pp.ua
ipoo.org
iprimus.com.au
iqemail.com
irangate.net
iraqmail.com
ireland.com
irelandmail.com
iremail.de
irj.hu
iroid.com
isellcars.com
iservejesus.com
islamonline.net
isleuthmail.com
ismart.net
isonfire.com
isp9.net
israelmail.com
ist-allein.info
ist-einmalig.de
ist-ganz-allein.de
ist-willig.de
italymail.com
itloox.com
itmom.com
ivebeenframed.com
ivillage.com
iwan-fals.com
iwmail.com
iwon.com
izadpanah.com
jahoopa.com
jakuza.hu
japan.com
jaydemail.com
jazzandjava.com
jazzfan.com
jazzgame.com
je-recycle.info
jerusalemmail.com
jet-renovation.fr
jetable.de
jetable.pp.ua
jetemail.net
jippii.fi
jmail.co.za
job4u.com
jobbikszimpatizans.hu
joelonsoftware.com
joinme.com
jokes.com
jordanmail.com
journalist.com
jourrapide.com
jovem.te.pt
joymail.com
jpopmail.com
jsrsolutions.com
jubiimail.dk
jump.com
jumpy.it
juniormail.com
junk1e.com
junkmail.com
junkmail.gq
juno.com
justemail.net
justicemail.com
kaazoo.com
kaffeeschluerfer.com
kaffeeschluerfer.de
kaixo.com
kalpoint.com
kansascity.com
kapoorweb.com
karachian.com
karachioye.com
karbasi.com
katamail.com
kayafmmail.co.za
kbjrmail.com
kcks.com
keg-party.com
keinpardon.de
keko.com.ar
kellychen.com
keromail.com
keyemail.com
kgb.hu
khosropour.com
kickassmail.com
killermail.com
kimo.com
kimsdisk.com
kinglibrary.net
kinki-kids.com
kissfans.com
kittymail.com
kitznet.at
kiwibox.com
kiwitown.com
klassmaster.net
km.ru
knol-power.nl
kolumbus.fi
kommespaeter.de
konx.com
korea.com
koreamail.com
kpnmail.nl
krim.ws
krongthip.com
krunis.com
ksanmail.com
ksee24mail.com
kube93mail.com
kukamail.com
kulturbetrieb.info
kumarweb.com
kuwait-mail.com
l33r.eu
la.com
labetteraverouge.at
ladymail.cz
lagerlouts.com
lags.us
lahoreoye.com
lakmail.com
lamer.hu
land.ru
lankamail.com
laoeq.com
laposte.net
lass-es-geschehen.de
last-chance.pro
lastmail.co
latemodels.com
latinmail.com
lavache.com
law.com
lawyer.com
lazyinbox.com
leehom.net
legalactions.com
legalrc.loan
legislator.com
lenta.ru
leonlai.net
letsgomets.net
letterboxes.org
letthemeatspam.com
levele.com
levele.hu
lex.bg
lexis-nexis-mail.com
libero.it
liberomail.com
lick101.com
liebt-dich.info
linkmaster.com
linktrader.com
linuxfreemail.com
linuxmail.org
lionsfan.com.au
liontrucks.com
liquidinformation.net
list.ru
listomail.com
littleapple.com
littleblueroom.com
live.at
live.be
live.ca
live.cl
live.cn
live.co.uk
live.co.za
live.com
live.com.ar
live.com.au
live.com.mx
live.com.pt
live.com.sg
live.de
live.dk
live.fr
live.ie
live.in
live.it
live.jp
live.nl
live.no
live.ru
live.se
liveradio.tk
liverpoolfans.com
llandudno.com
llangollen.com
lmxmail.sk
lobbyist.com
localbar.com
locos.com
login-email.ga
loh.pp.ua
lolfreak.net
lolito.tk
london.com
looksmart.co.uk
looksmart.com
looksmart.com.au
lopezclub.com
louiskoo.com
love.cz
loveable.com
lovecat.com
lovefall.ml
lovefootball.com
lovelygirl.net
lovemail.com
lover-boy.com
lovergirl.com
lovesea.gq
lovethebroncos.com
lovethecowboys.com
loveyouforever.de
lovingjesus.com
lowandslow.com
lr7.us
lroid.com
luso.pt
luukku.com
luv2.us
lvie.com.sg
lycos.co.uk
lycos.com
lycos.es
lycos.it
lycos.ne.jp
lycosmail.com
m-a-i-l.com
m-hmail.com
m4.org
m4ilweb.info
mac.com
macbox.com
macfreak.com
machinecandy.com
macmail.com
madcreations.com
madonnafan.com
madrid.com
maennerversteherin.com
maennerversteherin.de
maffia.hu
magicmail.co.za
magspam.net
mahmoodweb.com
mail.bg
mail-awu.de
mail-box.cz
mail-center.com
mail-central.com
mail-easy.fr
mail-filter.com
mail-me.com
mail-page.com
mail-tester.com
mail.austria.com
mail.az
mail.be
mail.bulgaria.com
mail.by
mail.co.za
mail.com
mail.com.tr
mail.de
mail.ee
mail.entrepeneurmag.com
mail.freetown.com
mail.gr
mail.hitthebeach.com
mail.htl22.at
mail.md
mail.misterpinball.de
mail.nu
mail.org.uk
mail.pf
mail.pt
mail.r-o-o-t.com
mail.ru
mail.sisna.com
mail.svenz.eu
mail.usa.com
mail.vasarhely.hu
mail.wtf
mail114.net
mail15.com
mail2007.com
mail2aaron.com
mail2abby.com
mail2abc.com
mail2actor.com
mail2admiral.com
mail2adorable.com
mail2adoration.com
mail2adore.com
mail2adventure.com
mail2aeolus.com
mail2aether.com
mail2affection.com
mail2afghanistan.com
mail2africa.com
mail2agent.com
mail2aha.com
mail2ahoy.com
mail2aim.com
mail2air.com
mail2airbag.com
mail2airforce.com
mail2airport.com
mail2alabama.com
mail2alan.com
mail2alaska.com
mail2albania.com
mail2alcoholic.com
mail2alec.com
mail2alexa.com
mail2algeria.com
mail2alicia.com
mail2alien.com
mail2allan.com
mail2allen.com
mail2allison.com
mail2alpha.com
mail2alyssa.com
mail2amanda.com
mail2amazing.com
mail2amber.com
mail2america.com
mail2american.com
mail2andorra.com
mail2andrea.com
mail2andy.com
mail2anesthesiologist.com
mail2angela.com
mail2angola.com
mail2ann.com
mail2anna.com
mail2anne.com
mail2anthony.com
mail2anything.com
mail2aphrodite.com
mail2apollo.com
mail2april.com
mail2aquarius.com
mail2arabia.com
mail2arabic.com
mail2architect.com
mail2ares.com
mail2argentina.com
mail2aries.com
mail2arizona.com
mail2arkansas.com
mail2armenia.com
mail2army.com
mail2arnold.com
mail2art.com
mail2artemus.com
mail2arthur.com
mail2artist.com
mail2ashley.com
mail2ask.com
mail2astronomer.com
mail2athena.com
mail2athlete.com
mail2atlas.com
mail2atom.com
mail2attitude.com
mail2auction.com
mail2aunt.com
mail2australia.com
mail2austria.com
mail2azerbaijan.com
mail2baby.com
mail2bahamas.com
mail2bahrain.com
mail2ballerina.com
mail2ballplayer.com
mail2band.com
mail2bangladesh.com
mail2bank.com
mail2banker.com
mail2bankrupt.com
mail2baptist.com
mail2bar.com
mail2barbados.com
mail2barbara.com
mail2barter.com
mail2basketball.com
mail2batter.com
mail2beach.com
mail2beast.com
mail2beatles.com
mail2beauty.com
mail2becky.com
mail2beijing.com
mail2belgium.com
mail2belize.com
mail2ben.com
mail2bernard.com
mail2beth.com
mail2betty.com
mail2beverly.com
mail2beyond.com
mail2biker.com
mail2bill.com
mail2billionaire.com
mail2billy.com
mail2bio.com
mail2biologist.com
mail2black.com
mail2blackbelt.com
mail2blake.com
mail2blind.com
mail2blonde.com
mail2blues.com
mail2bob.com
mail2bobby.com
mail2bolivia.com
mail2bombay.com
mail2bonn.com
mail2bookmark.com
mail2boreas.com
mail2bosnia.com
mail2boston.com
mail2botswana.com
mail2bradley.com
mail2brazil.com
mail2breakfast.com
mail2brian.com
mail2bride.com
mail2brittany.com
mail2broker.com
mail2brook.com
mail2bruce.com
mail2brunei.com
mail2brunette.com
mail2brussels.com
mail2bryan.com
mail2bug.com
mail2bulgaria.com
mail2business.com
mail2buy.com
mail2ca.com
mail2california.com
mail2calvin.com
mail2cambodia.com
mail2cameroon.com
mail2canada.com
mail2cancer.com
mail2capeverde.com
mail2capricorn.com
mail2cardinal.com
mail2cardiologist.com
mail2care.com
mail2caroline.com
mail2carolyn.com
mail2casey.com
mail2cat.com
mail2caterer.com
mail2cathy.com
mail2catlover.com
mail2catwalk.com
mail2cell.com
mail2chad.com
mail2champaign.com
mail2charles.com
mail2chef.com
mail2chemist.com
mail2cherry.com
mail2chicago.com
mail2chile.com
mail2china.com
mail2chinese.com
mail2chocolate.com
mail2christian.com
mail2christie.com
mail2christmas.com
mail2christy.com
mail2chuck.com
mail2cindy.com
mail2clark.com
mail2classifieds.com
mail2claude.com
mail2cliff.com
mail2clinic.com
mail2clint.com
mail2close.com
mail2club.com
mail2coach.com
mail2coastguard.com
mail2colin.com
mail2college.com
mail2colombia.com
mail2color.com
mail2colorado.com
mail2columbia.com
mail2comedian.com
mail2composer.com
mail2computer.com
mail2computers.com
mail2concert.com
mail2congo.com
mail2connect.com
mail2connecticut.com
mail2consultant.com
mail2convict.com
mail2cook.com
mail2cool.com
mail2cory.com
mail2costarica.com
mail2country.com
mail2courtney.com
mail2cowboy.com
mail2cowgirl.com
mail2craig.com
mail2crave.com
mail2crazy.com
mail2create.com
mail2croatia.com
mail2cry.com
mail2crystal.com
mail2cuba.com
mail2culture.com
mail2curt.com
mail2customs.com
mail2cute.com
mail2cutey.com
mail2cynthia.com
mail2cyprus.com
mail2czechrepublic.com
mail2dad.com
mail2dale.com
mail2dallas.com
mail2dan.com
mail2dana.com
mail2dance.com
mail2dancer.com
mail2danielle.com
mail2danny.com
mail2darlene.com
mail2darling.com
mail2darren.com
mail2daughter.com
mail2dave.com
mail2dawn.com
mail2dc.com
mail2dealer.com
mail2deanna.com
mail2dearest.com
mail2debbie.com
mail2debby.com
mail2deer.com
mail2delaware.com
mail2delicious.com
mail2demeter.com
mail2democrat.com
mail2denise.com
mail2denmark.com
mail2dennis.com
mail2dentist.com
mail2derek.com
mail2desert.com
mail2devoted.com
mail2devotion.com
mail2diamond.com
mail2diana.com
mail2diane.com
mail2diehard.com
mail2dilemma.com
mail2dillon.com
mail2dinner.com
mail2dinosaur.com
mail2dionysos.com
mail2diplomat.com
mail2director.com
mail2dirk.com
mail2disco.com
mail2dive.com
mail2diver.com
mail2divorced.com
mail2djibouti.com
mail2doctor.com
mail2doglover.com
mail2dominic.com
mail2dominica.com
mail2dominicanrepublic.com
mail2don.com
mail2donald.com
mail2donna.com
mail2doris.com
mail2dorothy.com
mail2doug.com
mail2dough.com
mail2douglas.com
mail2dow.com
mail2downtown.com
mail2dream.com
mail2dreamer.com
mail2dude.com
mail2dustin.com
mail2dyke.com
mail2dylan.com
mail2earl.com
mail2earth.com
mail2eastend.com
mail2eat.com
mail2economist.com
mail2ecuador.com
mail2eddie.com
mail2edgar.com
mail2edwin.com
mail2egypt.com
mail2electron.com
mail2eli.com
mail2elizabeth.com
mail2ellen.com
mail2elliot.com
mail2elsalvador.com
mail2elvis.com
mail2emergency.com
mail2emily.com
mail2engineer.com
mail2english.com
mail2environmentalist.com
mail2eos.com
mail2eric.com
mail2erica.com
mail2erin.com
mail2erinyes.com
mail2eris.com
mail2eritrea.com
mail2ernie.com
mail2eros.com
mail2estonia.com
mail2ethan.com
mail2ethiopia.com
mail2eu.com
mail2europe.com
mail2eurus.com
mail2eva.com
mail2evan.com
mail2evelyn.com
mail2everything.com
mail2exciting.com
mail2expert.com
mail2fairy.com
mail2faith.com
mail2fanatic.com
mail2fancy.com
mail2fantasy.com
mail2farm.com
mail2farmer.com
mail2fashion.com
mail2fat.com
mail2feeling.com
mail2female.com
mail2fever.com
mail2fighter.com
mail2fiji.com
mail2filmfestival.com
mail2films.com
mail2finance.com
mail2finland.com
mail2fireman.com
mail2firm.com
mail2fisherman.com
mail2flexible.com
mail2florence.com
mail2florida.com
mail2floyd.com
mail2fly.com
mail2fond.com
mail2fondness.com
mail2football.com
mail2footballfan.com
mail2found.com
mail2france.com
mail2frank.com
mail2frankfurt.com
mail2franklin.com
mail2fred.com
mail2freddie.com
mail2free.com
mail2freedom.com
mail2french.com
mail2freudian.com
mail2friendship.com
mail2from.com
mail2fun.com
mail2gabon.com
mail2gabriel.com
mail2gail.com
mail2galaxy.com
mail2gambia.com
mail2games.com
mail2gary.com
mail2gavin.com
mail2gemini.com
mail2gene.com
mail2genes.com
mail2geneva.com
mail2george.com
mail2georgia.com
mail2gerald.com
mail2german.com
mail2germany.com
mail2ghana.com
mail2gilbert.com
mail2gina.com
mail2girl.com
mail2glen.com
mail2gloria.com
mail2goddess.com
mail2gold.com
mail2golfclub.com
mail2golfer.com
mail2gordon.com
mail2government.com
mail2grab.com
mail2grace.com
mail2graham.com
mail2grandma.com
mail2grandpa.com
mail2grant.com
mail2greece.com
mail2green.com
mail2greg.com
mail2grenada.com
mail2gsm.com
mail2guard.com
mail2guatemala.com
mail2guy.com
mail2hades.com
mail2haiti.com
mail2hal.com
mail2handhelds.com
mail2hank.com
mail2hannah.com
mail2harold.com
mail2harry.com
mail2hawaii.com
mail2headhunter.com
mail2heal.com
mail2heather.com
mail2heaven.com
mail2hebe.com
mail2hecate.com
mail2heidi.com
mail2helen.com
mail2hell.com
mail2help.com
mail2helpdesk.com
mail2henry.com
mail2hephaestus.com
mail2hera.com
mail2hercules.com
mail2herman.com
mail2hermes.com
mail2hespera.com
mail2hestia.com
mail2highschool.com
mail2hindu.com
mail2hip.com
mail2hiphop.com
mail2holland.com
mail2holly.com
mail2hollywood.com
mail2homer.com
mail2honduras.com
mail2honey.com
mail2hongkong.com
mail2hope.com
mail2horse.com
mail2hot.com
mail2hotel.com
mail2houston.com
mail2howard.com
mail2hugh.com
mail2human.com
mail2hungary.com
mail2hungry.com
mail2hygeia.com
mail2hyperspace.com
mail2hypnos.com
mail2ian.com
mail2ice-cream.com
mail2iceland.com
mail2idaho.com
mail2idontknow.com
mail2illinois.com
mail2imam.com
mail2in.com
mail2india.com
mail2indian.com
mail2indiana.com
mail2indonesia.com
mail2infinity.com
mail2intense.com
mail2iowa.com
mail2iran.com
mail2iraq.com
mail2ireland.com
mail2irene.com
mail2iris.com
mail2irresistible.com
mail2irving.com
mail2irwin.com
mail2isaac.com
mail2israel.com
mail2italian.com
mail2italy.com
mail2jackie.com
mail2jacob.com
mail2jail.com
mail2jaime.com
mail2jake.com
mail2jamaica.com
mail2james.com
mail2jamie.com
mail2jan.com
mail2jane.com
mail2janet.com
mail2janice.com
mail2japan.com
mail2japanese.com
mail2jasmine.com
mail2jason.com
mail2java.com
mail2jay.com
mail2jazz.com
mail2jed.com
mail2jeffrey.com
mail2jennifer.com
mail2jenny.com
mail2jeremy.com
mail2jerry.com
mail2jessica.com
mail2jessie.com
mail2jesus.com
mail2jew.com
mail2jeweler.com
mail2jim.com
mail2jimmy.com
mail2joan.com
mail2joann.com
mail2joanna.com
mail2jody.com
mail2joe.com
mail2joel.com
mail2joey.com
mail2john.com
mail2join.com
mail2jon.com
mail2jonathan.com
mail2jones.com
mail2jordan.com
mail2joseph.com
mail2josh.com
mail2joy.com
mail2juan.com
mail2judge.com
mail2judy.com
mail2juggler.com
mail2julian.com
mail2julie.com
mail2jumbo.com
mail2junk.com
mail2justin.com
mail2justme.com
mail2k.ru
mail2kansas.com
mail2karate.com
mail2karen.com
mail2karl.com
mail2karma.com
mail2kathleen.com
mail2kathy.com
mail2katie.com
mail2kay.com
mail2kazakhstan.com
mail2keen.com
mail2keith.com
mail2kelly.com
mail2kelsey.com
mail2ken.com
mail2kendall.com
mail2kennedy.com
mail2kenneth.com
mail2kenny.com
mail2kentucky.com
mail2kenya.com
mail2kerry.com
mail2kevin.com
mail2kim.com
mail2kimberly.com
mail2king.com
mail2kirk.com
mail2kiss.com
mail2kosher.com
mail2kristin.com
mail2kurt.com
mail2kuwait.com
mail2kyle.com
mail2kyrgyzstan.com
mail2la.com
mail2lacrosse.com
mail2lance.com
mail2lao.com
mail2larry.com
mail2latvia.com
mail2laugh.com
mail2laura.com
mail2lauren.com
mail2laurie.com
mail2lawrence.com
mail2lawyer.com
mail2lebanon.com
mail2lee.com
mail2leo.com
mail2leon.com
mail2leonard.com
mail2leone.com
mail2leslie.com
mail2letter.com
mail2liberia.com
mail2libertarian.com
mail2libra.com
mail2libya.com
mail2liechtenstein.com
mail2life.com
mail2linda.com
mail2linux.com
mail2lionel.com
mail2lipstick.com
mail2liquid.com
mail2lisa.com
mail2lithuania.com
mail2litigator.com
mail2liz.com
mail2lloyd.com
mail2lois.com
mail2lola.com
mail2london.com
mail2looking.com
mail2lori.com
mail2lost.com
mail2lou.com
mail2louis.com
mail2louisiana.com
mail2lovable.com
mail2love.com
mail2lucky.com
mail2lucy.com
mail2lunch.com
mail2lust.com
mail2luxembourg.com
mail2luxury.com
mail2lyle.com
mail2lynn.com
mail2madagascar.com
mail2madison.com
mail2madrid.com
mail2maggie.com
mail2mail4.com
mail2maine.com
mail2malawi.com
mail2malaysia.com
mail2maldives.com
mail2mali.com
mail2malta.com
mail2mambo.com
mail2man.com
mail2mandy.com
mail2manhunter.com
mail2mankind.com
mail2many.com
mail2marc.com
mail2marcia.com
mail2margaret.com
mail2margie.com
mail2marhaba.com
mail2maria.com
mail2marilyn.com
mail2marines.com
mail2mark.com
mail2marriage.com
mail2married.com
mail2marries.com
mail2mars.com
mail2marsha.com
mail2marshallislands.com
mail2martha.com
mail2martin.com
mail2marty.com
mail2marvin.com
mail2mary.com
mail2maryland.com
mail2mason.com
mail2massachusetts.com
mail2matt.com
mail2matthew.com
mail2maurice.com
mail2mauritania.com
mail2mauritius.com
mail2max.com
mail2maxwell.com
mail2maybe.com
mail2mba.com
mail2me4u.com
mail2mechanic.com
mail2medieval.com
mail2megan.com
mail2mel.com
mail2melanie.com
mail2melissa.com
mail2melody.com
mail2member.com
mail2memphis.com
mail2methodist.com
mail2mexican.com
mail2mexico.com
mail2mgz.com
mail2miami.com
mail2michael.com
mail2michelle.com
mail2michigan.com
mail2mike.com
mail2milan.com
mail2milano.com
mail2mildred.com
mail2milkyway.com
mail2millennium.com
mail2millionaire.com
mail2milton.com
mail2mime.com
mail2mindreader.com
mail2mini.com
mail2minister.com
mail2minneapolis.com
mail2minnesota.com
mail2miracle.com
mail2missionary.com
mail2mississippi.com
mail2missouri.com
mail2mitch.com
mail2model.com
mail2moldova.commail2molly.com
mail2mom.com
mail2monaco.com
mail2money.com
mail2mongolia.com
mail2monica.com
mail2montana.com
mail2monty.com
mail2moon.com
mail2morocco.com
mail2morpheus.com
mail2mors.com
mail2moscow.com
mail2moslem.com
mail2mouseketeer.com
mail2movies.com
mail2mozambique.com
mail2mp3.com
mail2mrright.com
mail2msright.com
mail2museum.com
mail2music.com
mail2musician.com
mail2muslim.com
mail2my.com
mail2myboat.com
mail2mycar.com
mail2mycell.com
mail2mygsm.com
mail2mylaptop.com
mail2mymac.com
mail2mypager.com
mail2mypalm.com
mail2mypc.com
mail2myphone.com
mail2myplane.com
mail2namibia.com
mail2nancy.com
mail2nasdaq.com
mail2nathan.com
mail2nauru.com
mail2navy.com
mail2neal.com
mail2nebraska.com
mail2ned.com
mail2neil.com
mail2nelson.com
mail2nemesis.com
mail2nepal.com
mail2netherlands.com
mail2network.com
mail2nevada.com
mail2newhampshire.com
mail2newjersey.com
mail2newmexico.com
mail2newyork.com
mail2newzealand.com
mail2nicaragua.com
mail2nick.com
mail2nicole.com
mail2niger.com
mail2nigeria.com
mail2nike.com
mail2no.com
mail2noah.com
mail2noel.com
mail2noelle.com
mail2normal.com
mail2norman.com
mail2northamerica.com
mail2northcarolina.com
mail2northdakota.com
mail2northpole.com
mail2norway.com
mail2notus.com
mail2noway.com
mail2nowhere.com
mail2nuclear.com
mail2nun.com
mail2ny.com
mail2oasis.com
mail2oceanographer.com
mail2ohio.com
mail2ok.com
mail2oklahoma.com
mail2oliver.com
mail2oman.com
mail2one.com
mail2onfire.com
mail2online.com
mail2oops.com
mail2open.com
mail2ophthalmologist.com
mail2optometrist.com
mail2oregon.com
mail2oscars.com
mail2oslo.com
mail2painter.com
mail2pakistan.com
mail2palau.com
mail2pan.com
mail2panama.com
mail2paraguay.com
mail2paralegal.com
mail2paris.com
mail2park.com
mail2parker.com
mail2party.com
mail2passion.com
mail2pat.com
mail2patricia.com
mail2patrick.com
mail2patty.com
mail2paul.com
mail2paula.com
mail2pay.com
mail2peace.com
mail2pediatrician.com
mail2peggy.com
mail2pennsylvania.com
mail2perry.com
mail2persephone.com
mail2persian.com
mail2peru.com
mail2pete.com
mail2peter.com
mail2pharmacist.com
mail2phil.com
mail2philippines.com
mail2phoenix.com
mail2phonecall.com
mail2phyllis.com
mail2pickup.com
mail2pilot.com
mail2pisces.com
mail2planet.com
mail2platinum.com
mail2plato.com
mail2pluto.com
mail2pm.com
mail2podiatrist.com
mail2poet.com
mail2poland.com
mail2policeman.com
mail2policewoman.com
mail2politician.com
mail2pop.com
mail2pope.com
mail2popular.com
mail2portugal.com
mail2poseidon.com
mail2potatohead.com
mail2power.com
mail2presbyterian.com
mail2president.com
mail2priest.com
mail2prince.com
mail2princess.com
mail2producer.com
mail2professor.com
mail2protect.com
mail2psychiatrist.com
mail2psycho.com
mail2psychologist.com
mail2qatar.com
mail2queen.com
mail2rabbi.com
mail2race.com
mail2racer.com
mail2rachel.com
mail2rage.com
mail2rainmaker.com
mail2ralph.com
mail2randy.com
mail2rap.com
mail2rare.com
mail2rave.com
mail2ray.com
mail2raymond.com
mail2realtor.com
mail2rebecca.com
mail2recruiter.com
mail2recycle.com
mail2redhead.com
mail2reed.com
mail2reggie.com
mail2register.com
mail2rent.com
mail2republican.com
mail2resort.com
mail2rex.com
mail2rhodeisland.com
mail2rich.com
mail2richard.com
mail2ricky.com
mail2ride.com
mail2riley.com
mail2rita.com
mail2rob.com
mail2robert.com
mail2roberta.com
mail2robin.com
mail2rock.com
mail2rocker.com
mail2rod.com
mail2rodney.com
mail2romania.com
mail2rome.com
mail2ron.com
mail2ronald.com
mail2ronnie.com
mail2rose.com
mail2rosie.com
mail2roy.com
mail2rss.org
mail2rudy.com
mail2rugby.com
mail2runner.com
mail2russell.com
mail2russia.com
mail2russian.com
mail2rusty.com
mail2ruth.com
mail2rwanda.com
mail2ryan.com
mail2sa.com
mail2sabrina.com
mail2safe.com
mail2sagittarius.com
mail2sail.com
mail2sailor.com
mail2sal.com
mail2salaam.com
mail2sam.com
mail2samantha.com
mail2samoa.com
mail2samurai.com
mail2sandra.com
mail2sandy.com
mail2sanfrancisco.com
mail2sanmarino.com
mail2santa.com
mail2sara.com
mail2sarah.com
mail2sat.com
mail2saturn.com
mail2saudi.com
mail2saudiarabia.com
mail2save.com
mail2savings.com
mail2school.com
mail2scientist.com
mail2scorpio.com
mail2scott.com
mail2sean.com
mail2search.com
mail2seattle.com
mail2secretagent.com
mail2senate.com
mail2senegal.com
mail2sensual.com
mail2seth.com
mail2sevenseas.com
mail2sexy.com
mail2seychelles.com
mail2shane.com
mail2sharon.com
mail2shawn.com
mail2ship.com
mail2shirley.com
mail2shoot.com
mail2shuttle.com
mail2sierraleone.com
mail2simon.com
mail2singapore.com
mail2single.com
mail2site.com
mail2skater.com
mail2skier.com
mail2sky.com
mail2sleek.com
mail2slim.com
mail2slovakia.com
mail2slovenia.com
mail2smile.com
mail2smith.com
mail2smooth.com
mail2soccer.com
mail2soccerfan.com
mail2socialist.com
mail2soldier.com
mail2somalia.com
mail2son.com
mail2song.com
mail2sos.com
mail2sound.com
mail2southafrica.com
mail2southamerica.com
mail2southcarolina.com
mail2southdakota.com
mail2southkorea.com
mail2southpole.com
mail2spain.com
mail2spanish.com
mail2spare.com
mail2spectrum.com
mail2splash.com
mail2sponsor.com
mail2sports.com
mail2srilanka.com
mail2stacy.com
mail2stan.com
mail2stanley.com
mail2star.com
mail2state.com
mail2stephanie.com
mail2steve.com
mail2steven.com
mail2stewart.com
mail2stlouis.com
mail2stock.com
mail2stockholm.com
mail2stockmarket.com
mail2storage.com
mail2store.com
mail2strong.com
mail2student.com
mail2studio.com
mail2studio54.com
mail2stuntman.com
mail2subscribe.com
mail2sudan.com
mail2superstar.com
mail2surfer.com
mail2suriname.com
mail2susan.com
mail2suzie.com
mail2swaziland.com
mail2sweden.com
mail2sweetheart.com
mail2swim.com
mail2swimmer.com
mail2swiss.com
mail2switzerland.com
mail2sydney.com
mail2sylvia.com
mail2syria.com
mail2taboo.com
mail2taiwan.com
mail2tajikistan.com
mail2tammy.com
mail2tango.com
mail2tanya.com
mail2tanzania.com
mail2tara.com
mail2taurus.com
mail2taxi.com
mail2taxidermist.com
mail2taylor.com
mail2taz.com
mail2teacher.com
mail2technician.com
mail2ted.com
mail2telephone.com
mail2teletubbie.com
mail2tenderness.com
mail2tennessee.com
mail2tennis.com
mail2tennisfan.com
mail2terri.com
mail2terry.com
mail2test.com
mail2texas.com
mail2thailand.com
mail2therapy.com
mail2think.com
mail2tickets.com
mail2tiffany.com
mail2tim.com
mail2time.com
mail2timothy.com
mail2tina.com
mail2titanic.com
mail2toby.com
mail2todd.com
mail2togo.com
mail2tom.com
mail2tommy.com
mail2tonga.com
mail2tony.com
mail2touch.com
mail2tourist.com
mail2tracey.com
mail2tracy.com
mail2tramp.com
mail2travel.com
mail2traveler.com
mail2travis.com
mail2trekkie.com
mail2trex.com
mail2triallawyer.com
mail2trick.com
mail2trillionaire.com
mail2troy.com
mail2truck.com
mail2trump.com
mail2try.com
mail2tunisia.com
mail2turbo.com
mail2turkey.com
mail2turkmenistan.com
mail2tv.com
mail2tycoon.com
mail2tyler.com
mail2u4me.com
mail2uae.com
mail2uganda.com
mail2uk.com
mail2ukraine.com
mail2uncle.com
mail2unsubscribe.com
mail2uptown.com
mail2uruguay.com
mail2usa.com
mail2utah.com
mail2uzbekistan.com
mail2v.com
mail2vacation.com
mail2valentines.com
mail2valerie.com
mail2valley.com
mail2vamoose.com
mail2vanessa.com
mail2vanuatu.com
mail2venezuela.com
mail2venous.com
mail2venus.com
mail2vermont.com
mail2vickie.com
mail2victor.com
mail2victoria.com
mail2vienna.com
mail2vietnam.com
mail2vince.com
mail2virginia.com
mail2virgo.com
mail2visionary.com
mail2vodka.com
mail2volleyball.com
mail2waiter.com
mail2wallstreet.com
mail2wally.com
mail2walter.com
mail2warren.com
mail2washington.com
mail2wave.com
mail2way.com
mail2waycool.com
mail2wayne.com
mail2webmaster.com
mail2webtop.com
mail2webtv.com
mail2weird.com
mail2wendell.com
mail2wendy.com
mail2westend.com
mail2westvirginia.com
mail2whether.com
mail2whip.com
mail2white.com
mail2whitehouse.com
mail2whitney.com
mail2why.com
mail2wilbur.com
mail2wild.com
mail2willard.com
mail2willie.com
mail2wine.com
mail2winner.com
mail2wired.com
mail2wisconsin.com
mail2woman.com
mail2wonder.com
mail2world.com
mail2worship.com
mail2wow.com
mail2www.com
mail2wyoming.com
mail2xfiles.com
mail2xox.com
mail2yachtclub.com
mail2yahalla.com
mail2yemen.com
mail2yes.com
mail2yugoslavia.com
mail2zack.com
mail2zambia.com
mail2zenith.com
mail2zephir.com
mail2zeus.com
mail2zipper.com
mail2zoo.com
mail2zoologist.com
mail2zurich.com
mail3000.com
mail333.com
mail4trash.com
mail4u.info
mailandftp.com
mailandnews.com
mailas.com
mailasia.com
mailbolt.com
mailbomb.net
mailboom.com
mailbox.as
mailbox.co.za
mailbox.gr
mailbox.hu
mailbox72.biz
mailbox80.biz
mailbr.com.br
mailc.net
mailcan.com
mailcat.biz
mailcc.com
mailchoose.co
mailcity.com
mailclub.fr
mailclub.net
maildrop.cc
maildrop.gq
maildx.com
mailed.ro
mailexcite.com
mailfa.tk
mailfence.com
mailforce.net
mailforspam.com
mailfree.gq
mailfs.com
mailftp.com
mailgenie.net
mailguard.me
mailhaven.com
mailhood.com
mailimate.com
mailinator.com
mailinator.org
mailinator.us
mailinblack.com
mailingaddress.org
mailingweb.com
mailisent.com
mailismagic.com
mailite.com
mailmate.com
mailme.dk
mailme.gq
mailme24.com
mailmight.com
mailmij.nl
mailnator.com
mailnew.com
mailops.com
mailoye.com
mailpanda.com
mailpick.biz
mailpokemon.com
mailpost.zzn.com
mailpride.com
mailproxsy.com
mailpuppy.com
mailquack.com
mailrock.biz
mailroom.com
mailru.com
mailsac.com
mailseal.de
mailsent.net
mailservice.ms
mailshuttle.com
mailslapping.com
mailstart.com
mailstartplus.com
mailsurf.com
mailtag.com
mailtemp.info
mailto.de
mailtothis.com
mailueberfall.de
mailup.net
mailwire.com
mailworks.org
mailzi.ru
mailzilla.org
maktoob.com
malayalamtelevision.net
maltesemail.com
mamber.net
manager.de
mancity.net
mantrafreenet.com
mantramail.com
mantraonline.com
manybrain.com
marchmail.com
mariahc.com
marijuana.com
marijuana.nl
married-not.com
marsattack.com
martindalemail.com
mash4077.com
masrawy.com
matmail.com
mauimail.com
mauritius.com
maxleft.com
maxmail.co.uk
mbox.com.au
mchsi.com
me-mail.hu
me.com
medical.net.au
medscape.com
meetingmall.com
megago.com
megamail.pt
megapoint.com
mehrani.com
mehtaweb.com
meine-dateien.info
meine-diashow.de
meine-fotos.info
meine-urlaubsfotos.de
mekhong.com
melodymail.com
meloo.com
merda.flu.cc
merda.igg.biz
merda.nut.cc
merda.usa.cc
message.hu
message.myspace.com
messages.to
metacrawler.com
metalfan.com
metaping.com
metta.lk
mexicomail.com
mezimages.net
mfsa.ru
mierdamail.com
miesto.sk
mighty.co.za
migmail.net
migmail.pl
migumail.com
miho-nakayama.com
mikrotamanet.com
millionaireintraining.com
millionairemail.com
milmail.com
mindless.com
mindspring.com
minister.com
misery.net
mittalweb.com
mixmail.com
mjfrogmail.com
ml1.net
mlb.bounce.ed10.net
mm.st
mns.ru
moakt.com
mobilbatam.com
mobileninja.co.uk
mochamail.com
mohammed.com
mohmal.com
moldova.cc
moldova.com
moldovacc.com
momslife.com
monemail.com
money.net
montevideo.com.uy
monumentmail.com
moonman.com
moose-mail.com
mor19.uu.gl
mortaza.com
mosaicfx.com
moscowmail.com
most-wanted.com
mostlysunny.com
motormania.com
movemail.com
movieluver.com
mox.pp.ua
mp4.it
mr-potatohead.com
mscold.com
msgbox.com
msn.cn
msn.com
msn.nl
mt2015.com
mt2016.com
mttestdriver.com
muehlacker.tk
muell.icu
muellmail.com
muellemail.com
mundomail.net
munich.com
music.com
musician.org
musicscene.org
muskelshirt.de
muslim.com
muslimsonline.com
mutantweb.com
mvrht.com
my.com
my10minutemail.com
mybox.it
mycabin.com
mycity.com
mycool.com
mydomain.com
mydotcomaddress.com
myfamily.com
myfastmail.com
mygo.com
myiris.com
mymacmail.com
mynamedot.com
mynet.com
mynetaddress.com
mynetstore.de
myownemail.com
myownfriends.com
mypacks.net
mypad.com
mypersonalemail.com
myplace.com
myrambler.ru
myrealbox.com
myremarq.com
myself.com
myspaceinc.net
myspamless.com
mystupidjob.com
mytemp.email
mythirdage.com
myway.com
myworldmail.com
n2.com
n2baseball.com
n2business.com
n2mail.com
n2soccer.com
n2software.com
nabc.biz
nafe.com
nagpal.net
nakedgreens.com
name.com
nameplanet.com
nandomail.com
naplesnews.net
naseej.com
nativestar.net
nativeweb.net
naui.net
naver.com
navigator.lv
navy.org
naz.com
nc.rr.com
nchoicemail.com
neeva.net
nemra1.com
nenter.com
neo.rr.com
nervhq.org
net-c.be
net-c.ca
net-c.cat
net-c.com
net-c.es
net-c.fr
net-c.it
net-c.lu
net-c.nl
net-c.pl
net-pager.net
net-shopping.com
net4b.pt
net4you.at
netbounce.com
netbroadcaster.com
netby.dk
netc.eu
netc.fr
netc.it
netc.lu
netc.pl
netcenter-vn.net
netcmail.com
netcourrier.com
netexecutive.com
netexpressway.com
netgenie.com
netian.com
netizen.com.ar
netlane.com
netlimit.com
netmongol.com
netnet.com.sg
netnoir.net
netpiper.com
netposta.net
netralink.com
netscape.net
netscapeonline.co.uk
netspace.net.au
netspeedway.com
netsquare.com
netster.com
nettaxi.com
nettemail.com
netterchef.de
netti.fi
netzero.com
netzero.net
netzidiot.de
neue-dateien.de
neuro.md
newmail.com
newmail.net
newmail.ru
newsboysmail.com
newyork.com
nextmail.ru
nexxmail.com
nfmail.com
nicebush.com
nicegal.com
nicholastse.net
nicolastse.com
nightmail.com
nikopage.com
nimail.com
ninfan.com
nirvanafan.com
nmail.cf
noavar.com
nonpartisan.com
nonspam.eu
nonspammer.de
norika-fujiwara.com
norikomail.com
northgates.net
nospammail.net
nospamthanks.info
nowhere.org
ntelos.net
ntlhelp.net
ntlworld.com
ntscan.com
null.net
nullbox.info
nur-fuer-spam.de
nus.edu.sg
nwldx.com
nwytg.net
nxt.ru
ny.com
nybella.com
nyc.com
nycmail.com
nzoomail.com
o-tay.com
o2.co.uk
oaklandas-fan.com
oath.com
oceanfree.net
odaymail.com
oddpost.com
odmail.com
office-dateien.de
office-email.com
offroadwarrior.com
oicexchange.com
oida.icu
oikrach.com
okbank.com
okhuman.com
okmad.com
okmagic.com
okname.net
okuk.com
oldies104mail.com
ole.com
olemail.com
olympist.net
olypmall.ru
omaninfo.com
omen.ru
onebox.com
onenet.com.ar
oneoffmail.com
onet.com.pl
onet.eu
onet.pl
oninet.pt
online.ie
online.ms
online.nl
onlinewiz.com
onmilwaukee.com
onobox.com
op.pl
opayq.com
openmailbox.org
operafan.com
operamail.com
opoczta.pl
optician.com
optonline.net
optusnet.com.au
orange.fr
orbitel.bg
orgmail.net
orthodontist.net
osite.com.br
oso.com
otakumail.com
our-computer.com
our-office.com
our.st
ourbrisbane.com
ourklips.com
ournet.md
outgun.com
outlawspam.com
outlook.at
outlook.be
outlook.cl
outlook.co.id
outlook.co.il
outlook.co.nz
outlook.co.th
outlook.com
outlook.com.au
outlook.com.br
outlook.com.gr
outlook.com.pe
outlook.com.tr
outlook.com.vn
outlook.cz
outlook.de
outlook.dk
outlook.es
outlook.fr
outlook.hu
outlook.ie
outlook.in
outlook.it
outlook.jp
outlook.kr
outlook.lv
outlook.my
outlook.nl
outlook.ph
outlook.pt
outlook.sa
outlook.sg
outlook.sk
over-the-rainbow.com
ownmail.net
ozbytes.net.au
ozemail.com.au
pacbell.net
pacific-ocean.com
pacific-re.com
pacificwest.com
packersfan.com
pagina.de
pagons.org
pakistanmail.com
pakistanoye.com
palestinemail.com
pandora.be
papierkorb.me
parkjiyoon.com
parsmail.com
partlycloudy.com
partybombe.de
partyheld.de
partynight.at
parvazi.com
passwordmail.com
pathfindermail.com
pconnections.net
pcpostal.com
pcsrock.com
pcusers.otherinbox.com
pediatrician.com
penpen.com
peoplepc.com
peopleweb.com
pepbot.com
perfectmail.com
perso.be
personal.ro
personales.com
petlover.com
petml.com
pettypool.com
pezeshkpour.com
pfui.ru
phayze.com
phone.net
photo-impact.eu
photographer.net
phpbb.uu.gl
phreaker.net
phus8kajuspa.cu.cc
physicist.net
pianomail.com
pickupman.com
picusnet.com
pigpig.net
pinoymail.com
piracha.net
pisem.net
pjjkp.com
planet.nl
planetaccess.com
planetarymotion.net
planetearthinter.net
planetmail.com
planetmail.net
planetout.com
plasa.com
playersodds.com
playful.com
playstation.sony.com
plus.com
plus.google.com
plusmail.com.br
pmail.net
pobox.hu
pobox.sk
pochta.ru
poczta.fm
poczta.onet.pl
poetic.com
pokemail.net
pokemonpost.com
pokepost.com
polandmail.com
polbox.com
policeoffice.com
politician.com
polizisten-duzer.de
polyfaust.com
pool-sharks.com
poond.com
popaccount.com
popmail.com
popsmail.com
popstar.com
portugalmail.com
portugalmail.pt
portugalnet.com
positive-thinking.com
post.com
post.cz
post.sk
posta.ro
postaccesslite.com
postafree.com
postaweb.com
posteo.at
posteo.be
posteo.ch
posteo.cl
posteo.co
posteo.de
posteo.dk
posteo.es
posteo.gl
posteo.net
posteo.no
posteo.us
postfach.cc
postinbox.com
postino.ch
postmark.net
postmaster.co.uk
postmaster.twitter.com
postpro.net
pousa.com
powerfan.com
pp.inet.fi
praize.com
premium-mail.fr
premiumservice.com
presidency.com
press.co.jp
priest.com
primposta.com
primposta.hu
privy-mail.com
privymail.de
pro.hu
probemail.com
prodigy.net
progetplus.it
programist.ru
programmer.net
programozo.hu
proinbox.com
project2k.com
promessage.com
prontomail.com
protestant.com
protonmail.com
prydirect.info
psv-supporter.com
ptd.net
public-files.de
public.usa.com
publicist.com
pulp-fiction.com
punkass.com
purpleturtle.com
put2.net
pwrby.com
q.com
qatarmail.com
qmail.com
qprfans.com
qq.com
qrio.com
quackquack.com
quakemail.com
qualityservice.com
quantentunnel.de
qudsmail.com
quepasa.com
quickhosts.com
quickmail.nl
quicknet.nl
quickwebmail.com
quiklinks.com
quikmail.com
qv7.info
qwest.net
qwestoffice.net
r-o-o-t.com
raakim.com
racedriver.com
racefanz.com
racingfan.com.au
racingmail.com
radicalz.com
radiku.ye.vc
radiologist.net
ragingbull.com
ralib.com
rambler.ru
ranmamail.com
rastogi.net
ratt-n-roll.com
rattle-snake.com
raubtierbaendiger.de
ravearena.com
ravemail.com
razormail.com
rccgmail.org
rcn.com
realemail.net
reality-concept.club
reallyfast.biz
reallyfast.info
reallymymail.com
realradiomail.com
realtyagent.com
reborn.com
reconmail.com
recycler.com
recyclermail.com
rediff.com
rediffmail.com
rediffmailpro.com
rednecks.com
redseven.de
redsfans.com
regbypass.com
reggaefan.com
registerednurses.com
regspaces.tk
reincarnate.com
religious.com
remail.ga
renren.com
repairman.com
reply.hu
reply.ticketmaster.com
representative.com
rescueteam.com
resgedvgfed.tk
resource.calendar.google.com
resumemail.com
rezai.com
rhyta.com
richmondhill.com
rickymail.com
rin.ru
riopreto.com.br
rklips.com
rn.com
ro.ru
roadrunner.com
roanokemail.com
rock.com
rocketmail.com
rocketship.com
rockfan.com
rodrun.com
rogers.com
rome.com
roosh.com
rootprompt.org
roughnet.com
royal.net
rr.com
rrohio.com
rsub.com
rubyridge.com
runbox.com
rushpost.com
ruttolibero.com
rvshop.com
s-mail.com
sabreshockey.com
sacbeemail.com
saeuferleber.de
safe-mail.net
safrica.com
sagra.lu
sags-per-mail.de
sailormoon.com
saintly.com
saintmail.net
sale-sale-sale.com
salehi.net
salesperson.net
samerica.com
samilan.net
sammimail.com
sandelf.de
sanfranmail.com
sanook.com
sapo.pt
saudia.com
savelife.ml
sayhi.net
saynotospams.com
sbcglbal.net
sbcglobal.com
sbcglobal.net
scandalmail.com
scarlet.nl
schafmail.de
schizo.com
schmusemail.de
schoolemail.com
schoolmail.com
schoolsucks.com
schreib-doch-mal-wieder.de
schweiz.org
sci.fi
scientist.com
scifianime.com
scotland.com
scotlandmail.com
scottishmail.co.uk
scottsboro.org
scubadiving.com
seanet.com
search.ua
searchwales.com
sebil.com
seckinmail.com
secret-police.com
secretary.net
secretservices.net
secure-mail.biz
secure-mail.cc
seductive.com
seekstoyboy.com
seguros.com.br
selfdestructingmail.com
send.hu
sendme.cz
sendspamhere.com
sent.as
sent.at
sent.com
sentrismail.com
serga.com.ar
servemymail.com
servermaps.net
sesmail.com
sexmagnet.com
seznam.cz
shahweb.net
shaniastuff.com
shared-files.de
sharedmailbox.org
sharmaweb.com
shaw.ca
she.com
shieldedmail.com
shinedyoureyes.com
shitaway.cf
shitaway.cu.cc
shitaway.ga
shitaway.gq
shitaway.ml
shitaway.tk
shitaway.usa.cc
shitmail.de
shitmail.org
shitware.nl
shockinmytown.cu.cc
shootmail.com
shortmail.com
shotgun.hu
showslow.de
shuf.com
sialkotcity.com
sialkotian.com
sialkotoye.com
sify.com
silkroad.net
sina.cn
sina.com
sinamail.com
singapore.com
singles4jesus.com
singmail.com
singnet.com.sg
sinnlos-mail.de
siteposter.net
skafan.com
skeefmail.com
skim.com
skizo.hu
skrx.tk
sky.com
skynet.be
slamdunkfan.com
slave-auctions.net
slingshot.com
slippery.email
slipry.net
slo.net
slotter.com
smap.4nmv.ru
smapxsmap.net
smashmail.de
smellrear.com
smileyface.comsmithemail.net
smoothmail.com
sms.at
snail-mail.net
snakebite.com
snakemail.com
sndt.net
sneakemail.com
snet.net
sniper.hu
snkmail.com
snoopymail.com
snowboarding.com
snowdonia.net
socamail.com
socceramerica.net
soccermail.com
soccermomz.com
social-mailer.tk
socialworker.net
sociologist.com
sofort-mail.de
sofortmail.de
softhome.net
sogou.com
sohu.com
sol.dk
solar-impact.pro
solcon.nl
soldier.hu
solution4u.com
solvemail.info
songwriter.net
sonnenkinder.org
soodomail.com
soon.com
soulfoodcookbook.com
sp.nl
space-bank.com
space-man.com
space-ship.com
space-travel.com
space.com
spacemart.com
spacetowns.com
spacewar.com
spainmail.com
spam.2012-2016.ru
spamavert.com
spambob.com
spambob.org
spambog.net
spambooger.com
spam.care
spamcero.com
spamdecoy.net
spameater.com
spameater.org
spamex.com
spamfree24.info
spamfree24.net
spamgoes.in
spaminator.de
spamkill.info
spaml.com
spamoff.de
spamstack.net
spartapiet.com
spazmail.com
speedemail.net
speedpost.net
speedrules.com
speedrulz.com
speedymail.org
sperke.net
spils.com
spinfinder.com
spl.at
spoko.pl
spoofmail.de
sportemail.com
sportsmail.com
sporttruckdriver.com
spray.no
spray.se
spybox.de
spymac.com
sraka.xyz
srilankan.net
ssl-mail.com
st-davids.net
stade.fr
stalag13.com
stargateradio.com
starmail.com
starmail.org
starmedia.com
starplace.com
starspath.com
start.com.au
startkeys.com
stinkefinger.net
stipte.nl
stoned.com
stones.com
stop-my-spam.pp.ua
stopdropandroll.com
storksite.com
streber24.de
streetwisemail.com
stribmail.com
strompost.com
strongguy.com
student.su
studentcenter.org
stuffmail.de
subram.com
sudanmail.net
sudolife.me
sudolife.net
sudomail.biz
sudomail.com
sudomail.net
sudoverse.com
sudoverse.net
sudoweb.net
sudoworld.com
sudoworld.net
suhabi.com
suisse.org
sukhumvit.net
sunpoint.net
sunrise-sunset.com
sunsgame.com
sunumail.sn
suomi24.fi
superdada.com
supereva.it
supermail.ru
superrito.com
superstachel.de
surat.com
surf3.net
surfree.com
surfy.net
surgical.net
surimail.com
survivormail.com
susi.ml
svk.jp
swbell.net
sweb.cz
swedenmail.com
sweetville.net
sweetxxx.de
swift-mail.com
swiftdesk.com
swingeasyhithard.com
swingfan.com
swipermail.zzn.com
swirve.com
swissinfo.org
swissmail.com
swissmail.net
switchboardmail.com
switzerland.org
sx172.com
syom.com
syriamail.com
t-online.de
t.psh.me
t2mail.com
tafmail.com
takuyakimura.com
talk21.com
talkcity.com
talkinator.com
tamil.com
tampabay.rr.com
tankpolice.com
tatanova.com
tbwt.com
tcc.on.ca
tds.net
teachermail.net
teachers.org
teamdiscovery.com
teamtulsa.net
tech-center.com
tech4peace.org
techemail.com
techie.com
technisamail.co.za
technologist.com
techscout.com
techspot.com
teenagedirtbag.com
tele2.nl
telebot.com
telefonica.net
teleline.es
telenet.be
telepac.pt
telerymd.com
teleworm.us
telfort.nl
telfortglasvezel.nl
telinco.net
telkom.net
telpage.net
telstra.com
telstra.com.au
temp-mail.com
temp-mail.de
temp.headstrong.de
tempail.com
tempemail.biz
tempmail.us
tempmail2.com
tempmaildemo.com
tempmailer.com
temporarioemail.com.br
temporaryemail.us
tempthe.net
tempymail.com
temtulsa.net
tenchiclub.com
tenderkiss.com
tennismail.com
terminverpennt.de
terra.cl
terra.com
terra.com.ar
terra.com.br
terra.es
test.com
test.de
tfanus.com.er
tfz.net
thai.com
thaimail.com
thaimail.net
thanksnospam.info
the-african.com
the-airforce.com
the-aliens.com
the-american.com
the-animal.com
the-army.com
the-astronaut.com
the-beauty.com
the-big-apple.com
the-biker.com
the-boss.com
the-brazilian.com
the-canadian.com
the-canuck.com
the-captain.com
the-chinese.com
the-country.com
the-cowboy.com
the-davis-home.com
the-dutchman.com
the-eagles.com
the-englishman.com
the-fastest.net
the-fool.com
the-frenchman.com
the-galaxy.net
the-genius.com
the-gentleman.com
the-german.com
the-gremlin.com
the-hooligan.com
the-italian.com
the-japanese.com
the-lair.com
the-madman.com
the-mailinglist.com
the-marine.com
the-master.com
the-mexican.com
the-ministry.com
the-monkey.com
the-newsletter.net
the-pentagon.com
the-police.com
the-prayer.com
the-professional.com
the-quickest.com
the-russian.com
the-snake.com
the-spaceman.com
the-stock-market.com
the-student.net
the-whitehouse.net
the-wild-west.com
the18th.com
thecoolguy.com
thecriminals.com
thedoghousemail.com
thedorm.com
theend.hu
theglobe.com
thegolfcourse.com
thegooner.com
theheadoffice.com
theinternetemail.com
thelanddownunder.com
themail.com
themillionare.net
theoffice.net
theplate.com
thepokerface.com
thepostmaster.net
theraces.com
theracetrack.com
therapist.net
thestreetfighter.com
theteebox.com
thewatercooler.com
thewebpros.co.uk
thewizzard.com
thewizzkid.com
thezhangs.net
thirdage.com
thisgirl.com
thraml.com
throwam.com
thundermail.com
tidni.com
timein.net
tiscali.at
tiscali.be
tiscali.co.uk
tiscali.it
tiscali.lu
tiscali.se
tkcity.com
tmail.ws
toast.com
toke.com
tom.com
toolsource.com
toomail.biz
toothfairy.com
topchat.com
topgamers.co.uk
topletter.com
topmail-files.de
topmail.com.ar
topsurf.com
torchmail.com
torontomail.com
tortenboxer.de
totalmail.de
totalmusic.net
townisp.com
tpg.com.au
trash-amil.com
trash-mail.ga
trash-mail.ml
trash2010.com
trash2011.com
trashdevil.de
trashymail.net
travel.li
trayna.com
trialbytrivia.com
trickmail.net
trimix.cn
tritium.net
trmailbox.com
tropicalstorm.com
truckerz.com
truckracer.com
truckracers.com
trust-me.com
truthmail.com
tsamail.co.za
ttml.co.in
tunisiamail.com
turboprinz.de
turboprinzessin.de
turkey.com
turual.com
tut.by
tvstar.com
twc.com
twcny.com
twinstarsmail.com
tx.rr.com
tycoonmail.com
typemail.com
u14269.ml
u2club.com
ua.fm
uae.ac
uaemail.com
ubbi.com
ubbi.com.br
uboot.com
uk2.net
uk2k.com
uk2net.com
uk7.net
uk8.net
ukbuilder.com
ukcool.com
ukdreamcast.com
ukmail.org
ukmax.com
ukr.net
uku.co.uk
ultra.fyi
ultapulta.com
ultrapostman.com
ummah.org
umpire.com
unbounded.com
unforgettable.com
uni.de
unican.es
unihome.com
unitybox.de
universal.pt
uno.ee
uno.it
unofree.it
unterderbruecke.de
uol.com.ar
uol.com.br
uol.com.co
uol.com.mx
uol.com.ve
uole.com
uole.com.ve
uolmail.com
uomail.com
upc.nl
upcmail.nl
upf.org
uplipht.com
ureach.com
urgentmail.biz
urhen.com
uroid.com
usa.com
usa.net
usaaccess.net
usanetmail.com
used-product.fr
usermail.com
username.e4ward.com
usma.net
usmc.net
uswestmail.net
uymail.com
uyuyuy.com
v-sexi.com
vaasfc4.tk
vahoo.com
valemail.net
vampirehunter.com
varbizmail.com
vcmail.com
velnet.co.uk
velocall.com
verizon.net
verizonmail.com
verlass-mich-nicht.de
versatel.nl
veryfast.biz
veryrealemail.com
veryspeedy.net
vfemail.net
vickaentb.tk
videotron.ca
viditag.com
viewcastmedia.com
viewcastmedia.net
vinbazar.com
violinmakers.co.uk
vip.126.com
vip.21cn.com
vip.citiz.net
vip.gr
vip.onet.pl
vip.qq.com
vip.sina.com
vipmail.ru
virgilio.it
virgin.net
virginbroadband.com.au
virginmedia.com
virtualmail.com
visitmail.com
visitweb.com
visto.com
visualcities.com
vivavelocity.com
vivianhsu.net
vjtimail.com
vkcode.ru
vnet.citiz.net
vnn.vn
vodafone.nl
vodafonethuis.nl
volcanomail.com
vollbio.de
volloeko.de
vomoto.com
vorsicht-bissig.de
vorsicht-scharf.de
vote-democrats.com
vote-hillary.com
vote-republicans.com
vote4gop.org
votenet.com
vp.pl
vr9.com
vubby.com
w3.to
wahoye.com
walala.org
wales2000.net
walkmail.net
walkmail.ru
wam.co.za
wanadoo.es
wanadoo.fr
war-im-urlaub.de
warmmail.com
warpmail.net
warrior.hu
waumail.com
wazabi.club
wbdet.com
wearab.net
web-contact.info
web-emailbox.eu
web-ideal.fr
web-mail.com.ar
web-mail.pp.ua
web-police.com
web.de
webave.com
webcammail.com
webcity.ca
webcontact-france.eu
webdream.com
webindia123.com
webjump.com
webm4il.info
webmail.co.yu
webmail.co.za
webmail.hu
webmails.com
webname.com
webprogramming.com
webstation.com
websurfer.co.za
webtopmail.com
webuser.in
wee.my
weedmail.com
weekmail.com
weekonline.com
wefjo.grn.cc
weg-werf-email.de
wegas.ru
wegwerf-emails.de
wegwerfmail.info
wegwerpmailadres.nl
wehshee.com
weibsvolk.de
weibsvolk.org
weinenvorglueck.de
welsh-lady.com
westnet.com
westnet.com.au
wetrainbayarea.com
wfgdfhj.tk
whale-mail.com
whartontx.com
whatiaas.com
whatpaas.com
wheelweb.com
whipmail.com
whoever.com
whoopymail.com
whtjddn.33mail.com
wi.rr.com
wi.twcbc.com
wickmail.net
wideopenwest.com
wildmail.com
wilemail.com
will-hier-weg.de
windowslive.com
windrivers.net
windstream.net
wingnutz.com
winmail.com.au
winning.com
wir-haben-nachwuchs.de
wir-sind-cool.org
wirsindcool.de
witty.com
wiz.cc
wkbwmail.com
wmail.cf
wo.com.cn
woh.rr.com
wolf-web.com
wolke7.net
wollan.info
wombles.com
women-at-work.org
wongfaye.com
wooow.it
worker.com
workmail.com
worldemail.com
worldnet.att.net
wormseo.cn
wosaddict.com
wouldilie.com
wovz.cu.cc
wowgirl.com
wowmail.com
wowway.com
wp.pl
wptamail.com
wrexham.net
writeme.com
writemeback.com
wrongmail.com
wtvhmail.com
wwdg.com
www.com
www.e4ward.com
www2000.net
wx88.net
wxs.net
x-mail.net
x-networks.net
x5g.com
xagloo.com
xaker.ru
xing886.uu.gl
xmastime.com
xms.nl
xmsg.com
xoom.com
xoxox.cc
xpressmail.zzn.com
xs4all.nl
xsecurity.org
xsmail.com
xtra.co.nz
xuno.com
xww.ro
xy9ce.tk
xyzfree.net
xzapmail.com
y7mail.com
ya.ru
yada-yada.com
yaho.com
yahoo.ae
yahoo.at
yahoo.be
yahoo.ca
yahoo.ch
yahoo.cn
yahoo.co
yahoo.co.id
yahoo.co.il
yahoo.co.in
yahoo.co.jp
yahoo.co.kr
yahoo.co.nz
yahoo.co.th
yahoo.co.uk
yahoo.co.za
yahoo.com
yahoo.com.ar
yahoo.com.au
yahoo.com.br
yahoo.com.cn
yahoo.com.co
yahoo.com.hk
yahoo.com.is
yahoo.com.mx
yahoo.com.my
yahoo.com.ph
yahoo.com.ru
yahoo.com.sg
yahoo.com.tr
yahoo.com.tw
yahoo.com.vn
yahoo.cz
yahoo.de
yahoo.dk
yahoo.es
yahoo.fi
yahoo.fr
yahoo.gr
yahoo.hu
yahoo.ie
yahoo.in
yahoo.it
yahoo.jp
yahoo.nl
yahoo.no
yahoo.pl
yahoo.pt
yahoo.ro
yahoo.ru
yahoo.se
yahoofs.com
yalla.com
yalla.com.lb
yalook.com
yam.com
yandex.com
yandex.pl
yandex.ru
yandex.ua
yapost.com
yapped.net
yawmail.com
yeah.net
yebox.com
yehey.com
yemenmail.com
yepmail.net
yert.ye.vc
yesey.net
yifan.net
ymail.com
yogotemail.com
yomail.info
yopmail.com
yopmail.pp.ua
yopolis.com
yopweb.com
youareadork.com
youmailr.com
your-house.com
your-mail.com
yourinbox.com
yourlifesucks.cu.cc
yourlover.net
yourname.freeservers.com
yournightmare.com
yours.com
yourssincerely.com
yoursubdomain.zzn.com
yourteacher.net
yourwap.com
yuuhuu.net
yyhmail.com
z1p.biz
za.com
zahadum.com
zaktouni.fr
zeepost.nl
zetmail.com
zhaowei.net
zhouemail.510520.org
ziggo.nl
zionweb.org
zip.net
zipido.com
ziplip.com
zipmail.com
zipmail.com.br
zipmax.com
zmail.ru
zoemail.com
zoemail.org
zoho.com
zohomail.com
zomg.info
zonnet.nl
zoominternet.net
zubee.com
zuvio.com
zuzzurello.com
zwallet.com
zweb.in
zxcv.com
zxcvbnm.com
zybermail.com
zydecofan.com
zzn.com
zzom.co.uk
zzz.com
zzz.pl
```

## Additional resources

To ask questions or get additional help with this issue, see [requesting support](/support/request).

---

# Source: https://docs.unstructured.io/open-source/ingestion/ingest-configuration/embedding-configuration.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Embedding configuration

A common embedding configuration is a critical component that allows for dynamic selection of embedders and their associated parameters to create vectors from data. This configuration provides the flexibility to choose from various embedding models and fine-tune parameters to optimize the quality and characteristics of the resulting vectors. It enables users to tailor the embedding process to the specific needs of their data and downstream applications, ensuring that the generated vectors effectively capture semantic relationships and contextual information within the dataset.

The core [Unstructured](https://github.com/Unstructured-IO/unstructured) open source library does not enable the generation of embeddings by default. (However you can generate embeddings as a separate step manually.
[Learn how](/open-source/core-functionality/embedding).)

You can configure the Unstructured CLI and Unstructured Ingest Python library to generate embeddings by specifying the
`embedding_provider`, `embedding_api_key`, and `embedding_model_name` options (and, for Amazon Bedrock, additional options), as follows.
You must provide your own API key for the specified embedding provider. To get this API key, you must first create an account with that
provider and set up billing directly with them. You are responsible for all costs associated with using that provider.

Calls to the Unstructured CLI or Unstructured Ingest Python library that are routed to Unstructured's software-as-a-service (SaaS)
for processing (for example, by specifying an Unstructured API key and an Unstructured SaaS URL) require an Unstructured account for billing purposes. Unstructured's costs for generating
embeddings are already included in its account pricing plans. To
generate embeddings, you must specify the `embedding_provider` and `embedding_model_name` configuration options as follows. Unstructured uses its own internal API key
when using the specified provider to generate the embeddings. These
`embedding_provider` and `embedding_model_name` options are limited only to the provider and model names that are supported by the Unstructured API.
[See the list of supported provider and model names](/api-reference/workflow/workflows#embedder-node). [Learn how to specify these options](/open-source/how-to/embedding).

## Configs

* `embedding_provider`: The embedding provider to use while doing embedding. Available values include `bedrock`, `azure-openai`, `huggingface`, `mixedbread-ai`, `octoai`, `openai`, `togetherai`, `vertexai`, and `voyageai`.

* `embedding_api_key`: The API key to use, if one is required to generate the embeddings through an API service, such as OpenAI.

* `embedding_aws_access_key_id`: The AWS access key ID to be used for AWS-based embedders, such as Amazon Bedrock.

* `embedding_aws_region`: The AWS Region ID to be used for AWS-based embedders, such as Amazon Bedrock.

* `embedding_aws_secret_access_key`: The AWS secret access key to be used for AWS-based embedders, such as Amazon Bedrock.

* `embedding_model_name`: The specific model to use for the embedding provider, if necessary.

The default `embedding_model_name` values unless otherwise specified are:

* `bedrock`: None

* `azure-openai`: `text-embedding-ada-002`, with 1536 dimensions

* `huggingface`: `sentence-transformers/all-MiniLM-L6-v2`, with 384 dimensions

* `mixedbread-ai`: `mixedbread-ai/mxbai-embed-large-v1`, with 1024 dimensions

* `octoai`: `thenlper/gte-large`, with 1024 dimensions

* `openai`: `text-embedding-ada-002`, with 1536 dimensions

* `togetherai`: `togethercomputer/m2-bert-80M-32k-retrieval`, with 768 dimensions

* `vertexai`: `text-embedding-05`, with 768 dimensions

* `voyageai`: None

---

# Source: https://docs.unstructured.io/ui/embedding.md

# Source: https://docs.unstructured.io/open-source/how-to/embedding.md

# Source: https://docs.unstructured.io/open-source/core-functionality/embedding.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Embedding

The Unstructured open-source library does not offer built-in support for calling embedding providers to obtain embeddings for pieces of text.

Alternatively, the [Unstructured Ingest CLI](/open-source/ingestion/overview#unstructured-ingest-cli) and the [Unstructured Ingest Python library](/open-source/ingestion/python-ingest)
offer built-in support for calling embedding providers as part of an ingest pipeline. [Learn how](/open-source/how-to/embedding).

Also, you can use common third-party tools and libraries to get embeddings for document elements' text within JSON files that are
produced by calling the Unstructured open-source library. For example, the following sample Python script:

1. Takes an Unstructured open-source library-generated JSON file as input.
2. Reads in the JSON file's contents as a JSON object.
3. Uses the [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)
   model on Hugging Face to generate embeddings for each `text` field of each document element in the JSON file.
4. Adds the generated embeddings next to each corresponding `text` field in the original JSON.
5. Saves the results back to the original JSON file.

```python Python theme={null}
# Filename: embeddings.py
# pip install langchain sentence_transformers

import sys
import json
from langchain_huggingface.embeddings import HuggingFaceEmbeddings

if __name__ == "__main__":
    embeddings = HuggingFaceEmbeddings(
        model_name="sentence-transformers/all-MiniLM-L6-v2"
    )
    
    # Get the JSON file's path.
    if len(sys.argv) < 2:
        print("Error: Specify the path to the input JSON file.")
        print("For example, 'python embeddings.py myfile.json'")
        sys.exit(1)

    file_path = sys.argv[1]
    
    try:
        # Get the JSON file's contents.
        with open(file_path, 'r') as file:
            file_elements = json.load(file)

        # Process each element in the JSON file.
        for element in file_elements:
            # Get the element's "text" field.
            text = element["text"]
            # Generate the embeddings for that "text" field.
            query_result = embeddings.embed_query(text)
            # Add the embeddings to that element as an "embeddings" field.
            element["embeddings"] = query_result

        # Save the updated JSON back into the original file.
        with open(file_path, 'w') as file:
            json.dump(file_elements, file, indent=2)

        print(f"Done! Updated JSON saved to '{file_path}'.")

    except FileNotFoundError:
        print(f"Error: File '{file_path}' not found.")
    except IOError:
        print(f"Error: Unable to access file '{file_path}'.")
```

## Additional resources

For information about how to use Python scripts to call various embedding providers, see for example:

* [Amazon Bedrock](https://docs.aws.amazon.com/code-library/latest/ug/python_3_bedrock-runtime_code_examples.html#amazon_titan_text_embeddings)
* [Hugging Face](https://huggingface.co/blog/getting-started-with-embeddings)
* [OctoAI](https://octo.ai/blog/introducing-octoais-embedding-api-to-power-your-rag-needs/)
* [OpenAI](https://platform.openai.com/docs/guides/embeddings)
* [together.ai](https://docs.together.ai/docs/embeddings-overview)
* [Vertex AI](https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/get-text-embeddings)
* [Voyage AI](https://docs.voyageai.com/docs/embeddings)

---

# Source: https://docs.unstructured.io/api-reference/workflow/errors.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Errors

For the [Unstructured Python SDK](/api-reference/workflow/overview#unstructured-python-sdk),
the [Unstructured API's workflow operations](/api-reference/workflow/overview) returns errors primarily through
the `UnstructuredClientError` class (the base class for all errors raised by the Unstructured Python SDK) and
the `HTTPValidationError` class (inherited from `UnstructuredClientError`). Less common errors are returned through the following classes:

* `httpx.RequestError`, the base class for request errors.
* `httpx.ConnectError`, for HTTP connection request errors.
* `httpx.TimeoutException`, for HTTP request timeout errors.
* `ServerError` (inherited from `UnstructuredClientError`), for server-side errors.
* `ResponseValidationError` (inherited from `UnstructuredClientError`), for type mismatches between the response data and the expected Pydantic model.

Each of the preceding classes has the following members:

| Member         | Type             | Description                                                        |
| -------------- | ---------------- | ------------------------------------------------------------------ |
| `message`      | `str`            | The eror message.                                                  |
| `status_code`  | `int`            | The HTTP response status code, for example `401`.                  |
| `headers`      | `httpx.Headers`  | A collection of HTTP response headers.                             |
| `body`         | `str`            | The HTTP body. This can be an empty string if no body is returned. |
| `raw_response` | `httpx.Response` | The raw HTTP response.                                             |

The following example shows how to handle the preceding errors. In this example,
a required Unstructured API key is intentionally commented out of the code, so that a
`401 Unauthorized` error is intentionally thrown.

```python  theme={null}
import os

from unstructured_client import UnstructuredClient
from unstructured_client.models.operations import ListWorkflowsRequest
from unstructured_client.models.errors import (
    UnstructuredClientError,
    HTTPValidationError
)
from unstructured_client.models.errors.servererror import ServerError
from unstructured_client.models.errors.responsevalidationerror import ResponseValidationError
import httpx

try:
    client = UnstructuredClient(
        # For example, intentionally leave out the API key to intentionally throw an error.
        # api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")
    )

    response = client.workflows.list_workflows(
        request=ListWorkflowsRequest()
    )
    
    print(f"Found {len(response.response_list_workflows)} workflows.")

except HTTPValidationError as e:
    print("Validation error (HTTP 422):", e)
except ServerError as e:
    print("Server error (HTTP 5XX):", e)
except ResponseValidationError as e:
    print("Response validation/type mismatch:", e)
except UnstructuredClientError as e:
    # This catches any other UnstructuredClientError not already caught above.
    # This and all of the other error classes in this example expose the following members:
    print("Other Unstructured client error:")
    print(f"Message:      {e.message}")
    print(f"Status code:  {e.status_code}")
    print(f"Body:         {e.body}")
    print(f"Raw response: {e.raw_response}")
    print(f"Headers:")

    for header in e.headers.raw:
        key = header[0].decode('utf-8')
        value = header[1].decode('utf-8')
        print(f"              {key}: {value}")

except httpx.ConnectError as e:
    print("HTTP connection error:", e)
except httpx.TimeoutException as e:
    print("HTTP timeout error:", e)
except httpx.RequestError as e:
    # This catches catch-all network errors from HTTP not already caught above.
    print("Other HTTPX request error:", e)
except Exception as e:
    # Optional: this catches any other unforeseen errors.
    print("Unexpected error:", e)
```

The results of running the preceding code are similar to the following:

```text  theme={null}
Message:      API error occurred: Status 401. Body: {"detail":"API key is missing, please provide an API key in the header."}
Status code:  401
Body:         {"detail":"API key is missing, please provide an API key in the header."}
Raw response: <Response [401 Unauthorized]>
Headers:
              date: <date-and-time-of-the-error>
              server: <server-identifier>
              content-length: 73
              content-type: application/json
```

---

# Source: https://docs.unstructured.io/ui/examples.md

# Source: https://docs.unstructured.io/open-source/how-to/examples.md

# Source: https://docs.unstructured.io/api-reference/legacy-api/partition/examples.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Examples

> This page provides some examples of accessing Unstructured Partition Endpoint via different methods.

<Note>
  The following information applies to the legacy Unstructured Partition Endpoint.

  Unstructured recommends that you use the
  [on-demand jobs](/api-reference/workflow/overview#run-an-on-demand-job) functionality in the
  [Unstructured API](/api-reference/overview) instead. Unstructured's on-demand jobs provide
  many benefits over the legacy Unstructured Partition Endpoint, including support for:

  * Production-level usage.
  * Multiple local input files in batches.
  * The latest and highest-performing models.
  * Post-transform enrichments.
  * All of Unstructured's chunking strategies.
  * The generation of vector embeddings.

  The Unstructured API also provides support for processing files and data in remote locations.
</Note>

To use these examples, you'll first need to set an environment variable named `UNSTRUCTURED_API_KEY`,
representing your Unstructured API key. [Get your API key](/api-reference/legacy-api/partition/overview).

Also, you'll need to set an environment variable named `UNSTRUCTURED_API_URL` to the
value of the Unstructured API URL for your account. This API URL was provided to you when your Unstructured account was created.
If you do not have this API URL, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

<Note>
  For the Unstructured Python SDK, the API URL is set to `https://api.unstructuredapp.io/general/v0/general` by default.
  However, you should always use the API URL that was provided to you when your Unstructured account was created.
</Note>

### Changing partition strategy for a PDF

Here's how you can modify partition strategy for a PDF file, and select an alternative model to use with Unstructured API.

<AccordionGroup>
  <Accordion title="POST">
    <UseIngestOrPlatformInstead />

    ```bash POST theme={null}
    curl -X 'POST' $UNSTRUCTURED_API_URL \
    -H 'accept: application/json' \
    -H 'Content-Type: multipart/form-data' \
    -H 'unstructured-api-key: $UNSTRUCTURED_API_KEY' \
    -F 'files=@sample-docs/layout-parser-paper.pdf' \
    -F 'strategy=vlm' \
    -F 'vlm_model_provider=openai' \
    -F 'vlm_model=gpt-4o'
    ```
  </Accordion>

  <Accordion title="Python SDK">
    <UseIngestOrPlatformInstead />

    ```python Python theme={null}
    import asyncio
    import os
    import json
    import unstructured_client
    from unstructured_client.models import shared

    client = unstructured_client.UnstructuredClient(
        api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")
    )

    async def call_api(filename, input_dir, output_dir):
        req = {
            "partition_parameters": {
                "files": {
                    "content": open(filename, "rb"),
                    "file_name": os.path.basename(filename),
                },
                "strategy": shared.Strategy.VLM,
                "vlm_model": "gpt-4o",
                "vlm_model_provider": "openai",
                "split_pdf_page": True,
                "split_pdf_allow_failed": True,
                "split_pdf_concurrency_level": 15
            }
        }

        try:
            res = await client.general.partition_async(
                request=req
            )
            element_dicts = [element for element in res.elements]
            json_elements = json.dumps(element_dicts, indent=2)

            # Create the output directory structure.
            relative_path = os.path.relpath(os.path.dirname(filename), input_dir)
            output_subdir = os.path.join(output_dir, relative_path)
            os.makedirs(output_subdir, exist_ok=True)

            # Write the output file.
            output_filename = os.path.join(output_subdir, os.path.basename(filename) + ".json")
            with open(output_filename, "w") as file:
                file.write(json_elements)

        except Exception as e:
            print(f"Error processing {filename}: {e}")

    async def process_files(input_directory, output_directory):
        tasks = []

        for root, _, files in os.walk(input_directory):
            for file in files:
                if not file.endswith('.json'):
                    full_path = os.path.join(root, file)
                    tasks.append(call_api(full_path, input_directory, output_directory))

        await asyncio.gather(*tasks)

    if __name__ == "__main__":
        asyncio.run(process_files(
            input_directory=os.getenv("LOCAL_FILE_INPUT_DIR"), 
            output_directory=os.getenv("LOCAL_FILE_OUTPUT_DIR")
        ))
    ```
  </Accordion>

  <Accordion title="JavaScript/TypeScript SDK">
    <UseIngestOrPlatformInstead />

    ```typescript TypeScript theme={null}
    import { UnstructuredClient } from "unstructured-client";
    import * as fs from "fs";
    import * as path from "path";
    import { Strategy } from "unstructured-client/sdk/models/shared/index.js";
    import { PartitionResponse } from "unstructured-client/sdk/models/operations";

    // Send all files in the source path to Unstructured for processing.
    // Send the processed data to the destination path.
    function processFiles(
        client: UnstructuredClient,
        sourcePath: string,
        destinationPath: string
    ): void {

        // If an output directory does not exist for the corresponding input
        // directory, then create it.
        if (!fs.existsSync(destinationPath)) {
            fs.mkdirSync(destinationPath, { recursive: true });
        }

        // Get all folders and files at the current level of the input directory.
        const items = fs.readdirSync(sourcePath);

        // For each folder and file in the input directory...
        for (const item of items) {
            const inputPath = path.join(sourcePath, item);
            const outputPath = path.join(destinationPath, item)

            // If it's a folder, call this function recursively.
            if (fs.statSync(inputPath).isDirectory()) {
                processFiles(client, inputPath, outputPath);
            } else {
                // If it's a file, send it to Unstructured for processing.
                const data = fs.readFileSync(inputPath);

                client.general.partition({
                    partitionParameters: {
                        files: {
                            content: data,
                            fileName: inputPath
                        },
                        strategy: Strategy.HiRes,
                        hiResModelName: "layout_v1.1.0", 
                        splitPdfPage: true,
                        splitPdfConcurrencyLevel: 15,
                        splitPdfAllowFailed: true
                    }
                }).then((res: PartitionResponse) => {
                    // If successfully processed, write the processed data to
                    // the destination directory.
                    if (res.statusCode == 200) {
                        const jsonElements = JSON.stringify(res, null, 2)
                        fs.writeFileSync(outputPath + ".json", jsonElements)
                    }
                }).catch((e) => {
                    if (e.statusCode) {
                        console.log(e.statusCode);
                        console.log(e.body);
                    } else {
                        console.log(e);
                    }
                });
            }
        }
    }

    const client = new UnstructuredClient({
        security: { apiKeyAuth: process.env.UNSTRUCTURED_API_KEY },
        serverURL: process.env.UNSTRUCTURED_API_URL
    });

    processFiles(
        client,
        process.env.LOCAL_FILE_INPUT_DIR,
        process.env.LOCAL_FILE_OUTPUT_DIR
    );
    ```
  </Accordion>
</AccordionGroup>

If you have a local deployment of the Unstructured API, you can use other supported models, such as `yolox`.

### Specifying the language of a document for better OCR results

For better OCR results, you can specify what languages your document is in using the `languages` parameter.
[View the list of available languages](https://github.com/tesseract-ocr/tessdata).

<AccordionGroup>
  <Accordion title="POST">
    <UseIngestOrPlatformInstead />

    ```bash POST theme={null}
    curl -X 'POST' $UNSTRUCTURED_API_URL \
    -H 'accept: application/json' \
    -H 'Content-Type: multipart/form-data' \
    -H 'unstructured-api-key: $UNSTRUCTURED_API_KEY' \
    -F 'files=@sample-docs/korean.png' \
    -F 'strategy=vlm' \
    -F 'vlm_model_provider=openai' \
    -F 'vlm_model=gpt-4o' \-F 'languages=kor'
    ```
  </Accordion>

  <Accordion title="Python SDK">
    <UseIngestOrPlatformInstead />

    ```python Python theme={null}
    import asyncio
    import os
    import json
    import unstructured_client
    from unstructured_client.models import shared

    client = unstructured_client.UnstructuredClient(
        api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")
    )

    async def call_api(filename, input_dir, output_dir):
        req = {
            "partition_parameters": {
                "files": {
                    "content": open(filename, "rb"),
                    "file_name": os.path.basename(filename),
                },
                "strategy": shared.Strategy.OCR_ONLY,
                "languages": ["kor"],
                "split_pdf_page": True,
                "split_pdf_allow_failed": True,
                "split_pdf_concurrency_level": 15
            }
        }

        try:
            res = await client.general.partition_async(
                request=req
            )
            element_dicts = [element for element in res.elements]
            json_elements = json.dumps(element_dicts, indent=2)

            # Create the output directory structure.
            relative_path = os.path.relpath(os.path.dirname(filename), input_dir)
            output_subdir = os.path.join(output_dir, relative_path)
            os.makedirs(output_subdir, exist_ok=True)

            # Write the output file.
            output_filename = os.path.join(output_subdir, os.path.basename(filename) + ".json")
            with open(output_filename, "w") as file:
                file.write(json_elements)

        except Exception as e:
            print(f"Error processing {filename}: {e}")

    async def process_files(input_directory, output_directory):
        tasks = []

        for root, _, files in os.walk(input_directory):
            for file in files:
                if not file.endswith('.json'):
                    full_path = os.path.join(root, file)
                    tasks.append(call_api(full_path, input_directory, output_directory))

        await asyncio.gather(*tasks)

    if __name__ == "__main__":
        asyncio.run(process_files(
            input_directory=os.getenv("LOCAL_FILE_INPUT_DIR"), 
            output_directory=os.getenv("LOCAL_FILE_OUTPUT_DIR")
        ))
    ```
  </Accordion>

  <Accordion title="JavaScript/TypeScript SDK">
    <UseIngestOrPlatformInstead />

    ```typescript TypeScript theme={null}
    import { UnstructuredClient } from "unstructured-client";
    import * as fs from "fs";
    import * as path from "path";
    import { Strategy } from "unstructured-client/sdk/models/shared/index.js";
    import { PartitionResponse } from "unstructured-client/sdk/models/operations";

    // Send all files in the source path to Unstructured for processing.
    // Send the processed data to the destination path.
    function processFiles(
        client: UnstructuredClient,
        sourcePath: string,
        destinationPath: string
    ): void {

        // If an output directory does not exist for the corresponding input
        // directory, then create it.
        if (!fs.existsSync(destinationPath)) {
            fs.mkdirSync(destinationPath, { recursive: true });
        }

        // Get all folders and files at the current level of the input directory.
        const items = fs.readdirSync(sourcePath);

        // For each folder and file in the input directory...
        for (const item of items) {
            const inputPath = path.join(sourcePath, item);
            const outputPath = path.join(destinationPath, item)

            // If it's a folder, call this function recursively.
            if (fs.statSync(inputPath).isDirectory()) {
                processFiles(client, inputPath, outputPath);
            } else {
                // If it's a file, send it to Unstructured for processing.
                const data = fs.readFileSync(inputPath);

                client.general.partition({
                    partitionParameters: {
                        files: {
                            content: data,
                            fileName: inputPath
                        },
                        strategy: Strategy.OcrOnly,
                        languages: ["kor"],
                        splitPdfPage: true,
                        splitPdfConcurrencyLevel: 15,
                        splitPdfAllowFailed: true
                    }
                }).then((res: PartitionResponse) => {
                    // If successfully processed, write the processed data to
                    // the destination directory.
                    if (res.statusCode == 200) {
                        const jsonElements = JSON.stringify(res, null, 2)
                        fs.writeFileSync(outputPath + ".json", jsonElements)
                    }
                }).catch((e) => {
                    if (e.statusCode) {
                        console.log(e.statusCode);
                        console.log(e.body);
                    } else {
                        console.log(e);
                    }
                });
            }
        }
    }

    const client = new UnstructuredClient({
        security: { apiKeyAuth: process.env.UNSTRUCTURED_API_KEY },
        serverURL: process.env.UNSTRUCTURED_API_URL
    });

    processFiles(
        client,
        process.env.LOCAL_FILE_INPUT_DIR,
        process.env.LOCAL_FILE_OUTPUT_DIR
    );
    ```
  </Accordion>
</AccordionGroup>

### Saving bounding box coordinates

When elements are extracted from PDFs or images, it may be useful to get their bounding boxes as well.
Set the `coordinates` parameter to `true` to add this field to the elements in the response.

<AccordionGroup>
  <Accordion title="POST">
    <UseIngestOrPlatformInstead />

    ```bash POST theme={null}
    curl -X 'POST' $UNSTRUCTURED_API_URL \
    -H 'accept: application/json' \
    -H 'Content-Type: multipart/form-data' \
    -H 'unstructured-api-key: $UNSTRUCTURED_API_KEY' \
    -F 'files=@sample-docs/layout-parser-paper.pdf' \
    -F 'coordinates=true' \
    -F 'strategy=hi_res'
    ```
  </Accordion>

  <Accordion title="Python SDK">
    <UseIngestOrPlatformInstead />

    ```python Python theme={null}
    import asyncio
    import os
    import json
    import unstructured_client
    from unstructured_client.models import shared

    client = unstructured_client.UnstructuredClient(
        api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")
    )

    async def call_api(filename, input_dir, output_dir):
        req = {
            "partition_parameters": {
                "files": {
                    "content": open(filename, "rb"),
                    "file_name": os.path.basename(filename),
                },
                "strategy": shared.Strategy.HI_RES,
                "coordinates": True,
                "split_pdf_page": True,
                "split_pdf_allow_failed": True,
                "split_pdf_concurrency_level": 15
            }
        }

        try:
            res = await client.general.partition_async(
                request=req
            )
            element_dicts = [element for element in res.elements]
            json_elements = json.dumps(element_dicts, indent=2)

            # Create the output directory structure.
            relative_path = os.path.relpath(os.path.dirname(filename), input_dir)
            output_subdir = os.path.join(output_dir, relative_path)
            os.makedirs(output_subdir, exist_ok=True)

            # Write the output file.
            output_filename = os.path.join(output_subdir, os.path.basename(filename) + ".json")
            with open(output_filename, "w") as file:
                file.write(json_elements)

        except Exception as e:
            print(f"Error processing {filename}: {e}")

    async def process_files(input_directory, output_directory):
        tasks = []

        for root, _, files in os.walk(input_directory):
            for file in files:
                if not file.endswith('.json'):
                    full_path = os.path.join(root, file)
                    tasks.append(call_api(full_path, input_directory, output_directory))

        await asyncio.gather(*tasks)

    if __name__ == "__main__":
        asyncio.run(process_files(
            input_directory=os.getenv("LOCAL_FILE_INPUT_DIR"), 
            output_directory=os.getenv("LOCAL_FILE_OUTPUT_DIR")
        ))
    ```
  </Accordion>

  <Accordion title="JavaScript/TypeScript SDK">
    <UseIngestOrPlatformInstead />

    ```typescript TypeScript theme={null}
    import { UnstructuredClient } from "unstructured-client";
    import * as fs from "fs";
    import * as path from "path";
    import { Strategy } from "unstructured-client/sdk/models/shared/index.js";
    import { PartitionResponse } from "unstructured-client/sdk/models/operations";

    // Send all files in the source path to Unstructured for processing.
    // Send the processed data to the destination path.
    function processFiles(
        client: UnstructuredClient,
        sourcePath: string,
        destinationPath: string
    ): void {

        // If an output directory does not exist for the corresponding input
        // directory, then create it.
        if (!fs.existsSync(destinationPath)) {
            fs.mkdirSync(destinationPath, { recursive: true });
        }

        // Get all folders and files at the current level of the input directory.
        const items = fs.readdirSync(sourcePath);

        // For each folder and file in the input directory...
        for (const item of items) {
            const inputPath = path.join(sourcePath, item);
            const outputPath = path.join(destinationPath, item)

            // If it's a folder, call this function recursively.
            if (fs.statSync(inputPath).isDirectory()) {
                processFiles(client, inputPath, outputPath);
            } else {
                // If it's a file, send it to Unstructured for processing.
                const data = fs.readFileSync(inputPath);

                client.general.partition({
                    partitionParameters: {
                        files: {
                            content: data,
                            fileName: inputPath
                        },
                        strategy: Strategy.HiRes,
                        coordinates: true,
                        splitPdfPage: true,
                        splitPdfConcurrencyLevel: 15,
                        splitPdfAllowFailed: true
                    }
                }).then((res: PartitionResponse) => {
                    // If successfully processed, write the processed data to
                    // the destination directory.
                    if (res.statusCode == 200) {
                        const jsonElements = JSON.stringify(res, null, 2)
                        fs.writeFileSync(outputPath + ".json", jsonElements)
                    }
                }).catch((e) => {
                    if (e.statusCode) {
                        console.log(e.statusCode);
                        console.log(e.body);
                    } else {
                        console.log(e);
                    }
                });
            }
        }
    }

    const client = new UnstructuredClient({
        security: { apiKeyAuth: process.env.UNSTRUCTURED_API_KEY },
        serverURL: process.env.UNSTRUCTURED_API_URL
    });

    processFiles(
        client,
        process.env.LOCAL_FILE_INPUT_DIR,
        process.env.LOCAL_FILE_OUTPUT_DIR
    );
    ```
  </Accordion>
</AccordionGroup>

### Returning unique element IDs

By default, the element ID is a SHA-256 hash of the element text. This is to ensure that
the ID is deterministic. One downside is that the ID is not guaranteed to be unique.
Different elements with the same text will have the same ID, and there could also be hash collisions.
To use UUIDs in the output instead, set `unique_element_ids=true`. Note: this means that the element IDs
will be random, so with every partition of the same file, you will get different IDs.
This can be helpful if you'd like to use the IDs as a primary key in a database, for example.

<AccordionGroup>
  <Accordion title="POST">
    <UseIngestOrPlatformInstead />

    ```bash POST theme={null}
    curl -X 'POST' $UNSTRUCTURED_API_URL \
    -H 'accept: application/json'  \
    -H 'Content-Type: multipart/form-data' \
    -H 'unstructured-api-key: $UNSTRUCTURED_API_KEY' \
    -F 'files=@sample-docs/layout-parser-paper-fast.pdf' \
    -F 'unique_element_ids=true' \
    -F 'strategy=vlm' \
    -F 'vlm_model_provider=openai' \
    -F 'vlm_model=gpt-4o'
    ```
  </Accordion>

  <Accordion title="Python SDK">
    <UseIngestOrPlatformInstead />

    ```python Python theme={null}
    import asyncio
    import os
    import json
    import unstructured_client
    from unstructured_client.models import shared

    client = unstructured_client.UnstructuredClient(
        api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")
    )

    async def call_api(filename, input_dir, output_dir):
        req = {
            "partition_parameters": {
                "files": {
                    "content": open(filename, "rb"),
                    "file_name": os.path.basename(filename),
                },
                "strategy": shared.Strategy.VLM,
                "vlm_model": "gpt-4o",
                "vlm_model_provider": "openai",
                "unique_element_ids": True,
                "split_pdf_page": True,
                "split_pdf_allow_failed": True,
                "split_pdf_concurrency_level": 15
            }
        }

        try:
            res = await client.general.partition_async(
                request=req
            )
            element_dicts = [element for element in res.elements]
            json_elements = json.dumps(element_dicts, indent=2)

            # Create the output directory structure.
            relative_path = os.path.relpath(os.path.dirname(filename), input_dir)
            output_subdir = os.path.join(output_dir, relative_path)
            os.makedirs(output_subdir, exist_ok=True)

            # Write the output file.
            output_filename = os.path.join(output_subdir, os.path.basename(filename) + ".json")
            with open(output_filename, "w") as file:
                file.write(json_elements)

        except Exception as e:
            print(f"Error processing {filename}: {e}")

    async def process_files(input_directory, output_directory):
        tasks = []

        for root, _, files in os.walk(input_directory):
            for file in files:
                if not file.endswith('.json'):
                    full_path = os.path.join(root, file)
                    tasks.append(call_api(full_path, input_directory, output_directory))

        await asyncio.gather(*tasks)

    if __name__ == "__main__":
        asyncio.run(process_files(
            input_directory=os.getenv("LOCAL_FILE_INPUT_DIR"), 
            output_directory=os.getenv("LOCAL_FILE_OUTPUT_DIR")
        ))
    ```
  </Accordion>

  <Accordion title="JavaScript/TypeScript SDK">
    <UseIngestOrPlatformInstead />

    ```typescript TypeScript theme={null}
    import { UnstructuredClient } from "unstructured-client";
    import * as fs from "fs";
    import * as path from "path";
    import { Strategy } from "unstructured-client/sdk/models/shared/index.js";
    import { PartitionResponse } from "unstructured-client/sdk/models/operations";

    // Send all files in the source path to Unstructured for processing.
    // Send the processed data to the destination path.
    function processFiles(
        client: UnstructuredClient,
        sourcePath: string,
        destinationPath: string
    ): void {

        // If an output directory does not exist for the corresponding input
        // directory, then create it.
        if (!fs.existsSync(destinationPath)) {
            fs.mkdirSync(destinationPath, { recursive: true });
        }

        // Get all folders and files at the current level of the input directory.
        const items = fs.readdirSync(sourcePath);

        // For each folder and file in the input directory...
        for (const item of items) {
            const inputPath = path.join(sourcePath, item);
            const outputPath = path.join(destinationPath, item)

            // If it's a folder, call this function recursively.
            if (fs.statSync(inputPath).isDirectory()) {
                processFiles(client, inputPath, outputPath);
            } else {
                // If it's a file, send it to Unstructured for processing.
                const data = fs.readFileSync(inputPath);

                client.general.partition({
                    partitionParameters: {
                        files: {
                            content: data,
                            fileName: inputPath
                        },
                        uniqueElementIds: true,
                        strategy: Strategy.HiRes,
                        splitPdfPage: true,
                        splitPdfConcurrencyLevel: 15,
                        splitPdfAllowFailed: true
                    }
                }).then((res: PartitionResponse) => {
                    // If successfully processed, write the processed data to
                    // the destination directory.
                    if (res.statusCode == 200) {
                        const jsonElements = JSON.stringify(res, null, 2)
                        fs.writeFileSync(outputPath + ".json", jsonElements)
                    }
                }).catch((e) => {
                    if (e.statusCode) {
                        console.log(e.statusCode);
                        console.log(e.body);
                    } else {
                        console.log(e);
                    }
                });
            }
        }
    }

    const client = new UnstructuredClient({
        security: { apiKeyAuth: process.env.UNSTRUCTURED_API_KEY },
        serverURL: process.env.UNSTRUCTURED_API_URL
    });

    processFiles(
        client,
        process.env.LOCAL_FILE_INPUT_DIR,
        process.env.LOCAL_FILE_OUTPUT_DIR
    );
    ```
  </Accordion>
</AccordionGroup>

### Adding the chunking step after partitioning

You can combine partitioning and subsequent chunking in a single request by setting the `chunking_strategy` parameter.
By default, the `chunking_strategy` is set to `None`, and no chunking is performed.

[//]: # "TODO: add a link to the concepts section about chunking strategies. Need to create the shared Concepts section first"

<AccordionGroup>
  <Accordion title="POST">
    <UseIngestOrPlatformInstead />

    ```bash POST theme={null}
    curl -X 'POST' $UNSTRUCTURED_API_URL \
    -H 'accept: application/json'  \
    -H 'Content-Type: multipart/form-data' \
    -H 'unstructured-api-key: $UNSTRUCTURED_API_KEY' \
    -F 'files=@sample-docs/layout-parser-paper-fast.pdf' \
    -F 'chunking_strategy=by_title' \
    -F 'max_characters=1024' \
    -F 'strategy=vlm' \
    -F 'vlm_model_provider=openai' \
    -F 'vlm_model=gpt-4o'
    ```
  </Accordion>

  <Accordion title="Python SDK">
    <UseIngestOrPlatformInstead />

    ```python Python theme={null}
    import asyncio
    import os
    import json
    import unstructured_client
    from unstructured_client.models import shared

    client = unstructured_client.UnstructuredClient(
        api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")
    )

    async def call_api(filename, input_dir, output_dir):
        req = {
            "partition_parameters": {
                "files": {
                    "content": open(filename, "rb"),
                    "file_name": os.path.basename(filename),
                },
                "chunking_strategy": "by_title",
                "max_characters": 1024,
                "strategy": shared.Strategy.VLM,
                "vlm_model": "gpt-4o",
                "vlm_model_provider": "openai",
                "split_pdf_page": True,
                "split_pdf_allow_failed": True,
                "split_pdf_concurrency_level": 15
            }
        }

        try:
            res = await client.general.partition_async(
                request=req
            )
            element_dicts = [element for element in res.elements]
            json_elements = json.dumps(element_dicts, indent=2)

            # Create the output directory structure.
            relative_path = os.path.relpath(os.path.dirname(filename), input_dir)
            output_subdir = os.path.join(output_dir, relative_path)
            os.makedirs(output_subdir, exist_ok=True)

            # Write the output file.
            output_filename = os.path.join(output_subdir, os.path.basename(filename) + ".json")
            with open(output_filename, "w") as file:
                file.write(json_elements)

        except Exception as e:
            print(f"Error processing {filename}: {e}")

    async def process_files(input_directory, output_directory):
        tasks = []

        for root, _, files in os.walk(input_directory):
            for file in files:
                if not file.endswith('.json'):
                    full_path = os.path.join(root, file)
                    tasks.append(call_api(full_path, input_directory, output_directory))

        await asyncio.gather(*tasks)

    if __name__ == "__main__":
        asyncio.run(process_files(
            input_directory=os.getenv("LOCAL_FILE_INPUT_DIR"), 
            output_directory=os.getenv("LOCAL_FILE_OUTPUT_DIR")
        ))
    ```
  </Accordion>

  <Accordion title="JavaScript/TypeScript SDK">
    <UseIngestOrPlatformInstead />

    ```typescript TypeScript theme={null}
    import { UnstructuredClient } from "unstructured-client";
    import * as fs from "fs";
    import * as path from "path";
    import { ChunkingStrategy, Strategy } from "unstructured-client/sdk/models/shared/index.js";
    import { PartitionResponse } from "unstructured-client/sdk/models/operations";

    // Send all files in the source path to Unstructured for processing.
    // Send the processed data to the destination path.
    function processFiles(
        client: UnstructuredClient,
        sourcePath: string,
        destinationPath: string
    ): void {

        // If an output directory does not exist for the corresponding input
        // directory, then create it.
        if (!fs.existsSync(destinationPath)) {
            fs.mkdirSync(destinationPath, { recursive: true });
        }

        // Get all folders and files at the current level of the input directory.
        const items = fs.readdirSync(sourcePath);

        // For each folder and file in the input directory...
        for (const item of items) {
            const inputPath = path.join(sourcePath, item);
            const outputPath = path.join(destinationPath, item)

            // If it's a folder, call this function recursively.
            if (fs.statSync(inputPath).isDirectory()) {
                processFiles(client, inputPath, outputPath);
            } else {
                // If it's a file, send it to Unstructured for processing.
                const data = fs.readFileSync(inputPath);

                client.general.partition({
                    partitionParameters: {
                        files: {
                            content: data,
                            fileName: inputPath
                        },
                        strategy: Strategy.HiRes,
                        chunkingStrategy: ChunkingStrategy.ByTitle,
                        maxCharacters: 1024,
                        splitPdfPage: true,
                        splitPdfConcurrencyLevel: 15,
                        splitPdfAllowFailed: true
                    }
                }).then((res: PartitionResponse) => {
                    // If successfully processed, write the processed data to
                    // the destination directory.
                    if (res.statusCode == 200) {
                        const jsonElements = JSON.stringify(res, null, 2)
                        fs.writeFileSync(outputPath + ".json", jsonElements)
                    }
                }).catch((e) => {
                    if (e.statusCode) {
                        console.log(e.statusCode);
                        console.log(e.body);
                    } else {
                        console.log(e);
                    }
                });
            }
        }
    }

    const client = new UnstructuredClient({
        security: { apiKeyAuth: process.env.UNSTRUCTURED_API_KEY },
        serverURL: process.env.UNSTRUCTURED_API_URL
    });

    processFiles(
        client,
        process.env.LOCAL_FILE_INPUT_DIR,
        process.env.LOCAL_FILE_OUTPUT_DIR
    );
    ```
  </Accordion>
</AccordionGroup>

---

# Source: https://docs.unstructured.io/open-source/how-to/extract-image-block-types.md

# Source: https://docs.unstructured.io/api-reference/legacy-api/partition/extract-image-block-types.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Extract images and tables from documents

<Note>
  The following information applies to the legacy Unstructured Partition Endpoint.

  Unstructured recommends that you use the
  [on-demand jobs](/api-reference/workflow/overview#run-an-on-demand-job) functionality in the
  [Unstructured API](/api-reference/overview) instead. Unstructured's on-demand jobs provide
  many benefits over the legacy Unstructured Partition Endpoint, including support for:

  * Production-level usage.
  * Multiple local input files in batches.
  * The latest and highest-performing models.
  * Post-transform enrichments.
  * All of Unstructured's chunking strategies.
  * The generation of vector embeddings.

  The Unstructured API also provides support for processing files and data in remote locations.
</Note>

## Task

You want to get, decode, and show elements, such as images and tables, that are embedded in a PDF document.

## Approach

Extract the Base64-encoded representation of specific elements, such as images and tables, in the document.
For each of these extracted elements, decode the Base64-encoded representation of the element into its original visual representation
and then show it.

## To run this example

You will need a document that is one of the document types supported by the `extract_image_block_types` argument.
See the `extract_image_block_types` entry in [API Parameters](/api-reference/legacy-api/partition/api-parameters).
This example uses a PDF file with embedded images and tables.

## Code

For the [Unstructured Python SDK](/api-reference/legacy-api/partition/sdk-python), you'll need:

These environment variables:

* `UNSTRUCTURED_API_KEY` - Your Unstructured API key value.
* `UNSTRUCTURED_API_URL` - Your Unstructured API URL.

```python Python SDK theme={null}
from unstructured_client import UnstructuredClient
from unstructured_client.models import operations, shared
from unstructured.staging.base import elements_from_dicts, elements_to_json

import os
import base64
from PIL import Image
import io

if __name__ == "__main__":
    client = UnstructuredClient(
        api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")
    )

    # Source: https://github.com/Unstructured-IO/unstructured/blob/main/example-docs/embedded-images-tables.pdf
    
    # Where to get the input file and store the processed data, relative to this .py file.
    local_input_filepath = "local-ingest-input-pdf/embedded-images-tables.pdf"
    local_output_filepath = "local-ingest-output/embedded-images-tables.json"

    with open(local_input_filepath, "rb") as f:
        files = shared.Files(
            content=f.read(),
            file_name=local_input_filepath
        )

    request = operations.PartitionRequest(
        shared.PartitionParameters(
            files=files,
            split_pdf_page=True,
            split_pdf_allow_failed=True,
            split_pdf_concurrency_level=15,
            # Extract the Base64-encoded representation of each
            # processed "Image" and "Table" element. Extract each into
            # an "image_base64" object, as a child of the
            # "metadata" object, for that element in the result.
            # Element type names, such as "Image" and "Table" here,
            # are case-insensitive.
            # Any available Unstructured element type is allowed.
            extract_image_block_types=["Image", "Table"]
        )
    )

    try:
        result = client.general.partition(
            request=request
        )

        for element in result.elements:
            if "image_base64" in element["metadata"]:
                # Decode the Base64-encoded representation of the 
                # processed "Image" or "Table" element into its original
                # visual representation, and then show it.
                image_data = base64.b64decode(element["metadata"]["image_base64"])
                image = Image.open(io.BytesIO(image_data))
                image.show()
        
        # Optionally, prepare to print or save the elements as JSON.
        dict_elements = elements_from_dicts(
            element_dicts=result.elements
        )

        # Print the elements as JSON...
        json_elements = elements_to_json(
            elements=dict_elements,
            indent=2
        )

        print(json_elements)

        # ...or save as JSON.
        elements_to_json(
            elements=dict_elements,
            indent=2,
            filename=local_output_filepath
        )
    except Exception as e:
        print(e)
```

## See also

* [Extract text as HTML](/api-reference/legacy-api/partition/text-as-html)
* [Table extraction from PDF](/examplecode/codesamples/apioss/table-extraction-from-pdf)

---

# Source: https://docs.unstructured.io/open-source/core-functionality/extracting.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Extracting

## `extract_datetimetz`

Extracts the date, time, and timezone in the `Received` field(s) from an `.eml` file. `extract_datetimetz` takes in a string and returns a datetime.datetime object from the input string.

```python  theme={null}
from unstructured.cleaners.extract import extract_datetimetz

text = """from ABC.DEF.local ([ba23::58b5:2236:45g2:88h2]) by
  \n ABC.DEF.local2 ([ba23::58b5:2236:45g2:88h2%25]) with mapi id\
  n 32.88.5467.123; Fri, 26 Mar 2021 11:04:09 +1200"""

# Returns datetime.datetime(2021, 3, 26, 11, 4, 9, tzinfo=datetime.timezone(datetime.timedelta(seconds=43200)))
extract_datetimetz(text)

```

For more information about the `extract_datetimetz` function, you can check the [source code here](https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/cleaners/extract.py).

## `extract_email_address`

Extracts email addresses from a string input and returns a list of all the email addresses in the input string.

```python  theme={null}
from unstructured.cleaners.extract import extract_email_address

text = """Me me@email.com and You <You@email.com>
    ([ba23::58b5:2236:45g2:88h2]) (10.0.2.01)"""

# Returns "['me@email.com', 'you@email.com']"
extract_email_address(text)

```

For more information about the `extract_email_address` function, you can check the [source code here](https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/cleaners/extract.py).

## `extract_ip_address`

Extracts IPv4 and IPv6 IP addresses in the input string and returns a list of all IP address in input string.

```python  theme={null}
from unstructured.cleaners.extract import extract_ip_address

text = """Me me@email.com and You <You@email.com>
  ([ba23::58b5:2236:45g2:88h2]) (10.0.2.01)"""

# Returns "['ba23::58b5:2236:45g2:88h2', '10.0.2.01']"
extract_ip_address(text)

```

For more information about the `extract_ip_address` function, you can check the [source code here](https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/cleaners/extract.py).

## `extract_ip_address_name`

Extracts the names of each IP address in the `Received` field(s) from an `.eml` file. `extract_ip_address_name` takes in a string and returns a list of all IP addresses in the input string.

```python  theme={null}
from unstructured.cleaners.extract import extract_ip_address_name

text = """from ABC.DEF.local ([ba23::58b5:2236:45g2:88h2]) by
  \n ABC.DEF.local2 ([ba23::58b5:2236:45g2:88h2%25]) with mapi id\
  n 32.88.5467.123; Fri, 26 Mar 2021 11:04:09 +1200"""

# Returns "['ABC.DEF.local', 'ABC.DEF.local2']"
extract_ip_address_name(text)

```

For more information about the `extract_ip_address_name` function, you can check the [source code here](https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/cleaners/extract.py).

## `extract_mapi_id`

Extracts the `mapi id` in the `Received` field(s) from an `.eml` file. `extract_mapi_id` takes in a string and returns a list of a string containing the `mapi id` in the input string.

```python  theme={null}
from unstructured.cleaners.extract import extract_mapi_id

text = """from ABC.DEF.local ([ba23::58b5:2236:45g2:88h2]) by
  \n ABC.DEF.local2 ([ba23::58b5:2236:45g2:88h2%25]) with mapi id\
  n 32.88.5467.123; Fri, 26 Mar 2021 11:04:09 +1200"""

# Returns "['32.88.5467.123']"
extract_mapi_id(text)

```

For more information about the `extract_mapi_id` function, you can check the [source code here](https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/cleaners/extract.py).

## `extract_ordered_bullets`

Extracts alphanumeric bullets from the beginning of text up to three “sub-section” levels.

Examples:

```python  theme={null}
from unstructured.cleaners.extract import extract_ordered_bullets

# Returns ("1", "1", None)
extract_ordered_bullets("1.1 This is a very important point")

# Returns ("a", "1", None)
extract_ordered_bullets("a.1 This is a very important point")

```

For more information about the `extract_ordered_bullets` function, you can check the [source code here](https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/cleaners/extract.py).

## `extract_text_after`

Extracts text that occurs after the specified pattern.

Options:

* If `index` is set, extract after the `(index + 1)`th occurrence of the pattern. The default is `0`.

* Strips trailing whitespace if `strip` is set to `True`. The default is `True`.

Examples:

```python  theme={null}
from unstructured.cleaners.extract import extract_text_after

text = "SPEAKER 1: Look at me, I'm flying!"

# Returns "Look at me, I'm flying!"
extract_text_after(text, r"SPEAKER \d{1}:")

```

For more information about the `extract_text_after` function, you can check the [source code here](https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/cleaners/extract.py).

## `extract_text_before`

Extracts text that occurs before the specified pattern.

Options:

* If `index` is set, extract before the `(index + 1)`th occurrence of the pattern. The default is `0`.

* Strips leading whitespace if `strip` is set to `True`. The default is `True`.

Examples:

```python  theme={null}
from unstructured.cleaners.extract import extract_text_before

text = "Here I am! STOP Look at me! STOP I'm flying! STOP"

# Returns "Here I am!"
extract_text_before(text, r"STOP")

```

For more information about the `extract_text_before` function, you can check the [source code here](https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/cleaners/extract.py).

## `extract_us_phone_number`

Extracts a phone number from a section of text.

Examples:

```python  theme={null}
from unstructured.cleaners.extract import extract_us_phone_number

# Returns "215-867-5309"
extract_us_phone_number("Phone number: 215-867-5309")

```

For more information about the `extract_us_phone_number` function, you can check the [source code here](https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/cleaners/extract.py).

## `group_broken_paragraphs`

Groups together paragraphs that are broken up with line breaks for visual or formatting purposes. This is common in `.txt` files. By default, `group_broken_paragraphs` groups together lines split by `\n`. You can change that behavior with the `line_split` kwarg. The function considers `\n\n` to be a paragraph break by default. You can change that behavior with the `paragraph_split` kwarg.

Examples:

```python  theme={null}
from unstructured.cleaners.core import group_broken_paragraphs

text = """The big brown fox
was walking down the lane.

At the end of the lane, the
fox met a bear."""

group_broken_paragraphs(text)

```

```python  theme={null}
import re
from unstructured.cleaners.core import group_broken_paragraphs

para_split_re = re.compile(r"(\s*\n\s*){3}")

text = """The big brown fox

was walking down the lane.


At the end of the lane, the

fox met a bear."""

group_broken_paragraphs(text, paragraph_split=para_split_re)

```

For more information about the `group_broken_paragraphs` function, you can check the [source code here](https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/cleaners/core.py).

## `remove_punctuation`

Removes ASCII and unicode punctuation from a string.

Examples:

```python  theme={null}
from unstructured.cleaners.core import remove_punctuation

# Returns "A lovely quote"
remove_punctuation("“A lovely quote!”")

```

For more information about the `remove_punctuation` function, you can check the [source code here](https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/cleaners/core.py).

## `replace_unicode_quotes`

Replaces unicode quote characters such as `\x91` in strings.

Examples:

```python  theme={null}
from unstructured.cleaners.core import replace_unicode_quotes

# Returns "“A lovely quote!”"
replace_unicode_characters("\x93A lovely quote!\x94")

# Returns ""‘A lovely quote!’"
replace_unicode_characters("\x91A lovely quote!\x92")

```

For more information about the `replace_unicode_quotes` function, you can check the [source code here](https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/cleaners/core.py).

## `translate_text`

The `translate_text` cleaning function translates text between languages. `translate_text` uses the [Helsinki NLP MT models](https://huggingface.co/Helsinki-NLP) from `transformers` for machine translation. Works for Russian, Chinese, Arabic, and many other languages.

Parameters:

* `text`: the input string to translate.

* `source_lang`: the two letter language code for the source language of the text. If `source_lang` is not specified, the language will be detected using `langdetect`.

* `target_lang`: the two letter language code for the target language for translation. Defaults to `"en"`.

Examples:

```python  theme={null}
from unstructured.cleaners.translate import translate_text

# Output is "I'm a Berliner!"
translate_text("Ich bin ein Berliner!")

# Output is "I can also translate Russian!"
translate_text("Я тоже можно переводать русский язык!", "ru", "en")

```

For more information about the `translate_text` function, you can check the [source code here](https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/cleaners/translate.py).

---

# Source: https://docs.unstructured.io/open-source/how-to/filter-files.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Process a subset of files

<Note>
  The following information applies only to the [Unstructured Ingest CLI](/open-source/ingestion/overview#unstructured-ingest-cli) and the [Unstructured Ingest Python library](/open-source/ingestion/python-ingest).

  The Unstructured SDKs for Python and JavaScript/TypeScript and the Unstructured open-source library do not support this functionality.
</Note>

## Task

You want to process only files with specified extensions, only files at or below a specified size, or both.

## Approach

For the Ingest CLI, use the following command options. For the Ingest Python library, use the following parameters for the `FiltererConfig` object.

* Use `--file-glob` (CLI) or `file_glob` (Python) to specify the list of file extensions to process.
* Use `--max-file-size` (CLI) or `max_file_size` (Python) to specify the maximum size of files to process, in bytes.

## To run this example

The following example processes only `.pdf` and `.eml` files that have a file size of 100 KB or less. To run this example, you should have a directory
with a mixture of files, including at least one `.pdf` file and one `.eml` file, and with at least one of these files having a file size of 100 KB or less.

## Code

<CodeGroup>
  ```bash CLI Ingest v2 theme={null}
  unstructured-ingest \
    local \
      --input-path $LOCAL_FILE_INPUT_DIR \
      --output-dir $LOCAL_FILE_OUTPUT_DIR \
      --file-glob "*.pdf,*.eml" \
      --max-file-size 100000 \
      --partition-by-api \
      --partition-endpoint $UNSTRUCTURED_API_URL \
      --api-key $UNSTRUCTURED_API_KEY \
      --strategy hi_res
  ```

  ```python Python Ingest theme={null}
  import os

  from unstructured_ingest.pipeline.pipeline import Pipeline
  from unstructured_ingest.interfaces import ProcessorConfig
  from unstructured_ingest.processes.connectors.local import (
      LocalIndexerConfig,
      LocalDownloaderConfig,
      LocalConnectionConfig,
      LocalUploaderConfig
  )
  from unstructured_ingest.processes.partitioner import PartitionerConfig
  from unstructured_ingest.processes.filter import FiltererConfig

  if __name__ == "__main__":
      Pipeline.from_configs(
          context=ProcessorConfig(),
          indexer_config=LocalIndexerConfig(input_path=os.getenv("LOCAL_FILE_INPUT_DIR")),
          downloader_config=LocalDownloaderConfig(),
          source_connection_config=LocalConnectionConfig(),
          filterer_config=FiltererConfig(
              file_glob=["*.pdf","*.eml"],
              max_file_size=100000
          ),
          partitioner_config=PartitionerConfig(
              partition_by_api=True,
              api_key=os.getenv("UNSTRUCTURED_API_KEY"),
              partition_endpoint=os.getenv("UNSTRUCTURED_API_URL"),
              strategy="hi_res",
              additional_partition_args={
                  "unique_element_ids": True,
                  "split_pdf_page": True,
                  "split_pdf_allow_failed": True,
                  "split_pdf_concurrency_level": 15
              }
          ),
          uploader_config=LocalUploaderConfig(output_dir=os.getenv("LOCAL_FILE_OUTPUT_DIR"))
      ).run()
  ```
</CodeGroup>

---

# Source: https://docs.unstructured.io/examplecode/tools/firecrawl.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Firecrawl

[Firecrawl](https://firecrawl.dev/) is a tool that allows you to crawl websites and extract data from them.

You can use Unstructured to process the data that Firecrawl extracts from these websites. Unstructured turns this extracted
website data into a single set of standard [Unstructured document elements and metadata](/ui/document-elements),
improving the accuracy, relevance, and performance of your RAG applications, agentic AI workflows, model fine tuning, and more.

This example provides a hands-on walkthrough of how to use Firecrawl with Unstructured. In this example, you use Python code to have Firecrawl
extract data from a ficticious website and then add the extracted data to an Amazon S3 bucket within your AWS account. From there, you use the Unstructured user interface (UI) to set up an
ETL+ workflow that takes the extracted data from the S3 bucket, converts it into Unstructured document elements and metadata, adds
vector embeddings, and then sends the results over into a serverless index within a Pinecone vector store.

## Requirements

To complete this walkthrough, you must first have:

* An Unstructured account.
* A Firecrawl account and a Firecrawl API key for that account.
* An AWS account and a properly configured Amazon S3 bucket with bucket access credentials within that account.
* A Pinecone account and a Pinecone API key for that account.
* Python and the necessary development toolchain installed on your local development machine.

The following sections describe how to get these requirements.

### Unstructured account

Before you begin, you must have an Unstructured account and an Unstructured API key, as follows:

1. If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
   After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).

   <Note>
     To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/api-reference/overview#pricing).
   </Note>

2. If you have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at [https://platform.unstructured.io](https://platform.unstructured.io).

   <Note>
     For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
     or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).
   </Note>

3. Get your Unstructured API key:<br />

   a. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business** account, click **API Keys** on the sidebar.<br />

   <Note>
     For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
     for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).
   </Note>

   b. Click **Generate API Key**.<br />
   c. Follow the on-screen instructions to finish generating the key.<br />
   d. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

### Firecrawl account and API key

Before you begin, you must have a Firecrawl account and a Firecrawl API key, as follows:

1. [Sign up or sign in to your Firecrawl account](https://www.firecrawl.dev/signin/signup).
2. After you sign in, [get your Firecrawl API key](https://www.firecrawl.dev/app/api-keys).

### Amazon S3 bucket with access permissions and credentials

This walkthrough uses an Amazon S3 bucket as the Unstructured ETL+ workflow's source. So, before you begin, you must have an AWS account, and within that account, a properly configured Amazon S3 bucket with bucket access credentials
consisting of an AWS access key ID and secret access key for the AWS IAM user that has access to the bucket.
Follow the S3 connector instructions to
[create and configure the bucket and get the bucket access credentials](/ui/sources/s3) if you do not already have this all set up. (In these
instructions, do not follow the directions to use the Unstructured UI to create the S3 source connector. You will do this later when you create the Unstructured ETL+ workflow.)

### Pinecone account and API key

Before you begin, you must have a Pinecone account and a Pinecone API key, as follows:

1. [Sign up](https://app.pinecone.io/?sessionType=signup) or [log in](https://app.pinecone.io/?sessionType=login) to your Pinecone account.
2. After you log in, [get your Pinecone API key](https://docs.pinecone.io/guides/projects/manage-api-keys#create-an-api-key).

### Python and development toolchain installation

Before you can start coding on your local machine, you must install Python, and you should also install a Python package and project manager to manage your project's Firecrawl, AWS, Pinecone, and other code dependencies.
This walkthrough uses the popular Python package and project manager [uv](https://docs.astral.sh/uv/) (although `uv` is not required to use Firecrawl, AWS, or Pinecone with Python).

<Steps>
  <Step title="Install uv">
    To install `uv`, run one of the following commands, depending on your operating system:

    <Tabs>
      <Tab title="macOS, Linux">
        To use `curl` with `sh`:

        ```bash  theme={null}
        curl -fsSL https://get.uv.dev | bash
        ```

        To use `wget` with `sh` instead:

        ```bash  theme={null}
        wget -qO- https://astral.sh/uv/install.sh | sh
        ```
      </Tab>

      <Tab title="Windows">
        To use PowerShell with `irm` to download the script and run it with `iex`:

        ```powershell  theme={null}
        powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
        ```
      </Tab>
    </Tabs>

    If you need to install `uv` by using other approaches such as PyPI, Homebrew, or WinGet,
    see [Installing uv](https://docs.astral.sh/uv/getting-started/installation/).
  </Step>

  <Step title="Install Python">
    <Note>This walkthrough should work with Python 3.9 or later and has been tested with Python 3.12.</Note>

    `uv` will detect and use Python if you already have it installed.
    To view a list of installed Python versions, run the following command:

    ```bash  theme={null}
    uv python list
    ```

    If, however, you do not already have Python installed, you can install a version of Python for use with `uv`
    by running the following command. For example, this command installs Python 3.12 for use with `uv`:

    ```bash  theme={null}
    uv python install 3.12
    ```
  </Step>

  <Step title="Create the project directory">
    Switch to the directory on your local development machine where you want to create the project directory for this walkthrough.
    This example creates a project directory named `firecrawl_unstructured_demo` within your current working directory and then
    switches to this new project directory:

    ```bash  theme={null}
    mkdir firecrawl_unstructured_demo
    cd firecrawl_unstructured_demo
    ```
  </Step>

  <Step title="Intiialize the project">
    From within the new project directory, use `uv` to initialize the project by running the following command:

    ```bash  theme={null}
    uv init
    ```
  </Step>

  <Step title="Create a venv virtual environment">
    To isolate and manage your project's code dependencies, you should create a virtual environment. This walkthrough uses
    the popular Python virtual environment manager [venv](https://docs.python.org/3/library/venv.html) (although `venv` is not required to use CrewAI or the Unstructured API's workflow operations MCP Server).
    From the root of your project directory, use `uv` to create a virtual environment with `venv` by running the following command:

    ```bash  theme={null}
    # Create the virtual environment by using the current Python version:
    uv venv

    # Or, if you want to use a specific Python version:
    uv venv --python 3.12
    ```
  </Step>

  <Step title="Activate the virtual environment">
    To activate the `venv` virtual environment, run one of the following commands from the root of your project directory:

    <Tabs>
      <Tab title="macOS, Linux">
        * For `bash` or `zsh`, run `source .venv/bin/activate`
        * For `fish`, run `source .venv/bin/activate.fish`
        * For `csh` or `tcsh`, run `source .venv/bin/activate.csh`
        * For `pwsh`, run `.venv/bin/Activate.ps1`
      </Tab>

      <Tab title="Windows">
        * For `cmd.exe`, run `.venv\Scripts\activate.bat`
        * For `PowerShell`, run `.venv\Scripts\Activate.ps1`
      </Tab>
    </Tabs>

    If you need to deactivate the virtual environment at any time, run the following command:

    ```bash  theme={null}
    deactivate
    ```
  </Step>

  <Step title="Install the Firecrawl Python SDK">
    You will use the Firecrawl Python SDK later to extract data from a website.

    Use `uv` to install the Firecrawl Python SDK package, by running the following command:

    ```bash  theme={null}
    uv add firecrawl-py
    ```
  </Step>

  <Step title="Install the AWS SDK for Python">
    You will use the AWS SDK for Python later to have Firecrawl add the extracted data to your Amazon S3 bucket.

    1. Use `uv` to install the AWS SDK for Python package, by running the following command:

       ```bash  theme={null}
       uv add boto3
       ```

    2. To enable the AWS SDK for Python to authenticate with your AWS account, create a file named `credentials` in the `~/.aws/` directory for macOS or Linux, or the `<drive>:\Users\<username>\.aws\` directory for Windows. Then add the
       AWS access key ID and secret access key of the AWS IAM user that has access to your Amazon S3 bucket, and the [short code for the AWS Region](https://docs.aws.amazon.com/global-infrastructure/latest/regions/aws-regions.html) of the bucket (for example, `us-east-1`), to the `credentials` file.

       In the following `credentials` file example, replace the following placeholders:

       * Replace `<your-access-key-id>` with the AWS access key ID of the AWS IAM user that has access to the bucket.
       * Replace `<your-secret-access-key>` with the secret access key for the related access key ID.
       * Replace `<the-aws-short-region-code-for-your-bucket>` with the short code for the AWS Region of the bucket.

       ```text  theme={null}
       [default]
       aws_access_key_id = <your-access-key-id>
       aws_secret_access_key = <your-secret-access-key>
       region = <the-aws-short-region-code-for-your-bucket>
       ```
  </Step>

  <Step title="Install the Pinecone Python SDK">
    You will use the Pinecone Python SDK later to create the Pinecone serverless index and, in an optional step, to help enable RAG chat on the data that will be storedin the index.

    Use `uv` to install the Pinecone Python SDK package, along with the `grpc` extra to enable the programmatic creation of a Pinecone serverless index later, by running the following command:

    ```bash  theme={null}
    uv add "pinecone[grpc]"
    ```
  </Step>
</Steps>

## Step 1: Extract the website data and add the data to the bucket

In this step, you use Firecrawl to extract data from a website and then add the extracted data to the Amazon S3 bucket.

<Steps>
  <Step title="Add your Firecrawl API key to your project">
    1. Set your Firecrawl API key in your project as an environment variable for your code to use. To do this:

       a. In the root of your code project, create a file named `.env`.<br />
       b. Add the following environment variable to this `.env` file, and replace
       `<your-firecrawl-api-key>` with your Firecrawl API key:<br />

       ```text  theme={null}
       FIRECRAWL_API_KEY=<your-firecrawl-api-key>
       ```

    2. To enable your Firecrawl code to access the environment variable, use `uv` to install the `dotenv` package by running the following command:

       ```bash  theme={null}
       uv add dotenv
       ```

    3. In the `.gitignore` file in the root of your project, add the following line, to help prevent
       accidentally checking in your Firecrawl API key (or anything else in the `.env` file) into any shared code repositories later:

       ```text  theme={null}
       .env
       ```
  </Step>

  <Step title="Create the Python script to extract the website data">
    1. In the `.env` file, add the following line, which defines an environment variable representing the base URL
       of the website to crawl. This walkthrough uses a website named [Books to Scrape](https://books.toscrape.com/) that contains ficticious data (although you can use any accessible website you want):

       ```text  theme={null}
       FIRECRAWL_BASE_URL=https://books.toscrape.com/
       ```

    2. Also in the `.env` file, add the following line, which defines an environment variable representing the name of the target Amazon S3 bucket to
       have Firecrawlstore the website crawl results in. Replace `<the-name-of-your-bucket>` with the name of the bucket:

       ```text  theme={null}
       FIRECRAWL_S3_BUCKET_NAME=<the-name-of-your-bucket>
       ```

       <Note>Enter only the name of the bucket. Do not prefix the bucket's name with `s3://`. Do not include any trailing slash (`/`) after the bucket name.</Note>

    3. Create a file named `firecrawl_extract.py` in the root of your project directory, and add the following code to it:

       ```python  theme={null}
       from firecrawl import FirecrawlApp, ScrapeOptions
       from dotenv import load_dotenv
       import os, boto3, json
       from datetime import datetime

       load_dotenv()

       def save_to_s3(crawl_result, bucket_name, prefix):
           s3 = boto3.client("s3")
           timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")

           # Save the full crawl results to the bucket as a single JSON file.
           full_results_key = f"{prefix}/{timestamp}/full_results.json"
           s3.put_object(
               Bucket=bucket_name,
               Key=full_results_key,
               Body=json.dumps(crawl_result.model_dump(), indent=2, default=str),
           )

           # Save individually crawled pages to the bucket as HTML files.
           for idx, page in enumerate(crawl_result.data):
               if page.html is not None:
                   page_key = f"{prefix}/{timestamp}/pages/{idx}.html"
                   s3.put_object(Bucket=bucket_name, Key=page_key, Body=page.html)
                   print(f"Successfully saved {idx}.html to {bucket_name}/{page_key}")
           print(f"Successfully saved {len(crawl_result.data)} pages to {bucket_name}/{full_results_key}")

       def main():
           app = FirecrawlApp(api_key=os.getenv("FIRECRAWL_API_KEY"))

           result = app.crawl_url(
               url=os.getenv("FIRECRAWL_BASE_URL"),
               limit=5, # Maximum number of pages to crawl.
               scrape_options=ScrapeOptions(
                   formats=["html"], # Save the crawled pages as HTML files.
                   waitFor=1000, # Wait for one second before crawling the currentpage.
                   timeout=10000 # Move on to the next page if no response within 10 seconds.
               )
           )
           
           save_to_s3(
               crawl_result=result,
               bucket_name=os.getenv("FIRECRAWL_S3_BUCKET_NAME"),
               prefix="crawls"
           )

       if __name__ == "__main__":
           main()
       ```

       The preceding code does the following:

       * Loads from your `.env` file your Firecrawl API key, the base URL for Firecrawl to use for website crawling, and the S3 bucket name for Firecrawl to send the website crawl results to.
       * Calls a function named `main` that uses Firecrawl to crawl the website.
       * The `main` function then calls the `save_to_s3` function, which adds the website crawl results to the S3 bucket.

       You can change the where the website crawl results are saved in your S3 bucket by changing the `full_results_key` and `page_key` variables. For example, you might want to save the
       website crawl results in a single folder and then keep overwriting those results with new results as they come in, instead of adding new results to separate subfolders.

       You can also change the number of crawled pages by changing the `limit` argument, outputting the results as Markdown instead of HTML, and so on. For more
       information, see the [Firecrawl Python SDK documentation](https://docs.firecrawl.dev/sdks/python).
  </Step>

  <Step title="Run the script to extract the data">
    From the root of your project, use `uv` to run the script to extract the data from the website by running the following command:

    ```bash  theme={null}
    uv run firecrawl_extract.py
    ```

    After the script runs, you should see the following output, where `<your-bucket-name>` is the name of your bucket, and `<timestamp>` is the timestamp generated by the script:

    ```bash  theme={null}
    Successfully saved 0.html to <your-bucket-name>/crawls/<timestamp>/pages/0.html
    Successfully saved 1.html to <your-bucket-name>/crawls/<timestamp>/pages/1.html
    Successfully saved 2.html to <your-bucket-name>/crawls/<timestamp>/pages/2.html
    Successfully saved 3.html to <your-bucket-name>/crawls/<timestamp>/pages/3.html
    Successfully saved 4.html to <your-bucket-name>/crawls/<timestamp>/pages/4.html
    Successfully saved 5 pages to <your-bucket-name>/crawls/<timestamp>/full_results.json
    ```

    To verify that the data was successfully added, open your AmazonS3 console
    and look in the `crawls` folder within your bucket.
  </Step>
</Steps>

## Step 2: Create a Pinecone serverless index

In this step, you create a Pinecone serverless index that Unstructured will use to store its results based on the
extracted website data that Firecrawl sends to your Amazon S3 bucket.

<Steps>
  <Step title="Add your Pinecone API key to your project">
    Set your Pinecone API key as an environment variable for your code to use. To do this, add the following environment variable to your `.env` file, and replace
    `<your-pinecone-api-key>` with your Pinecone API key:

    ```text  theme={null}
    PINECONE_API_KEY=<your-pinecone-api-key>
    ```
  </Step>

  <Step title="Add the Pinecone index name to your project">
    Add the following line to your `.env` file, and replace `<the-name-of-your-index>` with the name of the serverlessindex you want to create:

    ```text  theme={null}
    PINECONE_INDEX_NAME=<the-name-of-your-index>
    ```
  </Step>

  <Step title="Create the Python script to create the Pinecone serverless index">
    Create a file named `pinecone_create_index.py` in the root of your project directory and add the following code to it:

    ```python  theme={null}
    from pinecone.grpc import PineconeGRPC as Pinecone
    from pinecone import ServerlessSpec
    import os
    from dotenv import load_dotenv

    load_dotenv()

    pc = Pinecone(api_key=os.getenv("PINECONE_API_KEY"))

    pc.create_index(
        name=os.getenv("PINECONE_INDEX_NAME"),
        vector_type="dense",
        dimension=3072,
        metric="cosine",
        spec=ServerlessSpec(
            cloud="aws",
            region="us-east-1"
        )
    )
    ```

    The preceding code creates in your Pinecone account a serverless index with the specified name for the specified cloud provider and within the specified region for that provider.
    The index is configured to accept vectors with 3072 dimensions with a cosine metric and a dense vector type.
  </Step>

  <Step title="Run the script to create the index">
    From the root of your project, use `uv` to run the script to create the Pinecone serverless index by running the following command:

    ```bash  theme={null}
    uv run pinecone_create_index.py
    ```

    After the script runs, to verify that the index was successfully added, log in to your Pinecone account. Then open your default project,
    and click **Indexes** under **Database** in the sidebar to view the new index that was added.
  </Step>
</Steps>

## Step 3: Set up and run the Unstructured ETL+ workflow

In this step, you set up and run the Unstructured ETL+ workflow that will take the extracted website data from your Amazon S3 bucket, process it to turn it into
Unstructured elements and metadata, add vector embeddings to the elements, and then add all of this information to your Pinecone serverless index.

<Steps>
  <Step title="Create the source connector">
    1. If you are not already signed in, sign in to your Unstructured account.
    2. On the sidebar, click **Connectors**.
    3. With **Sources** selected, **New**.
    4. Enter some name for the source connector, such as `s3-firecrawl-source`.
    5. For **Provider**, click **Amazon S3**.
    6. Click **Continue**.
    7. For **Bucket URI**, enter `s3://<your-bucket-name>/crawls/<timestamp>/pages/`, replacing `<your-bucket-name>` with the name of your bucket, and `<timestamp>` with the timestamp generated by the script in the previous step..
    8. For **Authentication Method**, select **Secret**.
    9. Enter your AWS access key ID and secret access key.
    10. Click **Continue**.
    11. Click **Save and Test**.
  </Step>

  <Step title="Create the destination connector">
    1. On the sidebar, click **Connectors**.
    2. With **Destinations** selected, **New**.
    3. Enter some name for the destination connector, such as `pinecone-firecrawl-destination`.
    4. For **Provider**, click **Pinecone**.
    5. Click **Continue**.
    6. For **Index Name**, enter the name of the Pinecone index you created earlier, such as `firecrawl-dense-index`.
    7. Leave **Namespace** blank. Unstructured will automatically use the default namespace in the index.
    8. Leave **Batch Size** set to the default value of `50`.
    9. For **API Key**, enter your Pinecone API key.
    10. Click **Save and Test**.
  </Step>

  <Step title="Create the workflow">
    1. On the sidebar, click **Workflows**.
    2. Click **New Workflow**.
    3. With **Build it Myself** selected, click **Continue**.
    4. In the workflow designer, click the **Source** workflow node.
    5. In the node settings pane, on the **Connectors** tab, select the name of the source connector you created earlier, such as `s3-firecrawl-source`.
    6. Click the **Destination** workflow node.
    7. In the node settings pane, on the **Connectors** tab, select the name of the destination connector you created earlier, such as `pinecone-firecrawl-destination`.
    8. Between the **Partitioner** and **Destination** nodes, click the **+** (add node) button.
    9. Click **Transform > Embedder**.
    10. Click the **Embedder** workflow node.
    11. In the **Settings** pane, under **Azure OpenAI**, select **text-embedding-3-large**.
    12. Click any blank space in the workflow designer.
    13. In the workflow settings pane, click **Details**.
    14. For **Name**, click the pencil (edit) icon and then enter some name for the workflow, such as `firecrawl-s3-to-pinecone-workflow`.
        Then click the checkmark (save) icon.
    15. Switch **Active** to on.
    16. Click **Save**.
  </Step>

  <Step title="Run the workflow as a job">
    1. On the sidebar, click **Workflows**.
    2. Next to the name of the workflow you created earlier, click **Run**.
  </Step>

  <Step title="Monitor the job">
    1. On the sidebar, click **Jobs**.
    2. Click the name of the workflow you created earlier.
    3. Monitor the job's progress in the job details pane until the status changes to **Finished**.
  </Step>

  <Step title="View the results">
    1. One way to view the results is to create a file named `pinecone_fetch_from_index.py` in the root of your project directory and add the following code to it:

       ```python  theme={null}
       from pinecone import Pinecone
       import os
       from dotenv import load_dotenv

       load_dotenv()

       pc = Pinecone(api_key=os.getenv("PINECONE_API_KEY"))
       index = pc.Index(os.getenv("PINECONE_INDEX_NAME"))
       ids = []

       # The list_paginated() method returns a dict with 'vectors' key
       paginated_results = index.list_paginated(namespace="default")
       for page in paginated_results['_data_store']['vectors']:
           ids.append(page['id'])

       response = index.fetch(
           ids=ids,
           namespace="default"
       )

       for vector_id, vector_data in response["vectors"].items():
           print(f"ID: {vector_id}\nMetadata: {vector_data.get('metadata')}\n\n")
       ```

    2. From the root of your project, use `uv` to run the script as follows:

       ```bash  theme={null}
       uv run pinecone_fetch_from_index.py
       ```

       The output will show each record's ID and metadata. For brevity and performance,
       the records' vector embeddings are not included in the output.
  </Step>
</Steps>

## Additional resources

* [Mastering Firecrawl's Crawl Endpoint: A Complete Web Scraping Guide](https://www.firecrawl.dev/blog/mastering-the-crawl-endpoint-in-firecrawl)

---

# Source: https://docs.unstructured.io/open-source/installation/full-installation.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Full installation

To install the Unstructured open source library on a local development machine, run one or more of the following commands.

These commands assume that you are using the Python package and project manager
[uv](https://docs.astral.sh/uv/), running within an activated [venv](https://docs.python.org/3/library/venv.html)
virtual environment that was created with `uv`. However, `uv` and `venv` are not required.

To work with all [supported file types](/open-source/introduction/supported-file-types), run:

```bash  theme={null}
uv add "unstructured[all-docs]"
```

To conserve disk space and reduce code dependencies, you can run the following command instead to work with a
default set of supported file types:

```bash  theme={null}
uv add unstructured
```

The preceding command supports plain text files (`.txt`), HTML files (`.html`), XML files (`.xml`), and emails (`.eml`, `.msg`, and `.p7s`) by default.

To further conserve disk space and reduce code dependencies, you can run the following command instead, replacing `<extra>` with the appropriate extra for the target file type:

```bash  theme={null}
uv add "unstructured[<extra>]"
```

The following file type extras are available:

* `all-docs` (for all supported file types in this list)
* `csv` (for `.csv` files only)
* `docx` (for `.doc` and `.docx` files only)
* `epub` (for `.epub` files only)
* `image` (for all supported image file types: `.bmp`, `.heic`, `.jpeg`, `.png`, and `.tiff`)
* `md` (for `.md` files only)
* `odt` (for `.odt` files only)
* `org` (for `.org` files only)
* `pdf` (for `.pdf` files only)
* `pptx` (for `.ppt` and `.pptx` files only)
* `rst` (for `.rst` files only)
* `rtf` (for `.rtf` files only)
* `tsv` (for `.tsv` files only)
* `xlsx` (for `.xls` and `.xlsx` files only)

Note that you can install multiple extras at the same time by separating them with commas, for example:

```bash  theme={null}
uv add "unstructured[pdf,docx]"
```

For maximum compatiblity, you should also install the following system dependencies:

* [libmagic-dev](https://man7.org/linux/man-pages/man3/libmagic.3.html) (for filetype detection)
* [poppler-utils](https://poppler.freedesktop.org/) and [tesseract-ocr](https://github.com/tesseract-ocr/tesseract) (for images and PDFs), and `tesseract-lang` (for additional language support)
* [libreoffice](https://www.libreoffice.org/discover/libreoffice/) (for Microsoft Office documents)
* [pandoc](https://pandoc.org/) (for `.epub`, `.odt`, and `.rtf` files. For `.rtf` files, you must have version 2.14.2 or newer. Running [this script](https://github.com/Unstructured-IO/unstructured/blob/main/scripts/install-pandoc.sh) will install the correct version for you.)

Installation instructured for these system dependencies vary by operating system type. For details, follow the preceding links or see your
operating system's documentation.

---

# Source: https://docs.unstructured.io/examplecode/tools/gcs-events.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Google Cloud Storage event triggers

You can use Google Cloud Storage events, such as adding new files to—or updating existing files within—Google Cloud Storage buckets, to automatically run Unstructured ETL+ workflows
that rely on those buckets as sources. This enables a no-touch approach to having Unstructured automatically process new and updated files in Google Cloud Storage as they are added or updated.

This example shows how to automate this process by adding a custom [Google Apps Script](https://developers.google.com/apps-script) project in your Google account. This project runs
a script on a regular time interval. This script automatically checks for new or updated files within the specified Google Cloud Storage bucket. If the script
detects at least one new or updated file, it then calls the [Unstructured API's workflow operations](/api-reference/workflow/overview) to automatically run the
specified corresponding Unstructured ETL+ workflow in your Unstructured account.

<Note>
  This example uses a custom Google Apps Script that you create and maintain.
  Any issues with file detection, timing, or script execution could be related to your custom script,
  rather than with Unstructured. If you are getting unexpected or no results, be sure to check your custom
  script's execution logs first for any informational and error messages.
</Note>

## Requirements

To use this example, you will need the following:

* An Unstructured account, and an Unstructured API key for your account, as follows:

  1. If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
     After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).

     <Note>
       To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/api-reference/overview#pricing).
     </Note>

  2. If you have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at [https://platform.unstructured.io](https://platform.unstructured.io).

     <Note>
       For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
       or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).
     </Note>

  3. Get your Unstructured API key:<br />

     a. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business** account, click **API Keys** on the sidebar.<br />

     <Note>
       For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
       for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).
     </Note>

     b. Click **Generate API Key**.<br />
     c. Follow the on-screen instructions to finish generating the key.<br />
     d. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

* The Unstructured API's workflow operations URL for your account, as follows:

  1. In the Unstructured UI, click **API Keys** on the sidebar.<br />
  2. Note the value of the **Unstructured API's workflow operations** field.

* A Google Cloud Storage source connector in your Unstructured account. [Learn how](/ui/sources/google-cloud).

* Some available [destination connector](/ui/destinations/overview) in your Unstructured account.

* A workflow that uses the preceding source and destination connectors. [Learn how](/ui/workflows).

* An OAuth 2.0 client ID and client secret to call the Google API, as follows:

  1. Sign in to your [Google Cloud account](https://cloud.google.com).
  2. Go to the [Google Cloud APIs dashboard](https://console.cloud.google.com/apis/dashboard).
  3. Click **+ Enable APIs and services**.
  4. In the **Search for APIs & Services** box, enter `Cloud Storage API`.
  5. In the list of search results, click **Cloud Storage API**.
  6. Make sure that **API Enabled** is shown. If not, click **Enable**.
  7. Go to your [Google Cloud console welcome page](https://console.cloud.google.com/welcome).
  8. In the **Search (/) for resources, docs, products, and more** box, enter `Credentials`.
  9. Click **Credentials (APIs & Services)**.
  10. Click **+ Create credentials > OAuth client ID**.
  11. For **Application type**, select **Web application**.
  12. (Optional) Enter some non-default **Name** for this OAuth 2.0 client to be shown in the list of created clients in your Google Cloud Console.
  13. Click **Create**.
  14. After the OAuth client is created, click **Download JSON** to save the client ID and client secret values to a JSON file on your local
      machine. Store this JSON file in a secure location.

## Step 1: Create the Google Apps Script project

1. Go to [http://script.google.com/](http://script.google.com/).
2. Click **+ New project**.
3. Click the new project's default name (such as **Untitled project**), and change it to something more descriptive, such as **Unstructured Scripts for GCS**.

## Step 2: Add the script

1. With the project still open, on the sidebar, click the **\< >** (**Editor**) icon.

2. In the **Files** tab, click **Code.gs**.

3. Replace the contents of the `Code.gs` file with the following code instead:

   ```javascript  theme={null}
   // Configure the OAuth2 service.
   function getOAuthService() {
     return OAuth2.createService('GCS')
       .setAuthorizationBaseUrl('https://accounts.google.com/o/oauth2/auth')
       .setTokenUrl('https://oauth2.googleapis.com/token')
       .setClientId(CLIENT_ID)
       .setClientSecret(CLIENT_SECRET)
       .setCallbackFunction('authCallback')
       .setPropertyStore(PropertiesService.getUserProperties())
       .setScope(OAUTH_SCOPE);
   }

   // OAuth2 callback function.
   function authCallback(request) {
     const service = getOAuthService();
     const isAuthorized = service.handleCallback(request);
     return HtmlService.createHtmlOutput(isAuthorized ? 'Success!' : 'Denied');
   }

   // Get a valid access token (refreshes automatically if expired).
   function getAccessToken() {
     const service = getOAuthService();
     if (!service.hasAccess()) {
       const authorizationUrl = service.getAuthorizationUrl();
       Logger.log('Open the following URL and re-run the script: %s', authorizationUrl);
       throw new Error('Authorization required. Open the URL in the log.');
     }
     return service.getAccessToken();
   }

   // Main function: checks for new or updated files in the bucket.
   function checkForNewOrUpdatedGCSFiles() {
     const thresholdMillis = 5 * 60 * 1000; // 5 minutes.
     const now = new Date();

     // Get (and refresh if needed) the access token.
     const accessToken = getAccessToken();

     // List objects in the bucket.
     const apiUrl = `https://storage.googleapis.com/storage/v1/b/${BUCKET_PATH}/o`;
     const response = UrlFetchApp.fetch(apiUrl, {
       method: 'get',
       headers: {
         'Authorization': 'Bearer ' + accessToken,
         'Accept': 'application/json'
       }
     });
     const data = JSON.parse(response.getContentText());
     const files = data.items || [];

     for (let i = 0; i < files.length; i++) {
       const file = files[i];
       const fileName = file.name;
       const created = new Date(file.timeCreated);
       const updated = new Date(file.updated);

       const millisSinceCreated = now - created;
       const createdWithinThreshold = millisSinceCreated < thresholdMillis;
       const millisSinceUpdated = now - updated;
       const updatedWithinThreshold = millisSinceUpdated < thresholdMillis;

       console.log('File Name:                       ' + fileName);
       console.log('Created:                         ' + created);
       console.log('Last updated:                    ' + updated);
       console.log('Milliseconds since created:      ' + millisSinceCreated);
       console.log('Milliseconds since last updated: ' + millisSinceUpdated);
       console.log('Created within threshold of ' + thresholdMillis + ' milliseconds? ' + createdWithinThreshold);
       console.log('Updated within threshold of ' + thresholdMillis + ' milliseconds? ' + updatedWithinThreshold);
       console.log('-----');

       if (createdWithinThreshold || updatedWithinThreshold) {
         // Trigger your workflow.
         UrlFetchApp.fetch(UNSTRUCTURED_API_URL, {
           method: 'post',
           headers: {
             'accept': 'application/json',
             'unstructured-api-key': UNSTRUCTURED_API_KEY
           }
         });
         console.log('At least one file created or updated within threshold of ' + thresholdMillis + ' milliseconds.');
         console.log('Unstructured workflow request sent to ' + UNSTRUCTURED_API_URL);
         return;
       }
     }
     console.log('No files created or updated within threshold of ' + thresholdMillis + ' milliseconds. No Unstructured workflow request sent.');
   }

   ```

4. Click the **Save project to Drive** button.

## Step 3: Customize the script for your workflow

1. With the project still open, on the **Files** tab, click the **Add a file** button, and then click **Script**.

2. Name the new file `Constants`. The `.gs` extension is added automatically.

3. Replace the contents of the `Constants.gs` file with the following code instead:

   ```javascript  theme={null}
   const BUCKET_PATH = '<bucket-path>';
   const UNSTRUCTURED_API_URL = '<unstructured-api-url>' + '/workflows/<workflow-id>/run';
   const UNSTRUCTURED_API_KEY = '<unstructured-api-key>';
   const CLIENT_ID = '<client-id>';
   const CLIENT_SECRET = '<client-secret>';
   const OAUTH_SCOPE = 'https://www.googleapis.com/auth/devstorage.read_only'; // Or .read_write or .full_control
   ```

   Replace the following placeholders:

   * Replace `<bucket-path>` with the path to your Google Cloud Storage bucket. This is the same path that you specified
     when you created your Google Cloud Storage source connector in your Unstructured account. Do not include the `gs://` prefix here.
   * Replace `<unstructured-api-url>` with your Unstructured API URL value.
   * Replace `<workflow-id>` with the ID of your Unstructured workflow.
   * Replace `<unstructured-api-key>` with your Unstructured API key value.
   * Replace `<client-id>` with your OAuth 2.0 client ID value.
   * Replace `<client-secret>` with your OAuth 2.0 client secret value.

4. Click the disk (**Save project to Drive**) icon.

## Step 4: Generate an initial OAuth 2.0 access token

1. On the sidebar, click the gear (**Project Settings**) icon.
2. In the **IDs** area, next to **Script ID**, click **Copy** to copy the script's ID value to your system's clipboard.
3. In a separate tab in your web browser, open your [Google Cloud Console welcome page](https://console.cloud.google.com/welcome).
4. In the **Search (/) for resources, docs, products, and more** box, enter `Credentials`.
5. Click **Credentials (APIs & Services)**.
6. In the **OAuth 2.0 client IDs** list, click the link for the client ID that you created earlier in the requirements.
7. Under **Authorized redirect URIs**, click **Add URI**.
8. In the **URIs 1** box, enter `https://script.google.com/macros/d/<script-id>/usercallback`, replacing `<script-id>` with the script's ID value that you copied earlier.
9. Click **Save**.
10. On the original tab in your web browser, with the Google Apps Script project still open to the **Constants.gs** file, on the sidebar, next to **Libraries**, click the **+** (**Add a library**) icon.
11. For **Script ID**, enter `1B7FSrk5Zi6L1rSxxTDgDEUsPzlukDsi4KGuTMorsTQHhGBzBkMun4iDF`, and then click **Look up**.
12. For **Version**, make sure the largest number is selected.
13. Click **Add**.
14. In the sidebar, click the **Code.gs** file to open it.
15. In the file's top navigation bar, select **getAccessToken**.
16. Click the **Run** icon.
17. In the **Execution log** area, next to the message `Open the following URL and re-run the script`, copy the entire URL into
    a separate tab in your web browser and then browse to that URL.
18. When prompted, click **Review permissions**, and follow the on-screen instructions to grant the necessary permissions.

## Step 5: Create the script trigger

1. On the original tab in your web browser, with the Google Apps Script project still open, on the sidebar, click the alarm clock (**Triggers**) icon.

2. Click the **+ Add Trigger** button.

3. Set the following values:

   * For **Choose which function to run**, select `checkForNewOrUpdatedGCSFiles`.

   * For **Choose which deployment should run**, select **Head**.

   * For **Select event source**, select **Time-driven**.

   * For **Select type of time based trigger**, select **Minutes timer**.

   * For **Select minute interval**, select **Every 5 minutes**.

     <Note>
       If you change **Minutes timer** or **Every 5 minutes** to a different interval, you should also go back and change the number `5` in the following
       line of code in the `checkForNewOrUpdatedFiles` function. Change the number `5` to the number of minutes that correspond to the alternate interval you
       selected:

       ```javascript  theme={null}
       const thresholdMillis = 5 * 60 * 1000;
       ```
     </Note>

   * For **Failure notification settings**, select an interval such as immediately, hourly, or daily.

4. Click **Save**.

## Step 6: View trigger results

1. With the Google Apps Script project still open, on the sidebar, click the three lines (**Executions**) icon.

2. As soon as the first script execution completes, you should see a corresponding message appear in the **Executions** list. If the **Status** column shows
   **Completed**, then keep going with this procedure.

   If the **Status** column shows **Failed**, expand the message to
   get any details about the failure. Fix the failure, and then wait for the next script execution to complete.

3. When the **Status** column shows **Completed** then, in your Unstructured account's user interface, click **Jobs** on the sidebar to see if a new job
   is running for that worklow.

   If no new job is running for that workflow, then add at least one new file to—or update at least one existing file within—the Google Cloud Storage bucket,
   within 5 minutes of the next script execution. After the next script execution, check the **Jobs** list again.

## Step 7 (Optional): Delete the trigger

1. To stop the script from automatically executing on a regular basis, with the Google Apps Script project still open, on the sidebar, click the alarm clock (**Triggers**) icon.
2. Rest your mouse pointer on the trigger you created in Step 5.
3. Click the ellipsis (three dots) icon, and then click **Delete trigger**.

---

# Source: https://docs.unstructured.io/api-reference/legacy-api/partition/generate-schema.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Generate a JSON schema for a file

<Note>
  The following information applies to the legacy Unstructured Partition Endpoint.

  Unstructured recommends that you use the
  [on-demand jobs](/api-reference/workflow/overview#run-an-on-demand-job) functionality in the
  [Unstructured API](/api-reference/overview) instead. Unstructured's on-demand jobs provide
  many benefits over the legacy Unstructured Partition Endpoint, including support for:

  * Production-level usage.
  * Multiple local input files in batches.
  * The latest and highest-performing models.
  * Post-transform enrichments.
  * All of Unstructured's chunking strategies.
  * The generation of vector embeddings.

  The Unstructured API also provides support for processing files and data in remote locations.
</Note>

## Task

You want to generate a schema for a JSON file that Unstructured produces, so that you can validate, test,
and document related JSON files across your systems.

## Approach

Use a Python package such as [genson](https://pypi.org/project/genson/) to generate schemas for your
JSON files.

<Info>The `genson` package is not owned or supported by Unstructured. For questions and
requests, see the [Issues](https://github.com/wolverdude/genson/issues) tab of the
`genson` repository in GitHub.</Info>

## Generate a schema from the terminal

<Steps>
  <Step title="Install genson">
    Use [pip](https://pip.pypa.io/en/stable/installation/) to install the [genson](https://pypi.org/project/genson/) package.

    ```bash  theme={null}
    pip install genson
    ```
  </Step>

  <Step title="Install jq">
    By default, `genson` generates the JSON schema as a single string without any line breaks or indented whitespace.

    To pretty-print the schema that `genson` produces, install the [jq](https://jqlang.github.io/jq/) utility.

    <Info>The `jq` utility is not owned or supported by Unstructured. For questions and
    requests, see the [Issues](https://github.com/jqlang/jq/issues) tab of the
    `jq` repository in GitHub.</Info>
  </Step>

  <Step title="Generate the schema">
    1. Run the `genson` command, specifying the path to the input (source) JSON file, and the path to
       the output (target) JSON schema file to be generated. Use `jq` to pretty-print the schema's content
       into the file to be generated.

       ```bash  theme={null}
       genson "/path/to/input/file.json" | jq '.' > "/path/to/output/schema.json"
       ```

    2. You can find the generated JSON schema file in the output path that you specified.
  </Step>
</Steps>

## Generate a schema from Python code

<Steps>
  <Step title="Install dependencies">
    In your Python project, install the [genson](https://pypi.org/project/genson/) package.

    ```bash  theme={null}
    pip install genson
    ```
  </Step>

  <Step title="Add and run the schema generation code">
    1. Set the following local environment variables:

       * Set `LOCAL_FILE_INPUT_PATH` to the local path to the input (source) JSON file.
       * Set `LOCAL_FILE_OUTPUT_PATH` to the local path to the output (target) JSON schema file to be generated.

    2. Add the following Python code file to your project:

       ```python  theme={null}
       import os, json
       from genson import SchemaBuilder

       def json_schema_from_file(
           input_file_path: str,
           output_schema_path: str
       ) -> None:
           try:
               with open(input_file_path, "r") as file:
                   json_data = json.load(file)

               builder = SchemaBuilder()
               builder.add_object(json_data)

               schema = builder.to_schema()

               try:
                   with open(output_schema_path, "w") as schema_file:
                       json.dump(schema, schema_file, indent=2)
               except IOError as e:
                   raise IOError(f"Error writing to output file: {e}")
               
               print(f"JSON schema successfully generated and saved to '{output_schema_path}'.")
           except FileNotFoundError:
               print(f"Error: Input file '{input_file_path}' not found.")
           except IOError as e:
               print(f"I/O error occurred: {e}")
           except Exception as e:
               print(f"An unexpected error occurred: {e}")

       if __name__ == "__main__":
           json_schema_from_file(
               input_file_path=os.getenv("LOCAL_FILE_INPUT_PATH"),
               output_schema_path=os.getenv("LOCAL_FILE_OUTPUT_PATH")
           )
       ```

    3. Run the Python code file.

    4. Check the path specified by `LOCAL_FILE_OUTPUT_PATH` for the generated JSON schema file.
  </Step>
</Steps>

---

# Source: https://docs.unstructured.io/ui/enriching/generative-ocr.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Generative OCR optimization

After partitioning, you can have a vision language model (VLM) optimize the fidelity of text blocks that Unstructured
initially processed during its partitioning phase.

Here are a few examples of Unstructured's output of text blocks that were initially processed, and the more accurate
version of these text blocks that were optimized by using Claude Sonnet 4. Irrelevant lines of output have been omitted here for brevity.

Example 1: Vertical watermarked text

<img src="https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/enriching/generative-ocr-watermark.png?fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=7b0f23e8446dd7ce620e7651c50edeea" alt="Vertical watermarked text example" data-og-width="143" width="143" data-og-height="626" height="626" data-path="img/ui/enriching/generative-ocr-watermark.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/enriching/generative-ocr-watermark.png?w=280&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=15e47c50df569cbb2fe69924355c0c81 280w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/enriching/generative-ocr-watermark.png?w=560&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=62218af951eb0ffaa3db8cad0ac60ad7 560w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/enriching/generative-ocr-watermark.png?w=840&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=b1ea9bf9adc17b1b822445eccd16753b 840w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/enriching/generative-ocr-watermark.png?w=1100&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=7de690d74c56300e08f4caa282dd0e83 1100w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/enriching/generative-ocr-watermark.png?w=1650&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=c4a2b211e386c3c091bb1b0490c52e22 1650w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/enriching/generative-ocr-watermark.png?w=2500&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=9fe7a85cdf4b6b92d8cf8791f8af9163 2500w" />

Before (vertical watermarked text, represented incorrectly):

```json  theme={null}
{
    "...": "...",
    "text": "3 2 0 2 t c O 9 2 ] V C . s c [ 2 v 9 0 8 6 1 . 0 1 3 2 : v i X r",
    "...": "..."
}
```

After (vertical watermarked text, now represented correctly from the original content):

```json  theme={null}
{
    "...": "...",
    "text": "arXiv:2310.16809v2 [cs.CV] 29 Oct 2023",
    "...": "..."
}
```

Example 2: Hyperlink

<img src="https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/enriching/generative-ocr-hyperlink.png?fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=b6839365e38048b688ad2e110198cb14" alt="Hyperlinked text example" data-og-width="507" width="507" data-og-height="188" height="188" data-path="img/ui/enriching/generative-ocr-hyperlink.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/enriching/generative-ocr-hyperlink.png?w=280&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=7676f083d0aae46beae3730507fea500 280w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/enriching/generative-ocr-hyperlink.png?w=560&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=72123904e13ff21a48fc7dfbe9d1b9c2 560w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/enriching/generative-ocr-hyperlink.png?w=840&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=3c45a6ad472957fce954c320e8a866c4 840w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/enriching/generative-ocr-hyperlink.png?w=1100&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=dec0c975b72e6d5b00d60082d05fdcb3 1100w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/enriching/generative-ocr-hyperlink.png?w=1650&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=ff0c0b77aef00bb4add76ba19b7c3ae0 1650w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/enriching/generative-ocr-hyperlink.png?w=2500&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=4e03fb162af79f94c482d061f346ccf9 2500w" />

Before (hyperlink, represented incorrectly):

```json  theme={null}
{
    "...": "...",
    "text": "con/Yuliang-Liu/MultinodalOCR|",
    "...": "..."
}
```

After (hyperlink, now represented correctly from the original content):

```json  theme={null}
{
    "...": "...",
    "text": "https://github.com/Yuliang-Liu/MultimodalOCR",
    "...": "..."
}
```

Example 3: Chinese characters

<img src="https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/enriching/generative-ocr-image.png?fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=d7c523324999fc3da8752920d5fb3795" alt="Chinese characters example" data-og-width="269" width="269" data-og-height="200" height="200" data-path="img/ui/enriching/generative-ocr-image.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/enriching/generative-ocr-image.png?w=280&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=f134ff22d50f41c8394b9d6edbda3f3d 280w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/enriching/generative-ocr-image.png?w=560&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=b00d4d697ece67fe41c24a03cf5cf048 560w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/enriching/generative-ocr-image.png?w=840&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=160372e44d78bbe97e7ccbbfbf542e45 840w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/enriching/generative-ocr-image.png?w=1100&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=5a08bd9af73e30dc858c6e8d1eaa1a44 1100w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/enriching/generative-ocr-image.png?w=1650&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=e3c54125e96cbef101f10b65ca93e696 1650w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/enriching/generative-ocr-image.png?w=2500&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=996c067c1840b4b82eb0f5d25a7d70c6 2500w" />

Before (Chinese characters, represented incorrectly):

```json  theme={null}
{
    "...": "...",
    "text": "GT SHE GPT4-V: EHES",
    "...": "..."
}
```

After (Chinese characters, now represented correctly from the original content, expressed as Unicode):

```json  theme={null}
{
    "...": "...",
    "text": "GT : \u91d1\u724c\u70e7\u814a GPT4-V: \u6587\u9759\u5019\u9e1f",
    "...": "..."
}
```

## Improve text fidelity with generative OCR

To produce generative OCR optimizations, in an **Enrichment** node in a workflow, click the following
in the node's settings pane's **Details** tab:

* **Image** under **Input Type**.

* One of the following providers and models:

  * **Anthropic** under **Provider** and any choice under **Model**
  * **OpenAI** under **Provider** and any choice under **Model**

* **Generative OCR** under **Task**.

<Warning>
  Generative OCR does not process any text blocks by default. You must also explicitly specify which document element
  types containing text that you want generative OCR to process. To do this, in the workflow editor for your workflow:

  1. Click the **Partitioner** node.
  2. In the node's settings pane, scroll down to and then click a blank area inside of the **Extract Image Block Types** list.
  3. Select each [document element types](/ui/document-elements#element-type) that you want generative OCR to process. For this
     walkthrough, select only **NarrativeText**.

  Generative OCR does not process the text of any `Image` or `Table` elements if they have already been processed by
  [image description](#image-description-task) or [table description](#table-description-task) enrichments, respectively. Do
  not remove the **Image** or **Table** document elements types from this **Extract Image Block Types** list, or else
  the image description and table description enrichments in your workflow might produce unexepcted results or might not work at all.
</Warning>

<Note>
  The **Generative OCR** enrichment appears under the **Input Type** of **Image**, even though this is not an image-related enrichment.
  This is a known issue and will be addressed in a future release.
</Note>

<Note>
  You can change a workflow's image description settings only through [Custom](/ui/workflows#create-a-custom-workflow) workflow settings.

  For workflows that use [chunking](/ui/chunking), the **Chunker** node should be placed after all **Enrichment** nodes. Placing the
  **Chunker** node before an image descriptions **Enrichment** node could cause incomplete or no image descriptions to be generated.
</Note>

<Warning>
  Unstructured can produce generative OCR optimizations for workflows that are configured as follows:

  * With a **Partitioner** node set to use the **Auto** or **High Res** partitioning strategy, and a generative OCR optimizations node is added.
  * With a **Partitioner** node set to use the **VLM** partitioning strategy. No generative OCR optimization node is needed (or allowed).

  Unstructured never produces generative OCR optimizations for workflows with a **Partitioner** node set to use the **Fast** partitioning strategy.
</Warning>

---

# Source: https://docs.unstructured.io/support/issues/get-authenticated-user-error.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Get authenticated user error

## Issue

When you try to sign in to Unstructured, you get an error that contains text similar to the following:

```text  theme={null}
Error in getAuthenticatedUser
```

This error can prevent you from accessing your
Unstructured account dashboard or performing other actions in your Unstructured account.

## Possible causes

* Unstructured is having trouble verifying your login session.
* Your Unstructured login session has expired.
* Your web browser's cache or cookies are causing some kind of conflict.
* There is a temporary issue with Unstructured's authentication service.

## Possible solutions

* Log out of your Unstructured account completely, if you can. To do this, in the Unstructured user interface, on the sidebar,
  click your user icon, and then click **Logout**.
* Clear your web browser's cache and cookies for the Unstructured website.
* Close and then reopen your web browser, and then try to sign in again to your Unstructured account.

## Additional resources

To ask questions or get additional help with this issue, see [requesting support](/support/request).

---

# Source: https://docs.unstructured.io/open-source/how-to/get-chunked-elements.md

# Source: https://docs.unstructured.io/api-reference/legacy-api/partition/get-chunked-elements.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Get chunked elements

<Note>
  The following information applies to the legacy Unstructured Partition Endpoint.

  Unstructured recommends that you use the
  [on-demand jobs](/api-reference/workflow/overview#run-an-on-demand-job) functionality in the
  [Unstructured API](/api-reference/overview) instead. Unstructured's on-demand jobs provide
  many benefits over the legacy Unstructured Partition Endpoint, including support for:

  * Production-level usage.
  * Multiple local input files in batches.
  * The latest and highest-performing models.
  * Post-transform enrichments.
  * All of Unstructured's chunking strategies.
  * The generation of vector embeddings.

  The Unstructured API also provides support for processing files and data in remote locations.
</Note>

## Task

You want to get, and print or save, the contents of elements that have been chunked.

## Approach

Chunked elements are typically represented in this format:

```json  theme={null}
{
  "type": "CompositeElement",
  "element_id": "3800f7ff047e97cbf0a5f7df7ad52c80",
  "text": "4,5 Exception noted in \u201cSpecial Considerations for Promotional Labeling and Advertising Material\u201d\n\n3",
  "metadata": {
    "filetype": "application/pdf",
    "languages": ["eng"],
    "page_number": 6,
    "orig_elements": "eJy ... Full content omitted for brevity ... x8=",
    "filename": "Portable-Document-Format-Specifications.pdf"
  }
}
```

```json  theme={null}
{
  "type": "Table",
  "element_id": "758127b42c51b93b59abf08640d1ccab",
  "text": "Gels and karyotypes High pressure liquid chromatography 600 dpi (8 bit grayscale depth) 300 dpi",
  "metadata": {
    "text_as_html": "<table><tbody><tr><td>Gels and karyotypes</td><td>600 dpi (8 bit grayscale depth)</td></tr><tr><td>High pressure liquid chromatography</td><td>300</td></tr></tbody></table>",
    "filetype": "application/pdf",
    "languages": ["eng"],
    "page_number": 8,
    "orig_elements": "eJy ... Full content omitted for brevity ... MnD",
    "filename": "Portable-Document-Format-Specifications.pdf"
}
```

To get the element's chunked content, extract the contents of the element's `text` field.

The chunked content might not contain all of its associated content. To get all of the elements that were used to derive this chunked content, extract the contents of the element's `orig_elements` field, which is nested inside of its parent `metadata` field.

The contents of the `orig_elements` field is in compressed Base64 gzipped format. To get the content, Base64-decode the bytes, decompress them, and then decode them using UTF-8.

## To run this example

You will need to chunk a document during processing. This example uses a PDF file chunked into 200- to 300-character elements.

## Code

For the [Unstructured Python SDK](/api-reference/legacy-api/partition/sdk-python), you'll need:

These environment variables:

* `UNSTRUCTURED_API_KEY` - Your Unstructured API key value.
* `UNSTRUCTURED_API_URL` - Your Unstructured API URL.

```python  theme={null}
from unstructured_client import UnstructuredClient
from unstructured_client.models import operations, shared
from unstructured.staging.base import elements_from_dicts, elements_to_json

import os, base64, zlib
from typing import List, Dict, Any

# Extract the contents of an orig_elements field.
def extract_orig_elements(orig_elements):
    decoded_orig_elements = base64.b64decode(orig_elements)
    decompressed_orig_elements = zlib.decompress(decoded_orig_elements)
    return decompressed_orig_elements.decode('utf-8')

# Source file: https://www.fda.gov/files/drugs/published/Portable-Document-Format-Specifications.pdf
input_filepath = "local-ingest-input-pdf/Portable-Document-Format-Specifications.pdf"
output_filepath = "local-ingest-output-json/Portable-Document-Format-Specifications.json"

client = UnstructuredClient(
    api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")
)

with open(input_filepath, "rb") as f:
    files = shared.Files(
        content=f.read(),
        file_name=input_filepath
    )

# Chunk the document with a basic chunking strategy.
# Create chunks that are at least around 200 characters (soft limit)
# but never more than 300 characters (hard maximum).
req = operations.PartitionRequest(
    shared.PartitionParameters(
        files=files,
        strategy=shared.Strategy.VLM,
        vlm_model="gpt-4o",
        vlm_model_provider="openai",
        split_pdf_page=True,
        split_pdf_allow_failed=True,
        split_pdf_concurrency_level=15,
        chunking_strategy="basic",
        new_after_n_chars=200,
        max_characters=300
    )
)

try:
    res = await client.general.partition_async(
        request=req
    )

    # Create a dictionary that will hold only
    # a transposed version of the returned elements. 
    # For instance, we just want to capture each element's ID,
    # the chunk's text, and the chunk's associated elements in context.
    orig_elements_dict: List[Dict[str, Any]] = []

    for element in res.elements:
        # For each chunk that has an "orig_elements" field...
        if "orig_elements" in element["metadata"]:
            # ...get the chunk's associated elements in context...
            orig_elements = extract_orig_elements(element["metadata"]["orig_elements"])
            # ...and then transpose it and other associated fields into a separate dictionary.
            orig_elements_dict.append({
                "element_id": element["element_id"],
                "text": element["text"],
                "orig_elements": json.loads(orig_elements)
            })
    
    # Convert the elements into a JSON object.
    orig_elements_json = json.dumps(orig_elements_dict, indent=2)

    # Write the JSON to a file.
    with open(output_filepath, "w") as file:
        file.write(orig_elements_json)
except Exception as e:
    print(e)
```

## See also

* [Recovering chunk elements](/open-source/core-functionality/chunking#recovering-chunk-elements)
* [Chunking strategies](/api-reference/legacy-api/partition/chunking)

---

# Source: https://docs.unstructured.io/api-reference/destinations/get-destination-connector.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Get destination connector

> Retrieve detailed information for a specific destination connector by its ID.


## OpenAPI

````yaml https://platform.unstructuredapp.io/openapi.json get /api/v1/destinations/{destination_id}
openapi: 3.1.0
info:
  title: Platform API
  version: 3.1.0
servers:
  - url: https://platform.unstructuredapp.io/
    description: Unstructured Platform API
    x-speakeasy-server-id: platform-api
security: []
paths:
  /api/v1/destinations/{destination_id}:
    get:
      tags:
        - destinations
      summary: Get destination connector
      description: >-
        Retrieve detailed information for a specific destination connector by
        its ID.
      operationId: get_destination
      parameters:
        - name: destination_id
          in: path
          required: true
          schema:
            type: string
            format: uuid
            title: Destination Id
        - name: unstructured-api-key
          in: header
          required: false
          schema:
            anyOf:
              - type: string
              - type: 'null'
            title: Unstructured-Api-Key
      responses:
        '200':
          description: Successful Response
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/DestinationConnectorInformation'
        '422':
          description: Validation Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/HTTPValidationError'
components:
  schemas:
    DestinationConnectorInformation:
      properties:
        id:
          type: string
          format: uuid
          title: Id
        name:
          type: string
          title: Name
        type:
          anyOf:
            - $ref: '#/components/schemas/DestinationConnectorType'
            - type: string
          title: Type
        config:
          anyOf:
            - $ref: '#/components/schemas/AzureDestinationConnectorConfig'
            - $ref: '#/components/schemas/AstraDBConnectorConfig'
            - $ref: '#/components/schemas/AzureAISearchConnectorConfig'
            - $ref: '#/components/schemas/CouchbaseDestinationConnectorConfig'
            - $ref: '#/components/schemas/DatabricksVolumesConnectorConfig'
            - $ref: '#/components/schemas/DatabricksVDTDestinationConnectorConfig'
            - $ref: '#/components/schemas/DeltaTableConnectorConfig'
            - $ref: '#/components/schemas/ElasticsearchConnectorConfig'
            - $ref: '#/components/schemas/GCSDestinationConnectorConfig'
            - $ref: '#/components/schemas/KafkaCloudDestinationConnectorConfig'
            - $ref: '#/components/schemas/MilvusDestinationConnectorConfig'
            - $ref: '#/components/schemas/MongoDBConnectorConfig'
            - $ref: '#/components/schemas/Neo4jDestinationConnectorConfig'
            - $ref: '#/components/schemas/OneDriveDestinationConnectorConfig'
            - $ref: '#/components/schemas/OpenSearchConnectorConfig'
            - $ref: '#/components/schemas/PineconeDestinationConnectorConfig'
            - $ref: '#/components/schemas/PostgresDestinationConnectorConfig'
            - $ref: '#/components/schemas/RedisDestinationConnectorConfig'
            - $ref: '#/components/schemas/QdrantCloudDestinationConnectorConfig'
            - $ref: '#/components/schemas/S3DestinationConnectorConfig'
            - $ref: '#/components/schemas/SnowflakeDestinationConnectorConfig'
            - $ref: '#/components/schemas/TeradataDestinationConnectorConfig'
            - $ref: '#/components/schemas/WeaviateDestinationConnectorConfig'
            - $ref: '#/components/schemas/IBMWatsonxS3DestinationConnectorConfig'
            - additionalProperties: true
              type: object
          title: Config
        created_at:
          type: string
          format: date-time
          title: Created At
        updated_at:
          anyOf:
            - type: string
              format: date-time
            - type: 'null'
          title: Updated At
      type: object
      required:
        - id
        - name
        - type
        - config
        - created_at
      title: DestinationConnectorInformation
    HTTPValidationError:
      properties:
        detail:
          items:
            $ref: '#/components/schemas/ValidationError'
          type: array
          title: Detail
      type: object
      title: HTTPValidationError
    DestinationConnectorType:
      type: string
      enum:
        - azure
        - astradb
        - azure_ai_search
        - couchbase
        - databricks_volumes
        - databricks_volume_delta_tables
        - delta_table
        - elasticsearch
        - gcs
        - kafka-cloud
        - milvus
        - mongodb
        - motherduck
        - neo4j
        - onedrive
        - opensearch
        - pinecone
        - postgres
        - redis
        - qdrant-cloud
        - s3
        - snowflake
        - teradata
        - weaviate-cloud
        - ibm_watsonx_s3
      title: DestinationConnectorType
    AzureDestinationConnectorConfig:
      properties:
        remote_url:
          type: string
          title: Remote Url
        account_name:
          anyOf:
            - type: string
            - type: 'null'
          title: Account Name
        account_key:
          anyOf:
            - type: string
            - type: 'null'
          title: Account Key
        connection_string:
          anyOf:
            - type: string
            - type: 'null'
          title: Connection String
        sas_token:
          anyOf:
            - type: string
            - type: 'null'
          title: Sas Token
      type: object
      required:
        - remote_url
      title: AzureDestinationConnectorConfig
    AstraDBConnectorConfig:
      properties:
        collection_name:
          type: string
          title: Collection Name
        keyspace:
          anyOf:
            - type: string
            - type: 'null'
          title: Keyspace
        batch_size:
          type: integer
          title: Batch Size
        binary_encode_vectors:
          type: boolean
          title: Binary Encode Vectors
          default: true
        api_endpoint:
          type: string
          title: Api Endpoint
        token:
          type: string
          title: Token
      additionalProperties: true
      type: object
      required:
        - collection_name
        - batch_size
        - api_endpoint
        - token
      title: AstraDBConnectorConfig
    AzureAISearchConnectorConfig:
      properties:
        endpoint:
          type: string
          title: Endpoint
        index:
          type: string
          title: Index
        key:
          type: string
          title: Key
      type: object
      required:
        - endpoint
        - index
        - key
      title: AzureAISearchConnectorConfig
    CouchbaseDestinationConnectorConfig:
      properties:
        bucket:
          type: string
          title: Bucket
        connection_string:
          type: string
          title: Connection String
        scope:
          anyOf:
            - type: string
            - type: 'null'
          title: Scope
        collection:
          anyOf:
            - type: string
            - type: 'null'
          title: Collection
        batch_size:
          type: integer
          title: Batch Size
        username:
          type: string
          title: Username
        password:
          type: string
          title: Password
      type: object
      required:
        - bucket
        - connection_string
        - batch_size
        - username
        - password
      title: CouchbaseDestinationConnectorConfig
    DatabricksVolumesConnectorConfig:
      properties:
        host:
          type: string
          title: Host
        catalog:
          type: string
          title: Catalog
        schema:
          type: string
          title: Schema
          default: default
        volume:
          type: string
          title: Volume
        volume_path:
          type: string
          title: Volume Path
        client_secret:
          type: string
          title: Client Secret
        client_id:
          type: string
          title: Client Id
      type: object
      required:
        - host
        - catalog
        - volume
        - volume_path
        - client_secret
        - client_id
      title: DatabricksVolumesConnectorConfig
    DatabricksVDTDestinationConnectorConfig:
      properties:
        server_hostname:
          type: string
          title: Server Hostname
        http_path:
          type: string
          title: Http Path
        token:
          anyOf:
            - type: string
            - type: 'null'
          title: Token
        client_id:
          anyOf:
            - type: string
            - type: 'null'
          title: Client Id
        client_secret:
          anyOf:
            - type: string
            - type: 'null'
          title: Client Secret
        catalog:
          type: string
          title: Catalog
        database:
          type: string
          title: Database
          default: default
        table_name:
          anyOf:
            - type: string
            - type: 'null'
          title: Table Name
        schema:
          type: string
          title: Schema
          default: default
        volume:
          type: string
          title: Volume
        volume_path:
          anyOf:
            - type: string
            - type: 'null'
          title: Volume Path
      type: object
      required:
        - server_hostname
        - http_path
        - catalog
        - volume
      title: DatabricksVDTDestinationConnectorConfig
    DeltaTableConnectorConfig:
      properties:
        aws_access_key_id:
          type: string
          title: Aws Access Key Id
        aws_secret_access_key:
          type: string
          title: Aws Secret Access Key
        aws_region:
          type: string
          title: Aws Region
        table_uri:
          type: string
          title: Table Uri
      type: object
      required:
        - aws_access_key_id
        - aws_secret_access_key
        - aws_region
        - table_uri
      title: DeltaTableConnectorConfig
    ElasticsearchConnectorConfig:
      properties:
        hosts:
          items:
            type: string
          type: array
          title: Hosts
        index_name:
          type: string
          title: Index Name
        es_api_key:
          type: string
          title: Es Api Key
      type: object
      required:
        - hosts
        - index_name
        - es_api_key
      title: ElasticsearchConnectorConfig
    GCSDestinationConnectorConfig:
      properties:
        remote_url:
          type: string
          title: Remote Url
        service_account_key:
          type: string
          title: Service Account Key
      type: object
      required:
        - remote_url
        - service_account_key
      title: GCSDestinationConnectorConfig
    KafkaCloudDestinationConnectorConfig:
      properties:
        bootstrap_servers:
          type: string
          title: Bootstrap Servers
        port:
          type: integer
          title: Port
        group_id:
          anyOf:
            - type: string
            - type: 'null'
          title: Group Id
        topic:
          type: string
          title: Topic
        kafka_api_key:
          type: string
          title: Kafka Api Key
        secret:
          type: string
          title: Secret
        batch_size:
          type: integer
          title: Batch Size
      type: object
      required:
        - bootstrap_servers
        - port
        - topic
        - kafka_api_key
        - secret
        - batch_size
      title: KafkaCloudDestinationConnectorConfig
    MilvusDestinationConnectorConfig:
      properties:
        uri:
          type: string
          title: Uri
        user:
          anyOf:
            - type: string
            - type: 'null'
          title: User
        token:
          anyOf:
            - type: string
            - type: 'null'
          title: Token
        password:
          anyOf:
            - type: string
            - type: 'null'
          title: Password
        db_name:
          anyOf:
            - type: string
            - type: 'null'
          title: Db Name
        collection_name:
          type: string
          title: Collection Name
        record_id_key:
          type: string
          title: Record Id Key
      type: object
      required:
        - uri
        - collection_name
        - record_id_key
      title: MilvusDestinationConnectorConfig
    MongoDBConnectorConfig:
      properties:
        database:
          type: string
          title: Database
        collection:
          type: string
          title: Collection
        uri:
          type: string
          title: Uri
      type: object
      required:
        - database
        - collection
        - uri
      title: MongoDBConnectorConfig
    Neo4jDestinationConnectorConfig:
      properties:
        uri:
          type: string
          title: Uri
        database:
          type: string
          title: Database
        username:
          type: string
          title: Username
        password:
          type: string
          title: Password
        batch_size:
          type: integer
          title: Batch Size
      type: object
      required:
        - uri
        - database
        - username
        - password
        - batch_size
      title: Neo4jDestinationConnectorConfig
    OneDriveDestinationConnectorConfig:
      properties:
        client_id:
          type: string
          title: Client Id
        user_pname:
          type: string
          title: User Pname
        tenant:
          type: string
          title: Tenant
        authority_url:
          type: string
          title: Authority Url
        client_cred:
          type: string
          title: Client Cred
        remote_url:
          type: string
          title: Remote Url
      type: object
      required:
        - client_id
        - user_pname
        - tenant
        - authority_url
        - client_cred
        - remote_url
      title: OneDriveDestinationConnectorConfig
    OpenSearchConnectorConfig:
      properties:
        hosts:
          items:
            type: string
          type: array
          minItems: 1
          title: Hosts
          description: List of OpenSearch hosts to connect to
        index_name:
          type: string
          title: Index Name
          description: Name of the OpenSearch index to read from or write to
        username:
          anyOf:
            - type: string
            - type: 'null'
          title: Username
          description: Username for basic authentication
        password:
          anyOf:
            - type: string
            - type: 'null'
          title: Password
          description: Password for basic authentication
        aws_access_key_id:
          anyOf:
            - type: string
            - type: 'null'
          title: Aws Access Key Id
          description: >-
            AWS access key ID for IAM authentication. When provided with
            aws_secret_access_key, IAM authentication is used instead of basic
            auth. Region and service type are auto-detected from the host URL.
        aws_secret_access_key:
          anyOf:
            - type: string
            - type: 'null'
          title: Aws Secret Access Key
          description: >-
            AWS secret access key for IAM authentication. Required when
            aws_access_key_id is provided.
        aws_session_token:
          anyOf:
            - type: string
            - type: 'null'
          title: Aws Session Token
          description: >-
            AWS session token for temporary credentials (optional). Only used
            when aws_access_key_id and aws_secret_access_key are provided.
        use_ssl:
          anyOf:
            - type: boolean
            - type: 'null'
          title: Use Ssl
          description: Whether to use SSL for the connection
          default: true
      type: object
      required:
        - hosts
        - index_name
      title: OpenSearchConnectorConfig
      description: >-
        OpenSearch connector configuration.


        OpenSearch is a fork of Elasticsearch with similar functionality.

        Supports two authentication methods:

        1. Basic auth: username + password

        2. AWS IAM auth: aws_access_key_id + aws_secret_access_key (+ optional
        aws_session_token)


        For AWS OpenSearch Service or OpenSearch Serverless, region and service
        type

        are auto-detected from the host URL.
    PineconeDestinationConnectorConfig:
      properties:
        index_name:
          type: string
          title: Index Name
        api_key:
          type: string
          title: Api Key
        namespace:
          type: string
          title: Namespace
        batch_size:
          type: integer
          title: Batch Size
      type: object
      required:
        - index_name
        - api_key
        - namespace
        - batch_size
      title: PineconeDestinationConnectorConfig
    PostgresDestinationConnectorConfig:
      properties:
        host:
          type: string
          title: Host
        database:
          type: string
          title: Database
        port:
          type: integer
          title: Port
        username:
          type: string
          title: Username
        password:
          type: string
          title: Password
        table_name:
          type: string
          title: Table Name
        batch_size:
          type: integer
          title: Batch Size
      type: object
      required:
        - host
        - database
        - port
        - username
        - password
        - table_name
        - batch_size
      title: PostgresDestinationConnectorConfig
    RedisDestinationConnectorConfig:
      properties:
        host:
          type: string
          title: Host
        port:
          type: integer
          title: Port
        username:
          anyOf:
            - type: string
            - type: 'null'
          title: Username
        password:
          anyOf:
            - type: string
            - type: 'null'
          title: Password
        uri:
          anyOf:
            - type: string
            - type: 'null'
          title: Uri
        database:
          type: integer
          title: Database
        ssl:
          type: boolean
          title: Ssl
        batch_size:
          type: integer
          title: Batch Size
      type: object
      required:
        - host
        - port
        - database
        - ssl
        - batch_size
      title: RedisDestinationConnectorConfig
    QdrantCloudDestinationConnectorConfig:
      properties:
        url:
          type: string
          title: Url
        api_key:
          type: string
          title: Api Key
        collection_name:
          type: string
          title: Collection Name
        batch_size:
          type: integer
          title: Batch Size
      type: object
      required:
        - url
        - api_key
        - collection_name
        - batch_size
      title: QdrantCloudDestinationConnectorConfig
    S3DestinationConnectorConfig:
      properties:
        remote_url:
          type: string
          title: Remote Url
        anonymous:
          type: boolean
          title: Anonymous
        key:
          anyOf:
            - type: string
            - type: 'null'
          title: Key
        secret:
          anyOf:
            - type: string
            - type: 'null'
          title: Secret
        token:
          anyOf:
            - type: string
            - type: 'null'
          title: Token
        endpoint_url:
          anyOf:
            - type: string
            - type: 'null'
          title: Endpoint Url
      type: object
      required:
        - remote_url
        - anonymous
      title: S3DestinationConnectorConfig
    SnowflakeDestinationConnectorConfig:
      properties:
        account:
          type: string
          title: Account
        role:
          type: string
          title: Role
        user:
          type: string
          title: User
        password:
          type: string
          title: Password
        host:
          type: string
          title: Host
        port:
          type: integer
          minimum: 1
          title: Port
          default: 443
        database:
          type: string
          title: Database
        schema:
          type: string
          title: Schema
        table_name:
          type: string
          title: Table Name
          default: elements
        batch_size:
          type: integer
          minimum: 1
          title: Batch Size
          default: 50
        record_id_key:
          anyOf:
            - type: string
            - type: 'null'
          title: Record Id Key
          default: record_id
      type: object
      required:
        - account
        - role
        - user
        - password
        - host
        - database
        - schema
      title: SnowflakeDestinationConnectorConfig
    TeradataDestinationConnectorConfig:
      properties:
        host:
          type: string
          title: Host
        user:
          type: string
          title: User
        password:
          type: string
          title: Password
        database:
          anyOf:
            - type: string
            - type: 'null'
          title: Database
        dbs_port:
          type: integer
          minimum: 1
          title: Dbs Port
          default: 1025
        table_name:
          type: string
          title: Table Name
          default: elements
        batch_size:
          type: integer
          minimum: 1
          title: Batch Size
          default: 50
        record_id_key:
          anyOf:
            - type: string
            - type: 'null'
          title: Record Id Key
          default: record_id
      type: object
      required:
        - host
        - user
        - password
      title: TeradataDestinationConnectorConfig
    WeaviateDestinationConnectorConfig:
      properties:
        cluster_url:
          type: string
          title: Cluster Url
        api_key:
          type: string
          title: Api Key
        collection:
          anyOf:
            - type: string
            - type: 'null'
          title: Collection
      type: object
      required:
        - cluster_url
        - api_key
      title: WeaviateDestinationConnectorConfig
    IBMWatsonxS3DestinationConnectorConfig:
      properties:
        iam_api_key:
          type: string
          title: Iam Api Key
        access_key_id:
          type: string
          title: Access Key Id
        secret_access_key:
          type: string
          title: Secret Access Key
        iceberg_endpoint:
          type: string
          title: Iceberg Endpoint
        object_storage_endpoint:
          type: string
          title: Object Storage Endpoint
        object_storage_region:
          type: string
          title: Object Storage Region
        catalog:
          type: string
          title: Catalog
        max_retries_connection:
          type: integer
          title: Max Retries Connection
        namespace:
          type: string
          title: Namespace
        table:
          type: string
          title: Table
        max_retries:
          type: integer
          title: Max Retries
        record_id_key:
          type: string
          title: Record Id Key
      type: object
      required:
        - iam_api_key
        - access_key_id
        - secret_access_key
        - iceberg_endpoint
        - object_storage_endpoint
        - object_storage_region
        - catalog
        - max_retries_connection
        - namespace
        - table
        - max_retries
        - record_id_key
      title: IBMWatsonxS3DestinationConnectorConfig
    ValidationError:
      properties:
        loc:
          items:
            anyOf:
              - type: string
              - type: integer
          type: array
          title: Location
        msg:
          type: string
          title: Message
        type:
          type: string
          title: Error Type
      type: object
      required:
        - loc
        - msg
        - type
      title: ValidationError

````

---

# Source: https://docs.unstructured.io/open-source/how-to/get-elements.md

# Source: https://docs.unstructured.io/api-reference/legacy-api/partition/get-elements.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Get element contents

<Note>
  The following information applies to the legacy Unstructured Partition Endpoint.

  Unstructured recommends that you use the
  [on-demand jobs](/api-reference/workflow/overview#run-an-on-demand-job) functionality in the
  [Unstructured API](/api-reference/overview) instead. Unstructured's on-demand jobs provide
  many benefits over the legacy Unstructured Partition Endpoint, including support for:

  * Production-level usage.
  * Multiple local input files in batches.
  * The latest and highest-performing models.
  * Post-transform enrichments.
  * All of Unstructured's chunking strategies.
  * The generation of vector embeddings.

  The Unstructured API also provides support for processing files and data in remote locations.
</Note>

## Task

You want to get, manipulate, and print or save, the contents of the [document elements and metadata](/api-reference/legacy-api/partition/document-elements) from the processed data that Unstructured returns.

## Approach

Each element in the document elements contains fields for that element's type, its ID, the extracted text, and associated metadata.

The programmatic approach you take to get these document elements will depend on which SDK you use:

<AccordionGroup>
  <Accordion title="Python SDK">
    For the [Unstructured Python SDK](/api-reference/legacy-api/partition/sdk-python), calling an `UnstructuredClient` object's `general.partition_async` method returns a `PartitionResponse` object.

    This `PartitionResponse` object's `elements` variable contains a list of key-value dictionaries (`List[Dict[str, Any]]`). For example:

    ```python Python theme={null}
    # ...

    res = await client.general.partition_async(request=req)

    # Do something with the elements, for example:
    save_elements_to_file(res.elements)

    # ...
    ```

    You can use standard Python [list operations](https://docs.python.org/3/tutorial/datastructures.html#more-on-lists) on this list.

    You can also use standard Python [looping techniques](https://docs.python.org/3/tutorial/datastructures.html#looping-techniques) on this list to access each element in this list.

    To work with an individual element's contents, you can use standard [dictionary operations](https://docs.python.org/3/library/stdtypes.html#dict) on the element.

    For example:

    ```python Python theme={null}
    # ...

    res = await client.general.partition_async(request=req)

    for element in res.elements:
        # Do something with each element, for example:
        save_element_to_database(f"{element["element_id"]}")
        save_element_to_database(f"{element["text"]}")
        save_element_to_database(f"{element["metadata"]["filename"]}\n")

    # ...
    ```

    To serialize this list as JSON, you can:

    1. Use the `elements_from_dicts` function to convert the list of key-value dictionaries (`Iterable[Dict[str, Any]]`) into a list of elements (`Iterable[Element]`).
    2. Use the `elements_to_json` function to convert the list of elements into a JSON-formatted string and then print or save that string.

    For example:

    ```python Python theme={null}
    from unstructured.staging.base import elements_from_dicts, elements_to_json

    # ...

    res = await client.general.partition_async(request=req)

    dict_elements = elements_from_dicts(
        element_dicts=res.elements
    )

    elements_to_json(
        elements=dict_elements,
        indent=2,
        filename=output_filepath
    )
    # ...
    ```
  </Accordion>

  <Accordion title="JavaScript/TypeScript SDK">
    For the [Unstructured JavaScript/TypeScript SDK](/api-reference/legacy-api/partition/sdk-jsts), calling an `UnstructuredClient` object's `general.partition` method returns a `Promise<PartitionResponse>` object.

    This `PartitionResponse` object's `elements` property contains an `Array` of string-value objects (`{ [k: string]: any; }[]`). For example:

    ```typescript TypeScript theme={null}
    // ...

    client.general.partition({
        partitionParameters: {
            files: {
                content: data,
                fileName: inputFilepath
            },
            strategy: Strategy.HiRes,
            splitPdfPage: true,
            splitPdfAllowFailed: true,
            splitPdfConcurrencyLevel: 15
        }
    }).then((res) => {
        if (res.statusCode == 200) {
            // Do something with the elements, for example:
            saveElementsToFile(res)
        }
    } // ...
    ```

    You can use standard [Array operations](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array) on this array.

    You can also use standard `Array` techniques such as [forEach](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/forEach) to access each object in this array. For example:

    ```typescript TypeScript theme={null}
    // ...

    client.general.partition({
        partitionParameters: {
            files: {
                content: data,
                fileName: inputFilepath
            },
            strategy: Strategy.HiRes,
            splitPdfPage: true,
            splitPdfAllowFailed: true,
            splitPdfConcurrencyLevel: 15
        }
    }).then((res) => {
        if (res.statusCode == 200) {
            res.forEach(element => {
                // Do something with each element, for example:
                saveElementToDatabase(`${element["element_id"]}`)
                saveElementToDatabase(`${element["text"]}`)
                saveElementToDatabase(`${element["metadata"]["filename"]}`)
            }
        }
    } // ...
    ```

    To serialize this list as JSON, you can use the standard [JSON.stringify](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/stringify) function to serialize it to JSON-formatted string and the Node.js [fs.WriteFileSync](https://nodejs.org/api/fs.html#fswritefilesyncfile-data-options) function to save it as a file. For example:

    ```typescript TypeScript theme={null}
    // ...

    client.general.partition({
        partitionParameters: {
            files: {
                content: data,
                fileName: inputFilepath
            },
            strategy: Strategy.HiRes,
            splitPdfPage: true,
            splitPdfAllowFailed: true,
            splitPdfConcurrencyLevel: 15
        }
    }).then((res) => {
        if (res.statusCode == 200) {    
            const jsonElements = JSON.stringify(res, null, 2)

            fs.writeFileSync(outputFilepath, jsonElements)
        }
    } // ...
    ```
  </Accordion>
</AccordionGroup>

---

# Source: https://docs.unstructured.io/api-reference/jobs/get-job-failed-files.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Get Job Failed Files

> Retrieve failed files for a specific job by its ID.


## OpenAPI

````yaml https://platform.unstructuredapp.io/openapi.json get /api/v1/jobs/{job_id}/failed-files
openapi: 3.1.0
info:
  title: Platform API
  version: 3.1.0
servers:
  - url: https://platform.unstructuredapp.io/
    description: Unstructured Platform API
    x-speakeasy-server-id: platform-api
security: []
paths:
  /api/v1/jobs/{job_id}/failed-files:
    get:
      tags:
        - jobs
      summary: Get Job Failed Files
      description: Retrieve failed files for a specific job by its ID.
      operationId: get_job_failed_files
      parameters:
        - name: job_id
          in: path
          required: true
          schema:
            type: string
            format: uuid
            title: Job Id
        - name: unstructured-api-key
          in: header
          required: false
          schema:
            anyOf:
              - type: string
              - type: 'null'
            title: Unstructured-Api-Key
      responses:
        '200':
          description: Successful Response
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/JobFailedFiles'
        '422':
          description: Validation Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/HTTPValidationError'
components:
  schemas:
    JobFailedFiles:
      properties:
        failed_files:
          items:
            $ref: '#/components/schemas/FailedFile'
          type: array
          title: Failed Files
      type: object
      required:
        - failed_files
      title: JobFailedFiles
    HTTPValidationError:
      properties:
        detail:
          items:
            $ref: '#/components/schemas/ValidationError'
          type: array
          title: Detail
      type: object
      title: HTTPValidationError
    FailedFile:
      properties:
        document:
          type: string
          title: Document
        error:
          type: string
          title: Error
      type: object
      required:
        - document
        - error
      title: FailedFile
    ValidationError:
      properties:
        loc:
          items:
            anyOf:
              - type: string
              - type: integer
          type: array
          title: Location
        msg:
          type: string
          title: Message
        type:
          type: string
          title: Error Type
      type: object
      required:
        - loc
        - msg
        - type
      title: ValidationError

````

---

# Source: https://docs.unstructured.io/api-reference/jobs/get-job-processing-details.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Get Job processing details

> Retrieve processing details for a specific job by its ID.


## OpenAPI

````yaml https://platform.unstructuredapp.io/openapi.json get /api/v1/jobs/{job_id}/details
openapi: 3.1.0
info:
  title: Platform API
  version: 3.1.0
servers:
  - url: https://platform.unstructuredapp.io/
    description: Unstructured Platform API
    x-speakeasy-server-id: platform-api
security: []
paths:
  /api/v1/jobs/{job_id}/details:
    get:
      tags:
        - jobs
      summary: Get Job processing details
      description: Retrieve processing details for a specific job by its ID.
      operationId: get_job_details
      parameters:
        - name: job_id
          in: path
          required: true
          schema:
            type: string
            format: uuid
            title: Job Id
        - name: unstructured-api-key
          in: header
          required: false
          schema:
            anyOf:
              - type: string
              - type: 'null'
            title: Unstructured-Api-Key
      responses:
        '200':
          description: Successful Response
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/JobDetails'
        '422':
          description: Validation Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/HTTPValidationError'
components:
  schemas:
    JobDetails:
      properties:
        id:
          type: string
          format: uuid
          title: Id
        processing_status:
          $ref: '#/components/schemas/JobProcessingStatus'
        node_stats:
          items:
            $ref: '#/components/schemas/JobNodeDetails'
          type: array
          title: Node Stats
        message:
          anyOf:
            - type: string
            - type: 'null'
          title: Message
      type: object
      required:
        - id
        - processing_status
        - node_stats
      title: JobDetails
    HTTPValidationError:
      properties:
        detail:
          items:
            $ref: '#/components/schemas/ValidationError'
          type: array
          title: Detail
      type: object
      title: HTTPValidationError
    JobProcessingStatus:
      type: string
      enum:
        - SCHEDULED
        - IN_PROGRESS
        - SUCCESS
        - COMPLETED_WITH_ERRORS
        - STOPPED
        - FAILED
      title: JobProcessingStatus
    JobNodeDetails:
      properties:
        node_name:
          anyOf:
            - type: string
            - type: 'null'
          title: Node Name
        node_type:
          anyOf:
            - type: string
            - type: 'null'
          title: Node Type
        node_subtype:
          anyOf:
            - type: string
            - type: 'null'
          title: Node Subtype
        ready:
          type: integer
          title: Ready
        in_progress:
          type: integer
          title: In Progress
        success:
          type: integer
          title: Success
        failure:
          type: integer
          title: Failure
      type: object
      required:
        - ready
        - in_progress
        - success
        - failure
      title: JobNodeDetails
    ValidationError:
      properties:
        loc:
          items:
            anyOf:
              - type: string
              - type: integer
          type: array
          title: Location
        msg:
          type: string
          title: Message
        type:
          type: string
          title: Error Type
      type: object
      required:
        - loc
        - msg
        - type
      title: ValidationError

````

---

# Source: https://docs.unstructured.io/api-reference/jobs/get-job.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Get Job

> Retrieve detailed information for a specific job by its ID.


## OpenAPI

````yaml https://platform.unstructuredapp.io/openapi.json get /api/v1/jobs/{job_id}
openapi: 3.1.0
info:
  title: Platform API
  version: 3.1.0
servers:
  - url: https://platform.unstructuredapp.io/
    description: Unstructured Platform API
    x-speakeasy-server-id: platform-api
security: []
paths:
  /api/v1/jobs/{job_id}:
    get:
      tags:
        - jobs
      summary: Get Job
      description: Retrieve detailed information for a specific job by its ID.
      operationId: get_job
      parameters:
        - name: job_id
          in: path
          required: true
          schema:
            type: string
            format: uuid
            title: Job Id
        - name: unstructured-api-key
          in: header
          required: false
          schema:
            anyOf:
              - type: string
              - type: 'null'
            title: Unstructured-Api-Key
      responses:
        '200':
          description: Successful Response
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/JobInformation'
        '422':
          description: Validation Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/HTTPValidationError'
components:
  schemas:
    JobInformation:
      properties:
        id:
          type: string
          format: uuid
          title: Id
        workflow_id:
          type: string
          format: uuid
          title: Workflow Id
        workflow_name:
          type: string
          title: Workflow Name
        status:
          $ref: '#/components/schemas/JobStatus'
        created_at:
          type: string
          format: date-time
          title: Created At
        runtime:
          anyOf:
            - type: string
              format: duration
            - type: 'null'
          title: Runtime
        input_file_ids:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Input File Ids
        output_node_files:
          anyOf:
            - items:
                $ref: '#/components/schemas/NodeFileMetadata'
              type: array
            - type: 'null'
          title: Output Node Files
        job_type:
          $ref: '#/components/schemas/WorkflowJobType'
          default: ephemeral
      type: object
      required:
        - id
        - workflow_id
        - workflow_name
        - status
        - created_at
      title: JobInformation
    HTTPValidationError:
      properties:
        detail:
          items:
            $ref: '#/components/schemas/ValidationError'
          type: array
          title: Detail
      type: object
      title: HTTPValidationError
    JobStatus:
      type: string
      enum:
        - SCHEDULED
        - IN_PROGRESS
        - COMPLETED
        - STOPPED
        - FAILED
      title: JobStatus
    NodeFileMetadata:
      properties:
        node_id:
          type: string
          format: uuid
          title: Node Id
        file_id:
          type: string
          title: File Id
        node_type:
          type: string
          title: Node Type
        node_subtype:
          type: string
          title: Node Subtype
      type: object
      required:
        - node_id
        - file_id
        - node_type
        - node_subtype
      title: NodeFileMetadata
    WorkflowJobType:
      type: string
      enum:
        - ephemeral
        - persistent
        - scheduled
        - template
      title: WorkflowJobType
    ValidationError:
      properties:
        loc:
          items:
            anyOf:
              - type: string
              - type: integer
          type: array
          title: Location
        msg:
          type: string
          title: Message
        type:
          type: string
          title: Error Type
      type: object
      required:
        - loc
        - msg
        - type
      title: ValidationError

````

---

# Source: https://docs.unstructured.io/api-reference/sources/get-source-connector.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Get source connector

> Retrieve detailed information for a specific source connector by its ID.


## OpenAPI

````yaml https://platform.unstructuredapp.io/openapi.json get /api/v1/sources/{source_id}
openapi: 3.1.0
info:
  title: Platform API
  version: 3.1.0
servers:
  - url: https://platform.unstructuredapp.io/
    description: Unstructured Platform API
    x-speakeasy-server-id: platform-api
security: []
paths:
  /api/v1/sources/{source_id}:
    get:
      tags:
        - sources
      summary: Get source connector
      description: Retrieve detailed information for a specific source connector by its ID.
      operationId: get_source
      parameters:
        - name: source_id
          in: path
          required: true
          schema:
            type: string
            format: uuid
            title: Source Id
        - name: unstructured-api-key
          in: header
          required: false
          schema:
            anyOf:
              - type: string
              - type: 'null'
            title: Unstructured-Api-Key
      responses:
        '200':
          description: Successful Response
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/SourceConnectorInformation'
        '422':
          description: Validation Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/HTTPValidationError'
components:
  schemas:
    SourceConnectorInformation:
      properties:
        id:
          type: string
          format: uuid
          title: Id
        name:
          type: string
          title: Name
        type:
          anyOf:
            - $ref: '#/components/schemas/SourceConnectorType'
            - type: string
          title: Type
        config:
          anyOf:
            - $ref: '#/components/schemas/AzureSourceConnectorConfig'
            - $ref: '#/components/schemas/BoxSourceConnectorConfig'
            - $ref: '#/components/schemas/ConfluenceSourceConnectorConfig'
            - $ref: '#/components/schemas/CouchbaseSourceConnectorConfig'
            - $ref: '#/components/schemas/DatabricksVolumesConnectorConfig'
            - $ref: '#/components/schemas/DropboxSourceConnectorConfig'
            - $ref: '#/components/schemas/ElasticsearchConnectorConfig'
            - $ref: '#/components/schemas/GCSSourceConnectorConfig'
            - $ref: '#/components/schemas/GoogleDriveSourceConnectorConfig'
            - $ref: '#/components/schemas/KafkaCloudSourceConnectorConfig'
            - $ref: '#/components/schemas/MongoDBConnectorConfig'
            - $ref: '#/components/schemas/OneDriveSourceConnectorConfig'
            - $ref: '#/components/schemas/OpenSearchConnectorConfig'
            - $ref: '#/components/schemas/OutlookSourceConnectorConfig'
            - $ref: '#/components/schemas/PostgresSourceConnectorConfig'
            - $ref: '#/components/schemas/S3SourceConnectorConfig'
            - $ref: '#/components/schemas/SalesforceSourceConnectorConfig'
            - $ref: '#/components/schemas/SharePointSourceConnectorConfig'
            - $ref: '#/components/schemas/SnowflakeSourceConnectorConfig'
            - $ref: '#/components/schemas/TeradataSourceConnectorConfig'
            - $ref: '#/components/schemas/JiraSourceConnectorConfig'
            - $ref: '#/components/schemas/ZendeskSourceConnectorConfig'
            - additionalProperties: true
              type: object
          title: Config
        created_at:
          type: string
          format: date-time
          title: Created At
        updated_at:
          anyOf:
            - type: string
              format: date-time
            - type: 'null'
          title: Updated At
      type: object
      required:
        - id
        - name
        - type
        - config
        - created_at
      title: SourceConnectorInformation
    HTTPValidationError:
      properties:
        detail:
          items:
            $ref: '#/components/schemas/ValidationError'
          type: array
          title: Detail
      type: object
      title: HTTPValidationError
    SourceConnectorType:
      type: string
      enum:
        - azure
        - box
        - confluence
        - couchbase
        - databricks_volumes
        - dropbox
        - elasticsearch
        - gcs
        - google_drive
        - kafka-cloud
        - mongodb
        - onedrive
        - opensearch
        - outlook
        - postgres
        - s3
        - salesforce
        - sharepoint
        - slack
        - snowflake
        - teradata
        - jira
        - zendesk
      title: SourceConnectorType
    AzureSourceConnectorConfig:
      properties:
        remote_url:
          type: string
          title: Remote Url
        account_name:
          anyOf:
            - type: string
            - type: 'null'
          title: Account Name
        account_key:
          anyOf:
            - type: string
            - type: 'null'
          title: Account Key
        connection_string:
          anyOf:
            - type: string
            - type: 'null'
          title: Connection String
        sas_token:
          anyOf:
            - type: string
            - type: 'null'
          title: Sas Token
        recursive:
          type: boolean
          title: Recursive
      type: object
      required:
        - remote_url
        - recursive
      title: AzureSourceConnectorConfig
    BoxSourceConnectorConfig:
      properties:
        box_app_config:
          type: string
          title: Box App Config
        recursive:
          type: boolean
          title: Recursive
      type: object
      required:
        - box_app_config
        - recursive
      title: BoxSourceConnectorConfig
    ConfluenceSourceConnectorConfig:
      properties:
        url:
          type: string
          title: Url
        username:
          type: string
          title: Username
        password:
          anyOf:
            - type: string
            - type: 'null'
          title: Password
        api_token:
          anyOf:
            - type: string
            - type: 'null'
          title: Api Token
        token:
          anyOf:
            - type: string
            - type: 'null'
          title: Token
        cloud:
          type: boolean
          title: Cloud
        extract_images:
          type: boolean
          title: Extract Images
          default: false
        extract_files:
          type: boolean
          title: Extract Files
          default: false
        max_num_of_spaces:
          type: integer
          title: Max Num Of Spaces
        max_num_of_docs_from_each_space:
          type: integer
          title: Max Num Of Docs From Each Space
        spaces:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Spaces
      type: object
      required:
        - url
        - username
        - cloud
        - max_num_of_spaces
        - max_num_of_docs_from_each_space
        - spaces
      title: ConfluenceSourceConnectorConfig
    CouchbaseSourceConnectorConfig:
      properties:
        bucket:
          type: string
          title: Bucket
        connection_string:
          type: string
          title: Connection String
        scope:
          anyOf:
            - type: string
            - type: 'null'
          title: Scope
        collection:
          anyOf:
            - type: string
            - type: 'null'
          title: Collection
        batch_size:
          type: integer
          title: Batch Size
        username:
          type: string
          title: Username
        password:
          type: string
          title: Password
        collection_id:
          type: string
          title: Collection Id
      type: object
      required:
        - bucket
        - connection_string
        - batch_size
        - username
        - password
        - collection_id
      title: CouchbaseSourceConnectorConfig
    DatabricksVolumesConnectorConfig:
      properties:
        host:
          type: string
          title: Host
        catalog:
          type: string
          title: Catalog
        schema:
          type: string
          title: Schema
          default: default
        volume:
          type: string
          title: Volume
        volume_path:
          type: string
          title: Volume Path
        client_secret:
          type: string
          title: Client Secret
        client_id:
          type: string
          title: Client Id
      type: object
      required:
        - host
        - catalog
        - volume
        - volume_path
        - client_secret
        - client_id
      title: DatabricksVolumesConnectorConfig
    DropboxSourceConnectorConfig:
      properties:
        token:
          type: string
          title: Token
        remote_url:
          type: string
          title: Remote Url
        recursive:
          type: boolean
          title: Recursive
      type: object
      required:
        - token
        - remote_url
        - recursive
      title: DropboxSourceConnectorConfig
    ElasticsearchConnectorConfig:
      properties:
        hosts:
          items:
            type: string
          type: array
          title: Hosts
        index_name:
          type: string
          title: Index Name
        es_api_key:
          type: string
          title: Es Api Key
      type: object
      required:
        - hosts
        - index_name
        - es_api_key
      title: ElasticsearchConnectorConfig
    GCSSourceConnectorConfig:
      properties:
        remote_url:
          type: string
          title: Remote Url
        service_account_key:
          type: string
          title: Service Account Key
        recursive:
          type: boolean
          title: Recursive
      type: object
      required:
        - remote_url
        - service_account_key
        - recursive
      title: GCSSourceConnectorConfig
    GoogleDriveSourceConnectorConfig:
      properties:
        drive_id:
          type: string
          title: Drive Id
        service_account_key:
          anyOf:
            - $ref: '#/components/schemas/SecretReference'
            - type: string
          title: Service Account Key
        extensions:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Extensions
        recursive:
          type: boolean
          title: Recursive
      type: object
      required:
        - drive_id
        - service_account_key
        - recursive
      title: GoogleDriveSourceConnectorConfig
    KafkaCloudSourceConnectorConfig:
      properties:
        bootstrap_servers:
          type: string
          title: Bootstrap Servers
        port:
          type: integer
          title: Port
        group_id:
          anyOf:
            - type: string
            - type: 'null'
          title: Group Id
        topic:
          type: string
          title: Topic
        kafka_api_key:
          type: string
          title: Kafka Api Key
        secret:
          type: string
          title: Secret
        num_messages_to_consume:
          type: integer
          title: Num Messages To Consume
      type: object
      required:
        - bootstrap_servers
        - port
        - topic
        - kafka_api_key
        - secret
        - num_messages_to_consume
      title: KafkaCloudSourceConnectorConfig
    MongoDBConnectorConfig:
      properties:
        database:
          type: string
          title: Database
        collection:
          type: string
          title: Collection
        uri:
          type: string
          title: Uri
      type: object
      required:
        - database
        - collection
        - uri
      title: MongoDBConnectorConfig
    OneDriveSourceConnectorConfig:
      properties:
        client_id:
          type: string
          title: Client Id
        user_pname:
          type: string
          title: User Pname
        tenant:
          type: string
          title: Tenant
        authority_url:
          type: string
          title: Authority Url
        client_cred:
          type: string
          title: Client Cred
        recursive:
          type: boolean
          title: Recursive
        path:
          type: string
          title: Path
      type: object
      required:
        - client_id
        - user_pname
        - tenant
        - authority_url
        - client_cred
        - recursive
        - path
      title: OneDriveSourceConnectorConfig
    OpenSearchConnectorConfig:
      properties:
        hosts:
          items:
            type: string
          type: array
          minItems: 1
          title: Hosts
          description: List of OpenSearch hosts to connect to
        index_name:
          type: string
          title: Index Name
          description: Name of the OpenSearch index to read from or write to
        username:
          anyOf:
            - type: string
            - type: 'null'
          title: Username
          description: Username for basic authentication
        password:
          anyOf:
            - type: string
            - type: 'null'
          title: Password
          description: Password for basic authentication
        aws_access_key_id:
          anyOf:
            - type: string
            - type: 'null'
          title: Aws Access Key Id
          description: >-
            AWS access key ID for IAM authentication. When provided with
            aws_secret_access_key, IAM authentication is used instead of basic
            auth. Region and service type are auto-detected from the host URL.
        aws_secret_access_key:
          anyOf:
            - type: string
            - type: 'null'
          title: Aws Secret Access Key
          description: >-
            AWS secret access key for IAM authentication. Required when
            aws_access_key_id is provided.
        aws_session_token:
          anyOf:
            - type: string
            - type: 'null'
          title: Aws Session Token
          description: >-
            AWS session token for temporary credentials (optional). Only used
            when aws_access_key_id and aws_secret_access_key are provided.
        use_ssl:
          anyOf:
            - type: boolean
            - type: 'null'
          title: Use Ssl
          description: Whether to use SSL for the connection
          default: true
      type: object
      required:
        - hosts
        - index_name
      title: OpenSearchConnectorConfig
      description: >-
        OpenSearch connector configuration.


        OpenSearch is a fork of Elasticsearch with similar functionality.

        Supports two authentication methods:

        1. Basic auth: username + password

        2. AWS IAM auth: aws_access_key_id + aws_secret_access_key (+ optional
        aws_session_token)


        For AWS OpenSearch Service or OpenSearch Serverless, region and service
        type

        are auto-detected from the host URL.
    OutlookSourceConnectorConfig:
      properties:
        authority_url:
          anyOf:
            - type: string
            - type: 'null'
          title: Authority Url
        tenant:
          anyOf:
            - type: string
            - type: 'null'
          title: Tenant
        client_id:
          type: string
          title: Client Id
        client_cred:
          type: string
          title: Client Cred
        outlook_folders:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Outlook Folders
        recursive:
          type: boolean
          title: Recursive
        user_email:
          type: string
          title: User Email
      type: object
      required:
        - client_id
        - client_cred
        - recursive
        - user_email
      title: OutlookSourceConnectorConfig
    PostgresSourceConnectorConfig:
      properties:
        host:
          type: string
          title: Host
        database:
          type: string
          title: Database
        port:
          type: integer
          title: Port
        username:
          type: string
          title: Username
        password:
          type: string
          title: Password
        table_name:
          type: string
          title: Table Name
        batch_size:
          type: integer
          title: Batch Size
        id_column:
          type: string
          title: Id Column
        fields:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Fields
      type: object
      required:
        - host
        - database
        - port
        - username
        - password
        - table_name
        - batch_size
        - id_column
      title: PostgresSourceConnectorConfig
    S3SourceConnectorConfig:
      properties:
        remote_url:
          type: string
          title: Remote Url
        anonymous:
          type: boolean
          title: Anonymous
        key:
          anyOf:
            - type: string
            - type: 'null'
          title: Key
        secret:
          anyOf:
            - type: string
            - type: 'null'
          title: Secret
        token:
          anyOf:
            - type: string
            - type: 'null'
          title: Token
        endpoint_url:
          anyOf:
            - type: string
            - type: 'null'
          title: Endpoint Url
        recursive:
          type: boolean
          title: Recursive
      type: object
      required:
        - remote_url
        - anonymous
        - recursive
      title: S3SourceConnectorConfig
    SalesforceSourceConnectorConfig:
      properties:
        username:
          type: string
          title: Username
        consumer_key:
          type: string
          title: Consumer Key
        private_key:
          type: string
          title: Private Key
        categories:
          items:
            type: string
          type: array
          title: Categories
      type: object
      required:
        - username
        - consumer_key
        - private_key
        - categories
      title: SalesforceSourceConnectorConfig
    SharePointSourceConnectorConfig:
      properties:
        site:
          type: string
          title: Site
        tenant:
          type: string
          title: Tenant
        authority_url:
          type: string
          title: Authority Url
          default: https://login.microsoftonline.com
        user_pname:
          type: string
          title: User Pname
        client_id:
          type: string
          title: Client Id
        client_cred:
          type: string
          title: Client Cred
        recursive:
          type: boolean
          title: Recursive
        path:
          anyOf:
            - type: string
            - type: 'null'
          title: Path
      type: object
      required:
        - site
        - tenant
        - user_pname
        - client_id
        - client_cred
        - recursive
      title: SharePointSourceConnectorConfig
    SnowflakeSourceConnectorConfig:
      properties:
        account:
          type: string
          title: Account
        role:
          type: string
          title: Role
        user:
          type: string
          title: User
        password:
          type: string
          title: Password
        host:
          type: string
          title: Host
        port:
          type: integer
          minimum: 1
          title: Port
          default: 443
        database:
          type: string
          title: Database
        schema:
          type: string
          title: Schema
        table_name:
          type: string
          title: Table Name
        batch_size:
          type: integer
          minimum: 1
          title: Batch Size
          default: 100
        id_column:
          type: string
          title: Id Column
        fields:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Fields
      type: object
      required:
        - account
        - role
        - user
        - password
        - host
        - database
        - schema
        - table_name
        - id_column
      title: SnowflakeSourceConnectorConfig
    TeradataSourceConnectorConfig:
      properties:
        host:
          type: string
          title: Host
        user:
          type: string
          title: User
        password:
          type: string
          title: Password
        database:
          anyOf:
            - type: string
            - type: 'null'
          title: Database
        dbs_port:
          type: integer
          minimum: 1
          title: Dbs Port
          default: 1025
        id_column:
          type: string
          title: Id Column
          default: id
        table_name:
          type: string
          title: Table Name
        batch_size:
          type: integer
          minimum: 1
          title: Batch Size
          default: 100
        fields:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Fields
      type: object
      required:
        - host
        - user
        - password
        - table_name
      title: TeradataSourceConnectorConfig
    JiraSourceConnectorConfig:
      properties:
        url:
          type: string
          title: Url
        username:
          type: string
          title: Username
        password:
          anyOf:
            - type: string
            - type: 'null'
          title: Password
        token:
          anyOf:
            - type: string
            - type: 'null'
          title: Token
        cloud:
          anyOf:
            - type: boolean
            - type: 'null'
          title: Cloud
          default: false
        projects:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Projects
        boards:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Boards
        issues:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Issues
        status_filters:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Status Filters
        download_attachments:
          anyOf:
            - type: boolean
            - type: 'null'
          title: Download Attachments
          default: false
      type: object
      required:
        - url
        - username
      title: JiraSourceConnectorConfig
    ZendeskSourceConnectorConfig:
      properties:
        subdomain:
          type: string
          title: Subdomain
        email:
          type: string
          title: Email
        api_token:
          type: string
          title: Api Token
        item_type:
          anyOf:
            - type: string
            - type: 'null'
          title: Item Type
          default: tickets
        batch_size:
          anyOf:
            - type: integer
            - type: 'null'
          title: Batch Size
          default: 2
      type: object
      required:
        - subdomain
        - email
        - api_token
      title: ZendeskSourceConnectorConfig
    ValidationError:
      properties:
        loc:
          items:
            anyOf:
              - type: string
              - type: integer
          type: array
          title: Location
        msg:
          type: string
          title: Message
        type:
          type: string
          title: Error Type
      type: object
      required:
        - loc
        - msg
        - type
      title: ValidationError
    SecretReference:
      properties:
        id:
          type: string
          title: Id
        type:
          $ref: '#/components/schemas/EncryptionType'
          default: rsa
      type: object
      required:
        - id
      title: SecretReference
    EncryptionType:
      type: string
      enum:
        - rsa
        - rsa_aes
      title: EncryptionType

````

---

# Source: https://docs.unstructured.io/api-reference/templates/get-template.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Get Template

> Retrieve detailed information and DAG for a specific template.


## OpenAPI

````yaml https://platform.unstructuredapp.io/openapi.json get /api/v1/templates/{template_id}
openapi: 3.1.0
info:
  title: Platform API
  version: 3.1.0
servers:
  - url: https://platform.unstructuredapp.io/
    description: Unstructured Platform API
    x-speakeasy-server-id: platform-api
security: []
paths:
  /api/v1/templates/{template_id}:
    get:
      tags:
        - templates
      summary: Get Template
      description: Retrieve detailed information and DAG for a specific template.
      operationId: get_template
      parameters:
        - name: template_id
          in: path
          required: true
          schema:
            type: string
            title: Template Id
        - name: unstructured-api-key
          in: header
          required: false
          schema:
            anyOf:
              - type: string
              - type: 'null'
            title: Unstructured-Api-Key
      responses:
        '200':
          description: Successful Response
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/TemplateDetail'
        '422':
          description: Validation Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/HTTPValidationError'
components:
  schemas:
    TemplateDetail:
      properties:
        id:
          type: string
          title: Id
        name:
          type: string
          title: Name
        version:
          type: string
          title: Version
        last_updated:
          type: string
          title: Last Updated
        description:
          type: string
          title: Description
        nodes:
          items:
            $ref: '#/components/schemas/TemplateNode'
          type: array
          title: Nodes
      type: object
      required:
        - id
        - name
        - version
        - last_updated
        - description
        - nodes
      title: TemplateDetail
      description: Full template details including nodes.
    HTTPValidationError:
      properties:
        detail:
          items:
            $ref: '#/components/schemas/ValidationError'
          type: array
          title: Detail
      type: object
      title: HTTPValidationError
    TemplateNode:
      properties:
        id:
          type: string
          title: Id
        name:
          type: string
          title: Name
        type:
          type: string
          title: Type
        subtype:
          type: string
          title: Subtype
        settings:
          anyOf:
            - additionalProperties: true
              type: object
            - type: 'null'
          title: Settings
      type: object
      required:
        - id
        - name
        - type
        - subtype
      title: TemplateNode
      description: A node in a template DAG.
    ValidationError:
      properties:
        loc:
          items:
            anyOf:
              - type: string
              - type: integer
          type: array
          title: Location
        msg:
          type: string
          title: Message
        type:
          type: string
          title: Error Type
      type: object
      required:
        - loc
        - msg
        - type
      title: ValidationError

````

---

# Source: https://docs.unstructured.io/api-reference/destinations/get-the-latest-destination-connector-connection-check.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Get the latest destination connector connection check

> Retrieves the most recent connection check for the specified destination connector.


## OpenAPI

````yaml https://platform.unstructuredapp.io/openapi.json get /api/v1/destinations/{destination_id}/connection-check
openapi: 3.1.0
info:
  title: Platform API
  version: 3.1.0
servers:
  - url: https://platform.unstructuredapp.io/
    description: Unstructured Platform API
    x-speakeasy-server-id: platform-api
security: []
paths:
  /api/v1/destinations/{destination_id}/connection-check:
    get:
      tags:
        - destinations
      summary: Get the latest destination connector connection check
      description: >-
        Retrieves the most recent connection check for the specified destination
        connector.
      operationId: get_connection_check_destinations
      parameters:
        - name: destination_id
          in: path
          required: true
          schema:
            type: string
            format: uuid
            title: Destination Id
        - name: unstructured-api-key
          in: header
          required: false
          schema:
            anyOf:
              - type: string
              - type: 'null'
            title: Unstructured-Api-Key
      responses:
        '200':
          description: Successful Response
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/DagNodeConnectionCheck'
        '422':
          description: Validation Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/HTTPValidationError'
components:
  schemas:
    DagNodeConnectionCheck:
      properties:
        id:
          type: string
          format: uuid
          title: Id
        status:
          $ref: '#/components/schemas/ConnectionCheckStatus'
        reason:
          anyOf:
            - type: string
            - type: 'null'
          title: Reason
        created_at:
          type: string
          format: date-time
          title: Created At
        reported_at:
          anyOf:
            - type: string
              format: date-time
            - type: 'null'
          title: Reported At
      type: object
      required:
        - id
        - status
        - created_at
      title: DagNodeConnectionCheck
    HTTPValidationError:
      properties:
        detail:
          items:
            $ref: '#/components/schemas/ValidationError'
          type: array
          title: Detail
      type: object
      title: HTTPValidationError
    ConnectionCheckStatus:
      type: string
      enum:
        - SCHEDULED
        - SUCCESS
        - FAILURE
      title: ConnectionCheckStatus
    ValidationError:
      properties:
        loc:
          items:
            anyOf:
              - type: string
              - type: integer
          type: array
          title: Location
        msg:
          type: string
          title: Message
        type:
          type: string
          title: Error Type
      type: object
      required:
        - loc
        - msg
        - type
      title: ValidationError

````

---

# Source: https://docs.unstructured.io/api-reference/sources/get-the-latest-source-connector-connection-check.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Get the latest source connector connection check

> Retrieves the most recent connection check for the specified source connector.


## OpenAPI

````yaml https://platform.unstructuredapp.io/openapi.json get /api/v1/sources/{source_id}/connection-check
openapi: 3.1.0
info:
  title: Platform API
  version: 3.1.0
servers:
  - url: https://platform.unstructuredapp.io/
    description: Unstructured Platform API
    x-speakeasy-server-id: platform-api
security: []
paths:
  /api/v1/sources/{source_id}/connection-check:
    get:
      tags:
        - sources
      summary: Get the latest source connector connection check
      description: >-
        Retrieves the most recent connection check for the specified source
        connector.
      operationId: get_connection_check_sources
      parameters:
        - name: source_id
          in: path
          required: true
          schema:
            type: string
            format: uuid
            title: Source Id
        - name: unstructured-api-key
          in: header
          required: false
          schema:
            anyOf:
              - type: string
              - type: 'null'
            title: Unstructured-Api-Key
      responses:
        '200':
          description: Successful Response
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/DagNodeConnectionCheck'
        '422':
          description: Validation Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/HTTPValidationError'
components:
  schemas:
    DagNodeConnectionCheck:
      properties:
        id:
          type: string
          format: uuid
          title: Id
        status:
          $ref: '#/components/schemas/ConnectionCheckStatus'
        reason:
          anyOf:
            - type: string
            - type: 'null'
          title: Reason
        created_at:
          type: string
          format: date-time
          title: Created At
        reported_at:
          anyOf:
            - type: string
              format: date-time
            - type: 'null'
          title: Reported At
      type: object
      required:
        - id
        - status
        - created_at
      title: DagNodeConnectionCheck
    HTTPValidationError:
      properties:
        detail:
          items:
            $ref: '#/components/schemas/ValidationError'
          type: array
          title: Detail
      type: object
      title: HTTPValidationError
    ConnectionCheckStatus:
      type: string
      enum:
        - SCHEDULED
        - SUCCESS
        - FAILURE
      title: ConnectionCheckStatus
    ValidationError:
      properties:
        loc:
          items:
            anyOf:
              - type: string
              - type: integer
          type: array
          title: Location
        msg:
          type: string
          title: Message
        type:
          type: string
          title: Error Type
      type: object
      required:
        - loc
        - msg
        - type
      title: ValidationError

````

---

# Source: https://docs.unstructured.io/api-reference/workflows/get-workflow.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Get Workflow

> Retrieve detailed information for a specific workflow by its ID.


## OpenAPI

````yaml https://platform.unstructuredapp.io/openapi.json get /api/v1/workflows/{workflow_id}
openapi: 3.1.0
info:
  title: Platform API
  version: 3.1.0
servers:
  - url: https://platform.unstructuredapp.io/
    description: Unstructured Platform API
    x-speakeasy-server-id: platform-api
security: []
paths:
  /api/v1/workflows/{workflow_id}:
    get:
      tags:
        - workflows
      summary: Get Workflow
      description: Retrieve detailed information for a specific workflow by its ID.
      operationId: get_workflow
      parameters:
        - name: workflow_id
          in: path
          required: true
          schema:
            type: string
            format: uuid
            title: Workflow Id
        - name: unstructured-api-key
          in: header
          required: false
          schema:
            anyOf:
              - type: string
              - type: 'null'
            title: Unstructured-Api-Key
      responses:
        '200':
          description: Successful Response
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/WorkflowInformation'
        '422':
          description: Validation Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/HTTPValidationError'
components:
  schemas:
    WorkflowInformation:
      properties:
        id:
          type: string
          format: uuid
          title: Id
        name:
          type: string
          title: Name
        sources:
          items:
            type: string
            format: uuid
          type: array
          title: Sources
        destinations:
          items:
            type: string
            format: uuid
          type: array
          title: Destinations
        workflow_type:
          anyOf:
            - $ref: '#/components/schemas/WorkflowType'
            - type: 'null'
        workflow_nodes:
          items:
            $ref: '#/components/schemas/WorkflowNode'
          type: array
          title: Workflow Nodes
        schedule:
          anyOf:
            - $ref: '#/components/schemas/WorkflowSchedule'
            - type: 'null'
        status:
          $ref: '#/components/schemas/WorkflowState'
        created_at:
          type: string
          format: date-time
          title: Created At
        updated_at:
          anyOf:
            - type: string
              format: date-time
            - type: 'null'
          title: Updated At
        reprocess_all:
          type: boolean
          title: Reprocess All
          default: false
      type: object
      required:
        - id
        - name
        - sources
        - destinations
        - workflow_nodes
        - status
        - created_at
      title: WorkflowInformation
    HTTPValidationError:
      properties:
        detail:
          items:
            $ref: '#/components/schemas/ValidationError'
          type: array
          title: Detail
      type: object
      title: HTTPValidationError
    WorkflowType:
      type: string
      enum:
        - basic
        - advanced
        - platinum
        - custom
      title: WorkflowType
    WorkflowNode:
      properties:
        id:
          anyOf:
            - type: string
              format: uuid
            - type: 'null'
          title: Id
        name:
          type: string
          title: Name
        type:
          type: string
          title: Type
        subtype:
          type: string
          title: Subtype
        settings:
          anyOf:
            - additionalProperties: true
              type: object
            - type: 'null'
          title: Settings
      type: object
      required:
        - name
        - type
        - subtype
      title: WorkflowNode
    WorkflowSchedule:
      properties:
        crontab_entries:
          items:
            $ref: '#/components/schemas/CronTabEntry'
          type: array
          title: Crontab Entries
      type: object
      required:
        - crontab_entries
      title: WorkflowSchedule
      examples:
        - crontab_entries:
            - cron_expression: 0 0 * * *
    WorkflowState:
      type: string
      enum:
        - active
        - inactive
      title: WorkflowState
    ValidationError:
      properties:
        loc:
          items:
            anyOf:
              - type: string
              - type: integer
          type: array
          title: Location
        msg:
          type: string
          title: Message
        type:
          type: string
          title: Error Type
      type: object
      required:
        - loc
        - msg
        - type
      title: ValidationError
    CronTabEntry:
      properties:
        cron_expression:
          type: string
          title: Cron Expression
      type: object
      required:
        - cron_expression
      title: CronTabEntry

````

---

# Source: https://docs.unstructured.io/open-source/ingestion/source-connectors/github.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# GitHub

Connect GitHub to your preprocessing pipeline, and use the Unstructured Ingest CLI or the Unstructured Ingest Python library to batch process all your documents and store structured outputs locally on your filesystem.

The requirements are as follows.

* The user or organization name, and the repository name, for the target GitHub repository, for example `Unstructured-IO/docs`.
* The name of a branch to access in the repository. By default, the repository's default branch is used.
* A GitHub [account](https://docs.github.com/get-started/start-your-journey/creating-an-account-on-github).
* A GitHub [personal access token](https://docs.github.com/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens) for the account.
* If you are accessing a private GitHub repository, the repository must grant the appropriate access
  [permissions](https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/managing-repository-settings/managing-teams-and-people-with-access-to-your-repository)
  to the GitHub account.

The GitHub connector dependencies:

```bash CLI, Python theme={null}
pip install "unstructured-ingest[github]"
```

You might also need to install additional dependencies, depending on your needs. [Learn more](/open-source/ingestion/ingest-dependencies).

The following environment variables:

* `GITHUB_URL` - The user or organization name, and the repository name, for the target GitHub repository; for example `Unstructured-IO/docs`, represented by `--url` (CLI) or `url` (Python).
* `GITHUB_BRANCH` - The name of the target branch in the repository, for example `main`, represented by `--branch` (CLI) or `branch` (Python). By default, the repository's default branch is used.
* `GITHUB_ACCESS_TOKEN` - The value of your GitHub access token, represented by `--access-token` (CLI) or `access_token` (Python).

Now call the Unstructured Ingest CLI or the Unstructured Ingest Python library. The destination connector can be any of the ones supported. This example uses the local destination connector.

This example sends data to Unstructured for processing by default. To process data locally instead, see the instructions at the end of this page.

<CodeGroup>
  ```bash CLI theme={null}
  #!/usr/bin/env bash

  # Chunking and embedding are optional.

  unstructured-ingest \
    github \
      --access-token $GITHUB_ACCESS_TOKEN \
      --url $GITHUB_URL \
      --branch $GITHUB_BRANCH \
      --recursive \
      --output-dir $LOCAL_FILE_OUTPUT_DIR \
      --partition-by-api \
      --api-key $UNSTRUCTURED_API_KEY \
      --partition-endpoint $UNSTRUCTURED_API_URL \
      --strategy hi_res \
      --additional-partition-args="{\"split_pdf_page\":\"true\", \"split_pdf_allow_failed\":\"true\", \"split_pdf_concurrency_level\": 15}"
  ```

  ```python Python Ingest theme={null}
  import os

  from unstructured_ingest.pipeline.pipeline import Pipeline
  from unstructured_ingest.interfaces import ProcessorConfig

  from unstructured_ingest.processes.connectors.github import (
      GithubIndexerConfig,
      GithubDownloaderConfig,
      GithubConnectionConfig,
      GithubAccessConfig
  )

  from unstructured_ingest.processes.partitioner import PartitionerConfig
  from unstructured_ingest.processes.chunker import ChunkerConfig
  from unstructured_ingest.processes.embedder import EmbedderConfig

  from unstructured_ingest.processes.connectors.local import LocalUploaderConfig

  # Chunking and embedding are optional.

  if __name__ == "__main__":
      Pipeline.from_configs(
          context=ProcessorConfig(),
          indexer_config=GithubIndexerConfig(
              branch=os.getenv("GITHUB_BRANCH"),
              recursive=True
          ),
          downloader_config=GithubDownloaderConfig(download_dir=os.getenv("LOCAL_FILE_DOWNLOAD_DIR")),
          source_connection_config=GithubConnectionConfig(
              access_config=GithubAccessConfig(access_token=os.getenv("GITHUB_ACCESS_TOKEN")),
              url=os.getenv("GITHUB_URL")
          ),
          partitioner_config=PartitionerConfig(
              partition_by_api=True,
              api_key=os.getenv("UNSTRUCTURED_API_KEY"),
              partition_endpoint=os.getenv("UNSTRUCTURED_API_URL"),
              additional_partition_args={
                  "split_pdf_page": True,
                  "split_pdf_allow_failed": True,
                  "split_pdf_concurrency_level": 15
              }
          ),
          chunker_config=ChunkerConfig(chunking_strategy="by_title"),
          embedder_config=EmbedderConfig(embedding_provider="huggingface"),
          uploader_config=LocalUploaderConfig(output_dir=os.getenv("LOCAL_FILE_OUTPUT_DIR"))
      ).run()

  ```
</CodeGroup>

For the Unstructured Ingest CLI and the Unstructured Ingest Python library, you can use the `--partition-by-api` option (CLI) or `partition_by_api` (Python) parameter to specify where files are processed:

* To do local file processing, omit `--partition-by-api` (CLI) or `partition_by_api` (Python), or explicitly specify `partition_by_api=False` (Python).

  Local file processing does not use an Unstructured API key or API URL, so you can also omit the following, if they appear:

  * `--api-key $UNSTRUCTURED_API_KEY` (CLI) or `api_key=os.getenv("UNSTRUCTURED_API_KEY")` (Python)
  * `--partition-endpoint $UNSTRUCTURED_API_URL` (CLI) or `partition_endpoint=os.getenv("UNSTRUCTURED_API_URL")` (Python)
  * The environment variables `UNSTRUCTURED_API_KEY` and `UNSTRUCTURED_API_URL`

* To send files to the legacy [Unstructured Partition Endpoint](/api-reference/legacy-api/partition/overview) for processing, specify `--partition-by-api` (CLI) or `partition_by_api=True` (Python).

  Unstructured also requires an Unstructured API key and API URL, by adding the following:

  * `--api-key $UNSTRUCTURED_API_KEY` (CLI) or `api_key=os.getenv("UNSTRUCTURED_API_KEY")` (Python)
  * `--partition-endpoint $UNSTRUCTURED_API_URL` (CLI) or `partition_endpoint=os.getenv("UNSTRUCTURED_API_URL")` (Python)
  * The environment variables `UNSTRUCTURED_API_KEY` and `UNSTRUCTURED_API_URL`, representing your API key and API URL, respectively.

  <Note>
    You must specify the API URL only if you are not using the default API URL for Unstructured Ingest, which applies to **Let's Go**, **Pay-As-You-Go**, and **Business SaaS** accounts.

    The default API URL for Unstructured Ingest is `https://api.unstructuredapp.io/general/v0/general`, which is the API URL for the legacy[Unstructured Partition Endpoint](/api-reference/legacy-api/partition/overview). However, you should always use the URL that was provided to you when your Unstructured account was created. If you do not have this URL, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

    If you do not have an API key, [get one now](/api-reference/legacy-api/partition/overview).

    If you are using a **Business** account, the process
    for generating Unstructured API keys, and the Unstructured API URL that you use, are different.
    For instructions, see your Unstructured account administrator, or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).
  </Note>

---

# Source: https://docs.unstructured.io/open-source/ingestion/source-connectors/gitlab.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# GitLab

Connect GitLab to your preprocessing pipeline, and use the Unstructured CLI or Python to batch process all your documents and store structured outputs locally on your filesystem.

The requirements are as follows.

* A [GitLab account](https://gitlab.com/users/sign_up).
* The URL for the target [GitLab repository](https://docs.gitlab.com/ee/user/project/repository/).
* A [GitLab access token](https://docs.gitlab.com/ee/security/tokens/) that allows access to the repository.
* Optionally, the name of a specific branch to access in the repository. The repository's default repository branch is used if not otherwise specified.
* Optionally, a list of file globs to limit which types of files are accepted, for example `*.html` or `*.txt`.

The GitLab connector dependencies:

```bash CLI, Python theme={null}
pip install "unstructured-ingest[gitlab]"
```

You might also need to install additional dependencies, depending on your needs. [Learn more](/open-source/ingestion/ingest-dependencies).

The following environment variables:

* `GITLAB_REPO_URL` - The URL for the target GitLab repository, represented by `--url` (CLI) or `url` (Python).
* `GITLAB_TOKEN` - The GitLab access token that allows access to the repository, represented by `--git-access-token` (CLI) or `access_token` (Python).

Now call the Unstructured CLI or Python. The destination connector can be any of the ones supported. This example uses the local destination connector.

This example sends data to Unstructured for processing by default. To process data locally instead, see the instructions at the end of this page.

<CodeGroup>
  ```bash CLI theme={null}
  #!/usr/bin/env bash

  # Chunking and embedding are optional.

  unstructured-ingest \
    gitlab \
    --url $GITLAB_URL \
    --git-access-token $GITLAB_TOKEN \
    --git-branch v0.0.7 \
    --git-file-glob "*.html,*.txt" \
    --output-dir $LOCAL_FILE_OUTPUT_DIR \
    --partition-by-api \
    --api-key $UNSTRUCTURED_API_KEY \
    --partition-endpoint $UNSTRUCTURED_API_URL \
    --strategy hi_res \
    --chunking-strategy by_title \
    --embedding-provider huggingface \
    --additional-partition-args="{\"split_pdf_page\":\"true\", \"split_pdf_allow_failed\":\"true\", \"split_pdf_concurrency_level\": 15}"
  ```

  ```python Python Ingest theme={null}
  # pip install "unstructured-ingest[gitlab]"

  import os

  from unstructured_ingest.pipeline.pipeline import Pipeline
  from unstructured_ingest.interfaces import ProcessorConfig

  from unstructured_ingest.processes.connectors.gitlab import (
      GitLabIndexerConfig,
      GitLabDownloaderConfig,
      GitLabConnectionConfig,
      GitLabAccessConfig
  )

  from unstructured_ingest.processes.partitioner import PartitionerConfig
  from unstructured_ingest.processes.chunker import ChunkerConfig
  from unstructured_ingest.processes.embedder import EmbedderConfig
  from unstructured_ingest.processes.connectors.local import LocalUploaderConfig

  # Chunking and embedding are optional.

  if __name__ == "__main__":
      Pipeline.from_configs(
          context=ProcessorConfig(),
          indexer_config=GitLabIndexerConfig(
              path="/",
              recursive=True,
              git_branch="v0.0.7"
          ),
          downloader_config=GitLabDownloaderConfig(download_dir=os.getenv("LOCAL_FILE_DOWNLOAD_DIR")),
          source_connection_config=GitLabConnectionConfig(
              access_config=GitLabAccessConfig(access_token=os.getenv("GITLAB_TOKEN")),
              url=os.getenv("GITLAB_URL")
          ),
          partitioner_config=PartitionerConfig(
              partition_by_api=True,
              api_key=os.getenv("UNSTRUCTURED_API_KEY"),
              partition_endpoint=os.getenv("UNSTRUCTURED_API_URL"),
              additional_partition_args={
                  "split_pdf_page": True,
                  "split_pdf_allow_failed": True,
                  "split_pdf_concurrency_level": 15
              }
          ),
          chunker_config=ChunkerConfig(chunking_strategy="by_title"),
          embedder_config=EmbedderConfig(embedding_provider="huggingface"),
          uploader_config=LocalUploaderConfig(output_dir=os.getenv("LOCAL_FILE_OUTPUT_DIR"))
      ).run()
  ```
</CodeGroup>

For the Unstructured Ingest CLI and the Unstructured Ingest Python library, you can use the `--partition-by-api` option (CLI) or `partition_by_api` (Python) parameter to specify where files are processed:

* To do local file processing, omit `--partition-by-api` (CLI) or `partition_by_api` (Python), or explicitly specify `partition_by_api=False` (Python).

  Local file processing does not use an Unstructured API key or API URL, so you can also omit the following, if they appear:

  * `--api-key $UNSTRUCTURED_API_KEY` (CLI) or `api_key=os.getenv("UNSTRUCTURED_API_KEY")` (Python)
  * `--partition-endpoint $UNSTRUCTURED_API_URL` (CLI) or `partition_endpoint=os.getenv("UNSTRUCTURED_API_URL")` (Python)
  * The environment variables `UNSTRUCTURED_API_KEY` and `UNSTRUCTURED_API_URL`

* To send files to the legacy [Unstructured Partition Endpoint](/api-reference/legacy-api/partition/overview) for processing, specify `--partition-by-api` (CLI) or `partition_by_api=True` (Python).

  Unstructured also requires an Unstructured API key and API URL, by adding the following:

  * `--api-key $UNSTRUCTURED_API_KEY` (CLI) or `api_key=os.getenv("UNSTRUCTURED_API_KEY")` (Python)
  * `--partition-endpoint $UNSTRUCTURED_API_URL` (CLI) or `partition_endpoint=os.getenv("UNSTRUCTURED_API_URL")` (Python)
  * The environment variables `UNSTRUCTURED_API_KEY` and `UNSTRUCTURED_API_URL`, representing your API key and API URL, respectively.

  <Note>
    You must specify the API URL only if you are not using the default API URL for Unstructured Ingest, which applies to **Let's Go**, **Pay-As-You-Go**, and **Business SaaS** accounts.

    The default API URL for Unstructured Ingest is `https://api.unstructuredapp.io/general/v0/general`, which is the API URL for the legacy[Unstructured Partition Endpoint](/api-reference/legacy-api/partition/overview). However, you should always use the URL that was provided to you when your Unstructured account was created. If you do not have this URL, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

    If you do not have an API key, [get one now](/api-reference/legacy-api/partition/overview).

    If you are using a **Business** account, the process
    for generating Unstructured API keys, and the Unstructured API URL that you use, are different.
    For instructions, see your Unstructured account administrator, or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).
  </Note>

---

# Source: https://docs.unstructured.io/open-source/ingestion/destination-connectors/google-cloud-service.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Google Cloud Storage

Batch process all your records to store structured outputs in Google Cloud Service.

The requirements are as follows.

<iframe width="560" height="315" src="https://www.youtube.com/embed/R57duYCxLi8" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* A Google Cloud service account. [Create a service account](https://cloud.google.com/iam/docs/service-accounts-create#console).

* A service account key for the service account. See [Create a service account key](https://cloud.google.com/iam/docs/keys-create-delete#creating) in
  [Create and delete service account keys](https://cloud.google.com/iam/docs/keys-create-delete).

  To ensure maximum compatibility across Unstructured service offerings, you should give the service account key information to Unstructured as
  a single-line string that contains the contents of the downloaded service account key file (and not the service account key file itself).
  To print this single-line string without line breaks, suitable for copying, you can run one of the following commands from your Terminal or Command Prompt.
  In this command, replace `<path-to-downloaded-key-file>` with the path to the service account key file that you downloaded by following the preceding instructions.

  * For macOS or Linux:
    ```text  theme={null}
    tr -d '\n' < <path-to-downloaded-key-file>
    ```
  * For Windows:
    ```text  theme={null}
    (Get-Content -Path "<path-to-downloaded-key-file>" -Raw).Replace("`r`n", "").Replace("`n", "")
    ```

* The URI for a Google Cloud Storage bucket. This URI consists of the target bucket name, plus any target folder within the bucket, expressed as `gs://<bucket-name>[/folder-name]`. [Create a bucket](https://cloud.google.com/storage/docs/creating-buckets#console).

  This bucket must have, at minimum, one of the following roles applied to the target Google Cloud service account:

  * `Storage Object Viewer` for bucket read access.
  * `Storage Object User` for bucket write access.
  * The `Storage Object Admin` role provides read and write access, plus access to additional bucket operations.

  To apply one of these roles to a service account for a bucket, see [Add a principal to a bucket-level policy](https://cloud.google.com/storage/docs/access-control/using-iam-permissions#bucket-add)
  in [Set and manage IAM policies on buckets](https://cloud.google.com/storage/docs/access-control/using-iam-permissions).

The Google Cloud Storage connector dependencies:

```bash CLI, Python theme={null}
pip install "unstructured-ingest[gcs]"
```

You might also need to install additional dependencies, depending on your needs. [Learn more](/open-source/ingestion/ingest-dependencies).

The following environment variables:

* `GCS_SERVICE_ACCOUNT_KEY` - The Google Cloud service account key for Google Cloud Storage, represented by `--service-account-key` (CLI) or `service_account_key` (Python).
* `GCS_REMOTE_URL` - The Google Cloud Storage bucket URL, represented by `--remote-url` (CLI) or `remote_url` (Python).

Now call the Unstructured CLI or Python. The source connector can be any of the ones supported. This example uses the local source connector:

This example sends files to Unstructured for processing by default. To process files locally instead, see the instructions at the end of this page.

<CodeGroup>
  ```bash CLI theme={null}
  #!/usr/bin/env bash

  # Chunking and embedding are optional.

  unstructured-ingest \
    local \
      --input-path $LOCAL_FILE_INPUT_DIR \
      --output-dir $LOCAL_FILE_OUTPUT_DIR \
      --strategy hi_res \
      --chunk-elements \
      --embedding-provider huggingface \
      --num-processes 2 \
      --verbose \
      --partition-by-api \
      --api-key $UNSTRUCTURED_API_KEY \
      --partition-endpoint $UNSTRUCTURED_API_URL \
    gcs \
      --service-account-key $GCS_SERVICE_ACCOUNT_KEY \
      --remote-url $GCS_REMOTE_URL
  ```

  ```python Python Ingest theme={null}
  import os

  from unstructured_ingest.pipeline.pipeline import Pipeline
  from unstructured_ingest.interfaces import ProcessorConfig

  from unstructured_ingest.processes.connectors.fsspec.gcs import (
      GcsConnectionConfig,
      GcsAccessConfig,
      GcsUploaderConfig
  )

  from unstructured_ingest.processes.connectors.local import (
      LocalIndexerConfig,
      LocalDownloaderConfig,
      LocalConnectionConfig
  )

  from unstructured_ingest.processes.partitioner import PartitionerConfig
  from unstructured_ingest.processes.chunker import ChunkerConfig
  from unstructured_ingest.processes.embedder import EmbedderConfig

  # Chunking and embedding are optional.

  if __name__ == "__main__":
      Pipeline.from_configs(
          context=ProcessorConfig(),
          indexer_config=LocalIndexerConfig(input_path=os.getenv("LOCAL_FILE_INPUT_DIR")),
          downloader_config=LocalDownloaderConfig(),
          source_connection_config=LocalConnectionConfig(),
          partitioner_config=PartitionerConfig(
              partition_by_api=True,
              api_key=os.getenv("UNSTRUCTURED_API_KEY"),
              partition_endpoint=os.getenv("UNSTRUCTURED_API_URL"),
              strategy="hi_res",
              additional_partition_args={
                  "split_pdf_page": True,
                  "split_pdf_allow_failed": True,
                  "split_pdf_concurrency_level": 15
              }
          ),
          chunker_config=ChunkerConfig(chunking_strategy="by_title"),
          embedder_config=EmbedderConfig(embedding_provider="huggingface"),
          destination_connection_config=GcsConnectionConfig(
              access_config=GcsAccessConfig(
                  service_account_key=os.getenv("GCS_SERVICE_ACCOUNT_KEY")
              )
          ),
          uploader_config=GcsUploaderConfig(remote_url=os.getenv("GCS_REMOTE_URL"))
      ).run()
  ```
</CodeGroup>

For the Unstructured Ingest CLI and the Unstructured Ingest Python library, you can use the `--partition-by-api` option (CLI) or `partition_by_api` (Python) parameter to specify where files are processed:

* To do local file processing, omit `--partition-by-api` (CLI) or `partition_by_api` (Python), or explicitly specify `partition_by_api=False` (Python).

  Local file processing does not use an Unstructured API key or API URL, so you can also omit the following, if they appear:

  * `--api-key $UNSTRUCTURED_API_KEY` (CLI) or `api_key=os.getenv("UNSTRUCTURED_API_KEY")` (Python)
  * `--partition-endpoint $UNSTRUCTURED_API_URL` (CLI) or `partition_endpoint=os.getenv("UNSTRUCTURED_API_URL")` (Python)
  * The environment variables `UNSTRUCTURED_API_KEY` and `UNSTRUCTURED_API_URL`

* To send files to the legacy [Unstructured Partition Endpoint](/api-reference/legacy-api/partition/overview) for processing, specify `--partition-by-api` (CLI) or `partition_by_api=True` (Python).

  Unstructured also requires an Unstructured API key and API URL, by adding the following:

  * `--api-key $UNSTRUCTURED_API_KEY` (CLI) or `api_key=os.getenv("UNSTRUCTURED_API_KEY")` (Python)
  * `--partition-endpoint $UNSTRUCTURED_API_URL` (CLI) or `partition_endpoint=os.getenv("UNSTRUCTURED_API_URL")` (Python)
  * The environment variables `UNSTRUCTURED_API_KEY` and `UNSTRUCTURED_API_URL`, representing your API key and API URL, respectively.

  <Note>
    You must specify the API URL only if you are not using the default API URL for Unstructured Ingest, which applies to **Let's Go**, **Pay-As-You-Go**, and **Business SaaS** accounts.

    The default API URL for Unstructured Ingest is `https://api.unstructuredapp.io/general/v0/general`, which is the API URL for the legacy[Unstructured Partition Endpoint](/api-reference/legacy-api/partition/overview). However, you should always use the URL that was provided to you when your Unstructured account was created. If you do not have this URL, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

    If you do not have an API key, [get one now](/api-reference/legacy-api/partition/overview).

    If you are using a **Business** account, the process
    for generating Unstructured API keys, and the Unstructured API URL that you use, are different.
    For instructions, see your Unstructured account administrator, or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).
  </Note>

---

# Source: https://docs.unstructured.io/open-source/ingestion/source-connectors/google-cloud-storage.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Google Cloud Storage

Connect Google Cloud Storage to your preprocessing pipeline, and use the Unstructured Ingest CLI or the Unstructured Ingest Python library to batch process all your documents and store structured outputs locally on your filesystem.

The requirements are as follows.

<iframe width="560" height="315" src="https://www.youtube.com/embed/R57duYCxLi8" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* A Google Cloud service account. [Create a service account](https://cloud.google.com/iam/docs/service-accounts-create#console).

* A service account key for the service account. See [Create a service account key](https://cloud.google.com/iam/docs/keys-create-delete#creating) in
  [Create and delete service account keys](https://cloud.google.com/iam/docs/keys-create-delete).

  To ensure maximum compatibility across Unstructured service offerings, you should give the service account key information to Unstructured as
  a single-line string that contains the contents of the downloaded service account key file (and not the service account key file itself).
  To print this single-line string without line breaks, suitable for copying, you can run one of the following commands from your Terminal or Command Prompt.
  In this command, replace `<path-to-downloaded-key-file>` with the path to the service account key file that you downloaded by following the preceding instructions.

  * For macOS or Linux:
    ```text  theme={null}
    tr -d '\n' < <path-to-downloaded-key-file>
    ```
  * For Windows:
    ```text  theme={null}
    (Get-Content -Path "<path-to-downloaded-key-file>" -Raw).Replace("`r`n", "").Replace("`n", "")
    ```

* The URI for a Google Cloud Storage bucket. This URI consists of the target bucket name, plus any target folder within the bucket, expressed as `gs://<bucket-name>[/folder-name]`. [Create a bucket](https://cloud.google.com/storage/docs/creating-buckets#console).

  This bucket must have, at minimum, one of the following roles applied to the target Google Cloud service account:

  * `Storage Object Viewer` for bucket read access.
  * `Storage Object User` for bucket write access.
  * The `Storage Object Admin` role provides read and write access, plus access to additional bucket operations.

  To apply one of these roles to a service account for a bucket, see [Add a principal to a bucket-level policy](https://cloud.google.com/storage/docs/access-control/using-iam-permissions#bucket-add)
  in [Set and manage IAM policies on buckets](https://cloud.google.com/storage/docs/access-control/using-iam-permissions).

The Google Cloud Storage connector dependencies:

```bash CLI, Python theme={null}
pip install "unstructured-ingest[gcs]"
```

You might also need to install additional dependencies, depending on your needs. [Learn more](/open-source/ingestion/ingest-dependencies).

The following environment variables:

* `GCS_SERVICE_ACCOUNT_KEY` - The Google Cloud service account key for Google Cloud Storage, represented by `--service-account-key` (CLI) or `service_account_key` (Python).
* `GCS_REMOTE_URL` - The Google Cloud Storage bucket URL, represented by `--remote-url` (CLI) or `remote_url` (Python).

Now call the Unstructured CLI or Python. The destination connector can be any of the ones supported. This example uses the local destination connector:

This example sends data to Unstructured for processing by default. To process files locally instead, see the instructions at the end of this page.

<CodeGroup>
  ```bash CLI theme={null}
  #!/usr/bin/env bash

  unstructured-ingest \
    gcs \
      --service-account-key $GCS_SERVICE_ACCOUNT_KEY \
      --remote-url $GCS_REMOTE_URL \
      --output-dir $LOCAL_FILE_OUTPUT_DIR \
      --num-processes 2 \
      --recursive \
      --verbose \
      --strategy hi_res \
      --partition-by-api \
      --api-key $UNSTRUCTURED_API_KEY \
      --partition-endpoint $UNSTRUCTURED_API_URL
  ```

  ```python Python Ingest theme={null}
  import os

  from unstructured_ingest.pipeline.pipeline import Pipeline
  from unstructured_ingest.interfaces import ProcessorConfig

  from unstructured_ingest.processes.connectors.fsspec.gcs import (
      GcsIndexerConfig,
      GcsDownloaderConfig,
      GcsConnectionConfig,
      GcsAccessConfig
  )

  from unstructured_ingest.processes.connectors.local import LocalUploaderConfig

  from unstructured_ingest.processes.partitioner import PartitionerConfig
  from unstructured_ingest.processes.chunker import ChunkerConfig
  from unstructured_ingest.processes.embedder import EmbedderConfig

  # Chunking and embedding are optional.

  if __name__ == "__main__":
      Pipeline.from_configs(
          context=ProcessorConfig(),
          indexer_config=GcsIndexerConfig(remote_url=os.getenv("GCS_REMOTE_URL")),
          downloader_config=GcsDownloaderConfig(download_dir=os.getenv("LOCAL_FILE_DOWNLOAD_DIR")),
          source_connection_config=GcsConnectionConfig(
              access_config=GcsAccessConfig(
                  service_account_key=os.getenv("GCS_SERVICE_ACCOUNT_KEY")
              )
          ),
          partitioner_config=PartitionerConfig(
              partition_by_api=True,
              api_key=os.getenv("UNSTRUCTURED_API_KEY"),
              partition_endpoint=os.getenv("UNSTRUCTURED_API_URL"),
              strategy="hi_res",
              additional_partition_args={
                  "split_pdf_page": True,
                  "split_pdf_allow_failed": True,
                  "split_pdf_concurrency_level": 15
              }
          ),
          chunker_config=ChunkerConfig(chunking_strategy="by_title"),
          embedder_config=EmbedderConfig(embedding_provider="huggingface"),
          uploader_config=LocalUploaderConfig(output_dir=os.getenv("LOCAL_FILE_OUTPUT_DIR"))
      ).run()
  ```
</CodeGroup>

For the Unstructured Ingest CLI and the Unstructured Ingest Python library, you can use the `--partition-by-api` option (CLI) or `partition_by_api` (Python) parameter to specify where files are processed:

* To do local file processing, omit `--partition-by-api` (CLI) or `partition_by_api` (Python), or explicitly specify `partition_by_api=False` (Python).

  Local file processing does not use an Unstructured API key or API URL, so you can also omit the following, if they appear:

  * `--api-key $UNSTRUCTURED_API_KEY` (CLI) or `api_key=os.getenv("UNSTRUCTURED_API_KEY")` (Python)
  * `--partition-endpoint $UNSTRUCTURED_API_URL` (CLI) or `partition_endpoint=os.getenv("UNSTRUCTURED_API_URL")` (Python)
  * The environment variables `UNSTRUCTURED_API_KEY` and `UNSTRUCTURED_API_URL`

* To send files to the legacy [Unstructured Partition Endpoint](/api-reference/legacy-api/partition/overview) for processing, specify `--partition-by-api` (CLI) or `partition_by_api=True` (Python).

  Unstructured also requires an Unstructured API key and API URL, by adding the following:

  * `--api-key $UNSTRUCTURED_API_KEY` (CLI) or `api_key=os.getenv("UNSTRUCTURED_API_KEY")` (Python)
  * `--partition-endpoint $UNSTRUCTURED_API_URL` (CLI) or `partition_endpoint=os.getenv("UNSTRUCTURED_API_URL")` (Python)
  * The environment variables `UNSTRUCTURED_API_KEY` and `UNSTRUCTURED_API_URL`, representing your API key and API URL, respectively.

  <Note>
    You must specify the API URL only if you are not using the default API URL for Unstructured Ingest, which applies to **Let's Go**, **Pay-As-You-Go**, and **Business SaaS** accounts.

    The default API URL for Unstructured Ingest is `https://api.unstructuredapp.io/general/v0/general`, which is the API URL for the legacy[Unstructured Partition Endpoint](/api-reference/legacy-api/partition/overview). However, you should always use the URL that was provided to you when your Unstructured account was created. If you do not have this URL, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

    If you do not have an API key, [get one now](/api-reference/legacy-api/partition/overview).

    If you are using a **Business** account, the process
    for generating Unstructured API keys, and the Unstructured API URL that you use, are different.
    For instructions, see your Unstructured account administrator, or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).
  </Note>

---

# Source: https://docs.unstructured.io/ui/sources/google-cloud.md

# Source: https://docs.unstructured.io/ui/destinations/google-cloud.md

# Source: https://docs.unstructured.io/api-reference/workflow/sources/google-cloud.md

# Source: https://docs.unstructured.io/api-reference/workflow/destinations/google-cloud.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Google Cloud Storage

<Note>
  If you're new to Unstructured, read this note first.

  Before you can create a destination connector, you must first sign in to your Unstructured account:

  * If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
    After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).
    To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/api-reference/overview#pricing).
  * If you already have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at
    [https://platform.unstructured.io](https://platform.unstructured.io). For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
    or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  After you sign in, the [Unstructured user interface](/ui/overview) (UI) appears, which you use to get your Unstructured API key.

  1. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business** account, click **API Keys** on the sidebar.<br />

     For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
     for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).

  2. Click **Generate API Key**.<br />

  3. Follow the on-screen instructions to finish generating the key.<br />

  4. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

  After you create the destination connector, add it along with a
  [source connector](/api-reference/workflow/sources/overview) to a [workflow](/api-reference/workflow/overview#workflows).
  Then run the worklow as a [job](/api-reference/workflow/overview#jobs). To learn how, try out the
  the notebook [Dropbox-To-Pinecone Connector API Quickstart for Unstructured](https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Dropbox_To_Pinecone_Connector_Quickstart.ipynb),
  or watch the two 4-minute video tutorials for the [Unstructured Python SDK](/api-reference/workflow/overview#unstructured-python-sdk).

  You can also create destination connectors with the Unstructured user interface (UI).
  [Learn how](/ui/destinations/overview).

  If you need help, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  You are now ready to start creating a destination connector! Keep reading to learn how.
</Note>

Send processed data from Unstructured to Google Cloud Storage.

The requirements are as follows.

<iframe width="560" height="315" src="https://www.youtube.com/embed/R57duYCxLi8" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* A Google Cloud service account. [Create a service account](https://cloud.google.com/iam/docs/service-accounts-create#console).

* A service account key for the service account. See [Create a service account key](https://cloud.google.com/iam/docs/keys-create-delete#creating) in
  [Create and delete service account keys](https://cloud.google.com/iam/docs/keys-create-delete).

  To ensure maximum compatibility across Unstructured service offerings, you should give the service account key information to Unstructured as
  a single-line string that contains the contents of the downloaded service account key file (and not the service account key file itself).
  To print this single-line string without line breaks, suitable for copying, you can run one of the following commands from your Terminal or Command Prompt.
  In this command, replace `<path-to-downloaded-key-file>` with the path to the service account key file that you downloaded by following the preceding instructions.

  * For macOS or Linux:
    ```text  theme={null}
    tr -d '\n' < <path-to-downloaded-key-file>
    ```
  * For Windows:
    ```text  theme={null}
    (Get-Content -Path "<path-to-downloaded-key-file>" -Raw).Replace("`r`n", "").Replace("`n", "")
    ```

* The URI for a Google Cloud Storage bucket. This URI consists of the target bucket name, plus any target folder within the bucket, expressed as `gs://<bucket-name>[/folder-name]`. [Create a bucket](https://cloud.google.com/storage/docs/creating-buckets#console).

  This bucket must have, at minimum, one of the following roles applied to the target Google Cloud service account:

  * `Storage Object Viewer` for bucket read access.
  * `Storage Object User` for bucket write access.
  * The `Storage Object Admin` role provides read and write access, plus access to additional bucket operations.

  To apply one of these roles to a service account for a bucket, see [Add a principal to a bucket-level policy](https://cloud.google.com/storage/docs/access-control/using-iam-permissions#bucket-add)
  in [Set and manage IAM policies on buckets](https://cloud.google.com/storage/docs/access-control/using-iam-permissions).

To create a Google Cloud Storage destination connector, see the following examples.

<CodeGroup>
  ```python Python SDK theme={null}
  import os

  from unstructured_client import UnstructuredClient
  from unstructured_client.models.operations import CreateDestinationRequest
  from unstructured_client.models.shared import CreateDestinationConnector

  with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
      response = client.destinations.create_destination(
          request=CreateDestinationRequest(
              create_destination_connector=CreateDestinationConnector(
                  name="<name>",
                  type="gcs",
                  config={
                      "remote_url": "<remote-url>",
                      "service_account_key": "<service-account-key>"
                  }
              )
          )
      )

      print(response.destination_connector_information)
  ```

  ```bash curl theme={null}
  curl --request 'POST' --location \
  "$UNSTRUCTURED_API_URL/destinations" \
  --header 'accept: application/json' \
  --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
  --header 'content-type: application/json' \
  --data \
  '{
      "type": "gcs",
      "name": "<name>",
      "config": {
          "remote_url": "<remote-url>",
          "service_account_key": "<service-account-key>"
      }
  }'
  ```
</CodeGroup>

Replace the preceding placeholders as follows:

* `<name>` (*required*) - A unique name for this connector.
* `<service-account-key>` (*required*) - The contents of a service account key file, expressed as a single string without line breaks, for a Google Cloud service account that has the required access permissions to the bucket.
* `<remote-url>` (*required*) - The URI for the Google Cloud Storage bucket and any target folder path within the bucket. This URI takes the format `gs://<bucket-name>[/folder-name]`.
* For `recursive` (source connector only), set to `true` to ingest data recursively from any subfolders, starting from the path specified by `<remote-url>`. The default is `false` if not otherwise specified.

---

# Source: https://docs.unstructured.io/examplecode/tools/google-drive-events.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Google Drive event triggers

You can use Google Drive events, such as adding new files to—or updating existing files in—Google Drive shared folders or shared drives, to automatically run Unstructured ETL+ workflows
that rely on those folders or drives as sources. This enables a no-touch approach to having Unstructured automatically process new and updated files in Google Drive as they are added or updated.

This example shows how to automate this process by adding a custom [Google Apps Script](https://developers.google.com/apps-script) project in your Google account. This project runs
a script on a regular time interval. This script automatically checks for new or updated files within the specified Google Drive shared folder or shared drive. If the script
detects at least one new or updated file, it then calls the [Unstructured API's workflow operations](/api-reference/workflow/overview) to automatically run the
specified corresponding Unstructured ETL+ workflow in your Unstructured account.

<Note>
  This example uses a custom Google Apps Script that you create and maintain.
  Any issues with file detection, timing, or script execution could be related to your custom script,
  rather than with Unstructured. If you are getting unexpected or no results, be sure to check your custom
  script's execution logs first for any informational and error messages.
</Note>

## Requirements

To use this example, you will need the following:

* An Unstructured account, and an Unstructured API key for your account, as follows:

  1. If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
     After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).

     <Note>
       To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/api-reference/overview#pricing).
     </Note>

  2. If you have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at [https://platform.unstructured.io](https://platform.unstructured.io).

     <Note>
       For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
       or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).
     </Note>

  3. Get your Unstructured API key:<br />

     a. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business** account, click **API Keys** on the sidebar.<br />

     <Note>
       For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
       for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).
     </Note>

     b. Click **Generate API Key**.<br />
     c. Follow the on-screen instructions to finish generating the key.<br />
     d. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

* The Unstructured API's workflow operations URL for your account, as follows:

  1. In the Unstructured UI, click **API Keys** on the sidebar.<br />
  2. Note the value of the **Unstructured API's workflow operations** field.

* A Google Drive source connector in your Unstructured account. [Learn how](/ui/sources/google-drive).

* Some available [destination connector](/ui/destinations/overview) in your Unstructured account.

* A workflow that uses the preceding source and destination connectors. [Learn how](/ui/workflows).

## Step 1: Create the Google Apps Script project

1. Sign in to your Google account.
2. Go to [http://script.google.com/](http://script.google.com/).
3. Click **+ New project**.
4. Click the new project's default name (such as **Untitled project**), and change it to something more descriptive, such as **Unstructured ETL Scripts**.

## Step 2: Add the script

1. With the project still open, on the sidebar, click the **\< >** (**Editor**) icon.

2. In the **Files** tab, click **Code.gs**.

3. Replace the contents of the `Code.gs` file with the following code instead:

   ```javascript  theme={null}
   function checkForNewOrUpdatedFiles() {
     const folder = DriveApp.getFolderById(FOLDER_ID);
     const files = folder.getFiles();
     const now = new Date();
     const thresholdMillis = 5 * 60 * 1000; // 5 minutes (adjust as needed).
     
     while (files.hasNext()) {
       const file = files.next();
       const created = file.getDateCreated();
       const lastUpdated = file.getLastUpdated();
       const fileName = file.getName();

       // Calculate time differences.
       const millisSinceCreated = now - created;
       const createdWithinThreshold = millisSinceCreated < thresholdMillis;
       const millisSinceUpdated = now - lastUpdated;
       const updatedWithinThreshold = millisSinceUpdated < thresholdMillis;

       // Log file details and calculations.
       console.log('File Name:                       ' + fileName);
       console.log('Created:                         ' + created);
       console.log('Last updated:                    ' + lastUpdated);
       console.log('Milliseconds since created:      ' + millisSinceCreated);
       console.log('Milliseconds since last updated: ' + millisSinceUpdated);
       console.log('Created within threshold of ' + thresholdMillis + ' milliseconds? ' + createdWithinThreshold);
       console.log('Updated within threshold of ' + thresholdMillis + ' milliseconds? ' + updatedWithinThreshold);
       console.log('-----')

       // If at least one file was created or updated within the last 5 minutes...
       if ((createdWithinThreshold) || (updatedWithinThreshold)) {
         // ...then make the HTTP POST request.
         UrlFetchApp.fetch(UNSTRUCTURED_API_URL, {
           method: 'post',
           headers: {
             'accept': 'application/json',
             'unstructured-api-key': UNSTRUCTURED_API_KEY
           }
         });
         // Then stop the script after the first fetch (no need to check any more files).
         console.log('At least one file created or updated within threshold of ' + thresholdMillis + ' milliseconds.')
         console.log('Unstructured workflow request sent to ' + UNSTRUCTURED_API_URL)
         return;
       }
     }
     console.log('No files created or updated within threshold of ' + thresholdMillis + ' milliseconds. No Unstructured workflow request sent.')
   }
   ```

4. Click the **Save project to Drive** button.

## Step 3: Customize the script for your workflow

1. With the project still open, on the **Files** tab, click the **Add a file** button, and then click **Script**.

2. Name the new file `Constants`. The `.gs` extension is added automatically.

3. Replace the contents of the `Constants.gs` file with the following code instead:

   ```javascript  theme={null}
   const FOLDER_ID = '<folder-id>';
   const UNSTRUCTURED_API_URL = '<unstructured-api-url>' + '/workflows/<workflow-id>/run';
   const UNSTRUCTURED_API_KEY = '<unstructured-api-key>';
   ```

   Replace the following placeholders:

   * Replace `<folder-id>` with the ID of your Google Drive shared folder or shared drive. This is the same ID that you specified
     when you created your Google Drive source connector in your Unstructured account.
   * Replace `<unstructured-api-url>` with your Unstructured API URL value.
   * Replace `<workflow-id>` with the ID of your Unstructured workflow.
   * Replace `<unstructured-api-key>` with your Unstructured API key value.

4. Click the disk (**Save project to Drive**) icon.

## Step 4: Create the script trigger

1. With the project still open, on the sidebar, click the alarm clock (**Triggers**) icon.

2. Click the **+ Add Trigger** button.

3. Set the following values:

   * For **Choose which function to run**, select `checkForNewOrUpdatedFiles`.

   * For **Choose which deployment should run**, select **Head**.

   * For **Select event source**, select **Time-driven**.

   * For **Select type of time based trigger**, select **Minutes timer**.

   * For **Select minute interval**, select **Every 5 minutes**.

     <Note>
       If you change **Minutes timer** or **Every 5 minutes** to a different interval, you should also go back and change the number `5` in the following
       line of code in the `checkForNewOrUpdatedFiles` function. Change the number `5` to the number of minutes that correspond to the alternate interval you
       selected:

       ```javascript  theme={null}
       const thresholdMillis = 5 * 60 * 1000;
       ```
     </Note>

   * For **Failure notification settings**, select an interval such as immediately, hourly, or daily.

4. Click **Save**.

## Step 5: View trigger results

1. With the project still open, on the sidebar, click the three lines (**Executions**) icon.

2. As soon as the first script execution completes, you should see a corresponding message appear in the **Executions** list. If the **Status** column shows
   **Completed**, then keep going with this procedure.

   If the **Status** column shows **Failed**, expand the message to
   get any details about the failure. Fix the failure, and then wait for the next script execution to complete.

3. When the **Status** column shows **Completed** then, in your Unstructured account, click **Jobs** on the sidebar to see if a new job
   is running for that worklow.

   If no new job is running for that workflow, then add at least one new file to—or update at least one existing file in—the Google Drive shared folder or shared drive,
   within 5 minutes of the next script execution. After the next script execution, check the **Jobs** list again.

## Step 6 (Optional): Delete the trigger

1. To stop the script from automatically executing on a regular basis, with the project still open, on the sidebar, click the alarm clock (**Triggers**) icon.
2. Rest your mouse pointer on the trigger you created in Step 4.
3. Click the ellipsis (three dots) icon, and then click **Delete trigger**.

---

# Source: https://docs.unstructured.io/support/issues/google-drive-schema-validation.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Google Drive schema validation failure

## Issue

When you save and test a Google Drive source connector, you get an error that contains text similar to the following:

```text  theme={null}
Schema validation failure for GoogleDriveConnectionConfig: 
Value error, Expecting value: line 1 column 1 (char 0) at 
access_config,service_account_key got -----BEGIN PRIVATE KEY-----
...
```

Although you can finish creating and saving this connector, the connection test fails, and you cannot successfully run any
workflows that use this failed connector.

## Cause

You have configured the Google Drive source connector with the Google Drive service account's key in an incorrect format.
You likely have specified only a portion of the file, such as just the private key part. Unstructured expects the entire contents
of the Google Drive service account's key file.

## Solution

1. Find the original Google Drive service account's key file in JSON format, which you downloaded earlier from your Google Cloud account when
   you originally created the Google Drive source connector.
2. Open the file in a text editor such as Notepad, Visual Studio Code, or TextEdit.
3. Copy everything in the file, starting from the first opening brace `{` to the final closing brace `}`.
4. In the original Google Drive source connector's settings, delete the original contents of the **Service Account Key** field, and paste
   the complete JSON content that you just copied.
5. Save the changes and re-test and re-save the connector.

## Additional resources

* If the connector test still fails, try using a tool such as `tr` for macOS or Linux, or PowerShell for Windows, to copy
  the entire contents of the file, instead of a text editor, as follows:

  * For macOS or Linux:

    ```text  theme={null}
    tr -d '\n' < <path-to-downloaded-key-file>
    ```

  * For Windows:

    ```text  theme={null}
    (Get-Content -Path "<path-to-downloaded-key-file>" -Raw).Replace("`r`n", "").Replace("`n", "")
    ```

* To ask questions or get additional help with this issue, see [requesting support](/support/request).

---

# Source: https://docs.unstructured.io/ui/sources/google-drive.md

# Source: https://docs.unstructured.io/open-source/ingestion/source-connectors/google-drive.md

# Source: https://docs.unstructured.io/api-reference/workflow/sources/google-drive.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Google Drive

<Note>
  If you're new to Unstructured, read this note first.

  Before you can create a source connector, you must first sign in to your Unstructured account:

  * If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
    After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).
    To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/api-reference/overview#pricing).
  * If you already have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at
    [https://platform.unstructured.io](https://platform.unstructured.io). For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
    or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  After you sign in, the [Unstructured user interface](/ui/overview) (UI) appears, which you use to get your Unstructured API key, as follows:

  1. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business** account, click **API Keys** on the sidebar.<br />

     For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
     for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).

  2. Click **Generate API Key**.<br />

  3. Follow the on-screen instructions to finish generating the key.<br />

  4. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

  After you create the source connector, add it along with a
  [destination connector](/api-reference/workflow/destinations/overview) to a [workflow](/api-reference/workflow/overview#workflows).
  Then run the worklow as a [job](/api-reference/workflow/overview#jobs). To learn how, try out the
  the notebook [Dropbox-To-Pinecone Connector API Quickstart for Unstructured](https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Dropbox_To_Pinecone_Connector_Quickstart.ipynb),
  or watch the two 4-minute video tutorials for the [Unstructured Python SDK](/api-reference/workflow/overview#unstructured-python-sdk).

  You can also create source connectors with the Unstructured user interface (UI).
  [Learn how](/ui/sources/overview).

  If you need help, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  You are now ready to start creating a source connector! Keep reading to learn how.
</Note>

Ingest your files into Unstructured from Google Drive.

The requirements are as follows.

<iframe width="560" height="315" src="https://www.youtube.com/embed/ZiRD7hC-wh8" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* A [Google Cloud account](https://console.cloud.google.com).

* The **Google Drive API** enabled in the account.
  [Learn how](https://cloud.google.com/apis/docs/getting-started#enabling_apis).

* Within the account, a Google Cloud service account and its related `credentials.json` key file or its contents in JSON format.
  [Create a service account](https://developers.google.com/workspace/guides/create-credentials#create_a_service_account).
  [Create credentials for a service account](https://developers.google.com/workspace/guides/create-credentials#create_credentials_for_a_service_account).

  To ensure maximum compatibility across Unstructured service offerings, you should give the service account key information to Unstructured as
  a single-line string that contains the contents of the downloaded service account key file (and not the service account key file itself).
  To print this single-line string without line breaks, suitable for copying, you can run one of the following commands from your Terminal or Command Prompt.
  In this command, replace `<path-to-downloaded-key-file>` with the path to the `credentials.json` key file that you downloaded by following the preceding instructions.

  * For macOS or Linux:

    ```text  theme={null}
    tr -d '\n' < <path-to-downloaded-key-file>
    ```

  * For Windows:

    ```text  theme={null}
    (Get-Content -Path "<path-to-downloaded-key-file>" -Raw).Replace("`r`n", "").Replace("`n", "")
    ```

* A Google Drive [shared folder](https://support.google.com/drive/answer/2375091) or [shared drive](https://support.google.com/a/users/answer/7212025).

* Give the service account access to the shared folder or shared drive. To do this, share the folder or drive with the service account's email address.
  [Learn how](https://support.google.com/drive/answer/7166529).
  [Learn more](https://www.googlecloudcommunity.com/gc/Workspace-Q-A/Can-i-give-access-to-document-of-google-drive-to-service-account/m-p/530106).

* Get the shared folder's ID or shared drive's ID. This is a part of the URL for your Google Drive shared folder or shared drive, represented in the following URL as `{folder_id}`: `https://drive.google.com/drive/folders/{folder-id}`.

## Document permissions metadata

The source connector outputs any permissions information that it can find in the source location about the processed source documents and associates that information with each
corresponding element that is generated. This permissions information is output into the `permissions_data` field, which is within the
`data_source` field under the element's `metadata` field. This information lists the users or groups, if any, that have
permissions to read, update, or delete the element's associated source document.

The following example shows what the output looks like. Ellipses indicate content that has been omitted from this example for brevity.

```json  theme={null}
[
    {
        "...": "...",
        "metadata": {
            "...": "...",
            "data_source": {
                "...": "...",
                "permissions_data": [
                    {
                        "read": {
                            "users": [
                                "11111111111111111111"
                            ],
                            "groups": [
                                "22222222222222222222",
                                "33333333333333333333"
                            ]
                        }
                    },
                    {
                        "update": {
                            "users": [
                                "44444444444444444444",
                                "55555555555555555555"
                            ],
                            "groups": [
                                "66666666666666666666",
                            ]
                        }
                    },
                    {
                        "delete": {
                            "users": [
                                "77777777777777777777"
                            ],
                            "groups": [
                                "88888888888888888888"
                            ]
                        }
                    }
                ],
                "...": "..."
            }
        }
    }
]
```

To look up information about a particular Google Cloud user, use the user's ID along with the
[Admin SDK API](https://developers.google.com/workspace/admin/directory/reference/rest/v1/users/get) or the
[People API](https://developers.google.com/people/api/rest/v1/people/get) for Google Cloud.

To look up information about a particular Google Cloud group, use the group's ID along with the
[Admin SDK API](https://developers.google.com/workspace/admin/directory/reference/rest/v1/groups/get) or the
[Cloud Identity API](https://cloud.google.com/identity/docs/reference/rest/v1/groups/get) for Google Cloud.

To create a Google Drive source connector, see the following examples.

<CodeGroup>
  ```python Python SDK theme={null}
  import os

  from unstructured_client import UnstructuredClient
  from unstructured_client.models.operations import CreateSourceRequest
  from unstructured_client.models.shared import CreateSourceConnector

  with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
      response = client.sources.create_source(
          request=CreateSourceRequest(
              create_source_connector=CreateSourceConnector(
                  name="<name>",
                  type="google_drive",
                  config={
                      "drive_id": "<drive-id>",
                      "service_account_key": "<service-account-key>",
                      "extensions": [
                          "<extension>",
                          "<extension>"
                      ],
                      "recursive": <True|False>
                  }
              )
          )
      )

      print(response.source_connector_information)
  ```

  ```bash curl theme={null}
  curl --request 'POST' --location \
  "$UNSTRUCTURED_API_URL/sources" \
  --header 'accept: application/json' \
  --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
  --header 'content-type: application/json' \
  --data \
  '{
      "name": "<name>",
      "type": "google_drive",
      "config": {
          "drive_id": "<drive-id>",
          "service_account_key": "<service-account-key>",
          "extensions": [
              "<extension>",
              "<extension>"
          ],
          "recursive": <true|false>
      }
  }'
  ```
</CodeGroup>

Replace the preceding placeholders as follows:

* `<name>` (*required*) - A unique name for this connector.

* `<drive-id>` - The ID for the target Google Drive folder or drive.

* `<service-account-key>` - The contents of the `credentials.json` key file as a single-line string.

* For `extensions`, set one or more `<extension>` values (such as `pdf` or `docx`) to process files with only those extensions. The default is to include all extensions.

  <Note>
    Do not include the leading dot in the file extensions. For example, use `pdf` or `docx` instead of `.pdf` or `.docx`.
  </Note>

* Set `recursive` to `true` to recursively process data from subfolders within the target folder or drive. The default is `false` if not otherwise specified.

---

# Source: https://docs.unstructured.io/examplecode/codesamples/api/huggingchat.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Query processed PDF with HuggingChat

This example uses the [Unstructured Ingest Python library](/open-source/ingestion/python-ingest) or the
[Unstructured JavaScript/TypeScript SDK](/api-reference/legacy-api/partition/sdk-jsts) to send a PDF file to
the legacy [Unstructured Partition Endpoint](/api-reference/legacy-api/partition/overview) for processing. Unstructured processes the PDF and extracts the PDF's content.
This example then sends some of the content to [HuggingChat](https://huggingface.co/chat/), Hugging Face's open-source AI chatbot,
along with some queries about this content.

To run this example, you'll need:

* The [hugchat](https://pypi.org/project/hugchat/) package for Python, or the [huggingface-chat](https://www.npmjs.com/package/huggingface-chat) package for JavaScript/TypeScript.
* Your Unstructured API key and API URL. [Get an API key and API URL](/api-reference/legacy-api/partition/overview).
* Your Hugging Face account's email address and account password. [Get an account](https://huggingface.co/join).
* A PDF file for Unstructured to process. This example uses a sample PDF file containing the text of the United States Constitution,
  available for download from [https://constitutioncenter.org/media/files/constitution.pdf](https://constitutioncenter.org/media/files/constitution.pdf).

These environment variables:

* `UNSTRUCTURED_API_KEY` - Your Unstructured API key value.
* `UNSTRUCTURED_API_URL` - Your Unstructured API URL.

These environment variables:

* `HUGGING_FACE_EMAIL`, representing your Hugging Face account's email address.
* `HUGGING_FACE_PASSWORD`, representing your Hugging Face account's email address.
* `HUGGING_FACE_COOKIE_DIR_PATH`, representing a login cookie cache path, typically `./cookies/` or `./login_cache/`.

This example's code is as follows:

<AccordionGroup>
  <Accordion title="Ingest Python library">
    ```python Python theme={null}
    from unstructured_ingest.pipeline.pipeline import Pipeline
    from unstructured_ingest.interfaces import ProcessorConfig
    from unstructured_ingest.processes.connectors.local import (
        LocalIndexerConfig,
        LocalDownloaderConfig,
        LocalConnectionConfig,
        LocalUploaderConfig
    )
    from unstructured_ingest.processes.partitioner import PartitionerConfig

    import os, json
    from hugchat import hugchat
    from hugchat.login import Login
        
    def generate_json_from_local(
            input_path: str,
            output_dir: str,
            partition_by_api: bool = False,
            api_key: str = None,
            partition_endpoint: str = None,
            split_pdf_page: bool = True,
            split_pdf_allow_failed: bool = True,
            split_pdf_concurrency_level: int = 15
        ):
        Pipeline.from_configs(
            context=ProcessorConfig(),
            indexer_config=LocalIndexerConfig(input_path=input_path),
            downloader_config=LocalDownloaderConfig(),
            source_connection_config=LocalConnectionConfig(),
            partitioner_config=PartitionerConfig(
                partition_by_api=partition_by_api,
                api_key=api_key,
                partition_endpoint=partition_endpoint,
                additional_partition_args={
                    "split_pdf_page": split_pdf_page,
                    "split_pdf_allow_failed": split_pdf_allow_failed,
                    "split_pdf_concurrency_level": split_pdf_concurrency_level
                }
            ),
            uploader_config=LocalUploaderConfig(output_dir=output_dir)
        ).run()

    def extract_matching_texts_from_local(input_json_file_path: str, text_to_match: str) -> str:
        voting_texts = ""

        with open(input_json_file_path, 'r') as file:
            file_elements = json.load(file)

        for element in file_elements:
            if text_to_match in element["text"]:
                voting_texts += " " + element["text"]

        return voting_texts

    def log_in_to_hugging_face(email: str, passwd: str, cookie_dir_path: str) -> hugchat.ChatBot:
        sign = Login(
            email=email,
            passwd=passwd
        )

        cookies = sign.login(cookie_dir_path=cookie_dir_path)

        return hugchat.ChatBot(cookies=cookies.get_dict())

    if __name__ == "__main__":
        generate_json_from_local(
            input_path=os.getenv("LOCAL_FILE_INPUT_DIR"),
            output_dir=os.getenv("LOCAL_FILE_OUTPUT_DIR"),
            partition_by_api=True,
            api_key=os.getenv("UNSTRUCTURED_API_KEY"),
            partition_endpoint=os.getenv("UNSTRUCTURED_API_URL")
        )

        chatbot = log_in_to_hugging_face(
            email=os.getenv("HUGGING_FACE_EMAIL"),
            passwd=os.getenv("HUGGING_FACE_PASSWORD"),
            cookie_dir_path=os.getenv("HUGGING_FACE_COOKIE_DIR_PATH")
        )

        voting_texts = extract_matching_texts_from_local(
            input_json_file_path=f"{os.getenv("LOCAL_FILE_OUTPUT_DIR")}/constitution.json",
            text_to_match="vot"
        )

        print("\n-----\n")
        print("Querying HuggingChat...")
        print("\n-----\n")

        req = f"Given the following information, what is the minimum voting age in the United States? {voting_texts}"
        print(req)
        print("\n-----\n")
        print(chatbot.chat(text=req))

        print("\n-----\n")
        print("Querying HuggingChat again...")
        print("\n-----\n")

        follow_up = "And when were women given the right to vote in the United States?"
        print(follow_up)
        print("\n-----\n")

        print(chatbot.chat(text=follow_up))
    ```
  </Accordion>

  <Accordion title="JavaScript/TypeScript SDK">
    <Warning>
      Unstructured recommends that you use the [Unstructured Ingest Python library](/open-source/ingestion/python-ingest) instead.

      The Ingest Python library provides faster processing of larger individual files, and faster and easier processing of multiple files at a time in batches.
    </Warning>

    ```typescript TypeScript theme={null}
    import { UnstructuredClient } from "unstructured-client";
    import { Strategy } from "unstructured-client/sdk/models/shared/index.js";
    import { PartitionResponse } from "unstructured-client/sdk/models/operations";

    import * as fs from "fs";
    import * as path from 'path';
    import { fileURLToPath } from 'url';
    import { Login, ChatBot} from "huggingface-chat";

    const key = process.env.UNSTRUCTURED_API_KEY || '';
    const url = process.env.UNSTRUCTURED_API_URL || '';
    const username = process.env.HUGGING_FACE_EMAIL || '';
    const userpassword = process.env.HUGGING_FACE_PASSWORD || '';
    const cookieDirPath = process.env.HUGGING_FACE_COOKIE_DIR_PATH || '';

    const inputFilepath = path.dirname(fileURLToPath(import.meta.url)) + 
                          "/local-ingest-pdf-source/constitution.pdf"

    async function getChatBot(email: string, password: string, cachePath: string): Promise<ChatBot> {
        const signin = new Login(email, password)
        const result = await signin.login(cachePath)
        const chat = new ChatBot(result)

        return chat
    }

    async function queryAndResponse(chatbot: ChatBot, query: string) {
        let data = await chatbot.chat(query)

        if (! data) {
            console.log("No data available.");
            return;
        }
        
        let reader = data.stream?.getReader();

        if (! reader) {
            console.log("No stream available.");
            return;
        }

        while (true) {
            const { done, value } = await reader?.read();
            if (done) break;
            process.stdout.write(value);
        }    
    }

    const client = new UnstructuredClient({
        security: { apiKeyAuth: key },
        serverURL: new URL(url).origin
    });

    const data = fs.readFileSync(inputFilepath);

    client.general.partition({
        partitionParameters: {
            files: {
                content: data,
                fileName: inputFilepath
            },
            strategy: Strategy.HiRes,
            splitPdfPage: true,
            splitPdfAllowFailed: true,
            splitPdfConcurrencyLevel: 15
        }
    }).then(async (res: PartitionResponse) => {
        if (res.statusCode == 200) {
            let voting_texts = ""

            if (res)
                for (const element of res) {
                    if (element.text.includes("vot")) {
                        voting_texts += " " + element.text;
                    }
            }

            let chat = await getChatBot(username, userpassword, cachePath)
            
            await chat.initialize()

            const firstQuery = 'Given the following information, what is the minimum voting age in the United States?'
            const secondQuery = 'And when were women given the right to vote in the United States?'

            console.log(`${firstQuery} ${voting_texts}\n\n-----\n`)
            await queryAndResponse(chat, firstQuery)
            console.log(`\n\n-----\n`)
            console.log(`${secondQuery}\n\n-----\n`)
            await queryAndResponse(chat, secondQuery)
        }    
    }).catch((e) => {
        if (e.statusCode) {
            console.log(e.statusCode);
            console.log(e.body);
        } else {
            console.log(e);
        }
    });
    ```
  </Accordion>
</AccordionGroup>

---

# Source: https://docs.unstructured.io/examplecode/tools/ibm-orchestrate.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# IBM watsonx Orchestrate

[IBM watsonx Orchestrate](https://www.ibm.com/products/watsonx-orchestrate) helps you build, deploy and manage powerful
AI assistants and agents that automate workflows and processes with generative AI.

This article provides a hands-on, step-by-step walkthrough that uses IBM watsonx Orchestrate to
build a simple AI chat app. This chat app relies on data that is stored in an Astra DB or Milvus vector database. This data
is generated by Unstructured and is based on your organization's source documents and semi-structured data.

## Requirements

To use this example, you will need the following:

If you want to connect Unstructured to Astra DB for providing the source data to IBM watsonx Orchestrate, you will need the following:

* An [IBM Cloud account](https://cloud.ibm.com/registration) or [DataStax account](https://astra.datastax.com/signup) account.

* An [Astra DB database](https://accounts.datastax.com/session-service/v1/login).

* To complete this example, you will need the following settings for the Astra DB database:

  * The database's API endpoint.
  * An application token for the database.
  * The name of the target keyspace in the database.
  * The name of the target collection in the keyspace.

  To get these settings, see the [Astra DB destination connector](/ui/destinations/astradb) documentation.

* Within your Unstructured account, a workflow that contains an Astra DB destination connector.

  * Create an [Unstructured account](https://unstructured.io/?modal=try-for-free).
  * Create an [Astra DB destination connector](/ui/destinations/astradb) in your Unstructured account.
  * Create a [custom workflow](/ui/workflows#create-a-custom-workflow) that contains the Astra DB destination connector in your Unstructured account.

    The workflow must generate [embeddings](/ui/embedding). The workflow's **Embedder** node's selected embedding model provider must be **IBM**. The
    node's selected embedding model must match the embedding model that you will specify later in **Step 2**.

    Be sure to [run the workflow](/ui/workflows#edit%2C-delete%2C-or-run-a-workflow) to have Unstructured generate the data and store it in your Astra DB database.

After you meet the preceding requirements, skip ahead to [Step 1](#step-1%3A-add-an-ibm-watsonx-orchestrate-subscription-to-your-ibm-cloud-account).

If, however, you want to connect Unstructured to Milvus on IBM watsonx.data for providing the source data to IBM watsonx Orchestrate, you will need the following:

* Within your IBM Cloud account, an IBM watsonx.data subscription that contains a Milvus service instance.

  * Create an [IBM Cloud account](https://cloud.ibm.com/registration).

    To complete this example, you must [create an API key](https://www.ibm.com/docs/en/masv-and-l/cd?topic=cli-creating-your-cloud-api-key) for your IBM Cloud account.

  * Create an [IBM watsonx.data subscription](https://cloud.ibm.com/watsonxdata) in your IBM Cloud account.

  * Create a [Milvus service instance](/ui/destinations/milvus) within your IBM watsonx.data subscription plan. This instance
    must contain a database, a collection, and an index to store and manage the data that is generated by Unstructured.

    To complete this example, you will need the following settings for the Milvus service instance:

    * The instance's GRPC host value.
    * The instance's GRPC port value.
    * The name of the target database on the instance.
    * The name of the target collection in the database.
    * The name of the target index in the collection.

    To get these settings, see the [Milvus destination connector](/ui/destinations/milvus) documentation.

* Within your Unstructured account, a workflow that contains a Milvus destination connector.

  * Create an [Unstructured account](https://unstructured.io/signup).
  * Create a [Milvus destination connector](/ui/destinations/milvus) in your Unstructured account.
  * Create a [custom workflow](/ui/workflows#create-a-custom-workflow) that contains the Milvus destination connector in your Unstructured account.

    The workflow must generate [embeddings](/ui/embedding). The workflow's **Embedder** node's selected embedding model provider must be **IBM**. The
    node's selected embedding model must match the embedding model that you will specify later in **Step 2**.

    Be sure to [run the workflow](/ui/workflows#edit%2C-delete%2C-or-run-a-workflow) to have Unstructured generate the data and store it in your Milvus instance's database.

## Step 1: Add an IBM watsonx Orchestrate subscription to your IBM Cloud account

If you already have an IBM watsonx Orchestrate subscription, then skip ahead to [Step 2](#step-2%3A-create-the-chat-app-in-ibm-watsonx-orchestrate)

1. [Log in to your IBM Cloud account](https://cloud.ibm.com/login).
2. On the sidebar, click the **Resource list** icon. If the sidebar is not visible, click the **Navigation Menu** icon to the far left of the
   top navigation bar.
3. Click **Create resource**.
4. With **IBM Cloud catalog** selected, search for and select **watsonx Orchestrate**.
5. Complete the on-screen instructions to finish creating the IBM watsonx Orchestrate subscription.

## Step 2: Create the chat app in IBM watsonx Orchestrate

In this step, you use IBM watsonx Orchestrate to create a chat app. This chat app allows you to ask questions about
your organization's documents and semi-structured data. This data is stored in your Astra DB or Milvus vector database and was
generated by Unstructured in a format that is well-suited for your chat app.

1. Open your IBM watsonx Orchestrate subscription, if it is not already open. To do this:

   a. [Log in to your IBM Cloud account](https://cloud.ibm.com/login).<br />
   b. On the sidebar, click the **Resource list** icon. If the sidebar is not visible, click the **Navigation Menu** icon to the far left of the
   top navigation bar.<br />
   c. In the list of resources, expand **AI / Machine Learning**, and then click the target watsonx Orchestrate subscription.<br />
   d. Click **Launch watsonx Orchestrate**.<br />

2. If the **Chat** page is not already open, click the **Open the main menu** icon to the far left of the
   top navigation bar, and then click **Chat**.

3. Toward the bottom of the sidebar, click **Create new agent**.

4. On the **Create an agent** page, with **Create from scratch** already selected, enter some **Name** and **Description** for the agent, and then click **Create**.

5. Under **Knowledge source**, in the **Start by adding knowledge** tile, click **Choose knowledge**.

If you are using Astra DB for providing the source data to IBM watsonx Orchestrate, then do the following:

1. On the **Select source** page, click **Astra DB**, and then click **Next**.

2. On the **Connect Astra DB** page, specify the following settings:

   a. For **URL**, enter the API endpoint for the Astra DB database.<br />
   b. Leave **Astra DB port** blank.<br />
   c. For **API key**, enter the application token for your Astra DB database.<br />
   d. Click **Next**.<br />

3. On the **Settings details** page, specify the following settings:

   a. For **Keyspace**, select the name of the target keyspace in the Astra DB database.<br />
   b. For **Data type**, select **Collection**.<br />
   c. For **Collection**, select the name of the target collection in the database.<br />
   d. For **Embedding mode**, select **Client**.<br />
   e. For **Embedding model**, select the name of the embedding model that matches the one that you specified earlier in your Unstructured workflow.<br />
   f. For **Search mode**, select **Vector**.<br />
   g. For **Title**, enter `record_id`.<br />
   h. For **Body**, enter `content`.<br />
   i. Leave **URL** and **Advanced settings** blank.<br />
   j. Click **Next**.<br />

4. On the **Description** page, enter some description for the agent, and then click **Save**.

5. In the list of pages, click **Behavior**.

6. At the bottom of the **Behavior** section, turn on **Chat with documents**.

7. In the upper-right corner of the page, click **Deploy**.

8. On the **Pre-deployment summary** page, click **Deploy**.

Skip ahead to [Step 3](#step-3%3A-run-the-chat-app).

If, however, you are using Milvus on IBM watsonx.data for providing the source data to IBM watsonx Orchestrate, then do the following:

1. On the **Select source** page, click **Milvus**, and then click **Next**.

2. On the **Connect Milvus** page, specify the following settings:

   a. For **GRPC host**, enter the GRPC host value for the Milvus service instance within your IBM watsonx.data subscription.<br />
   b. For **GRPC port**, enter the GRPC port value for the instance.<br />
   c. For **Choose an authentication type**, select **Basic authentication**.<br />
   d. For **Username**, enter `ibmlhapikey`.<br />
   e. For **Password**, enter the API key for your IBM Cloud account.<br />
   f. Click **Next**.<br />

3. On the **Select index** page, specify the following settings:

   a. For **Database**, select the name of the target database on the Milvus service instance within your IBM watsonx.data subscription.<br />
   b. For **Use Collection or Alias**, select **Collection**.<br />
   c. For **Collection**, select the name of the target collection in the database.<br />
   d. For **Index**, select the name of the target index in the collection.<br />
   e. For **Embedding model**, select the name of the embedding model that matches the one that you specified earlier in your Unstructured workflow.<br />
   f. For **Title**, select **element\_id**.<br />
   g. For **Body**, select **text**.<br />
   h. Click **Next**.<br />

4. On the **Description** page, enter some description for the agent, and then click **Save**.

5. At the bottom of the **Behavior** section, turn on **Chat with documents**.

6. In the upper-right corner of the page, click **Deploy**.

7. On the **Pre-deployment summary** page, click **Deploy**.

## Step 3: Run the chat app

In this step, you ask questions about your organization's source documents and semi-structured data. The chat app then
attempts to answer your questions by searching the related data that Unstructured generated and stored in your Astra DB or Milvus vector database.

1. If the **Chat** page is not already open in IBM watsonx Orchestrate, click the **Open the main menu** icon to the far left of the
   top navigation bar, and then click **Chat**.
2. In the sidebar, in the **Agents** list, select the name of the agent that you created in the previous step.
3. In the **Type something** box, enter a question, and then press `Enter`.
4. The agent will provide an answer.
5. Keep asking as many questions as you want to.

---

# Source: https://docs.unstructured.io/ui/destinations/ibm-watsonxdata.md

# Source: https://docs.unstructured.io/open-source/ingestion/destination-connectors/ibm-watsonxdata.md

# Source: https://docs.unstructured.io/api-reference/workflow/destinations/ibm-watsonxdata.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# IBM watsonx.data

<Tip>
  The IBM watsonx.data destination connector relies on an Apache Iceberg-based catalog within the watsonx.data data store instance.
  Apache Iceberg is suitable for managed data storage and cataloging, but not for embedding storage or semantic similarity
  queries. For embedding storage and semantic similarity queries, Unstructured recommends that you use the following destination connectors
  instead:

  * [Astra DB](/api-reference/workflow/destinations/astradb)
  * [Milvus](/api-reference/workflow/destinations/milvus) on IBM watsonx.data
</Tip>

<Note>
  If you're new to Unstructured, read this note first.

  Before you can create a destination connector, you must first sign in to your Unstructured account:

  * If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
    After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).
    To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/api-reference/overview#pricing).
  * If you already have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at
    [https://platform.unstructured.io](https://platform.unstructured.io). For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
    or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  After you sign in, the [Unstructured user interface](/ui/overview) (UI) appears, which you use to get your Unstructured API key.

  1. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business** account, click **API Keys** on the sidebar.<br />

     For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
     for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).

  2. Click **Generate API Key**.<br />

  3. Follow the on-screen instructions to finish generating the key.<br />

  4. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

  After you create the destination connector, add it along with a
  [source connector](/api-reference/workflow/sources/overview) to a [workflow](/api-reference/workflow/overview#workflows).
  Then run the worklow as a [job](/api-reference/workflow/overview#jobs). To learn how, try out the
  the notebook [Dropbox-To-Pinecone Connector API Quickstart for Unstructured](https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Dropbox_To_Pinecone_Connector_Quickstart.ipynb),
  or watch the two 4-minute video tutorials for the [Unstructured Python SDK](/api-reference/workflow/overview#unstructured-python-sdk).

  You can also create destination connectors with the Unstructured user interface (UI).
  [Learn how](/ui/destinations/overview).

  If you need help, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  You are now ready to start creating a destination connector! Keep reading to learn how.
</Note>

Send processed data from Unstructured to IBM watsonx.data.

The requirements are as follows.

* An [IBM Cloud account](https://cloud.ibm.com/login). [Create an IBM Cloud account](https://cloud.ibm.com/registration) if you do not already have one.

* An API key for the IBM Cloud account. If you do not have one already, create one as follows:

  <iframe width="560" height="315" src="https://www.youtube.com/embed/AsV8Edq_Lko" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

  1. [Log in to your IBM Cloud account](https://cloud.ibm.com/login).
  2. In the top navigation bar, click **Manage** and then, under **Security and access**, click **Access (IAM)**.
  3. On the sidebar, under **Manage identities**, click **API keys**.
  4. With the **View** list showing **My IBM Cloud API keys**, click **Create**.
  5. Enter some **Name** and an optional **Description** for the API key.
  6. Leave **Leaked action** set to **Disable the leaked key** and **Session management** set to **No**.
  7. Click **Create**.
  8. Click **Copy** or **Download** to copy or save the API key to a secure location. You won't be able to access this API key from this screen again.

* An IBM Cloud Object Storage (COS) instance in the account, and a bucket within that instance. If you do not have them already,
  create them as follows:

  <iframe width="560" height="315" src="https://www.youtube.com/embed/C_7q1EM8w20" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

  1. [Log in to your IBM Cloud account](https://cloud.ibm.com/login).
  2. On the sidebar, click the **Resource list** icon. If the sidebar is not visible, click the **Navigation Menu** icon to the far left of the
     top navigation bar.
  3. Click **Create resource**.
  4. With **IBM Cloud catalog** selected, search for and select **Object Storage**.
  5. Complete the on-screen instructions to finish creating the COS instance.
  6. With the COS instance's settings page shown, on the **Buckets** tab, click **Create bucket**.
  7. Complete the on-screen instructions to finish creating the bucket.

* The name, region, and public endpoint for the target bucket within the target Cloud Object Storage (COS) instance. To get these:

  1. [Log in to your IBM Cloud account](https://cloud.ibm.com/login).
  2. On the sidebar, click the **Resource list** icon. If the sidebar is not visible, click the **Navigation Menu** icon to the far left of the
     top navigation bar.
  3. In the list of resources, expand **Storage**, and then click the target COS instance.
  4. On the **Buckets** tab, click the target bucket.
  5. On the **Configuration** tab, note the following:

     * Under **Bucket details**, note the **Bucket name**. This is the bucket's name.
     * Under **Bucket details** section, note the value inside of the parentheses inside **Location**, for example `us-east`. This is the bucket's region.
     * Under **Endpoints**, note the value of **Public**, for example `s3.us-east.cloud-object-storage.appdomain.cloud`. (Ignore the values of
       **Private** and **Direct**). This is the bucket's public endpoint.

* An HMAC access key ID and secret access key for the target Cloud Object Storage (COS) instance. If you do not have them already,
  get or create them as follows:

  1. [Log in to your IBM Cloud account](https://cloud.ibm.com/login).

  2. On the sidebar, click the **Resource list** icon. If the sidebar is not visible, click the **Navigation Menu** icon to the far left of the
     top navigation bar.

  3. In the list of resources, expand **Storage**, and then click the target COS instance.

  4. On the **Service credentials** tab, if there is a credential that you want to use in the list, expand the credential, and copy the following values to a secure location:

     * `access_key_id` under `cos_hmac_keys`, which represents the HMAC access key ID.
     * `secret_access_key` under `cos_hmac_keys`, which represents the HMAC secret access key.

     After you have copied the preceding values, you have completed this procedure.

  5. If there is not a credential that you want to use, or there are no credentials at all, click **New Credential**.

  6. Enter some **Name** for the credential.

  7. For **Role**, select at least **Writer**, leave **Select Service ID** set to **Auto Generated**,
     switch on **Include HMAC Credential**, and then click **Add**.

  8. In the list of credentials, expand the credential, and copy the following values to a secure location:

     * `access_key_id` under `cos_hmac_keys`, which represents the HMAC access key ID.
     * `secret_access_key` under `cos_hmac_keys`, which represents the HMAC secret access key.

* An IBM watsonx.data data store instance in the IBM Cloud account. If you do not have one already, create one as follows:

  <iframe width="560" height="315" src="https://www.youtube.com/embed/oU21hQ9TUnU" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

  1. [Log in to your IBM Cloud account](https://cloud.ibm.com/login).
  2. On the sidebar, click the **Resource list** icon. If the sidebar is not visible, click the **Navigation Menu** icon to the far left of the
     top navigation bar.
  3. Click **Create resource**.
  4. With **IBM Cloud catalog** selected, search for and select **watsonx.data**.
  5. Complete the on-screen instructions to finish creating the watsonx.data data store instance.

* An Apache Iceberg-based catalog within the watsonx.data data store instance. If you do not have one already, create one as follows:

  1. [Log in to your IBM Cloud account](https://cloud.ibm.com/login).

  2. On the sidebar, click the **Resource list** icon. If the sidebar is not visible, click the **Navigation Menu** icon to the far left of the
     top navigation bar.

  3. In the list of resources, expand **Databases**, and then click the target watsonx.data data store instance.

  4. Click **Open web console**.

  5. If prompted, log in to the web console.

  6. On the sidebar, click **Infrastructure manager**. If the sidebar is not visible, click the **Global navigation** icon to the far left of the
     top navigation bar.

  7. Click **Add component**.

  8. Under **Storage**, click **IBM Cloud Object Storage**, and then click **Next**.

  9. Complete the on-screen instructions to finish creating the Iceberg catalog. This includes providing the following settings:

     If you select **Discover COS instance**, you must provide the following settings:

     * The name of the target COS instance.
     * The name of the target storage object (such as the target bucket) within the COS instance.
     * Some display name for the component.
     * After you provide this information, do the following:

       a. Check the box labelled **Associate Catalog**.<br />
       b. For **Catalog type**, select **Apache Iceberg**.<br />
       c. Enter some **Catalog name**.<br />
       d. Click **Associate**.<br />

     If you select **Register my own**, you must provide the following settings:

     * Some display name for the component.
     * The name of the target bucket within the target Cloud Object Storage (COS) instance that you noted earlier.
     * The region for the target bucket, which you noted earlier.
     * The public endpoint for the target bucket, which you noted earlier. For this screen only, be sure to prefix the public endpoint with `https://`.
     * The HMAC access key ID for the target COS instance, which you noted earlier.
     * The HMAC secret access key for the target COS instance, which you noted earlier.
     * After you provide this information, do the following:

       a. Check the box labelled **Associate Catalog**.<br />
       b. For **Catalog type**, select **Apache Iceberg**.<br />
       c. Enter some **Catalog name**.<br />
       d. Click **Associate**.<br />

  10. On the sidebar, click **Infrastructure manager**. Make sure the catalog is associated with the appropriate engines. If it is not, rest your mouse
      on an unassociated target engine, click the **Manage associations** icon, check the box next to the target catalog's name, and then
      click **Save and restart engine**.

      To create an engine if one is not already shown, click **Add component**, and follow the on-screen to add an appropriate engine from the list of available **Engines**
      (for example, an **IBM Presto** engine).

* The catalog name and metastore REST endpoint for the target Iceberg catalog. To get this:

  1. [Log in to your IBM Cloud account](https://cloud.ibm.com/login).
  2. On the sidebar, click the **Resource list** icon. If the sidebar is not visible, click the **Navigation Menu** icon to the far left of the
     top navigation bar.
  3. In the list of resources, expand **Databases**, and then click the target watsonx.data data store instance.
  4. Click **Open web console**.
  5. If prompted, log in to the web console.
  6. On the sidebar, click **Infrastructure manager**. If the sidebar is not visible, click the **Global navigation** icon to the far left of the
     top navigation bar.
  7. In the **Catalogs** section, click the target Iceberg catalog.
  8. On the **Details** tab, note the value of **Name** representing the catalog name, and **Metastore REST endpoint** representing the metastore REST endpoint. (Ignore the **Metastore Thrift endpoint** value.)

* A namespace (also known as a schema) and a table in the target catalog. If you do not have these already, create them as follows:

  1. [Log in to your IBM Cloud account](https://cloud.ibm.com/login).
  2. On the sidebar, click the **Resource list** icon. If the sidebar is not visible, click the **Navigation Menu** icon to the far left of the
     top navigation bar.
  3. In the list of resources, expand **Databases**, and then click the target watsonx.data data store instance.
  4. Click **Open web console**.
  5. If prompted, log in to the web console.
  6. On the sidebar, click **Data manager**. If the sidebar is not visible, click the **Global navigation** icon to the far left of the
     top navigation bar.
  7. On the **Browse data** tab, under **Catalogs associated**, click the target catalog.
  8. Click the ellipses, and then click **Create schema**.
  9. Enter some **Name** for the schema, and then click **Create**.
  10. On the sidebar, click **Query workspace**.
  11. In the SQL editor, enter and run a table creation statement such as the following one that uses
      [Presto SQL](https://prestodb.io/docs/current/connector/iceberg.html) syntax, replacing `<catalog-name>` with the name of the target
      catalog and `<schema-name>` with the name of the target schema, with at minimum the fields `id`, `type`, `text`, `element_id`, and `record_id`:

      ```sql  theme={null}
      CREATE TABLE <catalog-name>.<schema-name>.elements (
         id varchar,
         type varchar,
         text varchar,
         element_id varchar,
         record_id varchar,
         <additional-fields-as-needed>
      )
      WITH (
         delete_mode = 'copy-on-write',
         format = 'PARQUET',
         format_version = '2'
      )
      ```

      For objects in the `metadata` field that Unstructured produces and that you want to store in an Apache Iceberg table in IBM watsonx.data,
      you must create fields in your table schema that follows Unstructured's `metadata` field naming convention. For example, if Unstructured
      produces a `metadata` field with the following child objects:

      ```json  theme={null}
      "metadata": {
        "is_extracted": "true",
        "coordinates": {
          "points": [
            [
              134.20055555555555,
              241.36027777777795
            ],
            [
              134.20055555555555,
              420.0269444444447
            ],
            [
              529.7005555555555,
              420.0269444444447
            ],
            [
              529.7005555555555,
              241.36027777777795
            ]
          ],
          "system": "PixelSpace",
          "layout_width": 1654,
          "layout_height": 2339
        },
        "filetype": "application/pdf",
        "languages": [
          "eng"
        ],
        "page_number": 1,
        "image_base64": "aaa...bbb...111",
        "image_mime_type": "image/jpeg",
        "filename": "realestate.pdf",
        "data_source": {
          "url": "file:///home/etl/node/downloads/00000000-0000-0000-0000-000000000001/7458635f-realestate.pdf",
          "record_locator": {
            "protocol": "file",
            "remote_file_path": "file:///home/etl/node/downloads/00000000-0000-0000-0000-000000000001/7458635f-realestate.pdf"
          }
        },
        "entities": {
          "items": [
            {
              "entity": "HOME FOR FUTURE",
              "type": "ORGANIZATION"
            },
            {
              "entity": "221 Queen Street, Melbourne VIC 3000",
              "type": "LOCATION"
            }
          ],
          "relationships": [
            {
              "from": "HOME FOR FUTURE",
              "relationship": "based_in",
              "to": "221 Queen Street, Melbourne VIC 3000"
            }
          ]
        }
      }
      ```

      You could create corresponding fields in your table schema with the following field names and data types:

      | **Field name**    | **Apache Iceberg data type**                                                                                                        |
      | ----------------- | ----------------------------------------------------------------------------------------------------------------------------------- |
      | `is_extracted`    | `varchar`                                                                                                                           |
      | `points`          | `varchar`                                                                                                                           |
      | `system`          | `varchar`                                                                                                                           |
      | `layout_width`    | `bigint`                                                                                                                            |
      | `layout_height`   | `bigint`                                                                                                                            |
      | `filetype`        | `varchar`                                                                                                                           |
      | `languages`       | `array(varchar)`                                                                                                                    |
      | `page_number`     | `varchar`                                                                                                                           |
      | `image_base64`    | `varchar`                                                                                                                           |
      | `image_mime_type` | `varchar`                                                                                                                           |
      | `filename`        | `varchar`                                                                                                                           |
      | `url`             | `varchar`                                                                                                                           |
      | `record_locator`  | `varchar`                                                                                                                           |
      | `entities`        | `row(items array(row(entity varchar, type varchar)), relationships array(row("from" varchar, relationship varchar, "to" varchar)))` |

      Incoming elements that do not have matching column
      names will be dropped upon record insertion. For example, if the incoming data has an element named `sent_from` and there is no
      column named `sent_from` in the table, the `sent_from` element will be dropped upon record insertion. You should modify the preceding
      sample table creation statement to add columns for any additional elements that you want to be included upon record
      insertion.

      To increase query performance, Iceberg uses [hidden partitioning](https://iceberg.apache.org/docs/latest/partitioning/) to
      group similar rows together when writing. You can also
      [explicitly define partitions](https://prestodb.io/docs/current/connector/iceberg.html#create-table) as part of the
      preceding `CREATE TABLE` statement.

* The name of the target namespace (also known as a schema) within the target catalog, and name of the target table within that schema. To get these:

  1. [Log in to your IBM Cloud account](https://cloud.ibm.com/login).
  2. On the sidebar, click the **Resource list** icon. If the sidebar is not visible, click the **Navigation Menu** icon to the far left of the
     top navigation bar.
  3. In the list of resources, expand **Databases**, and then click the target watsonx.data data store instance.
  4. Click **Open web console**.
  5. If prompted, log in to the web console.
  6. On the sidebar, click **Data manager**. If the sidebar is not visible, click the **Global navigation** icon to the far left of the
     top navigation bar.
  7. On the **Browse data** tab, expand the name of the target catalog, and note the names of the target schema and target table.

* The name of the column in the target table that uniquely identifies each of the records in the table.

* To improve performance, the target table should be set to regularly remove old metadata files. To do this, run the following Python script.
  (You cannot use the preceding `CREATE TABLE` statement, or other SQL statements such as `ALTER TABLE`, to set this behavior.) To get the
  values for the specified environment variables, see the preceding instructions.

  ```python  theme={null}
  # Improves performance by setting the target table to regularly remove 
  # old metadata files. 
  #
  # First, install the following dependencies into your Python virtual 
  # environment:
  # 
  # pip install requests pyiceberg pyarrow
  #
  # Then, set the following environment variables:
  #
  # IBM_IAM_API_KEY - An API key value for the target IBM Cloud account.
  # IBM_ICEBERG_CATALOG_METASTORE_REST_ENDPOINT - The metastore REST endpoint 
  #     value for the target Apache Iceberg catalog in the target IBM watsonx.data 
  #     data store instance.
  # IBM_COS_BUCKET_PUBLIC_ENDPOINT - The target IBM Cloud Object Storage (COS) 
  #     instance’s endpoint value.
  # IBM_COS_ACCESS_KEY - An HMAC access key ID for the target COS instance.
  # IBM_COS_SECRET_ACCESS_KEY - The associated HMAC secret access key ID for the 
  #     target HMAC access key.
  # IBM_COS_BUCKET_REGION - The target COS instance’s region short ID.
  # IBM_ICEBERG_CATALOG - The name of the target Iceberg catalog.
  # IBM_ICEBERG_SCHEMA - The name of the target namespace (also known as a schema) 
  #     in the target catalog.
  # IBM_ICEBERG_TABLE - The name of the target table in the target schema.
  #
  # To get these values, see the Unstructured documentation for the 
  #     IBM watsonx.data connector.

  import os
  import requests
  from pyiceberg.catalog import load_catalog

  def main():
     # Get a bearer token for the target IBM Cloud account.   
     bearer_token = requests.post(
        url="https://iam.cloud.ibm.com/identity/token",
        headers={
              "Content-Type": "application/x-www-form-urlencoded",
              "Accept": "application/json"
        },
        data={
              "grant_type": "urn:ibm:params:oauth:grant-type:apikey", 
              "apikey": os.getenv("IBM_IAM_API_KEY")
        }
     ).json().get("access_token")

     # Connect to the target Iceberg catalog.
     catalog = load_catalog(
        os.getenv("IBM_ICEBERG_CATALOG"),
        **{
              "type": "rest",
              "uri": f"https://{os.getenv("IBM_ICEBERG_CATALOG_METASTORE_REST_ENDPOINT")}/mds/iceberg",
              "token": bearer_token,
              "warehouse": os.getenv("IBM_ICEBERG_CATALOG"),
              "s3.endpoint": os.getenv("IBM_COS_BUCKET_PUBLIC_ENDPOINT"),
              "s3.access-key-id": os.getenv("IBM_COS_ACCESS_KEY"),
              "s3.secret-access-key": os.getenv("IBM_COS_SECRET_ACCESS_KEY"),
              "s3.region": os.getenv("IBM_COS_BUCKET_REGION")
        },
     )
              
     # Load the target table.
     table = catalog.load_table(f"{os.getenv("IBM_ICEBERG_SCHEMA")}.{os.getenv("IBM_ICEBERG_TABLE")}")

     # Set the target table's properties to remove old metadata files.
     with table.transaction() as transaction:
        transaction.set_properties(
              {
                 "commit.manifest.min-count-to-merge": 10,
                 "commit.manifest-merge.enabled": True,
                 "write.metadata.previous-versions-max": 10,
                 "write.metadata.delete-after-commit.enabled": True,
              }
        )

     # Confirm that the target table's properties were set as expected.
     print(table.metadata.properties)

  if __name__ == "__main__":
     main()
  ```

To create an IBM watsonx.data destination connector, see the following examples.

<CodeGroup>
  ```python Python SDK theme={null}
  import os

  from unstructured_client import UnstructuredClient
  from unstructured_client.models.operations import CreateDestinationRequest
  from unstructured_client.models.shared import CreateDestinationConnector

  with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
      response = client.destinations.create_destination(
          request=CreateDestinationRequest(
              create_destination_connector=CreateDestinationConnector(
                  name="<name>",
                  type="ibm_watsonx_s3",
                  config={
                      "iceberg_endpoint": "<iceberg-endpoint>",
                      "object_storage_endpoint": "<object-storage-endpoint>",
                      "object_storage_region": "<object-storage-region>",
                      "iam_api_key": "<iam-api-key>",
                      "access_key_id": "<access-key-id>",
                      "secret_access_key": "<secret-access-key>",
                      "catalog": "<catalog>",
                      "namespace": "<namespace>",
                      "table": "<table>",
                      "max_retries": <max-retries>,
                      "max_retries_connection": <max-retries-connection>,
                      "record_id_key": "<record-id-key>"
                  }
              )
          )
      )

      print(response.destination_connector_information)
  ```

  ```bash curl theme={null}
  curl --request 'POST' --location \
  "$UNSTRUCTURED_API_URL/destinations" \
  --header 'accept: application/json' \
  --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
  --header 'content-type: application/json' \
  --data \
  '{
      "name": "<name>",
      "type": "ibm_watsonx_s3",
      "config": {
          "iceberg_endpoint": "<iceberg-endpoint>",
          "object_storage_endpoint": "<object-storage-endpoint>",
          "object_storage_region": "<object-storage-region>",
          "iam_api_key": "<iam-api-key>",
          "access_key_id": "<access-key-id>",
          "secret_access_key": "<secret-access-key>",
          "catalog": "<catalog>",
          "namespace": "<namespace>",
          "table": "<table>",
          "max_retries": <max-retries>,
          "max_retries_connection": <max-retries-connection>,
          "record_id_key": "<record-id-key>"
      }
  }'
  ```
</CodeGroup>

Replace the preceding placeholders as follows:

* `<name>` (*required*) - A unique name for this connector.
* `<iceberg-endpoint>` (*required*): The metastore REST endpoint for the target Apache Iceberg-based catalog within the IBM watsonx.data data store instance. Do not include `https://` in this value.
* `<object-storage-endpoint>` (*required*): The public endpoint for the target bucket within the IBM Cloud Object Storage (COS) instance that is associated with the catalog. Do not include `https://` in this value.
* `<object-storage-region>` (*required*): The region short ID (such as us-east) for the bucket.
* `<iam-api-key>` (*required*): A valid API key value for the IBM Cloud account.
* `<access-key-id>` (*required*): A valid hash-based message authentication code (HMAC) access key ID for the COS instance.
* `<secret-access-key>` (*required*): The HMAC secret access key for the access key ID.
* `<catalog>` (*required*): The name of the target Apache Iceberg-based catalog within the IBM watsonx.data data store instance.
* `<namespace>` (*required*): The name of the target namespace (also known as a schema) within the catalog.
* `<table>` (*required*): The name of the target table within the namespace (schema).
* `<max-retries>`: The maximum number of retries for the upload process. Typically, an optimal setting is `150`. The default is `50`. If specified, it must be a number between `2` and `500`, inclusive.
* `<max-retries-connection>`: The maximum number of retries when connecting to the catalog. Typically, an optimal setting is `15`. The default is `10`. If specified, it must be a number between `2` and `100`, inclusive.
* `<record-id-key>`: The name of the column that uniquely identifies each record in the target table. The default is `record_id`.

## Learn more

* <Icon icon="blog" />  [Unstructured + IBM watsonx.data: A New OEM Partnership Powering the Future of Enterprise AI](https://unstructured.io/blog/unstructured-ibm-watsonx-data-a-new-oem-partnership-powering-the-future-of-enterprise-ai)

---

# Source: https://docs.unstructured.io/ui/enriching/image-descriptions.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Image descriptions

<iframe width="560" height="315" src="https://www.youtube.com/embed/pMQm9ymM3N8" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

After partitioning, you can have Unstructured generate text-based summaries of detected images.

This summarization is done by using models offered through various model providers.

Here is an example of the output of a detected image using GPT-4o. Note specifically the `text` field that is added.
In this `text` field, `type` indicates the kind of image that was detected (in this case, a `diagram`), and `description` is a summary of the image.
Line breaks have been inserted here for readability. The output will not contain these line breaks.

<img src="https://mintcdn.com/unstructured-53/8osKRiNz_JLtvBTP/img/enriching/Diagram-Example.png?fit=max&auto=format&n=8osKRiNz_JLtvBTP&q=85&s=16c8e1ac1fcbd00932c9f2b4ebfa2005" alt="Example of a diagram" data-og-width="1112" width="1112" data-og-height="842" height="842" data-path="img/enriching/Diagram-Example.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/8osKRiNz_JLtvBTP/img/enriching/Diagram-Example.png?w=280&fit=max&auto=format&n=8osKRiNz_JLtvBTP&q=85&s=7a41102ba0d5a1d7824838935e7e0a69 280w, https://mintcdn.com/unstructured-53/8osKRiNz_JLtvBTP/img/enriching/Diagram-Example.png?w=560&fit=max&auto=format&n=8osKRiNz_JLtvBTP&q=85&s=1f0ed45c549623d0f5edab66e1fa4d7e 560w, https://mintcdn.com/unstructured-53/8osKRiNz_JLtvBTP/img/enriching/Diagram-Example.png?w=840&fit=max&auto=format&n=8osKRiNz_JLtvBTP&q=85&s=06b0dab8f7ba8be41463acc962957b77 840w, https://mintcdn.com/unstructured-53/8osKRiNz_JLtvBTP/img/enriching/Diagram-Example.png?w=1100&fit=max&auto=format&n=8osKRiNz_JLtvBTP&q=85&s=0523313f67193a0169705a9737f6cfc0 1100w, https://mintcdn.com/unstructured-53/8osKRiNz_JLtvBTP/img/enriching/Diagram-Example.png?w=1650&fit=max&auto=format&n=8osKRiNz_JLtvBTP&q=85&s=9cc970b99319944f0bf87e370080e9e9 1650w, https://mintcdn.com/unstructured-53/8osKRiNz_JLtvBTP/img/enriching/Diagram-Example.png?w=2500&fit=max&auto=format&n=8osKRiNz_JLtvBTP&q=85&s=e4d21ad523642bb60aac78be8b98e89d 2500w" />

```json  theme={null}
{
  "type": "Image",
  "element_id": "dd1fb72db7937725c9a781906098e6f8",
  "text": "{\n    
    \"type\": \"diagram\",\n    
    \"description\": \"User uploads a flowchart image via a Web Browser, which is then 
      converted to a Base64 Encoded Image. This image is sent to the Back-end System 
      (Node.js) where it is processed by the AI Model Adapter. The output undergoes 
      Validation and Rendering, resulting in Normalized Mermaid Code. AI Assisted 
      Editing is available through an AI Assistant, which allows for the Regenerated 
      Flowchart Image to be viewed again in the Web Browser.\\n\\n
      Text in the image:\\n
        - User\\n
        - Upload flowchart image\\n
        - Web Browser\\n
        - Base64 Encoded Image\\n
        - Back-end System (Node.js)\\n
        - AI Model Adapter\\n
        - Validation and Rendering\\n
        - Normalized Mermaid Code\\n
        - AI Assisted Editing\\n
        - AI Assistant\\n
        - Regenerated Flowchart Image\"\n
  }",
  "metadata": {
    "filetype": "application/pdf",
    "languages": [
      "eng"
    ],
    "page_number": 1,
    "image_base64": "/9j...<full results omitted for brevity>...Q==",
    "image_mime_type": "image/jpeg",
    "filename": "7f239e1d4ef3556cc867a4bd321bbc41.pdf",
    "data_source": {}
  }
}
```

For technical drawings, the `text` field will contain a `type` of `technical drawing`; `description` with `texts` containing text strings found in the drawing,
`tables` containing HTML representations of tables found in the drawing, and a `description` containing a summary of the drawing.
Here is an example. Line breaks have been inserted here for readability. The output will not contain these line breaks.

<img src="https://mintcdn.com/unstructured-53/8osKRiNz_JLtvBTP/img/enriching/Technical-Drawing-Example.png?fit=max&auto=format&n=8osKRiNz_JLtvBTP&q=85&s=e4d3fb249208b3166b0e79a9d54b4cdf" alt="Example of a technical drawing" data-og-width="1468" width="1468" data-og-height="782" height="782" data-path="img/enriching/Technical-Drawing-Example.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/8osKRiNz_JLtvBTP/img/enriching/Technical-Drawing-Example.png?w=280&fit=max&auto=format&n=8osKRiNz_JLtvBTP&q=85&s=1dc71713c55e51adc75a04a4cb2f379e 280w, https://mintcdn.com/unstructured-53/8osKRiNz_JLtvBTP/img/enriching/Technical-Drawing-Example.png?w=560&fit=max&auto=format&n=8osKRiNz_JLtvBTP&q=85&s=45d530c526f19e4a297853089bccbc9f 560w, https://mintcdn.com/unstructured-53/8osKRiNz_JLtvBTP/img/enriching/Technical-Drawing-Example.png?w=840&fit=max&auto=format&n=8osKRiNz_JLtvBTP&q=85&s=1f7b15ac7c1927436daa9232be2a8944 840w, https://mintcdn.com/unstructured-53/8osKRiNz_JLtvBTP/img/enriching/Technical-Drawing-Example.png?w=1100&fit=max&auto=format&n=8osKRiNz_JLtvBTP&q=85&s=f610544e6d84ce84e88402549b66eca8 1100w, https://mintcdn.com/unstructured-53/8osKRiNz_JLtvBTP/img/enriching/Technical-Drawing-Example.png?w=1650&fit=max&auto=format&n=8osKRiNz_JLtvBTP&q=85&s=3137b95737ca36bf07f989c3f95b0cd7 1650w, https://mintcdn.com/unstructured-53/8osKRiNz_JLtvBTP/img/enriching/Technical-Drawing-Example.png?w=2500&fit=max&auto=format&n=8osKRiNz_JLtvBTP&q=85&s=53d8f2bd14dfff5c2c95dbd56994f7be 2500w" />

```json  theme={null}
{
  "type": "Image",
  "element_id": "7877acdd762f2afc65b193fa89d8ef46",
  "text": "{\n  
    \"type\": \"technical drawing\",\n  
    \"description\": {\n    
      \"texts\": [\n
        \"RTD 1\",\n      
        \"RTD 2\",\n      
        \"01\",\n      
        \"18.50\\\" Cable Length\",\n      
        \"02\",\n      
        \"1/4\\\" Heat Shrink\",\n      
        \"6X Strip wires 0.100\\\" - 0.115\\\" before crimping\",\n      
        \"2X 1.50\",\n      
        \"22.25\\\" Cable Length\"\n    
      ],\n    
      \"tables\": "<table>
        <thead>
          <tr>
            <th>Item</th>
            <th>Quantity</th>
            <th>Part Number</th>
            <th>Description</th>
            <th>Supplier</th>
            <th>Supplier PN</th>
          </tr>
        </thead>
        <tbody>
          <tr>
            <td>1</td>
            <td>6</td>
            <td>002622</td>
            <td>Conn Socket 20-24AWG Gold</td>
            <td>Digikey</td>
            <td>WM7082CT-ND</td>
          </tr>
          <tr>
            <td>2</td>
            <td>1</td>
            <td>002647</td>
            <td>Conn Recept 16pos 3mm Dual Row</td>
            <td>Digikey</td>
            <td>WM2490-ND</td>
          </tr>
          <tr>
              <td>3</td>
              <td>2</td>
              <td>102961-01</td>
              <td>M12 Q/D Cable, Elbow, 4-Pole, 5m</td>
              <td>Automation Direct</td>
              <td>EVT222</td>
          </tr>
        </tbody>
      </table>",\n    
      \"description\": \"The technical drawing depicts a wiring setup involving two 
          RTDs (Resistance Temperature Detectors) labeled RTD 1 and RTD 2. Each RTD 
          is connected via cables with specified lengths: RTD 1 has an 18.50-inch 
          cable length, and RTD 2 has a 22.25-inch cable length. The drawing 
          includes annotations for stripping wires, indicating that six wires should 
          be stripped to a length between 0.100 inches and 0.115 inches before 
          crimping. There is a section labeled '1/4\\\" Heat Shrink' and a dimension 
          marked '2X 1.50'. The drawing uses numbered circles to reference specific 
          parts or steps in the process.\"\n  
      }\n
  }",
  "metadata": {
    "filetype": "application/pdf",
    "languages": [
      "eng"
    ],
    "page_number": 1,
    "image_base64": "/9j...<full results omitted for brevity>...Q==",
    "image_mime_type": "image/jpeg",
    "filename": "Material-Callouts-c4655c0c.pDF",
    "data_source": {}
  }
}
```

<Note>
  The `image_base64` field is generated only for documents or PDF pages that are [partitioned](/ui/partitioning) by using the High Res strategy. This field is not generated for
  documents or PDF pages that are partitioned by using the Fast or VLM strategy.
</Note>

For workflows that use [chunking](/ui/chunking), note the following changes:

* Each `Image` element is replaced by a `CompositeElement` element.
* This `CompositeElement` element will contain the image's summary description as part of the element's `text` field.
* This `CompositeElement` element will not contain an `image_base64` field.

Here are three examples of the descriptions for detected images. These descriptions are generated with GPT-4o by OpenAI:

<img src="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/enriching/Image-Description-1.png?fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=8a0b06395070f17ab506d0fcaf16215a" alt="Description of an image showing a scatter plot graph" data-og-width="2970" width="2970" data-og-height="458" height="458" data-path="img/enriching/Image-Description-1.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/enriching/Image-Description-1.png?w=280&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=2259f59c869211cbbb5e6769d6f43962 280w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/enriching/Image-Description-1.png?w=560&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=88d07f255d81c05c441e83c5dc772c2a 560w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/enriching/Image-Description-1.png?w=840&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=8f5ebefe720961bebbf7349ff73a0f69 840w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/enriching/Image-Description-1.png?w=1100&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=71f10d3cf8948e9a3a546149f33c1277 1100w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/enriching/Image-Description-1.png?w=1650&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=853e08bbe6845220c8dc0381e4c0f15b 1650w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/enriching/Image-Description-1.png?w=2500&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=dfda06da7c7701d66863fd9ab25bcd6e 2500w" />

<img src="https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/enriching/Image-Description-2.png?fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=267621de3b05f57c3c028dedb956d627" alt="Description of an image showing the Matthews Correlation Coefficient for different VQA datasets" data-og-width="2970" width="2970" data-og-height="442" height="442" data-path="img/enriching/Image-Description-2.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/enriching/Image-Description-2.png?w=280&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=3e7007067f76ff4b8d07b336fe49a7ac 280w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/enriching/Image-Description-2.png?w=560&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=262bb14e2df8f968662f2e0e3f370d54 560w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/enriching/Image-Description-2.png?w=840&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=ba99980d6cbd1d3d7170fb5c136968e1 840w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/enriching/Image-Description-2.png?w=1100&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=da7930915a560295d588f0726e19a517 1100w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/enriching/Image-Description-2.png?w=1650&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=93d92082e3589c83085dfc988e1aed28 1650w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/enriching/Image-Description-2.png?w=2500&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=50a968b980b7b0ef45c6170b8f979c2d 2500w" />

<img src="https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/enriching/Image-Description-3.png?fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=c79e8f0ac5f35e1712ae4518d0b5636e" alt="Description of an image showing three scatter plots" data-og-width="2982" width="2982" data-og-height="374" height="374" data-path="img/enriching/Image-Description-3.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/enriching/Image-Description-3.png?w=280&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=006c177dfac538612bb3d580e22d446f 280w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/enriching/Image-Description-3.png?w=560&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=60d9df431bcf556b432314608f249979 560w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/enriching/Image-Description-3.png?w=840&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=266a91b667d5a82755b7f3a29355ac14 840w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/enriching/Image-Description-3.png?w=1100&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=95b6bf5e8e1edd48065b6aa22c9415ed 1100w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/enriching/Image-Description-3.png?w=1650&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=804f527e7d42487fd5390add70483c7a 1650w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/enriching/Image-Description-3.png?w=2500&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=e94315d34545a4be2936bd6b107fbb3e 2500w" />

Any embeddings that are produced after these summaries are generated will be based on the `text` field's contents.

## Generate image descriptions

To generate image descriptions, in an **Enrichment** node in a workflow, select **Image**, and then choose one of the available provider (and model) combinations that are shown.

<Note>
  You can change a workflow's image description settings only through [Custom](/ui/workflows#create-a-custom-workflow) workflow settings.

  For workflows that use [chunking](/ui/chunking), the **Chunker** node should be placed after all **Enrichment** nodes. Placing the
  **Chunker** node before an image descriptions **Enrichment** node could cause incomplete or no image descriptions to be generated.
</Note>

<Warning>
  The following models are no longer available as of the following dates:

  * Amazon Bedrock Claude Sonnet 3.5: October 22, 2025
  * Anthropic Claude Sonnet 3.5: October 22, 2025

  Unstructured recommends the following actions:

  * For new workflows, do not use any of these models.
  * For any workflow that uses any of these models, update that workflow as soon as possible to use a different model.

  Workflows that attempt to use any of these models on or after its associated date will return errors.
</Warning>

<Warning>
  Unstructured can potentially generate image summary descriptions only for workflows that are configured as follows:

  * With a **Partitioner** node set to use the **Auto** or **High Res** partitioning strategy, and an image summary description node is added.
  * With a **Partitioner** node set to use the **VLM** partitioning strategy. No image summary description node is needed (or allowed).

  Even with these configurations, Unstructured actually generates image summary descriptions only for files that contain images and are also eligible
  for processing with the following partitioning strategies:

  * **High Res**, when the workflow's **Partitioner** node is set to use **Auto** or **High Res**.
  * **VLM** or **High Res**, when the workflow's **Partitioner** node is set to use **VLM**.

  Unstructured never generates image summary descriptions for workflows that are configured as follows:

  * With a **Partitioner** node set to use the **Fast** partitioning strategy.
  * With a **Partitioner** node set to use the **Auto**, **High Res**, or **VLM** partitioning strategy, for all files that Unstructured encounters that do not contain images.
</Warning>

---

# Source: https://docs.unstructured.io/open-source/ingestion/ingest-cli.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Process files in batches by using the Unstructured Ingest CLI

The Unstructured Ingest CLI enables you to use command-line scripts to send files in batches to Unstructured for processing, and to tell Unstructured where to deliver the processed data. [Learn more](/open-source/ingestion/overview#unstructured-ingest-cli).

<Note>
  The Unstructured Ingest CLI does not work with the Unstructured API.

  For information about the Unstructured API, see the [Unstructured API Overview](/api-reference/workflow/overview).
</Note>

## Getting started

You can use the Unstructured Ingest CLI to process files locally, or you can use the Ingest CLI to send files in batches to Unstructured for processing.

Local processing does not use an Unstructured API key or API URL.

Using the Ingest CLI to send files in batches to Unstructured for processing is more robust, and usage is billed to you on a pay-as-you-go basis. Usage requires an Unstructured API key and API URL, as follows:

1. If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
   After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).

   <Note>
     To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/api-reference/overview#pricing).
   </Note>

2. If you have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at [https://platform.unstructured.io](https://platform.unstructured.io).

   <Note>
     For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
     or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).
   </Note>

3. Get your Unstructured API key:<br />

   a. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business** account, click **API Keys** on the sidebar.<br />

   <Note>
     For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
     for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).
   </Note>

   b. Click **Generate API Key**.<br />
   c. Follow the on-screen instructions to finish generating the key.<br />
   d. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

3) The Unstructured API URL for Unstructured Ingest was provided to you when your Unstructured account was created.
   If you do not have this URL, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

<Note>
  The default URL for Unstructured Ingest is the same as the default URL for the legacy Unstructured Partition Endpoint: `https://api.unstructuredapp.io/general/v0/general`.
  However, you should always use the URL that was provided to you when your Unstructured account was created.
</Note>

## Installation

One approach to get started quickly with the Unstructured Ingest CLI is to install Python and then run the following command:

```bash  theme={null}
pip install unstructured-ingest
```

This default installation option enables the ingestion of plain text files, HTML, XML, JSON and emails that do not require any extra dependencies. This default option also enables you to specify local source and destination locations.

You might also need to install additional dependencies, depending on your needs. [Learn more](/open-source/ingestion/ingest-dependencies).

For additional installation options, see [Unstructured Ingest CLI](/open-source/ingestion/overview#unstructured-ingest-cli) in the [Ingest](/open-source/ingestion/overview) section.

<Info>To migrate from older, deprecated versions of the Ingest CLI that used `pip install unstructured`, see the [migration guide](/open-source/ingestion/overview#migration-guide).</Info>

## Usage

To call the Unstructured Ingest CLI, follow this calling pattern, where:

* `<source>` is the command name for one of the available [source](/open-source/ingestion/source-connectors/overview) (input) connectors, such as `local` for a local source location, `azure` for an Azure Storage account source, `s3` for an Amazon S3 bucket source, and so on.
* `<destination>` is the command name for one of the available [destination](/open-source/ingestion/destination-connectors/overview) (output) connectors, such as `local` for a local destination, `azure` for an Azure Storage account destination, `s3` for an Amazon S3 bucket destination, and so on.
* `<setting>` is one or more command-line options for specifying how and where Unstructured will ingest the files from the `<source>`, or how and where to deliver the processed data to the `<destination>`.

```bash CLI theme={null}
#!/usr/bin/env bash

unstructured-ingest \
  <source> \
    --<setting1> <value1> \
    --<setting2> <value2> \
    --<settingN> <valueN> \
  <destination> \
    --<setting1> <value1> \
    --<setting2> <value2> \
    --<settingN> <valueN>
```

To learn how to use the Unstructured Ingest CLI to work with a specific source (input) and destination (output) location, see the CLI script examples for the [source](/open-source/ingestion/source-connectors/overview) and [destination](/open-source/ingestion/destination-connectors/overview) connectors that are available for you to choose from.

See also the [ingest configuration](/open-source/ingestion/ingest-configuration/overview) settings for command-line options that enable you to further control how batches are sent and processed.

---

# Source: https://docs.unstructured.io/open-source/ingestion/ingest-dependencies.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Ingest dependencies

When you install the [Unstructured Ingest CLI](/open-source/ingestion/ingest-cli) and the
[Unstructured Ingest Python library](/open-source/ingestion/python-ingest) by running the command
`pip install unstructured-ingest` by itself, you get the following by default:

* The [local source connector](/open-source/ingestion/source-connectors/local) and the [local destination connector](/open-source/ingestion/destination-connectors/local).
* Support for the following file types:

| File type |
| --------- |
| `.bmp`    |
| `.eml`    |
| `.heic`   |
| `.html`   |
| `.jpg`    |
| `.jpeg`   |
| `.tiff`   |
| `.png`    |
| `.txt`    |
| `.xml`    |

To add support for additional file types, run the following:

| Command                                   | File type |
| ----------------------------------------- | --------- |
| `pip install "unstructured-ingest[csv]"`  | `.csv`    |
| `pip install "unstructured-ingest[doc]"`  | `.doc`    |
| `pip install "unstructured-ingest[docx]"` | `.docx`   |
| `pip install "unstructured-ingest[epub]"` | `.epub`   |
| `pip install "unstructured-ingest[md]"`   | `.md`     |
| `pip install "unstructured-ingest[msg]"`  | `.msg`    |
| `pip install "unstructured-ingest[odt]"`  | `.odt`    |
| `pip install "unstructured-ingest[org]"`  | `.org`    |
| `pip install "unstructured-ingest[pdf]"`  | `.pdf`    |
| `pip install "unstructured-ingest[ppt]"`  | `.ppt`    |
| `pip install "unstructured-ingest[pptx]"` | `.pptx`   |
| `pip install "unstructured-ingest[rtf]"`  | `.rtf`    |
| `pip install "unstructured-ingest[rst]"`  | `.rst`    |
| `pip install "unstructured-ingest[tsv]"`  | `.tsv`    |
| `pip install "unstructured-ingest[xlsx]"` | `.xlsx`   |

To add support for additional connectors, run the following:

| Command                                                 | Connector type       |
| ------------------------------------------------------- | -------------------- |
| `pip install "unstructured-ingest[airtable]"`           | Airtable             |
| `pip install "unstructured-ingest[astra]"`              | Astra DB             |
| `pip install "unstructured-ingest[azure]"`              | Azure Blob Storage   |
| `pip install "unstructured-ingest[azure-ai-search]"`    | Azure AI Search      |
| `pip install "unstructured-ingest[box]"`                | Box                  |
| `pip install "unstructured-ingest[chroma]"`             | Chroma               |
| `pip install "unstructured-ingest[confluence]"`         | Confluence           |
| `pip install "unstructured-ingest[couchbase]"`          | Couchbase            |
| `pip install "unstructured-ingest[databricks-volumes]"` | Databricks Volumes   |
| `pip install "unstructured-ingest[delta-table]"`        | Delta Tables         |
| `pip install "unstructured-ingest[dropbox]"`            | Dropbox              |
| `pip install "unstructured-ingest[dropbox]"`            | DuckDB, MotherDuck   |
| `pip install "unstructured-ingest[elasticsearch]"`      | Elasticsearch        |
| `pip install "unstructured-ingest[gcs]"`                | Google Cloud Storage |
| `pip install "unstructured-ingest[github]"`             | GitHub               |
| `pip install "unstructured-ingest[gitlab]"`             | GitLab               |
| `pip install "unstructured-ingest[google-drive]"`       | Google Drive         |
| `pip install "unstructured-ingest[jira]"`               | JIRA                 |
| `pip install "unstructured-ingest[kafka]"`              | Apache Kafka         |
| `pip install "unstructured-ingest[milvus]"`             | Milvus               |
| `pip install "unstructured-ingest[mongodb]"`            | MongoDB              |
| `pip install "unstructured-ingest[neo4j]"`              | Neo4j                |
| `pip install "unstructured-ingest[notion]"`             | Notion               |
| `pip install "unstructured-ingest[onedrive]"`           | OneDrive             |
| `pip install "unstructured-ingest[opensearch]"`         | OpenSearch           |
| `pip install "unstructured-ingest[outlook]"`            | Outlook              |
| `pip install "unstructured-ingest[pinecone]"`           | Pinecone             |
| `pip install "unstructured-ingest[postgres]"`           | PostgreSQL, SQLite   |
| `pip install "unstructured-ingest[qdrant]"`             | Qdrant               |
| `pip install "unstructured-ingest[redis]"`              | Redis                |
| `pip install "unstructured-ingest[s3]"`                 | Amazon S3            |
| `pip install "unstructured-ingest[sharepoint]"`         | SharePoint           |
| `pip install "unstructured-ingest[salesforce]"`         | Salesforce           |
| `pip install "unstructured-ingest[singlestore]"`        | SingleStore          |
| `pip install "unstructured-ingest[snowflake]"`          | Snowflake            |
| `pip install "unstructured-ingest[sftp]"`               | SFTP                 |
| `pip install "unstructured-ingest[slack]"`              | Slack                |
| `pip install "unstructured-ingest[teradata]"`           | Teradata             |
| `pip install "unstructured-ingest[vectara]"`            | Vectara              |
| `pip install "unstructured-ingest[weaviate]"`           | Weaviate             |

To add support for available embedding libraries, run the following:

| Command                                                 | Embedding library type |
| ------------------------------------------------------- | ---------------------- |
| `pip install "unstructured-ingest[bedrock]"`            | Amazon Bedrock         |
| `pip install "unstructured-ingest[embed-huggingface]"`  | Hugging Face           |
| `pip install "unstructured-ingest[embed-octoai]"`       | OctoAI                 |
| `pip install "unstructured-ingest[embed-vertexai]"`     | Google Vertex AI       |
| `pip install "unstructured-ingest[embed-voyageai]"`     | Voyage AI              |
| `pip install "unstructured-ingest[embed-mixedbreadai]"` | Mixedbread             |
| `pip install "unstructured-ingest[openai]"`             | OpenAI, Azure OpenAI   |
| `pip install "unstructured-ingest[togetherai]"`         | together.ai            |

For details about the specific dependencies that are installed, see:

* [Common dependencies](https://github.com/Unstructured-IO/unstructured-ingest/tree/main/requirements/common)
* [File type dependencies](https://github.com/Unstructured-IO/unstructured-ingest/tree/main/requirements/local_partition)
* [Connector dependencies](https://github.com/Unstructured-IO/unstructured-ingest/tree/main/requirements/connectors)
* [Embedding library dependencies](https://github.com/Unstructured-IO/unstructured-ingest/tree/main/requirements/embed)

See also [setup.py](https://github.com/Unstructured-IO/unstructured-ingest/blob/main/setup.py).

---

# Source: https://docs.unstructured.io/open-source/integrations.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Integrations

> Integrate your model development pipeline with your favorite machine learning frameworks and libraries, and prepare your data for ingestion into downstream systems. Most of our integrations come in the form of [staging functions](/open-source/core-functionality/staging), which take a list of `Element` objects as input and return formatted dictionaries as output.

## `Integration with Argilla`

You can convert a list of `Text` elements to an [Argilla](https://www.argilla.io/) `Dataset` using the [stage\_for\_argilla](/open-source/core-functionality/staging#stage-for-argilla) staging function. Specify the type of dataset to be generated using the `argilla_task` parameter. Valid values are `"text_classification"`, `"token_classification"`, and `"text2text"`. Follow the link for more details on usage.

## `Integration with Baseplate`

[Baseplate](https://docs.baseplate.ai/introduction) is a backend optimized for use with LLMs that has an easy to use spreadsheet interface. The `unstructured` library offers a staging function to convert a list of `Element` objects into the [rows format](https://docs.baseplate.ai/api-reference/documents/overview) required by the Baseplate API. See the [stage\_for\_baseplate](/open-source/core-functionality/staging#stage-for-baseplate) documentation for information on how to stage elements for ingestion into Baseplate.

## `Integration with Datasaur`

You can format a list of `Text` elements as input to token based tasks in [Datasaur](https://datasaur.ai/) using the [stage\_for\_datasaur](/open-source/core-functionality/staging#stage-for-datasaur) staging function. You will obtain a list of dictionaries indexed by the keys `"text"` with the content of the element, and `"entities"` with an empty list. Follow the link to learn how to customise your entities and for more details on usage.

## `Integration with Hugging Face`

You can prepare `Text` elements for processing in Hugging Face [Transformers](https://huggingface.co/docs/transformers/index) pipelines by splitting the elements into chunks that fit into the model’s attention window using the [stage\_for\_transformers](/open-source/core-functionality/staging#stage-for-transformers) staging function. You can customise the transformation by defining the `buffer` and `window_size`, the `split_function` and the `chunk_separator`. if you need to operate on text directly instead of `unstructured` `Text` objects, use the [chunk\_by\_attention\_window](/open-source/core-functionality/staging#stage-for-transformers) helper function. Follow the links for more details on usage.

## `Integration with Labelbox`

You can format your outputs for use with [LabelBox](https://labelbox.com/) using the [stage\_for\_label\_box](/open-source/core-functionality/staging#stage-for-label-box) staging function. LabelBox accepts cloud-hosted data and does not support importing text directly. With this integration you can stage the data files in the `output_directory` to be uploaded to a cloud storage service (such as S3 buckets) and get a config of type `List[Dict[str, Any]]` that can be written to a `.json` file and imported into LabelBox. Follow the link to see how to generate the `config.json` file that can be used with LabelBox, how to upload the staged data files to an S3 bucket, and for more details on usage.

## `Integration with Label Studio`

You can format your outputs for upload to [Label Studio](https://labelstud.io/) using the [stage\_for\_label\_studio](/open-source/core-functionality/staging#stage-for-label-studio) staging function. After running `stage_for_label_studio`, you can write the results to a JSON folder that is ready to be included in a new Label Studio project. You can also include pre-annotations and predictions as part of your upload.

See the [Label Studio docs](https://labelstud.io/tags/labels.html) for a full list of options for labels and annotations.

## `Integration with LangChain`

Our integration with [LangChain](https://github.com/hwchase17/langchain) makes it incredibly easy to combine language models with your data, no matter what form it is in. The [Unstructured.io File Loader](https://python.langchain.com/en/latest/modules/indexes/document_loaders/examples/unstructured_file.html) extracts the text from a variety of unstructured text files using our `unstructured` library. It is designed to be used as a way to load data into LangChain. Here is the simplest way to use the `UnstructuredFileLoader` in `langchain`.

```python  theme={null}
from langchain.document_loaders import UnstructuredFileLoader

loader = UnstructuredFileLoader("state_of_the_union.txt")
loader.load()

```

Checkout the [LangChain docs](https://python.langchain.com/en/latest/modules/indexes/document_loaders.html) for more examples about how to use Unstructured data loaders.

## `Integration with LlamaIndex`

To use `Unstructured.io File Loader` you will need to have [LlamaIndex](https://github.com/jerryjliu/llama_index) 🦙 (GPT Index) installed in your environment. Just `pip install llama-index` and then pass in a `Path` to a local file. Optionally, you may specify split\_documents if you want each element generated by `unstructured` to be placed in a separate document. Here is a simple example of how to use it:

```python  theme={null}
from pathlib import Path
from llama_index import download_loader


UnstructuredReader = download_loader("UnstructuredReader")

loader = UnstructuredReader()
documents = loader.load_data(file=Path('./10k_filing.html'))

```

See [here](https://llamahub.ai/) for more LlamaHub examples.

## `Integration with Pandas`

You can convert a list of `Element` objects to a Pandas dataframe with columns for the text from each element and their types such as `NarrativeText` or `Title` using the [convert\_to\_dataframe](/open-source/core-functionality/staging#convert-to-dataframe) staging function. Follow the link for more details on usage.

## `Integration with Prodigy`

You can format your JSON or CSV outputs for use with [Prodigy](https://prodi.gy/docs/api-loaders) using the [stage\_for\_prodigy](/open-source/core-functionality/staging#stage-for-prodigy) and [stage\_csv\_for\_prodigy](/open-source/core-functionality/staging#stage-csv-for-prodigy) staging functions. After running `stage_for_prodigy` | `stage_csv_for_prodigy`, you can write the results to a `.json` | `.jsonl` or a `.csv` file that is ready to be used with Prodigy. Follow the links for more details on usage.

## `Integration with Weaviate`

[Weaviate](https://weaviate.io/) is an open-source vector database that allows you to store data objects and vector embeddings from a variety of ML models. Storing text and embeddings in a vector database such as Weaviate is a key component of the [emerging LLM tech stack](https://medium.com/@unstructured-io/llms-and-the-emerging-ml-tech-stack-bdb189c8be5c). See the [stage\_for\_weaviate](open-source/core-functionality/staging#stage-for-weaviate) docs for details on how to upload `unstructured` outputs to Weaviate. An example notebook is also available [here](https://github.com/Unstructured-IO/unstructured/tree/main/examples/weaviate).

---

# Source: https://docs.unstructured.io/support/issues/internal-file-handling.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Internal issues

## Issues

Unstructured returns one of the following error types:

* `NotFoundException`: For example, an error retrieving intermediate file data from internal storage, with a pattern such as `/f8c6.../element_dicts/...json`.
* `FileNotFoundError`: For example, an error with internal system paths that use a pattern such as `/home/etl/node/staged/...` or `/home/etl/node/downloads/...`.
* `ValueError`: For example, a `Failed to decrypt secret` error.
* `AttributeError`: For example, a `'TimeoutError' object has no attribute 'status_code'` error.
* `asyncio.exceptions.CancelledError`

## Possible causes

* There is an error in a previous workflow or ingestion pipeline stage within Unstructured's environment that is preventing intermediate files from being created correctly.
* There are file system or storage issues within Unstructured's environment.
* There are problems with Unstructured's internal management of secret keys or configurations.
* There are issues with Unstructured's internal error handling or service lifecycle management.
* There are some other general issues with Unstructured's internal workings, often related to managing temporary files or internal state.

## Possible solutions

* **Retry the job**: Retry the entire job, as the issue might have been temporary.
* **Contact Unstructured Support**: These issues are generally fixable only by Unstructured. These issues typically indicate internal issues that customers cannot fix themselves.
  Be sure when you [contact Unstructured Support](/support/request) to include the full error message, timestamp, and job details.

## Additional resources

To ask questions or get additional help with this issue, see [requesting support](/support/request).

---

# Source: https://docs.unstructured.io/support/how-to/invoice-issue.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Raise an invoice issue or request a refund

To raise an issue about an unpaid invoice that Unstructured has sent you, or to request a refund for an invoice that you have already paid, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

In your email, include the following information:

* The email address of the primary contact for your Unstructured account.
* The related invoice number.
* The monetary amount of the unpaid invoice that you are disputing, or the monetary amount of the already-paid invoice for which you are requesting a refund.
* The specific reasons why you are disputing the charges or requesting a refund.
* Any supporting details, documentation, or other information that helps support your claim.

After Unstructured Support receives your request, within one business day, they will respond with an email to the primary contact, acknowledging your
dispute or refund request. If you have not received this acknowledgement email within 3 business days, you can email Unstructured Support at
[support@unstructured.io](mailto:support@unstructured.io) to check on its status.

The timing and details for resolving the dispute or refund request will depend on the nature of the issue. If you have any follow-up questions about the issue
or just want to check on its current status, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

---

# Source: https://docs.unstructured.io/ui/sources/jira.md

# Source: https://docs.unstructured.io/open-source/ingestion/source-connectors/jira.md

# Source: https://docs.unstructured.io/api-reference/workflow/sources/jira.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Jira

<Note>
  If you're new to Unstructured, read this note first.

  Before you can create a source connector, you must first sign in to your Unstructured account:

  * If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
    After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).
    To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/api-reference/overview#pricing).
  * If you already have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at
    [https://platform.unstructured.io](https://platform.unstructured.io). For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
    or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  After you sign in, the [Unstructured user interface](/ui/overview) (UI) appears, which you use to get your Unstructured API key, as follows:

  1. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business** account, click **API Keys** on the sidebar.<br />

     For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
     for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).

  2. Click **Generate API Key**.<br />

  3. Follow the on-screen instructions to finish generating the key.<br />

  4. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

  After you create the source connector, add it along with a
  [destination connector](/api-reference/workflow/destinations/overview) to a [workflow](/api-reference/workflow/overview#workflows).
  Then run the worklow as a [job](/api-reference/workflow/overview#jobs). To learn how, try out the
  the notebook [Dropbox-To-Pinecone Connector API Quickstart for Unstructured](https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Dropbox_To_Pinecone_Connector_Quickstart.ipynb),
  or watch the two 4-minute video tutorials for the [Unstructured Python SDK](/api-reference/workflow/overview#unstructured-python-sdk).

  You can also create source connectors with the Unstructured user interface (UI).
  [Learn how](/ui/sources/overview).

  If you need help, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  You are now ready to start creating a source connector! Keep reading to learn how.
</Note>

Ingest your files into Unstructured from Jira.

The requirements are as follows.

<iframe width="560" height="315" src="https://www.youtube.com/embed/JR5mcukOSUQ" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* A [Jira Cloud account](https://www.atlassian.com/try/cloud/signup?bundle=jira-software\&edition=free) or
  [Jira Data Center installation](https://confluence.atlassian.com/adminjiraserver/installing-jira-data-center-938846870.html).
* The site URL for your [Jira Data Center installation](https://confluence.atlassian.com/jirakb/find-your-site-url-to-set-up-the-jira-data-center-and-server-mobile-app-954244798.html) or Jira Cloud account.
  For Jira Cloud, open Jira in your web browser and copy the address from the browser's address bar.
  If you're unsure, check the dashboard URL, or if viewing an issue, project or board, the site URL is typically everything that comes before **but not including** `/jira`, such as
  `https://<organization>.atlassian.net`.
* To process Jira projects, provide the IDs for the target projects. To get a project's ID, sign in to your Jira Cloud account or Jira Data Center installation, and then go to the following URL: `https://<organization>.atlassian.net/rest/api/latest/project/<project-key>`,
  replacing `<organization>` with yours, and replacing `<project-key>` with the target project's key. In the
  response, look for the URL `https://<organization>.atlassian.net/rest/api/3/project/<project-id>`, where `<project-id>` is the target project's ID.
* To process Jira boards, the IDs for the target boards. To get a board's ID, sign in to your Jira Cloud account or Jira Data Center installation, and then go to the following URL: `https://<organization>.atlassian.net/rest/agile/1.0/board?projectKeyOrId=<project-key-or-id>`,
  replacing `<organization>` with yours, and `<project-key-or-id>` with the associated project's key or ID. In the
  response, look for the URL `https://<organization>.atlassian.net/rest/agile/1.0/board/<board-id>`, where `<board-id>` is the board's ID.
* To process Jira issues, the IDs for the target issues. To get an issue's ID, sign in to your Jia Cloud account or Jira Data Center installation, open the issue, and then look at the URL in your browser's address bar. The issue ID is the string of characters after the final slash in the URL.
* A user in your [Jira Cloud account](https://support.atlassian.com/jira-cloud-administration/docs/manage-users-groups-permissions-and-roles-in-jira-cloud/) or
  [Jira Data Center installation](https://confluence.atlassian.com/adminjiraserver/create-edit-or-remove-a-user-938847025.html).
* The user must have the correct permissions in your
  [Jira Cloud account](https://support.atlassian.com/jira-cloud-administration/docs/manage-users-groups-permissions-and-roles-in-jira-cloud/) or
  [Jira Data Center installation](https://confluence.atlassian.com/jirakb/permissions-made-simple-for-jira-server-717062767.html) to
  access the target projects, boards, and issues.
* One of the following:

  * For Jira Cloud or Jira Data Center, the target user's name or email address, and password.
    [Change a Jira Cloud user's password](https://support.atlassian.com/user-management/docs/change-password-for-portal-only-customers/).
    [Change a Jira Data Center user's password](https://confluence.atlassian.com/adminjiraserver/create-edit-or-remove-a-user-938847025.html).
  * For Jira Cloud only, the target user's name or email address, and API token.
    [Create an API token](https://support.atlassian.com/atlassian-account/docs/manage-api-tokens-for-your-atlassian-account/).
  * For Jira Data Center only, the target user's personal access token (PAT).
    [Create a PAT](https://confluence.atlassian.com/enterprise/using-personal-access-tokens-1026032365.html).

To create a Jira source connector, see the following examples.

<CodeGroup>
  ```python Python SDK theme={null}
  import os

  from unstructured_client import UnstructuredClient
  from unstructured_client.models.operations import CreateSourceRequest
  from unstructured_client.models.shared import CreateSourceConnector

  with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
      response = client.sources.create_source(
          request=CreateSourceRequest(
              create_source_connector=CreateSourceConnector(
                  name="<name>",
                  type="jira",
                  config={
                      "url": "<url>",

                      # For password or API token authentication:
                      # "username": "<username>",
                      # "password": "<password>",

                      # For personal access token authentication:
                      # "token": "<token>",

                      "projects": [
                          "<project-id>",
                          "<project-id>"
                      ],
                      "boards": [
                          "<board-id>",
                          "<board-id>"
                      ],
                      "issues": [
                          "<issue-id>",
                          "<issue-id>"
                      ],
                      "status_filters": [
                          "<status>",
                          "<status>"
                      ],
                      "download_attachments": <True|False>,
                      "cloud": <True|False>
                  }
              )
          )
      )

      print(response.source_connector_information)
  ```

  ```bash curl theme={null}
  curl --request 'POST' --location \
  "$UNSTRUCTURED_API_URL/sources" \
  --header 'accept: application/json' \
  --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
  --header 'content-type: application/json' \
  --data \
  '{
      "name": "<name>",
      "type": "jira",
      "config": {
          "url": "<url>",

          # For password or API token authentication:
          "username": "<username>",
          "password": "<password">,

          # For personal access token authentication:
          "token": "<token>",
          
          "projects": [ 
              "<project-id>",
              "<project-id>"
          ],
          "boards": [ 
              "<board-id>",
              "<board-id>"
          ],
          "issues": [ 
              "<issue-id>",
              "<issue-id>"
          ],
          "status_filters": [
              "<status>",
              "<status>"
          ],
          "download_attachments": <true|false>,
          "cloud": <true|false>
      }
  }'
  ```
</CodeGroup>

Replace the preceding placeholders as follows:

* `<name>` (*required*): A unique name for this connector.
* `<url>` (*required*): The URL of the Jira instance.
* `<username>` (*required* for password or API token authentication): The username of the Jira user.
* `<password>` (*required* for password or API token authentication): For password authentication, the password of the Jira user.
  For API token authentication, the API token of the Jira user.
* `<token>` (*required* for personal access token authentication): The personal access token of the Jira user.
* `<project-id>`: The ID of a target project in Jira to access.
* `<board-id>`: The ID of a target board in Jira to access.
* `<issue-id>`: The ID of a target issue in Jira to access.
* `<status>`: The name of a status to filter Jira issues by.
* Set `download_attachments` to `true` to download attachments from Jira issues. The default is `false`.
* Set `cloud` to `true` to specify using Jira Cloud or `false` to specify using Jira Data Center. The default is to use Jira Data Center.

## Learn more

* <Icon icon="blog" />  [Jira Integration in Unstructured: Build RAG Systems with Project Management Data](https://unstructured.io/blog/jira-integration-in-unstructured-build-rag-systems-with-project-management-data)

---

# Source: https://docs.unstructured.io/ui/jobs.md

# Source: https://docs.unstructured.io/api-reference/workflow/jobs.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Jobs

To use the [Unstructured API's workflow operations](/api-reference/workflow/overview) to manage jobs, do the following:

* To get a list of available jobs, use the `UnstructuredClient` object's `jobs.list_jobs` function (for the Python SDK) or
  the `GET` method to call the `/jobs` endpoint (for `curl` or Postman). [Learn more](/api-reference/workflow/overview#list-jobs).
* To get basic information about a job, use the `UnstructuredClient` object's `jobs.get_job` function (for the Python SDK) or
  the `GET` method to call the `/jobs/<job-id>` endpoint (for `curl` or Postman). [Learn more](/api-reference/workflow/overview#get-a-job).
* To get information about a job's current processing status, use the `UnstructuredClient` object's `jobs.get_job_details` function (for the Python SDK) or
  the `GET` method to call the `/jobs/<job-id>/details` endpoint (for `curl` or Postman). [Learn more](/api-reference/workflow/overview#get-processing-details-for-a-job).
* To get the list of any failed files for a job and why those files failed, use the `UnstructuredClient` object's `jobs.get_failed_files` function (for the Python SDK) or
  the `GET` method to call the `/jobs/<job-id>/failed-files` endpoint (for `curl` or Postman). [Learn more](/api-reference/workflow/overview#get-failed-file-details-for-a-job).
* To run a workflow that takes one or more local files only as input, and the workflow exists only for the duration of
  that job's run (known as an *on-demand job*), use the `POST` method to call the `/jobs/` endpoint for `curl` or Postman. [Learn more](/api-reference/workflow/overview#run-an-on-demand-job).
* A job is created automatically whenever a workflow runs on a schedule; see [Create a workflow](/api-reference/workflow/workflows#create-a-workflow).
  A job is also created automatically whenever you run a workflow manually; see [Run a workflow](/api-reference/workflow/overview#run-a-workflow).
* To cancel a running job, use the `UnstructuredClient` object's `jobs.cancel_job` function (for the Python SDK) or
  the `POST` method to call the `/jobs/<job-id>/cancel` endpoint (for `curl` or Postman). [Learn more](/api-reference/workflow/overview#cancel-a-job).

---

# Source: https://docs.unstructured.io/examplecode/tools/jq.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Query JSON with jq

[jq](https://jqlang.org/) is a lightweight and flexible command-line JSON processor. You can use `jq` on a local development machine to
slice, filter, map, and transform the JSON data that Unstructured outputs in much the same ways that tools such as `sed`, `awk`, and `grep` let you work with text.

To get `jq`, see the [Download jq](https://jqlang.org/download/) page.

<Info>
  `jq` is not owned or supported by Unstructured. For questions about `jq`and
  feature requests for future versions of `jq`, see the [Issues](https://github.com/jqlang/jq/issues) tab of the
  `jq` repository in GitHub.
</Info>

The following command examples use `jq` with the
[spring-weather.html.json](https://github.com/Unstructured-IO/unstructured/blob/main/example-docs/spring-weather.html.json) file in the
**example-docs** directory within the **Unstructured-IO/unstructured** repository in GitHub.

Find the element with a `type` of `Address`, and print the element's `text` field's value.

```bash  theme={null}
jq '.[] 
  | select(.type == "Address") 
  | .text' spring-weather.html.json
```

The output is:

```bash  theme={null}
"Silver Spring, MD 20910"
```

Find all elements with a `type` of `Title`, and print the `text` field of each found element as a string in a JSON array.

```bash  theme={null}
jq '[
  .[] 
  | select(.type == "Title") 
  | .text]' spring-weather.html.json
```

The output is:

```bash  theme={null}
[
  "News Around NOAA",
  "National Program",
  "Are You Weather-Ready for the Spring?",
  "Weather.gov >",
  "News Around NOAA > Are You Weather-Ready for the Spring?",
  "US Dept of Commerce",
  "National Oceanic and Atmospheric Administration",
  "National Weather Service",
  "News Around NOAA",
  "1325 East West Highway",
  "Comments? Questions? Please Contact Us.",
  "Disclaimer",
  "Information Quality",
  "Help",
  "Glossary",
  "Privacy Policy",
  "Freedom of Information Act (FOIA)",
  "About Us",
  "Career Opportunities"
]
```

Find all elements with a `type` of `Title`. Of these, find the ones that have a `text` field that contains the phrase `Contact Us`, and print the contents of each found element's `metadata.link_urls` field.

```bash  theme={null}
jq '.[] 
  | select(.type == "Title") 
  | select(.text 
  | contains("Contact Us")) 
  | .metadata.link_urls' spring-weather.html.json
```

The output is:

```bash  theme={null}
[
    "https://www.weather.gov/news/contact"
]
```

Find all elements with a `type` of `ListItem`. Of these, find the ones that have a `text` field that contains the phrase `Weather Safety`.
For each item in `metadata.link_texts`, print the item's value as the key, followed by the matching item in
`metadata.link_urls` as the value. Trim any leading and trailing whitespace from all values. Wrap the output in a JSON array.

```bash  theme={null}
jq '[
  .[]
  | select(.type == "ListItem")
  | select(.text | test("Weather Safety"; "i"))
  | [.metadata.link_texts, .metadata.link_urls]
  | transpose[]
  | {
      (.[0] | gsub("^\\s+|\\s+$"; "")) : (.[1] | gsub("^\\s+|\\s+$"; ""))
    }
]' spring-weather.html.json
```

The output is:

```bash  theme={null}
[
  {
    "Weather Safety": "http://www.weather.gov/safetycampaign"
  },
  {
    "Air Quality": "https://www.weather.gov/safety/airquality"
  },
  {
    "Beach Hazards": "https://www.weather.gov/safety/beachhazards"
  },
  {
    "Cold": "https://www.weather.gov/safety/cold"
  },
  {
    "Cold Water": "https://www.weather.gov/safety/coldwater"
  },
  {
    "Drought": "https://www.weather.gov/safety/drought"
  },
  {
    "Floods": "https://www.weather.gov/safety/flood"
  },
  {
    "Fog": "https://www.weather.gov/safety/fog"
  },
  {
    "Heat": "https://www.weather.gov/safety/heat"
  },
  {
    "Hurricanes": "https://www.weather.gov/safety/hurricane"
  },
  {
    "Lightning Safety": "https://www.weather.gov/safety/lightning"
  },
  {
    "Rip Currents": "https://www.weather.gov/safety/ripcurrent"
  },
  {
    "Safe Boating": "https://www.weather.gov/safety/safeboating"
  },
  {
    "Space Weather": "https://www.weather.gov/safety/space"
  },
  {
    "Sun (Ultraviolet Radiation)": "https://www.weather.gov/safety/heat-uv"
  },
  {
    "Thunderstorms & Tornadoes": "https://www.weather.gov/safety/thunderstorm"
  },
  {
    "Tornado": "https://www.weather.gov/safety/tornado"
  },
  {
    "Tsunami": "https://www.weather.gov/safety/tsunami"
  },
  {
    "Wildfire": "https://www.weather.gov/safety/wildfire"
  },
  {
    "Wind": "https://www.weather.gov/safety/wind"
  },
  {
    "Winter": "https://www.weather.gov/safety/winter"
  }
]
```

## Additional resources

* [jq Tutorial](https://jqlang.org/tutorial/)
* [jq Manual](https://jqlang.org/manual/)
* [jq Playground](https://play.jqlang.org/)

---

# Source: https://docs.unstructured.io/examplecode/tools/json-tools.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Working with Unstructured JSON output

You can use tools such as the following to work with Unstructured JSON output.

## Visual Studio Code

<Note>
  Visual Studio Code is not owned or supported by Unstructured. Unstructured makes no guarantees about the accuracy,
  reliability, or security of this tool or the extensions that you install into it. Use this tool and these
  extensions at your own risk.
</Note>

You can use [Visual Studio Code](https://code.visualstudio.com/) to work with Unstructured JSON output, as follows.

### Format JSON for easier reading in Visual Studio Code

To format an Unstructured JSON output file for easier reading, do the following after you download the Unstructured JSON output file locally and then open the downloaded file in Visual Studio Code:

1. Click anywhere in the JSON file on-screen to make it the active file.

2. Do one of the following:

   * Press **Shift + Option + F** for macOS, or **Shift + Alt + F** for Windows or Linux.
   * On the menu bar, click **View > Command Palette**. Then, in the **Command Palette**, type `>Format Document`, and press **Enter**.

3. The JSON file's contents display in a format that is easier to read.

### Visually render text\_as\_html field contents in Visual Studio Code

To visually render the contents of a `text_as_html` field in Unstructured JSON output, do the following in Visual Studio Code:

1. Install the [HTML Preview](https://marketplace.visualstudio.com/items?itemName=george-alisson.html-preview-vscode)
   extension into your installation of Visual Studio Code, as follows:

   a. On the menu bar, click **View > Extensions**.<br />
   b. In the **Search Extensions in Marketplace** box, type `HTML Preview`, and then press **Enter**.<br />
   c. In the list of search results, for the **HTML Preview** extension, click **Install**.<br />
   d. If prompted, click **Trust Publisher & Install**.<br />

2. Select the contents of the `text_as_html` field that you want to render. Do not copy the
   field name `"text_as_html"` or the opening or closing quotation marks around the field's contents.

3. Create a new file in Visual Studio Code, and paste the contents of the `text_as_html` that you copied, into the new file.

4. With the new file active on-screen, press the key combination `Control + Shift + V` or `Command + Shift + V` to visually render the new file's content.

See also:

* [Table-specific metadata](/ui/document-elements#table-specific-metadata)
* [VLM generated HTML elements](/ui/document-elements#vlm-generated-html-elements)
* [Extract tables as HTML](/api-reference/legacy-api/partition/text-as-html) (Unstructured API Reference)
* [Extract tables as HTML](/open-source/how-to/text-as-html) (Unstructured Open Source Reference)

### View image\_base64 field contents as an image in Visual Studio Code

The contents of `image_base64` fields in Unstructured JSON output are in Base64-encoded format. To view the contents of an `image_base64` field as a human-viewable image,
do the following in Visual Studio Code.

1. Install the [Base64Viewer](https://marketplace.visualstudio.com/items?itemName=JasonMejane.base64viewer)
   extension into your installation of Visual Studio Code, as follows:

   a. On the menu bar, click **View > Extensions**.<br />
   b. In the **Search Extensions in Marketplace** box, type `Base64Viewer`, and then press **Enter**.<br />
   c. In the list of search results, for the **Base64Viewer** extension, click **Install**.<br />
   d. If prompted, click **Trust Publisher & Install**.<br />

2. Copy to your system clipboard the contents of the `image_base64` field that you want to decode. Do not copy the
   field name `"image_base64"` or the opening or closing quotation marks around the field's contents.

3. On the menu bar, click **View > Command Palette**.

4. In the **Command Palette**, type `>Base64Viewer: Decode From Base64`, and press **Enter**.

5. When prompted, paste the contents of the `image_base64` field that you copied, and then press **Enter**. The
   human-viewable image appears in a separate editor tab.

See also:

* [Extract images and tables from documents](/api-reference/legacy-api/partition/extract-image-block-types) (Unstructured API Reference)
* [Extract images and tables from documents](/open-source/how-to/extract-image-block-types) (Unstructured Open Source Reference)

### View orig\_elements field contents as original JSON in Visual Studio Code

The contents of `orig_elements` fields in Unstructured JSON output are in gzipped Base64-encoded format. To view the contents of an `orig_elements` field in a human-readable format,
do the following in Visual Studio Code:

1. Install the [Gzip / Unzip text](https://marketplace.visualstudio.com/items?itemName=xballoy.gzip-unzip-text)
   extension into your installation of Visual Studio Code, as follows:

   a. On the menu bar, click **View > Extensions**.<br />
   b. In the **Search Extensions in Marketplace** box, type `Gzip / Unzip text`, and then press **Enter**.<br />
   c. In the list of search results, for the **Gzip / Unzip text** extension, click **Install**.<br />
   d. If prompted, click **Trust Publisher & Install**.<br />

2. Download the Unstructured JSON output file locally, and then open the downloaded file in Visual Studio Code.

   <Note>
     We do not recommend that you format the JSON file for easier reading after you open it in Visual Studio Code.
     This is because the formatting operation could introduce
     unnecessary whitespace and line breaks into the contents of the `orig_elements`
     fields. These additional characters could cause the extension to produce unexpected results or fail.
   </Note>

3. Select the contents of the `orig_elements` field that you want to decode and view. Do not select the
   field name `"orig_elements"` or the opening or closing quotation marks around the field's contents.

4. Right-click the selected contents, and then click **Command Palette**.

5. In the **Command Palette**, type `>Unzip Base64 encoded text`, and then press **Enter**. The human-readable JSON
   appears in a separate editor tab.

See also:

* [Get chunked elements](/api-reference/legacy-api/partition/get-chunked-elements) (Unstructured API Reference)
* [Get chunked elements](/open-source/how-to/get-chunked-elements) (Unstructured Open Source Reference)

## Online web tools

You can use online web tools such as the following to work with Unstructured JSON output.

### Format JSON for easier reading online

<Warning>
  Do not enter any sensitive information into this publicly-available website.
</Warning>

<Note>
  The following website is not owned or supported by Unstructured. Unstructured makes no guarantees about the accuracy,
  reliability, or security of this website. Use this website at your own risk.
</Note>

To format Unstructured JSON output for easier reading, do the following:

1. Go to the [JSON Viewer Online](https://easyjsonviewer.com).

2. Do one of the following:

   * Click **Load**, and then select an Unstructured JSON output file that you have saved locally.
   * Copy the Unstructured JSON output to your system clipboard, and then click **Paste**.

3. Click **Format**. The JSON is formatted on-screen for easier reading.

You should now be able to use your web browser's standard "find" function to search for specific text in
the formatted JSON, as needed.

### Visually render text\_as\_html field contents online

<Warning>
  Do not enter any sensitive information into this publicly-available website.
</Warning>

<Note>
  The following website is not owned or supported by Unstructured. Unstructured makes no guarantees about the accuracy,
  reliability, or security of this website. Use this website at your own risk.
</Note>

To visually render the contents of a `text_as_html` field in Unstructured JSON output, do the following:

1. Go to the [HTML Online Viewer](https://html.onlineviewer.net/).
2. Do one of the following:

   * Copy the contents of the `text_as_html` field to your system clipboard, and then paste it into the **Put your HTML text here** box.
     This content must not contain the field name `"text_as_html"` or the opening or closing quotation marks around the field's contents.
   * Click **Import**, and then select a file containing only the contents of the `text_as_html` field. This file
     must not contain the field name `"text_as_html"` or the opening or closing quotation marks around the field's contents.

<Tip>
  To add a one-pixel, black border to each cell in the rendered HTML for easier reading,
  add the following CSS directive before the opening `<table>` tag:

  ```text  theme={null}
  <style>table { border-collapse: collapse; } th, td { border: 1px solid black; }</style> 
  ```

  To add a bit of visual padding to each cell, add the following CSS directive before the
  opening `<table>` tag instead:

  ```text  theme={null}
  <style>th, td { border: 1px solid black; padding: 5px; }</style> 
  ```
</Tip>

After you paste or import the `text_as_html` field contents, the HTML is rendered on-screen.

See also:

* [Table-specific metadata](/ui/document-elements#table-specific-metadata)
* [VLM generated HTML elements](/ui/document-elements#vlm-generated-html-elements)
* [Extract tables as HTML](/api-reference/legacy-api/partition/text-as-html) (Unstructured API Reference)
* [Extract tables as HTML](/open-source/how-to/text-as-html) (Unstructured Open Source Reference)

### View image\_base64 field contents as an image online

<Warning>
  Do not enter any sensitive information into this publicly-available website.
</Warning>

<Note>
  The following website is not owned or supported by Unstructured. Unstructured makes no guarantees about the accuracy,
  reliability, or security of this website. Use this website at your own risk.
</Note>

The contents of `image_base64` fields in Unstructured JSON output are in Base64-encoded format. To view the contents of an `image_base64` field as a human-viewable image,
do the following:

1. Copy to your system clipboard the contents of the `image_base64` field that you want to view. Do not copy the
   field name `"image_base64"` or the opening or closing quotation marks around the field's contents.
2. Go to the [Base64 to Image](https://base64.guru/converter/decode/image) page on the Base64.Guru website.
3. In the **Base64** box, paste the contents of the `image_base64` field that you copied.
4. Click **Decode Base64 to Image**. The human-viewable image appears on-screen.

See also:

* [Extract images and tables from documents](/api-reference/legacy-api/partition/extract-image-block-types) (Unstructured API Reference)
* [Extract images and tables from documents](/open-source/how-to/extract-image-block-types) (Unstructured Open Source Reference)

### View orig\_elements field contents as original JSON online

<Warning>
  Do not enter any sensitive information into this publicly-available website.
</Warning>

<Note>
  The following website is not owned or supported by Unstructured. Unstructured makes no guarantees about the accuracy,
  reliability, or security of this website. Use this website at your own risk.
</Note>

The contents of `orig_elements` fields in Unstructured JSON output are in gzipped Base64-encoded format. To view the contents of an `orig_elements` field in a human-readable format,
do the following:

1. Go to the [Gzip Decompress Online](https://codebeautify.org/gzip-decompress-online) page on the Code Beautify website.

2. Do one of the following:

   * Copy the contents of the `orig_elements` field to your system clipboard, and then paste it into the **Enter your Gzip Compress Text** box.
     This content must not contain the field name `"orig_elements"` or the opening or closing quotation marks around the field's contents.<br />
   * Click **File**, and then select a file containing only the contents of the `orig_elements` field. This file
     must not contain the field name `"orig_elements"` or the opening or closing quotation marks around the field's contents.<br />

3. If the **Auto** box is not checked, click **Gzip Decompress**.

4. The human-readable JSON appears on-screen.

See also:

* [Get chunked elements](/api-reference/legacy-api/partition/get-chunked-elements) (Unstructured API Reference)
* [Get chunked elements](/open-source/how-to/get-chunked-elements) (Unstructured Open Source Reference)

---

# Source: https://docs.unstructured.io/ui/sources/kafka.md

# Source: https://docs.unstructured.io/ui/destinations/kafka.md

# Source: https://docs.unstructured.io/open-source/ingestion/source-connectors/kafka.md

# Source: https://docs.unstructured.io/open-source/ingestion/destination-connectors/kafka.md

# Source: https://docs.unstructured.io/api-reference/workflow/sources/kafka.md

# Source: https://docs.unstructured.io/api-reference/workflow/destinations/kafka.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Kafka

<Note>
  If you're new to Unstructured, read this note first.

  Before you can create a destination connector, you must first sign in to your Unstructured account:

  * If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
    After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).
    To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/api-reference/overview#pricing).
  * If you already have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at
    [https://platform.unstructured.io](https://platform.unstructured.io). For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
    or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  After you sign in, the [Unstructured user interface](/ui/overview) (UI) appears, which you use to get your Unstructured API key.

  1. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business** account, click **API Keys** on the sidebar.<br />

     For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
     for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).

  2. Click **Generate API Key**.<br />

  3. Follow the on-screen instructions to finish generating the key.<br />

  4. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

  After you create the destination connector, add it along with a
  [source connector](/api-reference/workflow/sources/overview) to a [workflow](/api-reference/workflow/overview#workflows).
  Then run the worklow as a [job](/api-reference/workflow/overview#jobs). To learn how, try out the
  the notebook [Dropbox-To-Pinecone Connector API Quickstart for Unstructured](https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Dropbox_To_Pinecone_Connector_Quickstart.ipynb),
  or watch the two 4-minute video tutorials for the [Unstructured Python SDK](/api-reference/workflow/overview#unstructured-python-sdk).

  You can also create destination connectors with the Unstructured user interface (UI).
  [Learn how](/ui/destinations/overview).

  If you need help, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  You are now ready to start creating a destination connector! Keep reading to learn how.
</Note>

Send processed data from Unstructured to Kafka.

The requirements are as follows.

* A Kafka cluster in [Confluent Cloud](https://www.confluent.io/confluent-cloud).
  ([Create a cluster](https://docs.confluent.io/cloud/current/clusters/create-cluster.html#create-ak-clusters).)

  The following video shows how to set up a Kafka cluster in Confluent Cloud:

  <iframe width="560" height="315" src="https://www.youtube.com/embed/zcKJ96J4Xvk" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* The [hostname and port number](https://docs.confluent.io/cloud/current/clusters/create-cluster.html#view-a-ak-cluster) of the bootstrap Kafka cluster to connect to..

* The name of the topic to read messages from or write messages to on the cluster.
  [Create a topic](https://docs.confluent.io/cloud/current/client-apps/topics/index.html#create-topics).
  [Access available topics](https://docs.confluent.io/cloud/current/client-apps/topics/index.html#create-topics).

* For authentication, an [API key and secret](https://docs.confluent.io/cloud/current/security/authenticate/workload-identities/service-accounts/api-keys/manage-api-keys.html#add-an-api-key).

To create a Kafka destination connector, see the following examples.

<CodeGroup>
  ```python Python SDK theme={null}
  import os

  from unstructured_client import UnstructuredClient
  from unstructured_client.models.operations import CreateDestinationRequest
  from unstructured_client.models.shared import CreateDestinationConnector

  with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
      response = client.destinations.create_destination(
          request=CreateDestinationRequest(
              create_destination_connector=CreateDestinationConnector(
                  name="<name>",
                  type="kafka-cloud",
                  config={
                      "bootstrap_servers": "<bootstrap-server>",
                      "port": <port>,
                      "group_id": "<group-id>",
                      "kafka_api_key": "<kafka-api-key>",
                      "secret": "<secret>",
                      "topic": "<topic>",
                      "batch_size": <batch-size>
                  }
              )
          )
      )

      print(response.destination_connector_information)
  ```

  ```bash curl theme={null}
  curl --request 'POST' --location \
  "$UNSTRUCTURED_API_URL/destinations" \
  --header 'accept: application/json' \
  --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
  --header 'content-type: application/json' \
  --data \
  '{
      "name": "<name>",
      "type": "kafka-cloud",
      "config": {
          "bootstrap_server": "<bootstrap-server>",
          "port": <port>,
          "group_id": "<group-id>",
          "kafka_api_key": "<kafka-api-key>",
          "secret": "<secret>",
          "topic": "<topic>",
          "num_messages_to_consume": <num-messages-to-consume>
      },
  }'
  ```
</CodeGroup>

Replace the preceding placeholders as follows:

* `<name>` (*required*) - A unique name for this connector.
* `<bootstrap-server>` - The hostname of the bootstrap Kafka cluster to connect to.
* `<port>` - The port number of the bootstrap Kafka cluster to connect to. The default is `9092` if not otherwise specified.
* `<group-id>` - The ID of the consumer group. A consumer group is a way to allow a pool of consumers to divide the consumption of data over topics and partitions. The default is `default_group_id` if not otherwise specified.
* `<kafka-api-key>` - For authentication, the API key for access to the cluster.
* `<secret>` - For authentication, the secret for access to the cluster.
* `<topic>` - The name of the topic to read messages from or write messages to on the cluster.
* `<batch-size>` (destination connector only) - The maximum number of messages to send in a single batch. The default is `100` if not otherwise specified.
* `<num-messages-to-consume>` (source connector only) - The maximum number of messages that the consumer will try to consume. The default is `100` if not otherwise specified.

## Learn more

* <Icon icon="blog" />  [Unstructured Platform Now Integrates with Apache Kafka in Confluent Cloud](https://unstructured.io/blog/unstructured-platform-now-integrates-with-apache-kafka-in-confluent-cloud)

---

# Source: https://docs.unstructured.io/open-source/ingestion/destination-connectors/kdbai.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# KDB.AI

Batch process all your records to store structured outputs in KDB.AI.

The requirements are as follows.

* A KDB.AI Cloud or server instance. [Sign Up for KDB.AI Cloud: Starter Edition](https://trykdb.kx.com/kdbai/signup/). [Set up KDB.AI Server](https://code.kx.com/kdbai/latest/gettingStarted/kdb-ai-server-setup.html).
* The instance's endpoint URL. [Get the KDB.AI Cloud endpoint URL](https://code.kx.com/kdbai/latest/gettingStarted/kdb-ai-cloud-setup.html#connect-to-your-database). [Get the KDB.AI Server endpoint URL](https://code.kx.com/kdbai/latest/gettingStarted/kdb-ai-server-setup.html).
* An API key. [Create the API key](https://code.kx.com/kdbai/latest/gettingStarted/kdb-ai-cloud-setup.html#create-an-api-key).
* The name of the target table to access. [Create the table](https://code.kx.com/kdbai/latest/gettingStarted/quickstart.html#create-a-new-table).

  KDB.AI requires the target table to have a defined schema before Unstructured can write to the table. The recommended table
  schema for Unstructured contains the fields `id`, `element_id`, `document`, `metadata`, and `embeddings`, as follows.
  This example code demonstrates the use of the [KDB.AI Client for Python](https://pypi.org/project/kdbai-client/) to create
  a table with this recommended schema, along with creating a vector index that contains 3072 dimensions:

  ```python Python theme={null}
  import kdbai_client as kdbai
  import os

  session = kdbai.Session(
      endpoint=os.getenv("KDBAI_ENDPOINT"),
      api_key=os.getenv("KDBAI_API_KEY")
  )

  db = session.database("default")

  schema = [
      {
          "name": "id",
          "type": "str"
      },
      {
          "name": "element_id",
          "type": "str"
      },
      {
          "name": "document",
          "type": "str"
      },
      {
          "name": "metadata", 
          "type": "general"
      },
      {
          "name": "embeddings",
          "type": "float32s"
      }
  ]

  indexes = [ 
      {
          "name": "vectorIndex",
          "type": "flat", 
          "params": {
              "dims": 3072,
              "metric": "CS"
          },
          "column": "embeddings"
      }
  ]

  table = db.create_table(
      table=os.getenv("KDBAI_TABLE"),
      schema=schema,
      indexes=indexes
  )

  print(f"The table named '{table.name}' now exists.")
  ```

The KDB.AI connector dependencies:

```bash CLI, Python theme={null}
pip install "unstructured-ingest[kdbai]"
```

You might also need to install additional dependencies, depending on your needs. [Learn more](/open-source/ingestion/ingest-dependencies).

The following environment variables:

* `KDBAI_ENDPOINT` - The KDB.AI instance's endpoint URL, represented by `--endpoint` (CLI) or `endpoint` (Python).
* `KDBAI_API_KEY` - The KDB.AI API key, represented by `--api-key` (CLI) or `api_key` (Python).
* `KDBAI_TABLE` - The name of the target table, represented by `--table-name` (CLI) or `table_name` (Python).

Now call the Unstructured CLI or Python. The source connector can be any of the ones supported. This example uses the local source connector:

This example sends files to Unstructured for processing by default. To process files locally instead, see the instructions at the end of this page.

<CodeGroup>
  ```bash CLI theme={null}
  #!/usr/bin/env bash

  # Chunking and embedding are optional.

  unstructured-ingest \
    local \
      --input-path $LOCAL_FILE_INPUT_DIR \
      --chunking-strategy by_title \
      --embedding-provider huggingface \
      --partition-by-api \
      --api-key $UNSTRUCTURED_API_KEY \
      --partition-endpoint $UNSTRUCTURED_API_URL \
      --strategy hi_res \
      --additional-partition-args="{\"split_pdf_page\":\"true\", \"split_pdf_allow_failed\":\"true\", \"split_pdf_concurrency_level\": 15}" \
    kdbai \
      --endpoint $KDBAI_API_KEY \
      --api-key $KDBAI_API_KEY \
      --table-name $KDBAI_TABLE
  ```

  ```python Python Ingest theme={null}
  import os

  from unstructured_ingest.pipeline.pipeline import Pipeline
  from unstructured_ingest.interfaces import ProcessorConfig

  from unstructured_ingest.processes.connectors.kdbai import (
      KdbaiConnectionConfig,
      KdbaiAccessConfig,
      KdbaiUploadStagerConfig,
      KdbaiUploaderConfig
  )

  from unstructured_ingest.processes.connectors.local import (
      LocalIndexerConfig,
      LocalDownloaderConfig,
      LocalConnectionConfig
  )
  from unstructured_ingest.processes.partitioner import PartitionerConfig
  from unstructured_ingest.processes.chunker import ChunkerConfig
  from unstructured_ingest.processes.embedder import EmbedderConfig

  # Chunking and embedding are optional.

  if __name__ == "__main__":
      Pipeline.from_configs(
          context=ProcessorConfig(),
          indexer_config=LocalIndexerConfig(input_path=os.getenv("LOCAL_FILE_INPUT_DIR")),
          downloader_config=LocalDownloaderConfig(),
          source_connection_config=LocalConnectionConfig(),
          partitioner_config=PartitionerConfig(
              partition_by_api=True,
              api_key=os.getenv("UNSTRUCTURED_API_KEY"),
              partition_endpoint=os.getenv("UNSTRUCTURED_API_URL"),
              strategy="hi_res",
              additional_partition_args={
                  "split_pdf_page": True,
                  "split_pdf_allow_failed": True,
                  "split_pdf_concurrency_level": 15
              }
          ),
          chunker_config=ChunkerConfig(chunking_strategy="by_title"),
          embedder_config=EmbedderConfig(embedding_provider="huggingface"),
          destination_connection_config=KdbaiConnectionConfig(
              access_config=KdbaiAccessConfig(
                  api_key=os.getenv("KDBAI_API_KEY")
              ),
              endpoint=os.getenv("KDBAI_ENDPOINT")
          ),
          stager_config=KdbaiUploadStagerConfig(),
          uploader_config=KdbaiUploaderConfig(table_name=os.getenv("KDBAI_TABLE"))
      ).run()
  ```
</CodeGroup>

For the Unstructured Ingest CLI and the Unstructured Ingest Python library, you can use the `--partition-by-api` option (CLI) or `partition_by_api` (Python) parameter to specify where files are processed:

* To do local file processing, omit `--partition-by-api` (CLI) or `partition_by_api` (Python), or explicitly specify `partition_by_api=False` (Python).

  Local file processing does not use an Unstructured API key or API URL, so you can also omit the following, if they appear:

  * `--api-key $UNSTRUCTURED_API_KEY` (CLI) or `api_key=os.getenv("UNSTRUCTURED_API_KEY")` (Python)
  * `--partition-endpoint $UNSTRUCTURED_API_URL` (CLI) or `partition_endpoint=os.getenv("UNSTRUCTURED_API_URL")` (Python)
  * The environment variables `UNSTRUCTURED_API_KEY` and `UNSTRUCTURED_API_URL`

* To send files to the legacy [Unstructured Partition Endpoint](/api-reference/legacy-api/partition/overview) for processing, specify `--partition-by-api` (CLI) or `partition_by_api=True` (Python).

  Unstructured also requires an Unstructured API key and API URL, by adding the following:

  * `--api-key $UNSTRUCTURED_API_KEY` (CLI) or `api_key=os.getenv("UNSTRUCTURED_API_KEY")` (Python)
  * `--partition-endpoint $UNSTRUCTURED_API_URL` (CLI) or `partition_endpoint=os.getenv("UNSTRUCTURED_API_URL")` (Python)
  * The environment variables `UNSTRUCTURED_API_KEY` and `UNSTRUCTURED_API_URL`, representing your API key and API URL, respectively.

  <Note>
    You must specify the API URL only if you are not using the default API URL for Unstructured Ingest, which applies to **Let's Go**, **Pay-As-You-Go**, and **Business SaaS** accounts.

    The default API URL for Unstructured Ingest is `https://api.unstructuredapp.io/general/v0/general`, which is the API URL for the legacy[Unstructured Partition Endpoint](/api-reference/legacy-api/partition/overview). However, you should always use the URL that was provided to you when your Unstructured account was created. If you do not have this URL, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

    If you do not have an API key, [get one now](/api-reference/legacy-api/partition/overview).

    If you are using a **Business** account, the process
    for generating Unstructured API keys, and the Unstructured API URL that you use, are different.
    For instructions, see your Unstructured account administrator, or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).
  </Note>

---

# Source: https://docs.unstructured.io/open-source/ingestion/destination-connectors/lancedb.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# LanceDB

Batch process all your records to store structured outputs in LanceDB.

The requirements are as follows.

* A [LanceDB open source software (OSS) installation](https://lancedb.github.io/lancedb/basic/#installation) on a local machine, a server, or a virtual machine.
  (LanceDB Cloud is not supported.)

* For LanceDB OSS with local data storage:

  * The local path to the folder where the LanceDB data is (or will be) stored.
    See [Connect to a database](https://lancedb.github.io/lancedb/basic/#connect-to-a-database) in the LanceDB documentation.
  * The name of the target [LanceDB table](https://lancedb.github.io/lancedb/basic/#create-an-empty-table) within the local data folder.

    <Note>
      Unstructured recommends that the target table have a field named `record_id` with a text string data type.
      Unstructured can use this field to do intelligent record overwrites. Without this field, duplicate records
      might be written to the table or, in some cases, the operation could fail altogether.
    </Note>

* For LanceDB OSS with data storage in an Amazon S3 bucket:

  * The URI for the target Amazon S3 bucket and any target folder path within that bucket. Use the format `s3://<bucket-name>[/<folder-name>]`.

  * The name of the target [LanceDB table](https://lancedb.github.io/lancedb/guides/storage/#object-stores) within the Amazon S3 bucket.

    <Note>
      Unstructured recommends that the target table have a field named `record_id` with a text string data type.
      Unstructured can use this field to do intelligent record overwrites. Without this field, duplicate records
      might be written to the table or, in some cases, the operation could fail altogether.
    </Note>

  * The AWS access key ID and AWS secret access key for the AWS IAM entity that has access to the Amazon S3 bucket.

  For more information, see [AWS S3](https://lancedb.github.io/lancedb/guides/storage/#aws-s3) in the LanceDB documentation, along with the following video:

  <iframe width="560" height="315" src="https://www.youtube.com/embed/hyDHfhVVAhs" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* For LanceDB OSS with data storage in an Azure Blob Storage account:

  * The name of the target Azure Blob Storage account.
    \= The URI for the target container within that Azure Blob Storage account and any target folder path within that container. Use the format `az://<container-name>[/<folder-name>]`.

  * The name of the target [LanceDB table](https://lancedb.github.io/lancedb/guides/storage/#object-stores) within the Azure Blob Storage account.

    <Note>
      Unstructured recommends that the target table have a field named `record_id` with a text string data type.
      Unstructured can use this field to do intelligent record overwrites. Without this field, duplicate records
      might be written to the table or, in some cases, the operation could fail altogether.
    </Note>

  * The access key for the Azure Blob Storage account.

  For more information, see [Azure Blob Storage](https://lancedb.github.io/lancedb/guides/storage/#azure-blob-storage) in the LanceDB documentation, along with the following video:

  <iframe width="560" height="315" src="https://www.youtube.com/embed/Vl3KCphlh9Y" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* For LanceDB OSS with data storage in a Google Cloud Storage bucket:

  * The URI for the target Google Cloud Storage bucket and any target folder path within that bucket. Use the format `gs://<bucket-name>[/<folder-name>]`.

  * The name of the target [LanceDB table](https://lancedb.github.io/lancedb/guides/storage/#object-stores) within the Google Cloud Storage bucket.

    <Note>
      Unstructured recommends that the target table have a field named `record_id` with a text string data type.
      Unstructured can use this field to do intelligent record overwrites. Without this field, duplicate records
      might be written to the table or, in some cases, the operation could fail altogether.
    </Note>

  * A single-line string that contains the contents of the downloaded service account key file for the Google Cloud service account that has access to the
    Google Cloud Storage bucket.

  For more information, see [Google Cloud Storage](https://lancedb.github.io/lancedb/guides/storage/#google-cloud-storage) in the LanceDB documentation, along with the following video:

  <iframe width="560" height="315" src="https://www.youtube.com/embed/HYaALQ0F-L4" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

The LanceDB connector dependencies:

```bash CLI, Python theme={null}
pip install "unstructured-ingest[lancedb]"
```

You might also need to install additional dependencies, depending on your needs. [Learn more](/open-source/ingestion/ingest-dependencies).

The following environment variables:

* For LanceDB OSS with local data storage:

  * `LANCEDB_URI` - The local path to the folder where the LanceDB data is stored, represented by `--uri` (CLI) or `uri` (Python).
  * `LANCEDB_TABLE` - The name of the target LanceDB table within the local data folder, represented by `--table-name` (CLI) or `table_name` (Python).

* For LanceDB OSS with data storage in an Amazon S3 bucket:

  * `LANCEDB_URI` - The URI for the target Amazon S3 bucket and any target folder path within that bucket. Use the format `s3://<bucket-name>[/<folder-name>]`. This is represented by `--uri` (CLI) or `uri` (Python).
  * `LANCEDB_TABLE` - The name of the target LanceDB table within the Amazon S3 bucket, rrepresented by `--table-name` (CLI) or `table_name` (Python).
  * `AWS_ACCESS_KEY_ID` - The AWS access key ID for the AWS IAM entity that has access to the Amazon S3 bucket, represented by `--aws-access-key-id` (CLI) or `aws_access_key_id` (Python).
  * `AWS_SECRET_ACCESS_KEY` - The AWS secret access key for the AWS IAM entity that has access to the Amazon S3 bucket, represented by `--aws-secret-access-key` (CLI) or `aws_secret_access_key` (Python).

* For LanceDB OSS with data storage in an Azure Blob Storage account:

  * `LANCEDB_URI` - The URI for the target container within that Azure Blob Storage account and any target folder path within that container. Use the format `az://<container-name>[/<folder-name>]`. This is represented by `--uri` (CLI) or `uri` (Python).
  * `LANCEDB_TABLE` - The name of the target LanceDB table within the Azure Blob Storage account, represented by `--table-name` (CLI) or `table_name` (Python).
  * `AZURE_STORAGE_ACCOUNT_NAME` - The name of the target Azure Blob Storage account, represented by `--azure-storage-account-name` (CLI) or `azure_storage_account_name` (Python).
  * `AZURE_STORAGE_ACCOUNT_KEY` - The access key for the Azure Blob Storage account, represented by `--azure-storage-account-key` (CLI) or `azure_storage_account_key` (Python).

* For LanceDB OSS with data storage in a Google Cloud Storage bucket:

  * `LANCEDB_URI` - The URI for the target Google Cloud Storage bucket and any target folder path within that bucket. Use the format `gs://<bucket-name>[/<folder-name>]`. This is represented by `--uri` (CLI) or `uri` (Python).
  * `LANCEDB_TABLE` - The name of the target LanceDB table within the Google Cloud Storage bucket, represented by `--table-name` (CLI) or `table_name` (Python).
  * `GCS_SERVICE_ACCOUNT_KEY` - A single-line string that contains the contents of the downloaded service account key file for the Google Cloud service account
    that has access to the Google Cloud Storage bucket, represented by `--google-service-account-key` (CLI) or `google_service_account_key` (Python).

Now call the Unstructured CLI or Python. The source connector can be any of the ones supported. This example uses the local source connector:

This example sends files to Unstructured for processing by default. To process files locally instead, see the instructions at the end of this page.

<CodeGroup>
  ```bash CLI theme={null}
  #!/usr/bin/env bash

  # Chunking and embedding are optional.

  # For LanceDB OSS with local data storage:
  unstructured-ingest \
    local \
      --input-path $LOCAL_FILE_INPUT_DIR \
      --chunking-strategy by_title \
      --embedding-provider huggingface \
      --partition-by-api \
      --api-key $UNSTRUCTURED_API_KEY \
      --partition-endpoint $UNSTRUCTURED_API_URL \
      --additional-partition-args="{\"split_pdf_page\":\"true\", \"split_pdf_allow_failed\":\"true\", \"split_pdf_concurrency_level\": 15}" \
    lancedb-local \
      --uri $LANCEDB_URI \
      --table-name $LANCEDB_TABLE

  # For LanceDB OSS with data storage in an Amazon S3 bucket:
  unstructured-ingest \
    local \
      --input-path $LOCAL_FILE_INPUT_DIR \
      --chunking-strategy by_title \
      --embedding-provider huggingface \
      --partition-by-api \
      --api-key $UNSTRUCTURED_API_KEY \
      --partition-endpoint $UNSTRUCTURED_API_URL \
      --additional-partition-args="{\"split_pdf_page\":\"true\", \"split_pdf_allow_failed\":\"true\", \"split_pdf_concurrency_level\": 15}" \
    lancedb-aws \
      --aws-access-key-id $AWS_ACCESS_KEY_ID \
      --aws-secret-access-key $AWS_SECRET_ACCESS_KEY \
      --uri $LANCEDB_URI \
      --table-name $LANCEDB_TABLE \
      --timeout 30s

  # For LanceDB OSS with data storage in an Azure Blob Storage account:
  unstructured-ingest \
    local \
      --input-path $LOCAL_FILE_INPUT_DIR \
      --chunking-strategy by_title \
      --embedding-provider huggingface \
      --partition-by-api \
      --api-key $UNSTRUCTURED_API_KEY \
      --partition-endpoint $UNSTRUCTURED_API_URL \
      --additional-partition-args="{\"split_pdf_page\":\"true\", \"split_pdf_allow_failed\":\"true\", \"split_pdf_concurrency_level\": 15}" \
    lancedb-azure \
      --azure-storage-account-name $AZURE_STORAGE_ACCOUNT_NAME \
      --azure-storage-account-key $AZURE_STORAGE_ACCOUNT_KEY \
      --uri $LANCEDB_URI \
      --table-name $LANCEDB_TABLE \
      --timeout 30s

  # For LanceDB OSS with data storage in a Google Cloud Storage bucket:
  unstructured-ingest \
    local \
      --input-path $LOCAL_FILE_INPUT_DIR \
      --chunking-strategy by_title \
      --embedding-provider huggingface \
      --partition-by-api \
      --api-key $UNSTRUCTURED_API_KEY \
      --partition-endpoint $UNSTRUCTURED_API_URL \
      --additional-partition-args="{\"split_pdf_page\":\"true\", \"split_pdf_allow_failed\":\"true\", \"split_pdf_concurrency_level\": 15}" \
    lancedb-gcs \
      --google-service-account-key $GCS_SERVICE_ACCOUNT_KEY \
      --uri $LANCEDB_URI \
      --table-name $LANCEDB_TABLE \
      --timeout 30s
  ```

  ```python Python Ingest theme={null}
  import os

  from unstructured_ingest.pipeline.pipeline import Pipeline
  from unstructured_ingest.interfaces import ProcessorConfig

  from unstructured_ingest.processes.connectors.local import (
      LocalIndexerConfig,
      LocalDownloaderConfig,
      LocalConnectionConfig
  )
  from unstructured_ingest.processes.partitioner import PartitionerConfig
  from unstructured_ingest.processes.chunker import ChunkerConfig
  from unstructured_ingest.processes.embedder import EmbedderConfig

  # For LanceDB OSS with local data storage:
  # from unstructured_ingest.processes.connectors.lancedb.local import (
  #     LanceDBLocalConnectionConfig,
  #     LanceDBLocalAccessConfig,
  #     LanceDBUploadStagerConfig,
  #     LanceDBUploaderConfig
  # )

  # For LanceDB OSS with data storage in an Amazon S3 bucket:
  from unstructured_ingest.processes.connectors.lancedb.aws import (
      LanceDBS3ConnectionConfig,
      LanceDBS3AccessConfig,
      LanceDBUploadStagerConfig,
      LanceDBUploaderConfig
  )

  # For LanceDB OSS with data storage in an Azure Blob Storage account:
  # from unstructured_ingest.processes.connectors.lancedb.azure import (
  #     LanceDBAzureConnectionConfig,
  #     LanceDBAzureAccessConfig,
  #     LanceDBUploadStagerConfig,
  #     LanceDBUploaderConfig
  # )

  # For LanceDB OSS with data storage in a Google Cloud Storage bucket:
  # from unstructured_ingest.processes.connectors.lancedb.gcp import (
  #     LanceDBGCSConnectionConfig,
  #     LanceDBGCSAccessConfig,
  #     LanceDBUploadStagerConfig,
  #     LanceDBUploaderConfig
  # )

  # Chunking and embedding are optional.

  if __name__ == "__main__":
      Pipeline.from_configs(
          context=ProcessorConfig(),
          indexer_config=LocalIndexerConfig(input_path=os.getenv("LOCAL_FILE_INPUT_DIR")),
          downloader_config=LocalDownloaderConfig(),
          source_connection_config=LocalConnectionConfig(),
          partitioner_config=PartitionerConfig(
              partition_by_api=True,
              api_key=os.getenv("UNSTRUCTURED_API_KEY"),
              partition_endpoint=os.getenv("UNSTRUCTURED_API_URL"),
              additional_partition_args={
                  "split_pdf_page": True,
                  "split_pdf_allow_failed": True,
                  "split_pdf_concurrency_level": 15
              }
          ),
          chunker_config=ChunkerConfig(chunking_strategy="by_title"),
          embedder_config=EmbedderConfig(embedding_provider="huggingface"),

          # For LanceDB OSS with local data storage:
          # destination_connection_config=LanceDBLocalConnectionConfig(
          #     access_config=LanceDBLocalAccessConfig(),
          #     uri=os.getenv("LANCEDB_URI")
          # ),

          # For LanceDB OSS with data storage in an Amazon S3 bucket:
          destination_connection_config=LanceDBS3ConnectionConfig(
              access_config=LanceDBS3AccessConfig(
                  aws_access_key_id=os.getenv("AWS_ACCESS_KEY_ID"),
                  aws_secret_access_key=os.getenv("AWS_SECRET_ACCESS_KEY")
              ),
              uri=os.getenv("LANCEDB_URI"),
              timeout="30s"
          ),

          # For LanceDB OSS with data storage in an Azure Blob Storage account:
          # destination_connection_config=LanceDBAzureConnectionConfig(
          #     access_config=LanceDBAzureAccessConfig(
          #         azure_storage_account_name=os.getenv("AZURE_STORAGE_ACCOUNT_NAME"),
          #         azure_storage_account_key=os.getenv("AZURE_STORAGE_ACCOUNT_KEY")
          #     ),
          #     uri=os.getenv("LANCEDB_URI"),
          #     timeout="30s"
          # ),

          # For LanceDB OSS with data storage in a Google Cloud Storage bucket:
          # destination_connection_config=LanceDBGCSConnectionConfig(
          #     access_config=LanceDBGCSAccessConfig(
          #         google_service_account_key=os.getenv("GCS_SERVICE_ACCOUNT_KEY")
          #     ),
          #     uri=os.getenv("LANCEDB_URI"),
          #     timeout="30s"
          # ),

          stager_config=LanceDBUploadStagerConfig(),
          uploader_config=LanceDBUploaderConfig(table_name=os.getenv("LANCEDB_TABLE"))
      ).run()
  ```
</CodeGroup>

For the Unstructured Ingest CLI and the Unstructured Ingest Python library, you can use the `--partition-by-api` option (CLI) or `partition_by_api` (Python) parameter to specify where files are processed:

* To do local file processing, omit `--partition-by-api` (CLI) or `partition_by_api` (Python), or explicitly specify `partition_by_api=False` (Python).

  Local file processing does not use an Unstructured API key or API URL, so you can also omit the following, if they appear:

  * `--api-key $UNSTRUCTURED_API_KEY` (CLI) or `api_key=os.getenv("UNSTRUCTURED_API_KEY")` (Python)
  * `--partition-endpoint $UNSTRUCTURED_API_URL` (CLI) or `partition_endpoint=os.getenv("UNSTRUCTURED_API_URL")` (Python)
  * The environment variables `UNSTRUCTURED_API_KEY` and `UNSTRUCTURED_API_URL`

* To send files to the legacy [Unstructured Partition Endpoint](/api-reference/legacy-api/partition/overview) for processing, specify `--partition-by-api` (CLI) or `partition_by_api=True` (Python).

  Unstructured also requires an Unstructured API key and API URL, by adding the following:

  * `--api-key $UNSTRUCTURED_API_KEY` (CLI) or `api_key=os.getenv("UNSTRUCTURED_API_KEY")` (Python)
  * `--partition-endpoint $UNSTRUCTURED_API_URL` (CLI) or `partition_endpoint=os.getenv("UNSTRUCTURED_API_URL")` (Python)
  * The environment variables `UNSTRUCTURED_API_KEY` and `UNSTRUCTURED_API_URL`, representing your API key and API URL, respectively.

  <Note>
    You must specify the API URL only if you are not using the default API URL for Unstructured Ingest, which applies to **Let's Go**, **Pay-As-You-Go**, and **Business SaaS** accounts.

    The default API URL for Unstructured Ingest is `https://api.unstructuredapp.io/general/v0/general`, which is the API URL for the legacy[Unstructured Partition Endpoint](/api-reference/legacy-api/partition/overview). However, you should always use the URL that was provided to you when your Unstructured account was created. If you do not have this URL, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

    If you do not have an API key, [get one now](/api-reference/legacy-api/partition/overview).

    If you are using a **Business** account, the process
    for generating Unstructured API keys, and the Unstructured API URL that you use, are different.
    For instructions, see your Unstructured account administrator, or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).
  </Note>

---

# Source: https://docs.unstructured.io/examplecode/tools/langflow.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Langflow

[Langflow](https://www.langflow.org/) is a visual framework for building multi-agent and RAG applications.
It is open-source, fully customizable, and works with most LLMs and many vector stores out of the box.

<img src="https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/designer.png?fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=852eec6f836e103a7b9bfa921c99385d" alt="Langflow designer" data-og-width="889" width="889" data-og-height="656" height="656" data-path="img/langflow/designer.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/designer.png?w=280&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=b7ae3409dad2733d010402245700ff87 280w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/designer.png?w=560&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=0f4a5ee139a6dd40fa83b7f6597831a8 560w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/designer.png?w=840&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=83c0c41faa2ef9d03a1df84841c61fce 840w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/designer.png?w=1100&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=bdd5e3c6c79d9d6e764c61f50b53a9d9 1100w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/designer.png?w=1650&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=88e77caa1607538895371b43c6d74c22 1650w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/designer.png?w=2500&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=e0ef48c4169e045c2f5db95b2e0900c4 2500w" />

This no-code, hands-on demonstration walks you through creating a Langflow project that enables you to use GPT-4o-mini to chat
in real time with a PDF document that is processed by Unstructured and has its processed data stored in an
[Astra DB](https://www.datastax.com/products/datastax-astra) vector database.

## Prerequisites

<iframe width="560" height="315" src="https://www.youtube.com/embed/PMs1iwL52aM" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* An IBM Cloud account or DataStax account.

  * For an IBM Cloud account, [sign up](https://cloud.ibm.com/registration) for an IBMid, and then [sign in](https://accounts.datastax.com/session-service/v1/login) to DataStax with your IBMid.
  * For a DataStax account, [sign up](https://astra.datastax.com/signup) for a DataStax account, and then [sign in](https://accounts.datastax.com/session-service/v1/login) to DataStax with your DataStax account.

* An Astra DB database in the DataStax account. To create a database:

  a. After you sign in to DataStax, click **Create database**.<br />
  b. Click the **Serverless (vector)** tile, if it is not already selected.<br />
  c. For **Database name**, enter some unique name for the database.<br />
  d. Select a **Provider** and a **Region**, and then click **Create database**.<br />

  [Learn more](https://docs.datastax.com/en/astra-db-classic/databases/manage-create.html).

* An application token for the database. To create an application token:

  a. After you sign in to DataStax, in the list of databases, click the name of the target database.<br />
  b. On the **Overview** tab, under **Database Details**, in the **Application Tokens** tile, click **Generate Token**.<br />
  c. Enter some **Token description** and select and **Expiration** time period, and then click **Generate token**.<br />
  d. Save the application token that is displayed to a secure location, and then click **Close**.<br />

  [Learn more](https://docs.datastax.com/en/astra-db-serverless/administration/manage-application-tokens.html).

* A keyspace in the database. To create a keyspace:

  a. After you sign in to DataStax, in the list of databases, click the name of the target database.<br />
  b. On the **Data Explorer** tab, in the **Keyspace** list, select **Create keyspace**.<br />
  c. Enter some **Keyspace name**, and then click **Add keyspace**.<br />

  [Learn more](https://docs.datastax.com/en/astra-db-serverless/databases/manage-keyspaces.html#keyspaces).

* A collection in the keyspace.

  For the [Unstructured UI](/ui/overview) and [Unstructured API](/api-reference/overview):

  * An existing collection is not required. At runtime, the collection behavior is as follows:

    * If an existing collection name is specified, and Unstructured generates embeddings,
      but the number of dimensions that are generated does not match the existing collection's embedding settings, the run will fail.
      You must change your Unstructured embedding settings or your existing collection's embedding settings to match, and try the run again.
    * If a collection name is not specified, Unstructured creates a new collection in your keyspace. If Unstructured generates embeddings,
      the new collection's name will be `u<short-workflow-id>_<short-embedding-model-name>_<number-of-dimensions>`.
      If Unstructured does not generate embeddings, the new collection's name will be `u<short-workflow-id`.

  For [Unstructured Ingest](/open-source/ingestion/overview):

  * For the source connector only, an existing collection is required.
  * For the destination connector only, an existing collection is not required. At runtime, the collection behavior is as follows:

    * If an existing collection name is specified, and Unstructured generates embeddings,
      but the number of dimensions that are generated does not match the existing collection's embedding settings, the run will fail.
      You must change your Unstructured embedding settings or your existing collection's embedding settings to match, and try the run again.
    * If a collection name is not specified, Unstructured creates a new collection in your keyspace. The new collection's name will be `unstructuredautocreated`.

  To create a collection yourself:

  a. After you sign in to DataStax, in the list of databases, click the name of the target database.<br />
  b. On the **Data Explorer** tab, in the **Keyspace** list, select the name of the target keyspace.<br />
  c. In the **Collections** list, select **Create collection**.<br />
  d. Enter some **Collection name**.<br />
  e. Turn on **Vector-enabled collection**, if it is not already turned on.<br />
  f. Choose a mode for **Embedding generation method**. See [Astra DB generated embeddings](#astra-db-generated-embeddings).<br />
  g. If you chose **Bring my own**, enter the number of dimensions for the embedding model that you plan to use.<br />
  h. For **Similarity metric**, select **Cosine**.<br />
  i. Click **Create collection**.<br />

  [Learn more](https://docs.datastax.com/en/astra-db-serverless/databases/manage-collections.html#create-collection).

Also:

* [Sign up for an OpenAI account](https://platform.openai.com/signup), and [get your OpenAI API key](https://help.openai.com/en/articles/4936850-where-do-i-find-my-openai-api-key).
* [Sign up for a free Langflow account](https://astra.datastax.com/signup?type=langflow).
* Get your Unstructured account and Unstructured API key:

  1. If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
     After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).

     <Note>
       To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/api-reference/overview#pricing).
     </Note>

  2. If you have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at [https://platform.unstructured.io](https://platform.unstructured.io).

     <Note>
       For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
       or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).
     </Note>

  3. Get your Unstructured API key:<br />

     a. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business** account, click **API Keys** on the sidebar.<br />

     <Note>
       For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
       for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).
     </Note>

     b. Click **Generate API Key**.<br />
     c. Follow the on-screen instructions to finish generating the key.<br />
     d. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

## Create and run the demonstration project

<Steps>
  <Step title="Create the Langflow project">
    1. Sign in to your Langflow dashboard.
    2. From your dashboard, click **New Project**.
    3. Click **Blank Flow**.
  </Step>

  <Step title="Add the Unstructured component">
    In this step, you add a component that instructs the legacy Unstructured Partition Endpoint to process a local file that you specify.

    1. On the sidebar, expand **Experimental (Beta)**, and then expand **Loaders**.

    2. Drag the **Unstructured** component onto the designer area.

    3. In the **Unstructured** component, click the box or icon next to **File**, and then select a local file for Unstructured to process.

       This component works only with the file extensions `.pdf`, `.docx`, and `.txt`. Although you can use any local file with one of these extensions,
       this demonstration uses [the text of the United States Constitution in PDF format](https://constitutioncenter.org/media/files/constitution.pdf),
       saved to your local development machine.

       <Tip>
         To work with multiple local files, or to work with remote files, see the suggested options in [Next steps](#next-steps).
       </Tip>

    4. For **Unstructured.io Serverless API Key**, enter your Unstructured API key value.

       <img src="https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/unstructured-component.png?fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=cac667e124074c079b3b87c072e7d9bf" alt="Unstructured component" data-og-width="300" width="300" data-og-height="343" height="343" data-path="img/langflow/unstructured-component.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/unstructured-component.png?w=280&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=0b383dd3c65cad6c335c64212f4cb57c 280w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/unstructured-component.png?w=560&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=8e89ce38bd97785d16bd0b1e3fba10d7 560w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/unstructured-component.png?w=840&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=a883043e2fba8c64d5d582e46ece9efb 840w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/unstructured-component.png?w=1100&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=8ceb03746b280e0d29e33492b3c3b631 1100w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/unstructured-component.png?w=1650&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=8255ce7a80bc7e8d78b30aaaef931869 1650w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/unstructured-component.png?w=2500&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=479da08f97af52bb6562f857ef006616 2500w" />

    5. Wait until **Saved** appears in the top navigation bar.

       <img src="https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/saved.png?fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=d29665b48a04781f5806efece7999de1" alt="Saved message" data-og-width="332" width="332" data-og-height="188" height="188" data-path="img/langflow/saved.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/saved.png?w=280&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=494a284ed6e18995895d9743af89bbac 280w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/saved.png?w=560&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=3ceeef577d16a7bf7e2251c30e773dbb 560w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/saved.png?w=840&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=06b62386025285585631f178bb132aa2 840w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/saved.png?w=1100&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=e7f642b654dcdbd62fe5a731956e28e7 1100w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/saved.png?w=1650&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=aa82635399b55d39bb99f9404f50cab1 1650w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/saved.png?w=2500&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=ae37ef36e92f5b8c410d6a09a92b6b2d 2500w" />
  </Step>

  <Step title="Add the OpenAI Embeddings component">
    In this step, you add a component that generates vector embeddings for the processed data that Unstructured outputs.

    1. On the sidebar, expand **Embeddings**, and then drag the **OpenAI Embeddings** component onto the designer area.

    2. In the **OpenAI Embeddings** component, for **Model**, select `text-embedding-3-large`.

    3. For **OpenAI API Key**, enter your OpenAI API key's value.

       <img src="https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/openai-embeddings-component.png?fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=f2023451994645df89c4d203e4fd8043" alt="OpenAI Embeddings component" data-og-width="302" width="302" data-og-height="302" height="302" data-path="img/langflow/openai-embeddings-component.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/openai-embeddings-component.png?w=280&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=cade9dc87224b3c0450a81d06f963a5c 280w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/openai-embeddings-component.png?w=560&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=3db0a42c7d71189099350e9cfc5b32e4 560w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/openai-embeddings-component.png?w=840&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=0bc635b42442d5a29fa364e6e17ed90e 840w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/openai-embeddings-component.png?w=1100&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=deab3ada3bd71c66fe21dcb35937de57 1100w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/openai-embeddings-component.png?w=1650&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=2f2931b79d99451e6c4726643272302e 1650w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/openai-embeddings-component.png?w=2500&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=3a8b5a7aa8e77b75af43a3ab561d968b 2500w" />

    4. Wait until **Saved** appears in the top navigation bar.
  </Step>

  <Step title="Add the Astra DB components">
    In this step, you add two components. The first component instructs Astra DB to ingest into the specified Astra DB collection the processed data that Unstructured outputs along
    with the associated generated vector embeddings. The second component instructs Astra DB to take user-supplied chat messages and perform contextual
    searches over the ingested data in the specified Astra DB collection, outputting its search results.

    1. On the sidebar, expand **Vector Stores**, and then drag the **Astra DB** component onto the designer area.

    2. Double-click the **Astra DB** component's title bar, and rename the component to `Astra DB Ingest`.

    3. Repeat these previous two actions to add a second **Astra DB** component, renaming it to `Astra DB RAG`.

    4. In both of these **Astra DB** components, in the **Database** list, select the name of your Astra DB database. Make sure this is the same database name in both components.

    5. In the **Collection** list in both components, select the name of the collection in the database. Make sure this is the same collection name in both components.

    6. In the **Astra DB Application Token** box in both components, enter your Astra DB application token's value. Make sure this is the same application token value in both components.

    7. Connect the **Data** output from the **Unstructured** component to the **Ingest Data** input in the **Astra DB Ingest** component.

       To make the connection, click and hold your mouse pointer inside of the circle next to **Data** in the **Unstructured** component.
       While holding your mouse pointer, drag it over into the circle next to **Ingest Data** in the **Astra DB Ingest** component. Then
       release your mouse pointer. A line appears between these two circles.

    8. Connect the **Embeddings** output from the **OpenAI Embeddings** component to the **Embedding or Astra Vectorize** input in the  **Astra DB Ingest** component.

       <img src="https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/astra-db-component.png?fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=5ce7cda4d2ea04c9c910540880b04afd" alt="Astra DB component" data-og-width="641" width="641" data-og-height="662" height="662" data-path="img/langflow/astra-db-component.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/astra-db-component.png?w=280&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=029bf986244b66fc9047ba7a50ec72da 280w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/astra-db-component.png?w=560&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=acb001e7a4d7513f1077e7d8321ae505 560w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/astra-db-component.png?w=840&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=e18d697c90a0825a708ed5548c066f87 840w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/astra-db-component.png?w=1100&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=b96ae14858773aef30e72475c085f0c3 1100w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/astra-db-component.png?w=1650&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=28e057a6635aecaf819d276fb9d60a1a 1650w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/astra-db-component.png?w=2500&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=7790fbd32e87bb6d98af782d169094d7 2500w" />

    9. Wait until **Saved** appears in the top navigation bar.

    10. In the title bar of the **Astra DB Ingest** component, click the play icon. This ingests the processed data
        from Unstructured and the associated generated vector embeddings into the specified Astra DB collection.

        <img src="https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/build.png?fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=6bd04369d6ab5bb9acbd2edbfadfa18e" alt="Play icon" data-og-width="304" width="304" data-og-height="238" height="238" data-path="img/langflow/build.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/build.png?w=280&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=dcf55c3968bb840f9cf7c11d1e2291aa 280w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/build.png?w=560&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=f2d6fe2216cd4f17e2956e65b59b5875 560w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/build.png?w=840&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=77b15a34ac2665f1fb5bd1ef3d1e47ce 840w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/build.png?w=1100&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=ef144153a824af1f5a7a6eb8d4f8d681 1100w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/build.png?w=1650&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=c275fcb8f78f3ffc253891714004aee1 1650w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/build.png?w=2500&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=c9a578853cab0252f7427067b00ea3a1 2500w" />

    11. Wait until **Building** disppears from the top navigation bar, and a green check mark appears next to this play icon. This could take several minutes.

        <Note>
          Each time you click the play icon in the **Astra DB Ingest** component, Unstructured reprocesses the specified local
          file. If this file does not change, this could result in multiple duplicate records
          being inserted into the specified Astra DB collection. You should only click the play icon in the **Astra DB Ingest** component when you want to insert new processed data into
          the specified Astra DB collection.
        </Note>
  </Step>

  <Step title="Add the Chat Input component">
    In this step, you add a component that takes user-supplied chat messages and sends them as input to Astra DB for contextual searching.

    1. On the sidebar, expand **Inputs**, and then drag the **Chat Input** component onto the designer area.

    2. Connect the **Message** output from the **Chat Input** component to the **Search Input** input in the **Astra DB RAG** component.

       <img src="https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/chat-input-component.png?fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=2d30e0b85062cbcc258949a1eb17becc" alt="Chat Input component" data-og-width="612" width="612" data-og-height="555" height="555" data-path="img/langflow/chat-input-component.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/chat-input-component.png?w=280&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=c9284a018b6323f4290693868ebf4539 280w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/chat-input-component.png?w=560&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=13a39f15ad42a65cc06d464ba9a7695a 560w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/chat-input-component.png?w=840&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=d327bc420919de51ae117ed2731f0e3c 840w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/chat-input-component.png?w=1100&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=a655ea34fd605b24293dba97bc75e046 1100w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/chat-input-component.png?w=1650&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=4999760c72c26e5caeebbc6310f0303c 1650w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/chat-input-component.png?w=2500&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=4f882ed9379a03a0b4be24a38f187678 2500w" />

    3. Wait until **Saved** appears in the top navigation bar.
  </Step>

  <Step title="Add the Parse Data component">
    In this step, you add a component that takes the Astra DB search results and converts them into plain text, suitable for inclusion in
    a prompt to a text-based LLM.

    1. On the sidebar, expand **Helpers**, and then drag the **Parse Data** component onto the designer area.

    2. Connect the **Search Results** output from the **Astra DB RAG** component to the **Data** input in the **Parse Data** component.

       <img src="https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/parse-data-component.png?fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=aee3ceade9844f8a8f79114451b4efff" alt="Parse Data component" data-og-width="631" width="631" data-og-height="558" height="558" data-path="img/langflow/parse-data-component.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/parse-data-component.png?w=280&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=4d4c76bcc18a5c173feed2b30b700d11 280w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/parse-data-component.png?w=560&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=53149d30f1774f40c349b5e8934c69c0 560w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/parse-data-component.png?w=840&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=ba2f80469e71f0f7097dedb95b9c8351 840w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/parse-data-component.png?w=1100&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=8e16926076d1e43fe7c05979ac8ab8ff 1100w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/parse-data-component.png?w=1650&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=400d970fcdabe603b0bcb84b0dec9ec9 1650w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/parse-data-component.png?w=2500&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=1ba8e2431472250ad9f2c61a97f832b8 2500w" />

    3. Wait until **Saved** appears in the top navigation bar.
  </Step>

  <Step title="Add the Prompt component">
    In this step, you add a component that builds a prompt and then sends it to a text-based LLM.

    1. On the sidebar, expand **Prompts**, and then drag the **Prompt** component onto the designer area.

    2. In the **Prompt** component, next to **Template**, click the box or arrow icon.

    3. In the **Edit Prompt** window, enter the following prompt:

       ```text  theme={null}
       {context}

       ---

       Given the context above, answer the question as best as possible.

       Question: {question}

       Answer: 
       ```

       <Tip>
         To answer the question, the preceding prompt uses the context along with general information that the text-based LLM is
         trained on. To use only the context to answer the question, you can change the prompt, for example to something like this:

         ```text  theme={null}
         {context}

         ---

         Given the context above, answer the question as best as possible. Use only the context to answer the question. Do not use 
         any other sources of information. If the context does not provide enough information to answer the question, reply with 
         'I do not have enough context to answer the question.'

         Question: {question}

         Answer:
         ```
       </Tip>

    4. Click **Check & Save**.

       <img src="https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/edit-prompt.png?fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=6fb35fff77621f8a58ee36d7ad8e6918" alt="Prompt component" data-og-width="668" width="668" data-og-height="478" height="478" data-path="img/langflow/edit-prompt.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/edit-prompt.png?w=280&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=ff1a2f937929c1e200da935f9df84cfb 280w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/edit-prompt.png?w=560&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=848d15a2e48144bf23619bd3741e982f 560w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/edit-prompt.png?w=840&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=cadf0dcf8c220fbdf874c27e4e0b1774 840w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/edit-prompt.png?w=1100&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=3df4ca6f04cbae6b61794bb526c589e1 1100w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/edit-prompt.png?w=1650&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=6b53a1bea4b64ab5ba19e42ec654cde9 1650w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/edit-prompt.png?w=2500&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=af93f60a7ea0f9408263e8615d7bd0c0 2500w" />

    5. Connect the **Text** output from the **Parse Data** component to the **context** input in the **Prompt** component.

       <img src="https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/connect-prompt-component.png?fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=b12062a4f475e7e4d8eb690eac7e0b09" alt="Connect Prompt component" data-og-width="570" width="570" data-og-height="346" height="346" data-path="img/langflow/connect-prompt-component.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/connect-prompt-component.png?w=280&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=9fd01c465685dc7aa68d60d2004ea442 280w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/connect-prompt-component.png?w=560&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=97e7f16d5a33337d6925cc70e54ce20e 560w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/connect-prompt-component.png?w=840&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=7ebc17f173b13993ad038cce38f5fae3 840w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/connect-prompt-component.png?w=1100&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=abd799c03c311fe4ae78b9f34dce8590 1100w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/connect-prompt-component.png?w=1650&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=701a24419153a2733377faa46409b56a 1650w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/connect-prompt-component.png?w=2500&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=6a78b4faeae35296158e78deb9e692ed 2500w" />

    6. Connect the **Message** output from the **Chat Input** component to the **question** input in the **Prompt** component.

       <Note>
         You will now have two connections from the **Message** output in the **Chat Input** component:

         * One connection was already made to the **Search Input** input in the **Astra DB RAG** component.
         * Another connection has just now been made to the **question** input in the **Prompt** component.
       </Note>

    7. Wait until **Saved** appears in the top navigation bar.
  </Step>

  <Step title="Add the OpenAI component">
    In this step, you create a component that sends a prompt to a text-based LLM and outputs the LLM's response.

    1. On the sidebar, expand **Models**, and then drag the **OpenAI** component onto the designer area.

    2. In the **Model Name** list, select **gpt-4o-mini**.

    3. For **OpenAI API Key**, enter your OpenAI API key's value.

    4. For **Temperature**, enter `0.1`.

    5. Connect the **Prompt Message** output from the **Prompt** component to the **Input** input in the **OpenAI** component.

       <img src="https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/openai-component.png?fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=01aaa221efac51f332fd15d67ad514ed" alt="OpenAI component" data-og-width="637" width="637" data-og-height="445" height="445" data-path="img/langflow/openai-component.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/openai-component.png?w=280&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=d6cdd28db68fd6308796fb6b6c522572 280w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/openai-component.png?w=560&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=d3a7ea38a2b7fff8e78159f834f28de3 560w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/openai-component.png?w=840&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=630e9932e82212eb09209fa918ac0ca1 840w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/openai-component.png?w=1100&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=d49793a908f2e04f3ede62817fc1d2b3 1100w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/openai-component.png?w=1650&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=2afd4d229f0be16f05888cd05673a8f2 1650w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/openai-component.png?w=2500&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=16df000fcba23aa59232dec0cd6e58ed 2500w" />

    6. Wait until **Saved** appears in the top navigation bar.
  </Step>

  <Step title="Add the Chat Output component">
    In this step, you create a component that returns the answer to the user's original chat message.

    1. On the sidebar, expand **Outputs**, and then drag the **Chat Output** component onto the designer area.

    2. Connect the **Text** output from the **OpenAI** component to the **Text** input in the **Chat Output** component.

       <img src="https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/chat-output-component.png?fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=8b65e1dcb81c0c1d65735f8f7817f53e" alt="Chat Output component" data-og-width="563" width="563" data-og-height="403" height="403" data-path="img/langflow/chat-output-component.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/chat-output-component.png?w=280&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=423c8059a05b4511e03d4a6cef4c5b3d 280w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/chat-output-component.png?w=560&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=0978a1ce48ba8d7e341fca34807fcd3b 560w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/chat-output-component.png?w=840&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=10ff261c061e0b7e17f21251604b3cc7 840w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/chat-output-component.png?w=1100&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=f4f2cf4be656460bdb1739aee426ca78 1100w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/chat-output-component.png?w=1650&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=73ff59bfcbc4ab8e66a36ead62b86c0e 1650w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/chat-output-component.png?w=2500&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=a2ced3bf796befc226cdfdb7ae6ad14d 2500w" />

    3. Wait until **Saved** appears in the top navigation bar.

    The final project should look like this:

        <img src="https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/final-project.png?fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=de1291d825959b4372c735d72a2b44e2" alt="Final project results" data-og-width="1281" width="1281" data-og-height="821" height="821" data-path="img/langflow/final-project.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/final-project.png?w=280&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=3ebb819762d5e0518dcb27f9497aa5eb 280w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/final-project.png?w=560&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=eda7ceb544c4c864ffc6d21511eb1e26 560w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/final-project.png?w=840&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=131300deec714573611ed0141395147a 840w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/final-project.png?w=1100&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=0bb2fcb6c075cb5eb4fdfebe22bc5c32 1100w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/final-project.png?w=1650&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=45e8e8afd082d6fc6a2d88336cd1519e 1650w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/final-project.png?w=2500&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=d52323f672344dd09f355c7dbacfac6e 2500w" />
  </Step>

  <Step title="Run the project">
    1. In the designer area, click **Playground**.

       <img src="https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/open-playground.png?fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=c2e028f53b624a3796ef9aca63a15482" alt="Open Playground button" data-og-width="248" width="248" data-og-height="121" height="121" data-path="img/langflow/open-playground.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/open-playground.png?w=280&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=38d917b5cae796c3b86beeb8651b804c 280w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/open-playground.png?w=560&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=6e5f8397d7c1187217ef27fcec9e7ca4 560w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/open-playground.png?w=840&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=24f604cbe6848d68e11592f54b6ef714 840w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/open-playground.png?w=1100&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=c6df8b9f561a989a56b787e1fcc73e35 1100w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/open-playground.png?w=1650&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=d93f6a23090ea8eca52fe574d0b9cba1 1650w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/open-playground.png?w=2500&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=a1a2a65eeba16231583d820c05b44093 2500w" />

    2. Enter a question into the chat box, for example, `What rights does the fifth amendment guarantee?` Then press the send button.

       <img src="https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/playground.png?fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=3fc1e1a430ab4bf1d0354df333a2d939" alt="Playground window" data-og-width="927" width="927" data-og-height="488" height="488" data-path="img/langflow/playground.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/playground.png?w=280&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=535a1caa67389b4e306dfebe3eeb2af3 280w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/playground.png?w=560&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=2b5366a9bc31d529fc3e5e432d1fb910 560w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/playground.png?w=840&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=c71b744f4a523b9979c6aae6ca770d76 840w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/playground.png?w=1100&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=3a63f491a7fa7b5ed5086850e6feef85 1100w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/playground.png?w=1650&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=0e62204226159ad49b942153514ca6f1 1650w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/langflow/playground.png?w=2500&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=b72180ce106cc2ce7e7d340059969ef4 2500w" />

    3. Wait until the answer appears.

    4. Ask as many additional questions as you want to.
  </Step>
</Steps>

## Next steps

Now that you have your pipeline set up, here are just a few ways that you could modify it to support different requirements,
such as processing multiple files or using a different vector store.

### Process multiple files

In this demonstration, you pass to Unstructured a single local file. To pass multiple local or
non-local files to Unstructured instead, you can use the
[Unstructured UI](/ui/overview) or the [Unstructured API](/api-reference/overview) or
[Unstructured Ingest](/open-source/ingestion/overview) outside of Langflow.

To do this, you can:

* [Use the Unstructured UI to create a workflow](/ui/quickstart) that relies on any available
  [source connector](/ui/sources/overview) to connect to
  [Astra DB](/ui/destinations/astradb). Run this workflow outside of Langflow anytime you have new documents in that source location that
  you want Unstructured to process and then insert the new processed data into Astra DB. Then, back in the Langflow project,
  use the **Playground** to ask additional questions, which will now include the new data when generating answers.

* [Use Unstructured Ingest to create a pipeline](/open-source/ingestion/overview) that relies on any available
  [source connector](/open-source/ingestion/source-connectors/overview) to connect to
  [Astra DB](/open-source/ingestion/destination-connectors/astradb). Run this pipeline outside of Langflow anytime you have new documents in that non-local source location that
  you want Unstructured to process and then insert the new processed data into Astra DB. Then, back in the Langflow project,
  use the **Playground** to ask additonal questions, which will now include the new data when generating answers.

### Use a different vector store

In this demonstration, you use Astra DB as the vector store. Langflow and Unstructured support several vector stores in addition to Astra DB.

To do this, you can:

[Use the Unstructured UI to create a workflow](/ui/quickstart) that relies on any available
[source connector](/ui/sources/overview) to connect to
one of the following available vector stores that Langflow also supports:

* [Milvus](/ui/destinations/milvus)
* [MongoDB](/ui/destinations/mongodb)
* [Pinecone](/ui/destinations/pinecone)

Run this workflow outside of Langflow anytime you have new documents in the source location that
you want Unstructured to process and then insert the new processed data into the vector store. Then, back in the Langflow project,
swap out the **Astra DB RAG** component for the corresponding **Vector Stores** component that matches the new vector
store's name. Configure the new component, and then
use the **Playground** to ask additional questions, which will now use the new vector store when generating answers.

Or, [use Unstructured Ingest to create a pipeline](/open-source/ingestion/overview) that relies on any available
[source connector](/open-source/ingestion/source-connectors/overview) to connect to
one of the following available vector stores that Langflow also supports:

* [Chroma DB](/open-source/ingestion/destination-connectors/chroma)
* [Couchbase](/open-source/ingestion/destination-connectors/couchbase)
* [Elasticsearch](/open-source/ingestion/destination-connectors/elasticsearch)
* [Milvus](/open-source/ingestion/destination-connectors/milvus)
* [MongoDB](/open-source/ingestion/destination-connectors/mongodb)
* [OpenSearch](/open-source/ingestion/destination-connectors/opensearch)
* [Pinecone](/open-source/ingestion/destination-connectors/pinecone)
* [Qdrant](/open-source/ingestion/destination-connectors/qdrant)
* [Vectara](/open-source/ingestion/destination-connectors/vectara)
* [Weaviate](/open-source/ingestion/destination-connectors/weaviate)

Run this pipeline outside of Langflow anytime you have new documents in the source location that
you want Unstructured to process and then insert the new processed data into the vector store. Then, back in the Langflow project,
swap out the **Astra DB RAG** component for the corresponding **Vector Stores** component that matches the new vector
store's name. Configure the new component, and then
use the **Playground** to ask additional questions, which will now use the new vector store when generating answers.

## Learn more

* See the [Langflow documentation](https://docs.langflow.org/).
* <Icon icon="blog" />  [No-Code AI Assistant in No Time with Unstructured Platform, AstraDB, and Langflow](https://unstructured.io/blog/no-code-ai-assistant-in-no-time-with-unstructured-platform-astradb-and-langflow)

---

# Source: https://docs.unstructured.io/api-reference/sources/list-available-source-connectors.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# List available source connectors

> Retrieve a list of available source connectors.


## OpenAPI

````yaml https://platform.unstructuredapp.io/openapi.json get /api/v1/sources/
openapi: 3.1.0
info:
  title: Platform API
  version: 3.1.0
servers:
  - url: https://platform.unstructuredapp.io/
    description: Unstructured Platform API
    x-speakeasy-server-id: platform-api
security: []
paths:
  /api/v1/sources/:
    get:
      tags:
        - sources
      summary: List available source connectors
      description: Retrieve a list of available source connectors.
      operationId: list_sources
      parameters:
        - name: source_type
          in: query
          required: false
          schema:
            anyOf:
              - $ref: '#/components/schemas/SourceConnectorType'
              - type: 'null'
            title: Source Type
        - name: unstructured-api-key
          in: header
          required: false
          schema:
            anyOf:
              - type: string
              - type: 'null'
            title: Unstructured-Api-Key
      responses:
        '200':
          description: Successful Response
          content:
            application/json:
              schema:
                type: array
                items:
                  $ref: '#/components/schemas/SourceConnectorInformation'
                title: Response List Sources
        '422':
          description: Validation Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/HTTPValidationError'
components:
  schemas:
    SourceConnectorType:
      type: string
      enum:
        - azure
        - box
        - confluence
        - couchbase
        - databricks_volumes
        - dropbox
        - elasticsearch
        - gcs
        - google_drive
        - kafka-cloud
        - mongodb
        - onedrive
        - opensearch
        - outlook
        - postgres
        - s3
        - salesforce
        - sharepoint
        - slack
        - snowflake
        - teradata
        - jira
        - zendesk
      title: SourceConnectorType
    SourceConnectorInformation:
      properties:
        id:
          type: string
          format: uuid
          title: Id
        name:
          type: string
          title: Name
        type:
          anyOf:
            - $ref: '#/components/schemas/SourceConnectorType'
            - type: string
          title: Type
        config:
          anyOf:
            - $ref: '#/components/schemas/AzureSourceConnectorConfig'
            - $ref: '#/components/schemas/BoxSourceConnectorConfig'
            - $ref: '#/components/schemas/ConfluenceSourceConnectorConfig'
            - $ref: '#/components/schemas/CouchbaseSourceConnectorConfig'
            - $ref: '#/components/schemas/DatabricksVolumesConnectorConfig'
            - $ref: '#/components/schemas/DropboxSourceConnectorConfig'
            - $ref: '#/components/schemas/ElasticsearchConnectorConfig'
            - $ref: '#/components/schemas/GCSSourceConnectorConfig'
            - $ref: '#/components/schemas/GoogleDriveSourceConnectorConfig'
            - $ref: '#/components/schemas/KafkaCloudSourceConnectorConfig'
            - $ref: '#/components/schemas/MongoDBConnectorConfig'
            - $ref: '#/components/schemas/OneDriveSourceConnectorConfig'
            - $ref: '#/components/schemas/OpenSearchConnectorConfig'
            - $ref: '#/components/schemas/OutlookSourceConnectorConfig'
            - $ref: '#/components/schemas/PostgresSourceConnectorConfig'
            - $ref: '#/components/schemas/S3SourceConnectorConfig'
            - $ref: '#/components/schemas/SalesforceSourceConnectorConfig'
            - $ref: '#/components/schemas/SharePointSourceConnectorConfig'
            - $ref: '#/components/schemas/SnowflakeSourceConnectorConfig'
            - $ref: '#/components/schemas/TeradataSourceConnectorConfig'
            - $ref: '#/components/schemas/JiraSourceConnectorConfig'
            - $ref: '#/components/schemas/ZendeskSourceConnectorConfig'
            - additionalProperties: true
              type: object
          title: Config
        created_at:
          type: string
          format: date-time
          title: Created At
        updated_at:
          anyOf:
            - type: string
              format: date-time
            - type: 'null'
          title: Updated At
      type: object
      required:
        - id
        - name
        - type
        - config
        - created_at
      title: SourceConnectorInformation
    HTTPValidationError:
      properties:
        detail:
          items:
            $ref: '#/components/schemas/ValidationError'
          type: array
          title: Detail
      type: object
      title: HTTPValidationError
    AzureSourceConnectorConfig:
      properties:
        remote_url:
          type: string
          title: Remote Url
        account_name:
          anyOf:
            - type: string
            - type: 'null'
          title: Account Name
        account_key:
          anyOf:
            - type: string
            - type: 'null'
          title: Account Key
        connection_string:
          anyOf:
            - type: string
            - type: 'null'
          title: Connection String
        sas_token:
          anyOf:
            - type: string
            - type: 'null'
          title: Sas Token
        recursive:
          type: boolean
          title: Recursive
      type: object
      required:
        - remote_url
        - recursive
      title: AzureSourceConnectorConfig
    BoxSourceConnectorConfig:
      properties:
        box_app_config:
          type: string
          title: Box App Config
        recursive:
          type: boolean
          title: Recursive
      type: object
      required:
        - box_app_config
        - recursive
      title: BoxSourceConnectorConfig
    ConfluenceSourceConnectorConfig:
      properties:
        url:
          type: string
          title: Url
        username:
          type: string
          title: Username
        password:
          anyOf:
            - type: string
            - type: 'null'
          title: Password
        api_token:
          anyOf:
            - type: string
            - type: 'null'
          title: Api Token
        token:
          anyOf:
            - type: string
            - type: 'null'
          title: Token
        cloud:
          type: boolean
          title: Cloud
        extract_images:
          type: boolean
          title: Extract Images
          default: false
        extract_files:
          type: boolean
          title: Extract Files
          default: false
        max_num_of_spaces:
          type: integer
          title: Max Num Of Spaces
        max_num_of_docs_from_each_space:
          type: integer
          title: Max Num Of Docs From Each Space
        spaces:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Spaces
      type: object
      required:
        - url
        - username
        - cloud
        - max_num_of_spaces
        - max_num_of_docs_from_each_space
        - spaces
      title: ConfluenceSourceConnectorConfig
    CouchbaseSourceConnectorConfig:
      properties:
        bucket:
          type: string
          title: Bucket
        connection_string:
          type: string
          title: Connection String
        scope:
          anyOf:
            - type: string
            - type: 'null'
          title: Scope
        collection:
          anyOf:
            - type: string
            - type: 'null'
          title: Collection
        batch_size:
          type: integer
          title: Batch Size
        username:
          type: string
          title: Username
        password:
          type: string
          title: Password
        collection_id:
          type: string
          title: Collection Id
      type: object
      required:
        - bucket
        - connection_string
        - batch_size
        - username
        - password
        - collection_id
      title: CouchbaseSourceConnectorConfig
    DatabricksVolumesConnectorConfig:
      properties:
        host:
          type: string
          title: Host
        catalog:
          type: string
          title: Catalog
        schema:
          type: string
          title: Schema
          default: default
        volume:
          type: string
          title: Volume
        volume_path:
          type: string
          title: Volume Path
        client_secret:
          type: string
          title: Client Secret
        client_id:
          type: string
          title: Client Id
      type: object
      required:
        - host
        - catalog
        - volume
        - volume_path
        - client_secret
        - client_id
      title: DatabricksVolumesConnectorConfig
    DropboxSourceConnectorConfig:
      properties:
        token:
          type: string
          title: Token
        remote_url:
          type: string
          title: Remote Url
        recursive:
          type: boolean
          title: Recursive
      type: object
      required:
        - token
        - remote_url
        - recursive
      title: DropboxSourceConnectorConfig
    ElasticsearchConnectorConfig:
      properties:
        hosts:
          items:
            type: string
          type: array
          title: Hosts
        index_name:
          type: string
          title: Index Name
        es_api_key:
          type: string
          title: Es Api Key
      type: object
      required:
        - hosts
        - index_name
        - es_api_key
      title: ElasticsearchConnectorConfig
    GCSSourceConnectorConfig:
      properties:
        remote_url:
          type: string
          title: Remote Url
        service_account_key:
          type: string
          title: Service Account Key
        recursive:
          type: boolean
          title: Recursive
      type: object
      required:
        - remote_url
        - service_account_key
        - recursive
      title: GCSSourceConnectorConfig
    GoogleDriveSourceConnectorConfig:
      properties:
        drive_id:
          type: string
          title: Drive Id
        service_account_key:
          anyOf:
            - $ref: '#/components/schemas/SecretReference'
            - type: string
          title: Service Account Key
        extensions:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Extensions
        recursive:
          type: boolean
          title: Recursive
      type: object
      required:
        - drive_id
        - service_account_key
        - recursive
      title: GoogleDriveSourceConnectorConfig
    KafkaCloudSourceConnectorConfig:
      properties:
        bootstrap_servers:
          type: string
          title: Bootstrap Servers
        port:
          type: integer
          title: Port
        group_id:
          anyOf:
            - type: string
            - type: 'null'
          title: Group Id
        topic:
          type: string
          title: Topic
        kafka_api_key:
          type: string
          title: Kafka Api Key
        secret:
          type: string
          title: Secret
        num_messages_to_consume:
          type: integer
          title: Num Messages To Consume
      type: object
      required:
        - bootstrap_servers
        - port
        - topic
        - kafka_api_key
        - secret
        - num_messages_to_consume
      title: KafkaCloudSourceConnectorConfig
    MongoDBConnectorConfig:
      properties:
        database:
          type: string
          title: Database
        collection:
          type: string
          title: Collection
        uri:
          type: string
          title: Uri
      type: object
      required:
        - database
        - collection
        - uri
      title: MongoDBConnectorConfig
    OneDriveSourceConnectorConfig:
      properties:
        client_id:
          type: string
          title: Client Id
        user_pname:
          type: string
          title: User Pname
        tenant:
          type: string
          title: Tenant
        authority_url:
          type: string
          title: Authority Url
        client_cred:
          type: string
          title: Client Cred
        recursive:
          type: boolean
          title: Recursive
        path:
          type: string
          title: Path
      type: object
      required:
        - client_id
        - user_pname
        - tenant
        - authority_url
        - client_cred
        - recursive
        - path
      title: OneDriveSourceConnectorConfig
    OpenSearchConnectorConfig:
      properties:
        hosts:
          items:
            type: string
          type: array
          minItems: 1
          title: Hosts
          description: List of OpenSearch hosts to connect to
        index_name:
          type: string
          title: Index Name
          description: Name of the OpenSearch index to read from or write to
        username:
          anyOf:
            - type: string
            - type: 'null'
          title: Username
          description: Username for basic authentication
        password:
          anyOf:
            - type: string
            - type: 'null'
          title: Password
          description: Password for basic authentication
        aws_access_key_id:
          anyOf:
            - type: string
            - type: 'null'
          title: Aws Access Key Id
          description: >-
            AWS access key ID for IAM authentication. When provided with
            aws_secret_access_key, IAM authentication is used instead of basic
            auth. Region and service type are auto-detected from the host URL.
        aws_secret_access_key:
          anyOf:
            - type: string
            - type: 'null'
          title: Aws Secret Access Key
          description: >-
            AWS secret access key for IAM authentication. Required when
            aws_access_key_id is provided.
        aws_session_token:
          anyOf:
            - type: string
            - type: 'null'
          title: Aws Session Token
          description: >-
            AWS session token for temporary credentials (optional). Only used
            when aws_access_key_id and aws_secret_access_key are provided.
        use_ssl:
          anyOf:
            - type: boolean
            - type: 'null'
          title: Use Ssl
          description: Whether to use SSL for the connection
          default: true
      type: object
      required:
        - hosts
        - index_name
      title: OpenSearchConnectorConfig
      description: >-
        OpenSearch connector configuration.


        OpenSearch is a fork of Elasticsearch with similar functionality.

        Supports two authentication methods:

        1. Basic auth: username + password

        2. AWS IAM auth: aws_access_key_id + aws_secret_access_key (+ optional
        aws_session_token)


        For AWS OpenSearch Service or OpenSearch Serverless, region and service
        type

        are auto-detected from the host URL.
    OutlookSourceConnectorConfig:
      properties:
        authority_url:
          anyOf:
            - type: string
            - type: 'null'
          title: Authority Url
        tenant:
          anyOf:
            - type: string
            - type: 'null'
          title: Tenant
        client_id:
          type: string
          title: Client Id
        client_cred:
          type: string
          title: Client Cred
        outlook_folders:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Outlook Folders
        recursive:
          type: boolean
          title: Recursive
        user_email:
          type: string
          title: User Email
      type: object
      required:
        - client_id
        - client_cred
        - recursive
        - user_email
      title: OutlookSourceConnectorConfig
    PostgresSourceConnectorConfig:
      properties:
        host:
          type: string
          title: Host
        database:
          type: string
          title: Database
        port:
          type: integer
          title: Port
        username:
          type: string
          title: Username
        password:
          type: string
          title: Password
        table_name:
          type: string
          title: Table Name
        batch_size:
          type: integer
          title: Batch Size
        id_column:
          type: string
          title: Id Column
        fields:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Fields
      type: object
      required:
        - host
        - database
        - port
        - username
        - password
        - table_name
        - batch_size
        - id_column
      title: PostgresSourceConnectorConfig
    S3SourceConnectorConfig:
      properties:
        remote_url:
          type: string
          title: Remote Url
        anonymous:
          type: boolean
          title: Anonymous
        key:
          anyOf:
            - type: string
            - type: 'null'
          title: Key
        secret:
          anyOf:
            - type: string
            - type: 'null'
          title: Secret
        token:
          anyOf:
            - type: string
            - type: 'null'
          title: Token
        endpoint_url:
          anyOf:
            - type: string
            - type: 'null'
          title: Endpoint Url
        recursive:
          type: boolean
          title: Recursive
      type: object
      required:
        - remote_url
        - anonymous
        - recursive
      title: S3SourceConnectorConfig
    SalesforceSourceConnectorConfig:
      properties:
        username:
          type: string
          title: Username
        consumer_key:
          type: string
          title: Consumer Key
        private_key:
          type: string
          title: Private Key
        categories:
          items:
            type: string
          type: array
          title: Categories
      type: object
      required:
        - username
        - consumer_key
        - private_key
        - categories
      title: SalesforceSourceConnectorConfig
    SharePointSourceConnectorConfig:
      properties:
        site:
          type: string
          title: Site
        tenant:
          type: string
          title: Tenant
        authority_url:
          type: string
          title: Authority Url
          default: https://login.microsoftonline.com
        user_pname:
          type: string
          title: User Pname
        client_id:
          type: string
          title: Client Id
        client_cred:
          type: string
          title: Client Cred
        recursive:
          type: boolean
          title: Recursive
        path:
          anyOf:
            - type: string
            - type: 'null'
          title: Path
      type: object
      required:
        - site
        - tenant
        - user_pname
        - client_id
        - client_cred
        - recursive
      title: SharePointSourceConnectorConfig
    SnowflakeSourceConnectorConfig:
      properties:
        account:
          type: string
          title: Account
        role:
          type: string
          title: Role
        user:
          type: string
          title: User
        password:
          type: string
          title: Password
        host:
          type: string
          title: Host
        port:
          type: integer
          minimum: 1
          title: Port
          default: 443
        database:
          type: string
          title: Database
        schema:
          type: string
          title: Schema
        table_name:
          type: string
          title: Table Name
        batch_size:
          type: integer
          minimum: 1
          title: Batch Size
          default: 100
        id_column:
          type: string
          title: Id Column
        fields:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Fields
      type: object
      required:
        - account
        - role
        - user
        - password
        - host
        - database
        - schema
        - table_name
        - id_column
      title: SnowflakeSourceConnectorConfig
    TeradataSourceConnectorConfig:
      properties:
        host:
          type: string
          title: Host
        user:
          type: string
          title: User
        password:
          type: string
          title: Password
        database:
          anyOf:
            - type: string
            - type: 'null'
          title: Database
        dbs_port:
          type: integer
          minimum: 1
          title: Dbs Port
          default: 1025
        id_column:
          type: string
          title: Id Column
          default: id
        table_name:
          type: string
          title: Table Name
        batch_size:
          type: integer
          minimum: 1
          title: Batch Size
          default: 100
        fields:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Fields
      type: object
      required:
        - host
        - user
        - password
        - table_name
      title: TeradataSourceConnectorConfig
    JiraSourceConnectorConfig:
      properties:
        url:
          type: string
          title: Url
        username:
          type: string
          title: Username
        password:
          anyOf:
            - type: string
            - type: 'null'
          title: Password
        token:
          anyOf:
            - type: string
            - type: 'null'
          title: Token
        cloud:
          anyOf:
            - type: boolean
            - type: 'null'
          title: Cloud
          default: false
        projects:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Projects
        boards:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Boards
        issues:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Issues
        status_filters:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Status Filters
        download_attachments:
          anyOf:
            - type: boolean
            - type: 'null'
          title: Download Attachments
          default: false
      type: object
      required:
        - url
        - username
      title: JiraSourceConnectorConfig
    ZendeskSourceConnectorConfig:
      properties:
        subdomain:
          type: string
          title: Subdomain
        email:
          type: string
          title: Email
        api_token:
          type: string
          title: Api Token
        item_type:
          anyOf:
            - type: string
            - type: 'null'
          title: Item Type
          default: tickets
        batch_size:
          anyOf:
            - type: integer
            - type: 'null'
          title: Batch Size
          default: 2
      type: object
      required:
        - subdomain
        - email
        - api_token
      title: ZendeskSourceConnectorConfig
    ValidationError:
      properties:
        loc:
          items:
            anyOf:
              - type: string
              - type: integer
          type: array
          title: Location
        msg:
          type: string
          title: Message
        type:
          type: string
          title: Error Type
      type: object
      required:
        - loc
        - msg
        - type
      title: ValidationError
    SecretReference:
      properties:
        id:
          type: string
          title: Id
        type:
          $ref: '#/components/schemas/EncryptionType'
          default: rsa
      type: object
      required:
        - id
      title: SecretReference
    EncryptionType:
      type: string
      enum:
        - rsa
        - rsa_aes
      title: EncryptionType

````

---

# Source: https://docs.unstructured.io/api-reference/destinations/list-destination-connectors.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# List destination connectors

> Retrieve a list of available destination connectors.


## OpenAPI

````yaml https://platform.unstructuredapp.io/openapi.json get /api/v1/destinations/
openapi: 3.1.0
info:
  title: Platform API
  version: 3.1.0
servers:
  - url: https://platform.unstructuredapp.io/
    description: Unstructured Platform API
    x-speakeasy-server-id: platform-api
security: []
paths:
  /api/v1/destinations/:
    get:
      tags:
        - destinations
      summary: List destination connectors
      description: Retrieve a list of available destination connectors.
      operationId: list_destinations
      parameters:
        - name: destination_type
          in: query
          required: false
          schema:
            anyOf:
              - $ref: '#/components/schemas/DestinationConnectorType'
              - type: 'null'
            title: Destination Type
        - name: unstructured-api-key
          in: header
          required: false
          schema:
            anyOf:
              - type: string
              - type: 'null'
            title: Unstructured-Api-Key
      responses:
        '200':
          description: Successful Response
          content:
            application/json:
              schema:
                type: array
                items:
                  $ref: '#/components/schemas/DestinationConnectorInformation'
                title: Response List Destinations
        '422':
          description: Validation Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/HTTPValidationError'
components:
  schemas:
    DestinationConnectorType:
      type: string
      enum:
        - azure
        - astradb
        - azure_ai_search
        - couchbase
        - databricks_volumes
        - databricks_volume_delta_tables
        - delta_table
        - elasticsearch
        - gcs
        - kafka-cloud
        - milvus
        - mongodb
        - motherduck
        - neo4j
        - onedrive
        - opensearch
        - pinecone
        - postgres
        - redis
        - qdrant-cloud
        - s3
        - snowflake
        - teradata
        - weaviate-cloud
        - ibm_watsonx_s3
      title: DestinationConnectorType
    DestinationConnectorInformation:
      properties:
        id:
          type: string
          format: uuid
          title: Id
        name:
          type: string
          title: Name
        type:
          anyOf:
            - $ref: '#/components/schemas/DestinationConnectorType'
            - type: string
          title: Type
        config:
          anyOf:
            - $ref: '#/components/schemas/AzureDestinationConnectorConfig'
            - $ref: '#/components/schemas/AstraDBConnectorConfig'
            - $ref: '#/components/schemas/AzureAISearchConnectorConfig'
            - $ref: '#/components/schemas/CouchbaseDestinationConnectorConfig'
            - $ref: '#/components/schemas/DatabricksVolumesConnectorConfig'
            - $ref: '#/components/schemas/DatabricksVDTDestinationConnectorConfig'
            - $ref: '#/components/schemas/DeltaTableConnectorConfig'
            - $ref: '#/components/schemas/ElasticsearchConnectorConfig'
            - $ref: '#/components/schemas/GCSDestinationConnectorConfig'
            - $ref: '#/components/schemas/KafkaCloudDestinationConnectorConfig'
            - $ref: '#/components/schemas/MilvusDestinationConnectorConfig'
            - $ref: '#/components/schemas/MongoDBConnectorConfig'
            - $ref: '#/components/schemas/Neo4jDestinationConnectorConfig'
            - $ref: '#/components/schemas/OneDriveDestinationConnectorConfig'
            - $ref: '#/components/schemas/OpenSearchConnectorConfig'
            - $ref: '#/components/schemas/PineconeDestinationConnectorConfig'
            - $ref: '#/components/schemas/PostgresDestinationConnectorConfig'
            - $ref: '#/components/schemas/RedisDestinationConnectorConfig'
            - $ref: '#/components/schemas/QdrantCloudDestinationConnectorConfig'
            - $ref: '#/components/schemas/S3DestinationConnectorConfig'
            - $ref: '#/components/schemas/SnowflakeDestinationConnectorConfig'
            - $ref: '#/components/schemas/TeradataDestinationConnectorConfig'
            - $ref: '#/components/schemas/WeaviateDestinationConnectorConfig'
            - $ref: '#/components/schemas/IBMWatsonxS3DestinationConnectorConfig'
            - additionalProperties: true
              type: object
          title: Config
        created_at:
          type: string
          format: date-time
          title: Created At
        updated_at:
          anyOf:
            - type: string
              format: date-time
            - type: 'null'
          title: Updated At
      type: object
      required:
        - id
        - name
        - type
        - config
        - created_at
      title: DestinationConnectorInformation
    HTTPValidationError:
      properties:
        detail:
          items:
            $ref: '#/components/schemas/ValidationError'
          type: array
          title: Detail
      type: object
      title: HTTPValidationError
    AzureDestinationConnectorConfig:
      properties:
        remote_url:
          type: string
          title: Remote Url
        account_name:
          anyOf:
            - type: string
            - type: 'null'
          title: Account Name
        account_key:
          anyOf:
            - type: string
            - type: 'null'
          title: Account Key
        connection_string:
          anyOf:
            - type: string
            - type: 'null'
          title: Connection String
        sas_token:
          anyOf:
            - type: string
            - type: 'null'
          title: Sas Token
      type: object
      required:
        - remote_url
      title: AzureDestinationConnectorConfig
    AstraDBConnectorConfig:
      properties:
        collection_name:
          type: string
          title: Collection Name
        keyspace:
          anyOf:
            - type: string
            - type: 'null'
          title: Keyspace
        batch_size:
          type: integer
          title: Batch Size
        binary_encode_vectors:
          type: boolean
          title: Binary Encode Vectors
          default: true
        api_endpoint:
          type: string
          title: Api Endpoint
        token:
          type: string
          title: Token
      additionalProperties: true
      type: object
      required:
        - collection_name
        - batch_size
        - api_endpoint
        - token
      title: AstraDBConnectorConfig
    AzureAISearchConnectorConfig:
      properties:
        endpoint:
          type: string
          title: Endpoint
        index:
          type: string
          title: Index
        key:
          type: string
          title: Key
      type: object
      required:
        - endpoint
        - index
        - key
      title: AzureAISearchConnectorConfig
    CouchbaseDestinationConnectorConfig:
      properties:
        bucket:
          type: string
          title: Bucket
        connection_string:
          type: string
          title: Connection String
        scope:
          anyOf:
            - type: string
            - type: 'null'
          title: Scope
        collection:
          anyOf:
            - type: string
            - type: 'null'
          title: Collection
        batch_size:
          type: integer
          title: Batch Size
        username:
          type: string
          title: Username
        password:
          type: string
          title: Password
      type: object
      required:
        - bucket
        - connection_string
        - batch_size
        - username
        - password
      title: CouchbaseDestinationConnectorConfig
    DatabricksVolumesConnectorConfig:
      properties:
        host:
          type: string
          title: Host
        catalog:
          type: string
          title: Catalog
        schema:
          type: string
          title: Schema
          default: default
        volume:
          type: string
          title: Volume
        volume_path:
          type: string
          title: Volume Path
        client_secret:
          type: string
          title: Client Secret
        client_id:
          type: string
          title: Client Id
      type: object
      required:
        - host
        - catalog
        - volume
        - volume_path
        - client_secret
        - client_id
      title: DatabricksVolumesConnectorConfig
    DatabricksVDTDestinationConnectorConfig:
      properties:
        server_hostname:
          type: string
          title: Server Hostname
        http_path:
          type: string
          title: Http Path
        token:
          anyOf:
            - type: string
            - type: 'null'
          title: Token
        client_id:
          anyOf:
            - type: string
            - type: 'null'
          title: Client Id
        client_secret:
          anyOf:
            - type: string
            - type: 'null'
          title: Client Secret
        catalog:
          type: string
          title: Catalog
        database:
          type: string
          title: Database
          default: default
        table_name:
          anyOf:
            - type: string
            - type: 'null'
          title: Table Name
        schema:
          type: string
          title: Schema
          default: default
        volume:
          type: string
          title: Volume
        volume_path:
          anyOf:
            - type: string
            - type: 'null'
          title: Volume Path
      type: object
      required:
        - server_hostname
        - http_path
        - catalog
        - volume
      title: DatabricksVDTDestinationConnectorConfig
    DeltaTableConnectorConfig:
      properties:
        aws_access_key_id:
          type: string
          title: Aws Access Key Id
        aws_secret_access_key:
          type: string
          title: Aws Secret Access Key
        aws_region:
          type: string
          title: Aws Region
        table_uri:
          type: string
          title: Table Uri
      type: object
      required:
        - aws_access_key_id
        - aws_secret_access_key
        - aws_region
        - table_uri
      title: DeltaTableConnectorConfig
    ElasticsearchConnectorConfig:
      properties:
        hosts:
          items:
            type: string
          type: array
          title: Hosts
        index_name:
          type: string
          title: Index Name
        es_api_key:
          type: string
          title: Es Api Key
      type: object
      required:
        - hosts
        - index_name
        - es_api_key
      title: ElasticsearchConnectorConfig
    GCSDestinationConnectorConfig:
      properties:
        remote_url:
          type: string
          title: Remote Url
        service_account_key:
          type: string
          title: Service Account Key
      type: object
      required:
        - remote_url
        - service_account_key
      title: GCSDestinationConnectorConfig
    KafkaCloudDestinationConnectorConfig:
      properties:
        bootstrap_servers:
          type: string
          title: Bootstrap Servers
        port:
          type: integer
          title: Port
        group_id:
          anyOf:
            - type: string
            - type: 'null'
          title: Group Id
        topic:
          type: string
          title: Topic
        kafka_api_key:
          type: string
          title: Kafka Api Key
        secret:
          type: string
          title: Secret
        batch_size:
          type: integer
          title: Batch Size
      type: object
      required:
        - bootstrap_servers
        - port
        - topic
        - kafka_api_key
        - secret
        - batch_size
      title: KafkaCloudDestinationConnectorConfig
    MilvusDestinationConnectorConfig:
      properties:
        uri:
          type: string
          title: Uri
        user:
          anyOf:
            - type: string
            - type: 'null'
          title: User
        token:
          anyOf:
            - type: string
            - type: 'null'
          title: Token
        password:
          anyOf:
            - type: string
            - type: 'null'
          title: Password
        db_name:
          anyOf:
            - type: string
            - type: 'null'
          title: Db Name
        collection_name:
          type: string
          title: Collection Name
        record_id_key:
          type: string
          title: Record Id Key
      type: object
      required:
        - uri
        - collection_name
        - record_id_key
      title: MilvusDestinationConnectorConfig
    MongoDBConnectorConfig:
      properties:
        database:
          type: string
          title: Database
        collection:
          type: string
          title: Collection
        uri:
          type: string
          title: Uri
      type: object
      required:
        - database
        - collection
        - uri
      title: MongoDBConnectorConfig
    Neo4jDestinationConnectorConfig:
      properties:
        uri:
          type: string
          title: Uri
        database:
          type: string
          title: Database
        username:
          type: string
          title: Username
        password:
          type: string
          title: Password
        batch_size:
          type: integer
          title: Batch Size
      type: object
      required:
        - uri
        - database
        - username
        - password
        - batch_size
      title: Neo4jDestinationConnectorConfig
    OneDriveDestinationConnectorConfig:
      properties:
        client_id:
          type: string
          title: Client Id
        user_pname:
          type: string
          title: User Pname
        tenant:
          type: string
          title: Tenant
        authority_url:
          type: string
          title: Authority Url
        client_cred:
          type: string
          title: Client Cred
        remote_url:
          type: string
          title: Remote Url
      type: object
      required:
        - client_id
        - user_pname
        - tenant
        - authority_url
        - client_cred
        - remote_url
      title: OneDriveDestinationConnectorConfig
    OpenSearchConnectorConfig:
      properties:
        hosts:
          items:
            type: string
          type: array
          minItems: 1
          title: Hosts
          description: List of OpenSearch hosts to connect to
        index_name:
          type: string
          title: Index Name
          description: Name of the OpenSearch index to read from or write to
        username:
          anyOf:
            - type: string
            - type: 'null'
          title: Username
          description: Username for basic authentication
        password:
          anyOf:
            - type: string
            - type: 'null'
          title: Password
          description: Password for basic authentication
        aws_access_key_id:
          anyOf:
            - type: string
            - type: 'null'
          title: Aws Access Key Id
          description: >-
            AWS access key ID for IAM authentication. When provided with
            aws_secret_access_key, IAM authentication is used instead of basic
            auth. Region and service type are auto-detected from the host URL.
        aws_secret_access_key:
          anyOf:
            - type: string
            - type: 'null'
          title: Aws Secret Access Key
          description: >-
            AWS secret access key for IAM authentication. Required when
            aws_access_key_id is provided.
        aws_session_token:
          anyOf:
            - type: string
            - type: 'null'
          title: Aws Session Token
          description: >-
            AWS session token for temporary credentials (optional). Only used
            when aws_access_key_id and aws_secret_access_key are provided.
        use_ssl:
          anyOf:
            - type: boolean
            - type: 'null'
          title: Use Ssl
          description: Whether to use SSL for the connection
          default: true
      type: object
      required:
        - hosts
        - index_name
      title: OpenSearchConnectorConfig
      description: >-
        OpenSearch connector configuration.


        OpenSearch is a fork of Elasticsearch with similar functionality.

        Supports two authentication methods:

        1. Basic auth: username + password

        2. AWS IAM auth: aws_access_key_id + aws_secret_access_key (+ optional
        aws_session_token)


        For AWS OpenSearch Service or OpenSearch Serverless, region and service
        type

        are auto-detected from the host URL.
    PineconeDestinationConnectorConfig:
      properties:
        index_name:
          type: string
          title: Index Name
        api_key:
          type: string
          title: Api Key
        namespace:
          type: string
          title: Namespace
        batch_size:
          type: integer
          title: Batch Size
      type: object
      required:
        - index_name
        - api_key
        - namespace
        - batch_size
      title: PineconeDestinationConnectorConfig
    PostgresDestinationConnectorConfig:
      properties:
        host:
          type: string
          title: Host
        database:
          type: string
          title: Database
        port:
          type: integer
          title: Port
        username:
          type: string
          title: Username
        password:
          type: string
          title: Password
        table_name:
          type: string
          title: Table Name
        batch_size:
          type: integer
          title: Batch Size
      type: object
      required:
        - host
        - database
        - port
        - username
        - password
        - table_name
        - batch_size
      title: PostgresDestinationConnectorConfig
    RedisDestinationConnectorConfig:
      properties:
        host:
          type: string
          title: Host
        port:
          type: integer
          title: Port
        username:
          anyOf:
            - type: string
            - type: 'null'
          title: Username
        password:
          anyOf:
            - type: string
            - type: 'null'
          title: Password
        uri:
          anyOf:
            - type: string
            - type: 'null'
          title: Uri
        database:
          type: integer
          title: Database
        ssl:
          type: boolean
          title: Ssl
        batch_size:
          type: integer
          title: Batch Size
      type: object
      required:
        - host
        - port
        - database
        - ssl
        - batch_size
      title: RedisDestinationConnectorConfig
    QdrantCloudDestinationConnectorConfig:
      properties:
        url:
          type: string
          title: Url
        api_key:
          type: string
          title: Api Key
        collection_name:
          type: string
          title: Collection Name
        batch_size:
          type: integer
          title: Batch Size
      type: object
      required:
        - url
        - api_key
        - collection_name
        - batch_size
      title: QdrantCloudDestinationConnectorConfig
    S3DestinationConnectorConfig:
      properties:
        remote_url:
          type: string
          title: Remote Url
        anonymous:
          type: boolean
          title: Anonymous
        key:
          anyOf:
            - type: string
            - type: 'null'
          title: Key
        secret:
          anyOf:
            - type: string
            - type: 'null'
          title: Secret
        token:
          anyOf:
            - type: string
            - type: 'null'
          title: Token
        endpoint_url:
          anyOf:
            - type: string
            - type: 'null'
          title: Endpoint Url
      type: object
      required:
        - remote_url
        - anonymous
      title: S3DestinationConnectorConfig
    SnowflakeDestinationConnectorConfig:
      properties:
        account:
          type: string
          title: Account
        role:
          type: string
          title: Role
        user:
          type: string
          title: User
        password:
          type: string
          title: Password
        host:
          type: string
          title: Host
        port:
          type: integer
          minimum: 1
          title: Port
          default: 443
        database:
          type: string
          title: Database
        schema:
          type: string
          title: Schema
        table_name:
          type: string
          title: Table Name
          default: elements
        batch_size:
          type: integer
          minimum: 1
          title: Batch Size
          default: 50
        record_id_key:
          anyOf:
            - type: string
            - type: 'null'
          title: Record Id Key
          default: record_id
      type: object
      required:
        - account
        - role
        - user
        - password
        - host
        - database
        - schema
      title: SnowflakeDestinationConnectorConfig
    TeradataDestinationConnectorConfig:
      properties:
        host:
          type: string
          title: Host
        user:
          type: string
          title: User
        password:
          type: string
          title: Password
        database:
          anyOf:
            - type: string
            - type: 'null'
          title: Database
        dbs_port:
          type: integer
          minimum: 1
          title: Dbs Port
          default: 1025
        table_name:
          type: string
          title: Table Name
          default: elements
        batch_size:
          type: integer
          minimum: 1
          title: Batch Size
          default: 50
        record_id_key:
          anyOf:
            - type: string
            - type: 'null'
          title: Record Id Key
          default: record_id
      type: object
      required:
        - host
        - user
        - password
      title: TeradataDestinationConnectorConfig
    WeaviateDestinationConnectorConfig:
      properties:
        cluster_url:
          type: string
          title: Cluster Url
        api_key:
          type: string
          title: Api Key
        collection:
          anyOf:
            - type: string
            - type: 'null'
          title: Collection
      type: object
      required:
        - cluster_url
        - api_key
      title: WeaviateDestinationConnectorConfig
    IBMWatsonxS3DestinationConnectorConfig:
      properties:
        iam_api_key:
          type: string
          title: Iam Api Key
        access_key_id:
          type: string
          title: Access Key Id
        secret_access_key:
          type: string
          title: Secret Access Key
        iceberg_endpoint:
          type: string
          title: Iceberg Endpoint
        object_storage_endpoint:
          type: string
          title: Object Storage Endpoint
        object_storage_region:
          type: string
          title: Object Storage Region
        catalog:
          type: string
          title: Catalog
        max_retries_connection:
          type: integer
          title: Max Retries Connection
        namespace:
          type: string
          title: Namespace
        table:
          type: string
          title: Table
        max_retries:
          type: integer
          title: Max Retries
        record_id_key:
          type: string
          title: Record Id Key
      type: object
      required:
        - iam_api_key
        - access_key_id
        - secret_access_key
        - iceberg_endpoint
        - object_storage_endpoint
        - object_storage_region
        - catalog
        - max_retries_connection
        - namespace
        - table
        - max_retries
        - record_id_key
      title: IBMWatsonxS3DestinationConnectorConfig
    ValidationError:
      properties:
        loc:
          items:
            anyOf:
              - type: string
              - type: integer
          type: array
          title: Location
        msg:
          type: string
          title: Message
        type:
          type: string
          title: Error Type
      type: object
      required:
        - loc
        - msg
        - type
      title: ValidationError

````

---

# Source: https://docs.unstructured.io/api-reference/jobs/list-jobs.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# List Jobs

> Retrieve a list of jobs with optional filtering by workflow ID or job status.


## OpenAPI

````yaml https://platform.unstructuredapp.io/openapi.json get /api/v1/jobs/
openapi: 3.1.0
info:
  title: Platform API
  version: 3.1.0
servers:
  - url: https://platform.unstructuredapp.io/
    description: Unstructured Platform API
    x-speakeasy-server-id: platform-api
security: []
paths:
  /api/v1/jobs/:
    get:
      tags:
        - jobs
      summary: List Jobs
      description: >-
        Retrieve a list of jobs with optional filtering by workflow ID or job
        status.
      operationId: list_jobs
      parameters:
        - name: workflow_id
          in: query
          required: false
          schema:
            anyOf:
              - type: string
                format: uuid
              - type: 'null'
            title: Workflow Id
        - name: status
          in: query
          required: false
          schema:
            anyOf:
              - type: string
              - type: 'null'
            title: Status
        - name: unstructured-api-key
          in: header
          required: false
          schema:
            anyOf:
              - type: string
              - type: 'null'
            title: Unstructured-Api-Key
      responses:
        '200':
          description: Successful Response
          content:
            application/json:
              schema:
                type: array
                items:
                  $ref: '#/components/schemas/JobInformation'
                title: Response List Jobs
        '422':
          description: Validation Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/HTTPValidationError'
components:
  schemas:
    JobInformation:
      properties:
        id:
          type: string
          format: uuid
          title: Id
        workflow_id:
          type: string
          format: uuid
          title: Workflow Id
        workflow_name:
          type: string
          title: Workflow Name
        status:
          $ref: '#/components/schemas/JobStatus'
        created_at:
          type: string
          format: date-time
          title: Created At
        runtime:
          anyOf:
            - type: string
              format: duration
            - type: 'null'
          title: Runtime
        input_file_ids:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Input File Ids
        output_node_files:
          anyOf:
            - items:
                $ref: '#/components/schemas/NodeFileMetadata'
              type: array
            - type: 'null'
          title: Output Node Files
        job_type:
          $ref: '#/components/schemas/WorkflowJobType'
          default: ephemeral
      type: object
      required:
        - id
        - workflow_id
        - workflow_name
        - status
        - created_at
      title: JobInformation
    HTTPValidationError:
      properties:
        detail:
          items:
            $ref: '#/components/schemas/ValidationError'
          type: array
          title: Detail
      type: object
      title: HTTPValidationError
    JobStatus:
      type: string
      enum:
        - SCHEDULED
        - IN_PROGRESS
        - COMPLETED
        - STOPPED
        - FAILED
      title: JobStatus
    NodeFileMetadata:
      properties:
        node_id:
          type: string
          format: uuid
          title: Node Id
        file_id:
          type: string
          title: File Id
        node_type:
          type: string
          title: Node Type
        node_subtype:
          type: string
          title: Node Subtype
      type: object
      required:
        - node_id
        - file_id
        - node_type
        - node_subtype
      title: NodeFileMetadata
    WorkflowJobType:
      type: string
      enum:
        - ephemeral
        - persistent
        - scheduled
        - template
      title: WorkflowJobType
    ValidationError:
      properties:
        loc:
          items:
            anyOf:
              - type: string
              - type: integer
          type: array
          title: Location
        msg:
          type: string
          title: Message
        type:
          type: string
          title: Error Type
      type: object
      required:
        - loc
        - msg
        - type
      title: ValidationError

````

---

# Source: https://docs.unstructured.io/api-reference/templates/list-templates.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# List Templates

> Retrieve a list of available templates with their metadata.


## OpenAPI

````yaml https://platform.unstructuredapp.io/openapi.json get /api/v1/templates/
openapi: 3.1.0
info:
  title: Platform API
  version: 3.1.0
servers:
  - url: https://platform.unstructuredapp.io/
    description: Unstructured Platform API
    x-speakeasy-server-id: platform-api
security: []
paths:
  /api/v1/templates/:
    get:
      tags:
        - templates
      summary: List Templates
      description: Retrieve a list of available templates with their metadata.
      operationId: list_templates
      parameters:
        - name: unstructured-api-key
          in: header
          required: false
          schema:
            anyOf:
              - type: string
              - type: 'null'
            title: Unstructured-Api-Key
      responses:
        '200':
          description: Successful Response
          content:
            application/json:
              schema:
                type: array
                items:
                  $ref: '#/components/schemas/TemplateListItem'
                title: Response List Templates
        '422':
          description: Validation Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/HTTPValidationError'
components:
  schemas:
    TemplateListItem:
      properties:
        id:
          type: string
          title: Id
        name:
          type: string
          title: Name
        version:
          type: string
          title: Version
        last_updated:
          type: string
          title: Last Updated
        description:
          type: string
          title: Description
      type: object
      required:
        - id
        - name
        - version
        - last_updated
        - description
      title: TemplateListItem
      description: Template metadata for list responses.
    HTTPValidationError:
      properties:
        detail:
          items:
            $ref: '#/components/schemas/ValidationError'
          type: array
          title: Detail
      type: object
      title: HTTPValidationError
    ValidationError:
      properties:
        loc:
          items:
            anyOf:
              - type: string
              - type: integer
          type: array
          title: Location
        msg:
          type: string
          title: Message
        type:
          type: string
          title: Error Type
      type: object
      required:
        - loc
        - msg
        - type
      title: ValidationError

````

---

# Source: https://docs.unstructured.io/api-reference/workflows/list-workflows.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# List Workflows

> Retrieve a list of workflows, optionally filtered by source, destination, state, name, date range, and supports pagination and sorting.


## OpenAPI

````yaml https://platform.unstructuredapp.io/openapi.json get /api/v1/workflows/
openapi: 3.1.0
info:
  title: Platform API
  version: 3.1.0
servers:
  - url: https://platform.unstructuredapp.io/
    description: Unstructured Platform API
    x-speakeasy-server-id: platform-api
security: []
paths:
  /api/v1/workflows/:
    get:
      tags:
        - workflows
      summary: List Workflows
      description: >-
        Retrieve a list of workflows, optionally filtered by source,
        destination, state, name, date range, and supports pagination and
        sorting.
      operationId: list_workflows
      parameters:
        - name: dag_node_configuration_id
          in: query
          required: false
          schema:
            anyOf:
              - type: string
              - type: 'null'
            title: Dag Node Configuration Id
        - name: source_id
          in: query
          required: false
          schema:
            anyOf:
              - type: string
                format: uuid
              - type: 'null'
            title: Source Id
        - name: destination_id
          in: query
          required: false
          schema:
            anyOf:
              - type: string
                format: uuid
              - type: 'null'
            title: Destination Id
        - name: status
          in: query
          required: false
          schema:
            anyOf:
              - $ref: '#/components/schemas/WorkflowState'
              - type: 'null'
            title: Status
        - name: page
          in: query
          required: false
          schema:
            anyOf:
              - type: integer
              - type: 'null'
            default: 1
            title: Page
        - name: page_size
          in: query
          required: false
          schema:
            anyOf:
              - type: integer
              - type: 'null'
            default: 20
            title: Page Size
        - name: created_since
          in: query
          required: false
          schema:
            anyOf:
              - type: string
                format: date-time
              - type: 'null'
            title: Created Since
        - name: created_before
          in: query
          required: false
          schema:
            anyOf:
              - type: string
                format: date-time
              - type: 'null'
            title: Created Before
        - name: name
          in: query
          required: false
          schema:
            anyOf:
              - type: string
              - type: 'null'
            title: Name
        - name: sort_by
          in: query
          required: false
          schema:
            type: string
            default: id
            title: Sort By
        - name: sort_direction
          in: query
          required: false
          schema:
            $ref: '#/components/schemas/SortDirection'
            default: asc
        - name: show_only_soft_deleted
          in: query
          required: false
          schema:
            anyOf:
              - type: boolean
              - type: 'null'
            default: false
            title: Show Only Soft Deleted
        - name: show_recommender_workflows
          in: query
          required: false
          schema:
            anyOf:
              - type: boolean
              - type: 'null'
            default: false
            title: Show Recommender Workflows
        - name: unstructured-api-key
          in: header
          required: false
          schema:
            anyOf:
              - type: string
              - type: 'null'
            title: Unstructured-Api-Key
      responses:
        '200':
          description: Successful Response
          content:
            application/json:
              schema:
                type: array
                items:
                  $ref: '#/components/schemas/WorkflowInformation'
                title: Response List Workflows
        '422':
          description: Validation Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/HTTPValidationError'
components:
  schemas:
    WorkflowState:
      type: string
      enum:
        - active
        - inactive
      title: WorkflowState
    SortDirection:
      type: string
      enum:
        - asc
        - desc
      title: SortDirection
    WorkflowInformation:
      properties:
        id:
          type: string
          format: uuid
          title: Id
        name:
          type: string
          title: Name
        sources:
          items:
            type: string
            format: uuid
          type: array
          title: Sources
        destinations:
          items:
            type: string
            format: uuid
          type: array
          title: Destinations
        workflow_type:
          anyOf:
            - $ref: '#/components/schemas/WorkflowType'
            - type: 'null'
        workflow_nodes:
          items:
            $ref: '#/components/schemas/WorkflowNode'
          type: array
          title: Workflow Nodes
        schedule:
          anyOf:
            - $ref: '#/components/schemas/WorkflowSchedule'
            - type: 'null'
        status:
          $ref: '#/components/schemas/WorkflowState'
        created_at:
          type: string
          format: date-time
          title: Created At
        updated_at:
          anyOf:
            - type: string
              format: date-time
            - type: 'null'
          title: Updated At
        reprocess_all:
          type: boolean
          title: Reprocess All
          default: false
      type: object
      required:
        - id
        - name
        - sources
        - destinations
        - workflow_nodes
        - status
        - created_at
      title: WorkflowInformation
    HTTPValidationError:
      properties:
        detail:
          items:
            $ref: '#/components/schemas/ValidationError'
          type: array
          title: Detail
      type: object
      title: HTTPValidationError
    WorkflowType:
      type: string
      enum:
        - basic
        - advanced
        - platinum
        - custom
      title: WorkflowType
    WorkflowNode:
      properties:
        id:
          anyOf:
            - type: string
              format: uuid
            - type: 'null'
          title: Id
        name:
          type: string
          title: Name
        type:
          type: string
          title: Type
        subtype:
          type: string
          title: Subtype
        settings:
          anyOf:
            - additionalProperties: true
              type: object
            - type: 'null'
          title: Settings
      type: object
      required:
        - name
        - type
        - subtype
      title: WorkflowNode
    WorkflowSchedule:
      properties:
        crontab_entries:
          items:
            $ref: '#/components/schemas/CronTabEntry'
          type: array
          title: Crontab Entries
      type: object
      required:
        - crontab_entries
      title: WorkflowSchedule
      examples:
        - crontab_entries:
            - cron_expression: 0 0 * * *
    ValidationError:
      properties:
        loc:
          items:
            anyOf:
              - type: string
              - type: integer
          type: array
          title: Location
        msg:
          type: string
          title: Message
        type:
          type: string
          title: Error Type
      type: object
      required:
        - loc
        - msg
        - type
      title: ValidationError
    CronTabEntry:
      properties:
        cron_expression:
          type: string
          title: Cron Expression
      type: object
      required:
        - cron_expression
      title: CronTabEntry

````

---

# Source: https://docs.unstructured.io/open-source/ingestion/source-connectors/local.md

# Source: https://docs.unstructured.io/open-source/ingestion/destination-connectors/local.md

# Source: https://docs.unstructured.io/api-reference/workflow/sources/local.md

# Source: https://docs.unstructured.io/api-reference/workflow/destinations/local.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Local

Send processed data from Unstructured to local storage.

This local destination connector is supported only for REST API clients such as `curl` and Postman. It is currently not supported by the Unstructured user interface (UI) or the Unstructured Python SDK.

The local destination connector works only with the [local source connector](/api-reference/workflow/sources/local). You cannot use the local destination connector with any other source connector.

For details, see the following:

1. [Create a workflow](/api-reference/workflow/workflows#create-a-workflow) that uses a local destination and local source.
2. [Run a workflow as a job](/api-reference/workflow/overview#run-a-workflow) that uses a local destination and local source.
3. [Download a processed local file from the workflow's completed job run](/api-reference/workflow/overview#download-a-processed-local-file-from-a-job).

---

# Source: https://docs.unstructured.io/examplecode/tools/mcp-partition.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Model Context Protocol (MCP) Hands-On Walkthrough for the Unstructured Partition Endpoint

[Model Context Protocol](https://modelcontextprotocol.io/introduction) (MCP) is an open protocol that standardizes
how applications provide context to LLMs. From the MCP documentation:

> Think of MCP like a USB-C port for AI applications. Just as USB-C provides a standardized way to connect your devices to various peripherals and accessories, MCP provides a standardized way to connect AI models to different data sources and tools.

This article provides a hands-on, step-by-step walkthrough to build a simple example on your local development machine
that integrates the [Unstructured API](/api-reference/overview) with MCP. Specifically,
this walkthrough demonstrates how to run an MCP *server*
(a lightweight program that exposes specific capabilities through MCP); this MCP server uses the
[Unstructured Partition Endpoint](/api-reference/workflow/overview) to partition a single file, with limited support
for chunking as well, processing local files one file at a time.

<Note>
  For a hands-on walkthrough of the [Unstructured API's workflow operations](/api-reference/workflow/overview) instead,
  see the [MCP Hands-On Walkthrough for the Unstructured API's workflow operations](/examplecode/tools/mcp), which offers
  a full range of partitioning, chunking, embedding, and enrichment options for your files and data.
</Note>

You call this MCP server from an MCP *client* (an application that maintains a one-to-one connection with an
MCP server), which in this walkthrough is [Claude for Desktop](https://claude.ai/download), a chat-based interface, by asking it plain language questions.
Other MCP clients are also available. [Learn more](https://modelcontextprotocol.io/clients).

## Requirements

To complete this walkthrough, you must first have an Unstructured account, an Unstructured API key for that account,
the necessary Python toolchain and project setups, Claude for Desktop, and an input file for Unstructured to process, as follows.

### Unstructured account and API key

Before you begin, you must have an Unstructured account and an Unstructured API key, as follows:

1. If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
   After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).

   <Note>
     To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/api-reference/overview#pricing).
   </Note>

2. If you have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at [https://platform.unstructured.io](https://platform.unstructured.io).

   <Note>
     For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
     or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).
   </Note>

3. Get your Unstructured API key:<br />

   a. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business** account, click **API Keys** on the sidebar.<br />

   <Note>
     For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
     for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).
   </Note>

   b. Click **Generate API Key**.<br />
   c. Follow the on-screen instructions to finish generating the key.<br />
   d. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

### Python toolchain and Claude for Desktop setup

Before you can start coding on your local machine, you must install the Python package and project manager [uv](https://docs.astral.sh/uv/),
and install Claude for Desktop, as follows.

<Steps>
  <Step>
    Install `uv`, as follows.

    By using `pip`:

    ```bash  theme={null}
    pip install uv
    ```

    Or, by using the following script (be sure to restart your terminal or Command Prompt after running the script):

    ```bash  theme={null}
    # For macOS and Linux:
    curl -LsSf https://astral.sh/uv/install.sh | sh

    # For Windows:
    powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
    ```

    (To learn more about `uv`, see the [uv documentation](https://docs.astral.sh/uv/).)
  </Step>

  <Step>
    [Install Claude for Desktop](https://claude.ai/download).
  </Step>
</Steps>

### Input file

You must have an input file for Unstructured to process. This file can be any file type that Unstructured supports, such as a `.pdf` file.
[See the list of supported file types](/ui/supported-file-types). If you do not have any files available, you can download some from the [example-docs](https://github.com/Unstructured-IO/unstructured-ingest/tree/main/example-docs) folder in the Unstructured repo on GitHub.

To minimize the risk of Claude for Desktop's hard timeout limit of 60 seconds for MCP server responses, we recommend that you use small files
(less than 400 KB) for this walkthrough.

You are now ready to start coding.

## Step 1: Create the MCP server

<Steps>
  <Step>
    From your current working directory, create a project directory and then initialize it. This example creates a
    project directory named `mcp-unstructured-parition-demo`. Then go to the project directory:

    ```bash  theme={null}
    uv init mcp-unstructured-partition-demo
    cd mcp-unstructured-partition-demo
    ```
  </Step>

  <Step>
    Create a virtual environment in the root of the project directory. Then activate the virtual environment:

    ```bash  theme={null}
    # For macOS and Linux:
    uv venv
    source .venv/bin/activate

    # For Windows:
    uv venv
    .venv\Scripts\activate
    ```

    To deactivate and exit the virtual environment at any time, simply run the `deactivate` command:

    ```bash  theme={null}
    deactivate
    ```
  </Step>

  <Step>
    With the virtual environment activated, install the Python SDK for Model Context Protocol and additional code dependencies:

    ```bash  theme={null}
    uv add "mcp[cli]" "unstructured-client>=0.30.6" dotenv
    ```
  </Step>

  <Step>
    In the root of the project directory, create a new Python file. This example names the file `doc_processor.py`.
    Then add the following code to this new `doc_processor.py` file.

    In the following code, replace `<absolute-path-to-local-directory>` with the absolute path to the local directory where you want
    Unstructured to store the processed files' content. Be sure that this local directory exists before you run the code.

    ```python  theme={null}
    import os
    from dotenv import load_dotenv
    import json
    from unstructured_client import UnstructuredClient
    from typing import AsyncIterator
    from dataclasses import dataclass
    from contextlib import asynccontextmanager
    from mcp.server.fastmcp import FastMCP, Context
    from unstructured_client.models import operations, shared

    @dataclass
    class AppContext:
        client: UnstructuredClient

    @asynccontextmanager
    async def app_lifespan(server: FastMCP) -> AsyncIterator[AppContext]:
        """Manages the Unstructured API client lifecycle."""
        api_key = os.getenv("UNSTRUCTURED_API_KEY")
        if not api_key:
            raise ValueError("UNSTRUCTURED_API_KEY environment variable is required")

        client = UnstructuredClient(api_key_auth=api_key)
        try:
            yield AppContext(client=client)
        finally:
            # No cleanup needed for the API client.
            pass

    # Create the MCP server instance.
    mcp = FastMCP("unstructured-partition-mcp", lifespan=app_lifespan, dependencies=["unstructured-client", "python-dotenv"])
    # Specify the absolute path to the local directory to store processed files.
    PROCESSED_FILES_FOLDER = "<absolute-path-to-local-directory>"

    def load_environment_variables() -> None:
        """
        Load environment variables from .env file.
        Raises an error if critical environment variables are missing.
        """
        load_dotenv()
        required_vars = [
            "UNSTRUCTURED_API_KEY"
        ]

        for var in required_vars:
            if not os.getenv(var):
                raise ValueError(f"Missing required environment variable: {var}")

    def json_to_text(file_path) -> str:
        with open(file_path, 'r') as file:
            elements = json.load(file)

        doc_texts = []

        for element in elements:
            text = element.get("text", "").strip()
            element_type = element.get("type", "")
            metadata = element.get("metadata", {})

            if element_type == "Title":
                doc_texts.append(f"<h1> {text}</h1><br>")
            elif element_type == "Header":
                doc_texts.append(f"<h2> {text}</h2><br/>")
            elif element_type == "NarrativeText" or element_type == "UncategorizedText":
                doc_texts.append(f"<p>{text}</p>")
            elif element_type == "ListItem":
                doc_texts.append(f"<li>{text}</li>")
            elif element_type == "PageNumber":
                doc_texts.append(f"Page number: {text}")
            elif element_type == "Table":
                table_html = metadata.get("text_as_html", "")
                doc_texts.append(table_html)  # Keep the table as HTML.
            else:
                doc_texts.append(text)

        return " ".join(doc_texts)

    @mcp.tool()
    async def process_document(ctx: Context, filepath: str) -> str:
        """
        Sends the document to Unstructured for processing. 
        Returns the processed contents of the document
        
        Args:
            filepath: The local path to the document.
        """

        filepath = os.path.abspath(filepath)

        if not os.path.isfile(filepath):
            return "File does not exist"

        # Check whether Unstructured supports the file's extension.
        _, ext = os.path.splitext(filepath)
        supported_extensions = {".abw", ".bmp", ".csv", ".cwk", ".dbf", ".dif", ".doc", ".docm", ".docx", ".dot",
                                ".dotm", ".eml", ".epub", ".et", ".eth", ".fods", ".gif", ".heic", ".htm", ".html",
                                ".hwp", ".jpeg", ".jpg", ".md", ".mcw", ".mw", ".odt", ".org", ".p7s", ".pages",
                                ".pbd", ".pdf", ".png", ".pot", ".potm", ".ppt", ".pptm", ".pptx", ".prn", ".rst",
                                ".rtf", ".sdp", ".sgl", ".svg", ".sxg", ".tiff", ".txt", ".tsv", ".uof", ".uos1",
                                ".uos2", ".web", ".webp", ".wk2", ".xls", ".xlsb", ".xlsm", ".xlsx", ".xlw", ".xml",
                                ".zabw"}

        if ext.lower() not in supported_extensions:
            return "File extension not supported by Unstructured"

        client = ctx.request_context.lifespan_context.client
        file_basename = os.path.basename(filepath)

        req = operations.PartitionRequest(
            partition_parameters=shared.PartitionParameters(
                files=shared.Files(
                    content=open(filepath, "rb"),
                    file_name=filepath,
                ),
                strategy=shared.Strategy.AUTO,
            ),
        )

        try:
            res = client.general.partition(request=req)
            element_dicts = [element for element in res.elements]
            json_elements = json.dumps(element_dicts, indent=2)
            output_json_file_path = os.path.join(PROCESSED_FILES_FOLDER, f"{file_basename}.json")
            with open(output_json_file_path, "w") as file:
                file.write(json_elements)

            return json_to_text(output_json_file_path)
        except Exception as e:
            return f"The following exception happened during file processing: {e}"

    if __name__ == "__main__":
        load_environment_variables()
        # Initialize and run the server.
        mcp.run(transport='stdio')
    ```
  </Step>

  <Step>
    In the root of the project directory, create a new file named `.env`.
    Then add the following code to this new `.env` file, replacing `<your-unstructured-api-key>` with your Unstructured API key:

    ```text  theme={null}
    UNSTRUCTURED_API_KEY="<your-unstructured-api-key>"
    ```
  </Step>
</Steps>

## Step 2: Configure Claude for Desktop to use the MCP server

<Steps>
  <Step>
    Open the Claude for Desktop App configuration file from the following location, for example by using Visual Studio Code:

    ```text  theme={null}
    # For macOS or Linux:
    code ~/Library/Application\ Support/Claude/claude_desktop_config.json

    # For Windows:
    code $env:AppData\Claude\claude_desktop_config.json
    ```
  </Step>

  <Step>
    Add the following code to the `claude_desktop_config.json` file, and then save the file.

    In this file:

    * `"unstructured-partition-mcp"` must match the value that you set in `mcp = FastMCP("unstructured-partition-mcp" ...` in the `doc_processor.py` file.
    * "Absolute path to" `uv` must be the absolute path to the `uv` executable on your local machine.
    * "Absolute path to project directory" must be the absolute path to the project directory.
    * `mcp-unstructured-partition-demo` must match the name of the project directory that you created.
    * `"doc_processor.py"` must match the name of the Python file that you created for the MCP server.

    For macOS or Linux:

    ```json  theme={null}
    {
        "mcpServers": {
            "unstructured-partition-mcp": {
                "command": "/ABSOLUTE/PATH/TO/uv",
                "args": [
                    "--directory",
                    "/ABSOLUTE/PATH/TO/PROJECT/DIRECTORY/mcp-unstructured-partition-demo/",
                    "run",
                    "doc_processor.py"
                ]
            }
        }
    }
    ```

    For Windows:

    ```json  theme={null}
    {
        "mcpServers": {
            "unstructured-partition-mcp": {
                "command": "C:\\ABSOLUTE\\PATH\\TO\\uv",
                "args": [
                    "--directory",
                    "C:\\ABSOLUTE\\PATH\\TO\\PROJECT\\DIRECTORY\\mcp-unstructured-partition-demo\\",
                    "run",
                    "doc_processor.py"
                ]
            }
        }
    }
    ```
  </Step>
</Steps>

## Step 3: Run the MCP server and call it from Claude for Desktop

<Steps>
  <Step>
    Start Claude for Desktop. If it is already running, restart it.
  </Step>

  <Step>
    Ask the following question in the chat window, replacing `<absolute-path-to-local-file>` with the absolute path to the local file that you
    want Unstructured to process.

    <Note>
      This walkthrough has been tested with small files (less than 400 KB), due to Claude for Desktop's hard timeout limit of 60 seconds for MCP server responses.

      Larger files might take longer for Unstructured to process, and Claude for Desktop might time out before such files are fully processed.
    </Note>

    ```text  theme={null}
    Process the document at <absolute-path-to-local-file>
    ```

    Claude for Desktop responds with information about the processed document.

    You can find the processed file's content in the path that you specified for `PROCESSED_FILES_FOLDER` in the MCP server's `doc_processor.py` file.
  </Step>
</Steps>

---

# Source: https://docs.unstructured.io/examplecode/tools/mcp.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Model Context Protocol (MCP) Hands-On Walkthrough for the Unstructured API's workflow operations

[Model Context Protocol](https://modelcontextprotocol.io/introduction) (MCP) is an open protocol that standardizes
how applications provide context to LLMs. From the MCP documentation:

> Think of MCP like a USB-C port for AI applications. Just as USB-C provides a standardized way to connect your devices to various peripherals and accessories, MCP provides a standardized way to connect AI models to different data sources and tools.

This article provides a hands-on, step-by-step walkthrough to build a simple example on your local development machine
that integrates the [Unstructured API](/api-reference/overview) with the
[Python SDK for Model Context Protocol](https://github.com/modelcontextprotocol/python-sdk). Specifically,
this walkthrough demonstrates how to write and run Python code to create an MCP *server*
(a lightweight program that exposes specific capabilities through MCP); this MCP server uses the
[Unstructured API's workflow operations](/api-reference/workflow/overview) to list all of your available
[source connectors](/api-reference/workflow/sources/overview) and to get information about a specific source connector. The
Unstructured API's workflow operations offer a full range of partitioning, chunking, embedding, and enrichment options for your files and data.

<Note>
  For a hands-on walkthrough of the legacy [Unstructured Partition Endpoint](/api-reference/legacy-api/partition/overview) instead,
  see the [MCP Hands-On Walkthrough for the legacy Unstructured Partition Endpoint](/examplecode/tools/mcp-partition), which
  is intended for rapid prototyping of Unstructured's various partitioning strategies, with limited support for chunking,
  processing local files one file at a time.
</Note>

You call this MCP server from an MCP *client* (an application that maintains a one-to-one connection with an
MCP server), which in this walkthrough is [Claude for Desktop](https://claude.ai/download), a chat-based interface, by asking it plain language questions.
Other MCP clients are also available. [Learn more](https://modelcontextprotocol.io/clients).

At the end of this walkthrough, you can extend this example by using the Unstructured API's workflow operations to perform other tasks, such as creating, listing, changing, and
deleting various connectors, workflows, and jobs in your Unstructured account.

## Requirements

To complete this walkthrough, you must first have an Unstructured account, an Unstructured API key for that account,
the necessary Python toolchain and project setups, and Claude for Desktop, as follows.

### Unstructured account and API key

Before you begin, you must have an Unstructured account and an Unstructured API key, as follows:

1. If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
   After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).

   <Note>
     To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/api-reference/overview#pricing).
   </Note>

2. If you have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at [https://platform.unstructured.io](https://platform.unstructured.io).

   <Note>
     For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
     or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).
   </Note>

3. Get your Unstructured API key:<br />

   a. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business** account, click **API Keys** on the sidebar.<br />

   <Note>
     For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
     for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).
   </Note>

   b. Click **Generate API Key**.<br />
   c. Follow the on-screen instructions to finish generating the key.<br />
   d. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

### Python toolchain and Claude for Desktop setup

Before you can start coding on your local machine, you must install the Python package and project manager [uv](https://docs.astral.sh/uv/),
and install Claude for Desktop, as follows.

<Steps>
  <Step>
    Install `uv`, as follows.

    By using `pip`:

    ```bash  theme={null}
    pip install uv
    ```

    Or, by using the following script (be sure to restart your terminal or Command Prompt after running the script):

    ```bash  theme={null}
    # For macOS and Linux:
    curl -LsSf https://astral.sh/uv/install.sh | sh

    # For Windows:
    powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
    ```

    (To learn more about `uv`, see the [uv documentation](https://docs.astral.sh/uv/).)
  </Step>

  <Step>
    [Install Claude for Desktop](https://claude.ai/download).
  </Step>
</Steps>

You are now ready to start coding.

## Step 1: Create the MCP server

<Tip>
  If you are not interested in building an MCP server from the beginning and want to use a pre-built MCP server instead,
  skip ahead to [Optional Step 4: Extend the MCP server](#optional-step-4%3A-extend-the-mcp-server).
</Tip>

<Steps>
  <Step>
    From your current working directory, create a project directory and then initialize it. This example creates a
    project directory named `mcp-unstructured-demo`. Then go to the project directory:

    ```bash  theme={null}
    uv init mcp-unstructured-demo
    cd mcp-unstructured-demo
    ```
  </Step>

  <Step>
    Create a virtual environment in the root of the project directory. Then activate the virtual environment:

    ```bash  theme={null}
    # For macOS and Linux:
    uv venv
    source .venv/bin/activate

    # For Windows:
    uv venv
    .venv\Scripts\activate
    ```

    To deactivate and exit the virtual environment at any time, simply run the `deactivate` command:

    ```bash  theme={null}
    deactivate
    ```
  </Step>

  <Step>
    With the virtual environment activated, install the Python SDK for Model Context Protocol and additional code dependencies:

    ```bash  theme={null}
    uv add "mcp[cli]" "unstructured-client>=0.30.6" dotenv
    ```
  </Step>

  <Step>
    In the root of the project directory, create a new Python file. This example names the file `workflow.py`.
    Then add the following code to this new `workflow.py` file:

    ```python  theme={null}
    from typing import Any, AsyncIterator, Optional
    from mcp.server.fastmcp import FastMCP, Context
    from dotenv import load_dotenv
    import os
    from dataclasses import dataclass
    from contextlib import asynccontextmanager
    from unstructured_client import UnstructuredClient
    from unstructured_client.models.operations import (
        ListSourcesRequest,
        GetSourceRequest
    )

    def load_environment_variables() -> None:
        """
        Loads environment variables from a .env file and checks for required variables. 
        Raises an error if any required variable is missing.

        Args:
            None

        Returns:
            None
        """
        load_dotenv()
        required_vars = [
            "UNSTRUCTURED_API_KEY"
        ]

        for var in required_vars:
            if not os.getenv(var):
                raise ValueError(f"Missing required environment variable: {var}")

    @dataclass
    class AppContext:
        client: UnstructuredClient

    @asynccontextmanager
    async def app_lifespan(server: FastMCP) -> AsyncIterator[AppContext]:
        """
        Manages application lifecycle with type-safe context
        for the FastMCP server. Initializes the UnstructuredClient.
        
        Args:
            server (FastMCP): The FastMCP server instance.
        
        Returns:
            AsyncIterator[AppContext]: An asynchronous context manager providing the 
            application context.
        """
        api_key = os.getenv("UNSTRUCTURED_API_KEY")
        if not api_key:
            raise ValueError("UNSTRUCTURED_API_KEY environment variable is required")
        client = UnstructuredClient(api_key_auth=api_key)
        try:
            yield AppContext(client=client)
        finally:
            pass

    # Initialize the FastMCP server.
    mcp = FastMCP("workflow", lifespan=app_lifespan)

    @mcp.tool()
    async def list_sources(ctx: Context, source_type: Optional[str] = None) -> str:
        """
        Lists available source connectors. Optionally filters by source type.
        
        Args:
            ctx (Context): The context for the request.
            source_type (Optional[str]): The type of source connector to filter by. 
                If None, all sources are listed.
        
        Returns:
            str: A formatted string listing available source connector names and IDs.
        """
        client = ctx.request_context.lifespan_context.client
        request = ListSourcesRequest()

        if source_type:
            try:
                request.source_type = [source_type]
            except KeyError:
                return f"Invalid source type: {source_type}"

        response = await client.sources.list_sources_async(
            request=request
        )
        sorted_sources = sorted(
            response.response_list_sources, 
            key=lambda source: source.name.lower()
        )
        result = ["Available source connectors:"]

        for source in sorted_sources:
            result.append(f"- {source.name} (ID: {source.id})")

        return "\n".join(result)

    @mcp.tool()
    async def get_source(ctx: Context, source_id: str) -> str:
        """
        Gets detailed information about a specific source connector by its ID.
        
        Args:
            ctx (Context): The context for the request.
            source_id (str): The ID of the source connector to retrieve.
            
        Returns: 
            str: A formatted string containing information about the source connector.
        """
        client = ctx.request_context.lifespan_context.client
        
        response = await client.sources.get_source_async(
            request=GetSourceRequest(source_id=source_id)
        )
        info = response.source_connector_information

        result = ["Source connector information:"]
        result.append(f"- name: {info.name}")
        
        for key, value in info.config:
            result.append(f"{key}: {value}")

        return "\n".join(result)
        
    if __name__ == "__main__":
        load_environment_variables()
        # Initialize and run the MCP server.
        mcp.run(transport='stdio')
    ```
  </Step>

  <Step>
    In the root of the project directory, create a new file named `.env`.
    Then add the following code to this new `.env` file, replacing `<your-unstructured-api-key>` with your Unstructured API key:

    ```text  theme={null}
    UNSTRUCTURED_API_KEY="<your-unstructured-api-key>"
    ```
  </Step>
</Steps>

## Step 2: Configure Claude for Desktop to use the MCP server

<Steps>
  <Step>
    Open the Claude for Desktop App configuration file from the following location, for example by using Visual Studio Code:

    ```text  theme={null}
    # For macOS or Linux:
    code ~/Library/Application\ Support/Claude/claude_desktop_config.json

    # For Windows:
    code $env:AppData\Claude\claude_desktop_config.json
    ```
  </Step>

  <Step>
    Add the following code to the `claude_desktop_config.json` file, and then save the file.

    In this file:

    * `"workflow"` must match the value that you set in `mcp = FastMCP("workflow" ...` in the `workflow.py` file.
    * "Absolute path to" `uv` must be the absolute path to the `uv` executable on your local machine.
    * "Absolute path to project directory" must be the absolute path to the project directory.
    * `mcp-unstructured-demo` must match the name of the project directory that you created.
    * `"workflow.py"` must match the name of the Python file that you created for the MCP server.

    For macOS or Linux:

    ```json  theme={null}
    {
        "mcpServers": {
            "workflow": {
                "command": "/ABSOLUTE/PATH/TO/uv",
                "args": [
                    "--directory",
                    "/ABSOLUTE/PATH/TO/PROJECT/DIRECTORY/mcp-unstructured-demo/",
                    "run",
                    "workflow.py"
                ]
            }
        }
    }
    ```

    For Windows:

    ```json  theme={null}
    {
        "mcpServers": {
            "workflow": {
                "command": "C:\\ABSOLUTE\\PATH\\TO\\uv",
                "args": [
                    "--directory",
                    "C:\\ABSOLUTE\\PATH\\TO\\PROJECT\\DIRECTORY\\mcp-unstructured-demo\\",
                    "run",
                    "workflow.py"
                ]
            }
        }
    }
    ```
  </Step>
</Steps>

## Step 3: Run the MCP server and call it from Claude for Desktop

<Steps>
  <Step>
    Start Claude for Desktop. If it is already running, restart it.
  </Step>

  <Step>
    Ask the following question in the chat window, replacing `S3` with the type of source connector that you want to list.
    For a list of available source connector types, see the [Unstructured API's workflow operations' sources overview](/api-reference/workflow/sources/overview).

    ```text  theme={null}
    List all available source connectors of type S3.
    ```

    Claude for Desktop responds with a list of all available source connectors for that source connector type.
  </Step>

  <Step>
    Ask another question in the chat window, replacing `<source-connector-id>` with one of the source connector IDs that
    Claude for Desktop returned in the previous step. For example:

    ```text  theme={null}
    Provide information about the source connector with ID <source-connector-id>.
    ```

    Claude for Desktop responds with information about that source connector.
  </Step>
</Steps>

## Optional Step 4: Extend the MCP server

Unstructured offers an initial experimental MCP server implementation, with much more functionality then the MCP server that you just created previously in this walkthrough.
This initial MCP server is intended only for experimentation purposes and is not recommended for production usage. A more complete, supported,
production-level implementation is coming soon.

To use Unstructured's initial experimental MCP server implementation, do the following:

<Steps>
  <Step>
    From a different working directory than the one you used in previously in this walkthrough, use Git to clone the
    [Unstructured API MCP Server](https://github.com/Unstructured-IO/UNS-MCP) repository in GitHub, and then switch to the cloned
    repository's root directory:

    ```bash  theme={null}
    git clone https://github.com/Unstructured-IO/UNS-MCP.git
    cd UNS-MCP
    ```
  </Step>

  <Step>
    In the cloned repository's root directory, create a new file named `.env`. Then copy the contents of the repository's
    `.env.template` file into the `.env` file that you just created.

    As an MCP security best practice, sensitive information such as API keys should be stored in environment variables or a
    secrets manager, not hard-coded in MCP server code or retrieved from the user by an MCP client.
    The `.env` file is a common way to store environment variables in Python projects outside of MCP server code and not
    visible to MCP clients.

    Having your own `.env` file helps ensure that you do
    not accidentally commit any sensitive information from this cloned repository into other repositories (as `.env` is an
    entry in this cloned repository's `.gitignore` file, but `.env.template` is not).

    In your `.env` file, replace `<key-here>` in the following line with your Unstructured API key:

    ```text  theme={null}
    UNSTRUCTURED_API_KEY="<key-here>"
    ```

    As needed, replace any of the other placeholders in the `.env` file with your own values. For example, if you want to
    create an Amazon S3 source or destination connector, you should provide your AWS credentials in the `.env` file by specifying values
    for the `AWS_KEY` and `AWS_SECRET` variables. For more information, see the usage comments in the original `.env.template`
    file or in the content that you copied into your `.env` file.

    The original `.env.template` file will be updated as more features are added to the repository. You should periodically
    pull down the source repository's contents into your cloned repository to get any new features, as well as the latest version
    of the `.env.template` file that you can copy any newly available required environment variable names over into your existing `.env` file.
  </Step>

  <Step>
    In the cloned repository's root directory, create a virtual environment. Then activate the virtual environment:

    ```bash  theme={null}
    # For macOS and Linux:
    uv venv
    source .venv/bin/activate

    # For Windows:
    uv venv
    .venv\Scripts\activate
    ```

    To deactivate and exit the virtual environment at any time, simply run the `deactivate` command:

    ```bash  theme={null}
    deactivate
    ```
  </Step>

  <Step>
    With the virtual environment activated, install the cloned repository's code dependencies:

    ```bash  theme={null}
    uv sync
    ```
  </Step>

  <Step>
    Open the Claude for Desktop App configuration file from the following location, for example by using Visual Studio Code:

    ```text  theme={null}
    # For macOS or Linux:
    code ~/Library/Application\ Support/Claude/claude_desktop_config.json

    # For Windows:
    code $env:AppData\Claude\claude_desktop_config.json
    ```
  </Step>

  <Step>
    Overwrite the contents of the `claude_desktop_config.json` file with the following content, and then save the file.

    For macOS or Linux:

    ```json  theme={null}
    {
        "mcpServers": {
            "UNS-MCP": {
                "command": "/ABSOLUTE/PATH/TO/uv",
                "args": [
                    "--directory",
                    "/ABSOLUTE/PATH/TO/PARENT/FOLDER/UNS-MCP",
                    "run",
                    "server.py"
                ]
            }
        }
    }
    ```

    For Windows:

    ```json  theme={null}
    {
        "mcpServers": {
            "workflow": {
                "command": "C:\\ABSOLUTE\\PATH\\TO\\uv",
                "args": [
                    "--directory",
                    "C:\\ABSOLUTE\\PATH\\TO\\PARENT\\FOLDER\\UNS-MCP",
                    "run",
                    "server.py"
                ]
            }
        }
    }
    ```
  </Step>

  <Step>
    Start Claude for Desktop. If it is already running, restart it.
  </Step>

  <Step>
    Here are some questions that you can ask in the chat window to test the MCP server. Replace the placeholders with the appropriate values.

    For a list of available source connector types, see the [Unstructured API's workflow operations sources overview](/api-reference/workflow/sources/overview).

    For a list of available destination connector types, see the [Unstructured API's workflow operations destinations overview](/api-reference/workflow/destinations/overview).

    * `List all available source connectors of type <source-connector-type>.`
    * `Provide information about the source connector with ID <source-connector-id>.`
    * `List all available source connectors of type <destination-connector-type>.`
    * `Provide information about the source connector with ID <destination-connector-id>.`
    * `List all available destination connectors of type <destination-connector-type>.`
    * `Provide information about the destination connector with ID <destination-connector-id>.`
    * `Create a new workflow.`
    * `Provide information about the workflow with ID <workflow-id>.`
    * `Run the workflow with ID <workflow-id>.`
    * `Update the workflow with ID <workflow-id>.`
    * `List all available workflows.`
    * `Delete the worklow with ID <workflow-id>.`
  </Step>

  <Step>
    New features are being added to the source repository over time, which will increase the types of questions that
    you can ask the MCP server. You should periodically
    pull down the source repository's contents into your cloned repository to get any new features as they are added.
  </Step>
</Steps>

## Learn more

* <Icon icon="blog" />  [Building an MCP Server with Unstructured API](https://unstructured.io/blog/building-an-mcp-server-with-unstructured-api)
* <Icon icon="blog" />  [Building an End-to-End Data Pipeline with Custom NER on Unstructured using MCP](https://unstructured.io/blog/building-an-end-to-end-data-pipeline-with-custom-ner-on-unstructured-using-mcp)
* <Icon icon="blog" />  Uses this MCP server: [All Your Unstructured Data in a Databricks Delta Table. Just Say the Word.](https://unstructured.io/blog/all-your-unstructured-data-in-a-databricks-delta-table-just-say-the-word)

---

# Source: https://docs.unstructured.io/api-reference/workflow/migration.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Migration

This page contains information about how to migrate from previous versions of the
Unstructured API's [workflow operations](/api-reference/workflow/overview).

## ConnectorConfigInput and ConnectorType classes are deprecated

**Applies to**: The [Unstructured Python SDK](/api-reference/workflow/overview#unstructured-python-sdk) only.

**Issue**: Referencing classes that end in `ConnectorConfigInput` such as `S3SourceConnectorConfigInput`
and `S3DestinationConnectorConfigInput`, and referencing classes that end in `ConnectorType` such as
`S3SourceConnectorType` and `S3DestinationConnectorType`, produce warnings at design time and errors at run time.

**Cause**: The preceding classes have been deprecated and are no longer supported.

**Solution**:

* Remove references to classes that end in `ConnectorConfigInput` from your code. Replace these references with a
  dictionary. This dictionary must contain the same fields as the
  class that you removed, with dictionary key/value pairs instead of object parameter/argument pairs.
* Remove references to classes that end in `ConnectorType` from your code. Replace these references with a string reference instead. This
  string corresponds to Unstructured's programmatic identifier for the connector type, for example `s3` for Amazon S3.

For example, the following code template shows the deprecated approach for programmatically creating an S3 source connector.

```python Python SDK theme={null}
# Deprecated code example. Do not use.
import os

from unstructured_client import UnstructuredClient
from unstructured_client.models.operations import CreateSourceRequest
# The SourceConnectorType and S3SourceConnectorConfigInput classes are deprecated.
from unstructured_client.models.shared import (
    CreateSourceConnector,
    SourceConnectorType,         # <- Remove. Deprecated.
    S3SourceConnectorConfigInput # <- Remove. Deprecated.
)

with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
    response = client.sources.create_source(
        request=CreateSourceRequest(
            create_source_connector=CreateSourceConnector(
                name="<name>",
                type=SourceConnectorType.S3, # <- Replace SourceConnectorType.S3 with a string reference to "s3".
                # Replace S3SourceConnectorConfigInput with a typed dictionary.
                config=S3SourceConnectorConfigInput(
                    # For anonymous authentication:
                    anonymous=True,

                    # For AWS access key ID with AWS secret access key authentication:
                    key="<key>", 
                    secret="<secret>", 

                    # For AWS STS token authentication:
                    token="<token>", 
                    key="<key>", 
                    secret="<secret>", 

                    remote_url="<remote_url>",
                    endpoint_url="<endpoint-url>", 
                    recursive=<True|False>
                )
            )
        )
    )

    print(response.source_connector_information)
```

To address this issue, use the following code template instead, which removes the
`SourceConnectorType` and `S3SourceConnectorConfigInput` class references and replaces them
with the correct substitutions.

```python Python SDK theme={null}
import os

from unstructured_client import UnstructuredClient
from unstructured_client.models.operations import CreateSourceRequest
from unstructured_client.models.shared import CreateSourceConnector

with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
    response = client.sources.create_source(
        request=CreateSourceRequest(
            create_source_connector=CreateSourceConnector(
                name="<name>",
                type="s3",
                config={
                    # For anonymous authentication:
                    "anonymous": True,

                    # For AWS access key ID with AWS secret access key authentication:
                    "key": "<key>", 
                    "secret": "<secret>", 

                    # For AWS STS token authentication:
                    "token": "<token>", 
                    "key": "<key>", 
                    "secret": "<secret>", 

                    "remote_url": "<remote_url>",
                    "endpoint_url": "<endpoint-url>", 
                    "recursive": <True|False>
                }
            )
        )
    )

    print(response.source_connector_information)
```

---

# Source: https://docs.unstructured.io/ui/destinations/milvus.md

# Source: https://docs.unstructured.io/open-source/ingestion/destination-connectors/milvus.md

# Source: https://docs.unstructured.io/api-reference/workflow/destinations/milvus.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Milvus

<Note>
  If you're new to Unstructured, read this note first.

  Before you can create a destination connector, you must first sign in to your Unstructured account:

  * If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
    After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).
    To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/api-reference/overview#pricing).
  * If you already have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at
    [https://platform.unstructured.io](https://platform.unstructured.io). For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
    or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  After you sign in, the [Unstructured user interface](/ui/overview) (UI) appears, which you use to get your Unstructured API key.

  1. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business** account, click **API Keys** on the sidebar.<br />

     For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
     for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).

  2. Click **Generate API Key**.<br />

  3. Follow the on-screen instructions to finish generating the key.<br />

  4. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

  After you create the destination connector, add it along with a
  [source connector](/api-reference/workflow/sources/overview) to a [workflow](/api-reference/workflow/overview#workflows).
  Then run the worklow as a [job](/api-reference/workflow/overview#jobs). To learn how, try out the
  the notebook [Dropbox-To-Pinecone Connector API Quickstart for Unstructured](https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Dropbox_To_Pinecone_Connector_Quickstart.ipynb),
  or watch the two 4-minute video tutorials for the [Unstructured Python SDK](/api-reference/workflow/overview#unstructured-python-sdk).

  You can also create destination connectors with the Unstructured user interface (UI).
  [Learn how](/ui/destinations/overview).

  If you need help, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  You are now ready to start creating a destination connector! Keep reading to learn how.
</Note>

Send processed data from Unstructured to Milvus.

The requirements are as follows.

* For the [Unstructured UI](/ui/overview) or the [Unstructured API](/api-reference/overview), only Milvus cloud-based instances (such as Milvus on IBM watsonx.data, or Zilliz Cloud) are supported.

* For [Unstructured Ingest](/open-source/ingestion/overview), Milvus local and cloud-based instances are supported.

* For Milvus on IBM watsonx.data, you will need:

  <iframe width="560" height="315" src="https://www.youtube.com/embed/hLCwoe2fCnc" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

  * An [IBM Cloud account](https://cloud.ibm.com/registration).

  * An IBM watsonx.data [Lite plan](https://cloud.ibm.com/docs/watsonxdata?topic=watsonxdata-tutorial_prov_lite_1)
    or [Enterprise plan](https://cloud.ibm.com/docs/watsonxdata?topic=watsonxdata-getting-started_1) within your IBM Cloud account.

    * If you are provisioning a Lite plan, be sure to choose the **Generative AI** use case when prompted, as this is the only use case offered that includes Milvus.

  * A [Milvus service instance in IBM watsonx.data](https://cloud.ibm.com/docs/watsonxdata?topic=watsonxdata-adding-milvus-service).

    * If you are creating a Milvus service instance within a watsonx.data Lite plan, when you are prompted to choose a Milvus instance size, you can only select **Lite**. Because the Lite
      Milvus instance size is recommended only for 384 dimensions, you should also use an embedding model that uses 384 dimensions only.
    * If you are creating a Milvus service instance within a watsonx.data Enterprise plan, you can choose any available Milvus instance size. However, all Milvus instance sizes other than
      **Custom** are recommended only for 384 dimensions, which means you should use an embedding model that uses 384 dimensions only.
      The **Custom** Milvus instance size is recommended for any number of dimensions.

  * The URI of the instance, which takes the format of `https://`, followed by the instance's **GRPC host**, followed by a colon and the **GRPC port**.
    This takes the format of `https://<host>:<port>`. To get this information, do the following:

    a. Sign in to your IBM Cloud account.<br />
    b. On the sidebar, click the **Resource list** icon. If the sidebar is not visible, click the **Navigation Menu** icon to the far left of the title bar.<br />
    c. Expand **Databases**, and then click the name of the target **watsonx.data** plan.<br />
    d. Click **Open web console**.<br />
    e. On the sidebar, click **Infrastructure manager**. If the sidebar is not visible, click the **Global navigation** icon to the far left of the title bar.<br />
    f. Click the target Milvus service instance.<br />
    g. On the **Details** tab, under **Type**, click **View connect details**.<br />
    h. Under **Service details**, expand **GRPC**, and note the value of **GRPC host** and **GRPC port**.<br />

  * The name of the [database](https://milvus.io/docs/manage_databases.md) in the instance.

  * The name of the [collection](https://milvus.io/docs/manage-collections.md) in the database. Note the collection requirements at the end of this section.

  * The username and password to access the instance.

    * The username for Milvus on IBM watsonx.data is typically `ibmlhapikey`.

      <Note>
        More recent versions of Milvus on IBM watsonx.data require `ibmlhapikey_<your-IBMid>` instead, where `<your-IBMid>` is
        your IBMid, for example `me@example.com`. To get your IBMid, do the following:

        1. Sign in to your IBM Cloud account.
        2. In the title bar, click **Manage** and then, under **Security and access**, click **Access (IAM)**.
        3. In the sidebar, expand **Manage identities**, and then click **Users**.
        4. In the list of users, click your user name.
        5. On the **User details** tab, in the **Details** tile, note the value of **IBMid**.
      </Note>

    * The password for Milvus on IBM watsonx.data is in the form of an IBM Cloud user API key. To create an IBM Cloud user API key:

      a. Sign in to your IBM Cloud account.<br />
      b. In the title bar, click **Manage** and then, under **Security and access**, click **Access (IAM)**.<br />
      c. On the sidebar, under **Manage identities**, click **API keys**. If the sidebar is not visible, click the **Navigation Menu** icon to the far left of the title bar.<br />
      d. Click **Create**.<br />
      e. Enter some **Name** for the API key.<br />
      f. Optionally, enter some **Description** for the API key.<br />
      g. For **Leaked action**, leave **Disable the leaked key** selected.<br />
      h. For **Session management**, leave **No** selected.<br />
      i. Click **Create**.<br />
      j. Click **Download** (or **Copy**), and then download the API key to a secure location (or paste the copied API key into a secure location). You won't be able to access this API key from this dialog again. If you lose this API key, you can create a new one (and you should then delete the old one).<br />

* For Zilliz Cloud, you will need:

  <iframe width="560" height="315" src="https://www.youtube.com/embed/vwWudGvBEKQ" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

  * A [Zilliz Cloud account](https://cloud.zilliz.com/signup).

  * A [Zilliz Cloud cluster](https://docs.zilliz.com/docs/create-cluster).

  * The URI of the cluster, also known as the cluster's *public endpoint*, which takes a format such as
    `https://<cluster-id>.<cluster-type>.<cloud-provider>-<region>.cloud.zilliz.com`. To get this public endpoint value, do the following:

    1. After you sign in to your Zilliz Cloud account, on the sidebar, in the list of available projects, select the project that contains the cluster.
    2. On the sidebar, click **Clusters**.
    3. Click the tile for the cluster.
    4. On the **Cluster Details** tab, on the **Connect** subtab, copy the **Public Endpoint** value.

  * The username and password to access the cluster, as follows:

    1. After you sign in to your Zilliz Cloud account, on the sidebar, in the list of available projects, select the project that contains the cluster.
    2. On the sidebar, click **Clusters**.
    3. Click the tile for the cluster.
    4. On the **Users** tab, copy the name of the user.
    5. Next to the user's name, under **Actions**, click the ellipsis (three dots) icon, and then click **Reset Password**.
    6. Enter a new password for the user, and then click **Confirm**. Copy this new password.

  * The name of the [database](https://docs.zilliz.com/docs/database#create-database) in the instance.

  * The name of the [collection](https://docs.zilliz.com/docs/manage-collections-console#create-collection) in the database.

    The collection must have a defined schema before Unstructured can write to the collection. The minimum viable
    schema for Unstructured contains only the fields `element_id`, `embeddings`, `record_id`, and `text`, as follows.
    `type` is an optional field, but highly recommended. For settings for additional Unstructured-produced fields,
    such as the ones within `metadata`, see the usage notes toward the end of this section and adapt them to your specific needs.

    | Field Name                       | Field Type        | Max Length | Dimension |
    | -------------------------------- | ----------------- | ---------- | --------- |
    | `element_id` (primary key field) | **VARCHAR**       | `200`      | --        |
    | `embeddings` (vector field)      | **FLOAT\_VECTOR** | --         | `384`     |
    | `record_id`                      | **VARCHAR**       | `200`      | --        |
    | `text`                           | **VARCHAR**       | `65536`    | --        |
    | `type`                           | **VARCHAR**       | `200`      | --        |

    In the **Create Index** area for the collection, next to **Vector Fields**, click **Edit Index**. Make sure that for the
    `embeddings` field, the **Field Type** is set to **FLOAT\_VECTOR** and the **Metric Type** is set to **Cosine**.

    <Warning>
      The number of dimensions for the `embeddings` field must match the number of dimensions for the embedding model that you plan to use.
    </Warning>

    <Note>
      Fields with a **VARCHAR** data type are limited to a maximum length of 65,535 characters. Attempting to exceed this character count
      will cause Unstructured to throw errors when attempting to write to a Milvus collection, and the associated Unstructured job could fail.
      For example, `metadata` fields that typically exceed these character counts include `image_base64` and `orig_elements`.
    </Note>

* For Milvus local, you will need:

  * A [Milvus instance](https://milvus.io/docs/install-overview.md).
  * The [URI](https://milvus.io/api-reference/pymilvus/v2.4.x/MilvusClient/Client/MilvusClient.md) of the instance.
  * The name of the [database](https://milvus.io/docs/manage_databases.md) in the instance.
  * The name of the [collection](https://milvus.io/docs/manage-collections.md) in the database.
    Note the collection requirements at the end of this section.
  * The [username and password, or token](https://milvus.io/docs/authenticate.md) to access the instance.

All Milvus instances require the target collection to have a defined schema before Unstructured can write to the collection. The minimum viable
schema for Unstructured contains only the fields `element_id`, `embeddings`, `record_id`, and `text`, as follows.
`type` is an optional field, but highly recommended.

This example code demonstrates the use of the
[Python SDK for Milvus](https://pypi.org/project/pymilvus/) to create a collection with this schema,
targeting Milvus on IBM watsonx.data. For the `MilvusClient` arguments to connect to other types of Milvus deployments, see your Milvus provider's documentation:

```python Python theme={null}
import os

from pymilvus import (
    MilvusClient,
    FieldSchema,
    DataType,
    CollectionSchema
)

DATABASE_NAME   = "default"
COLLECTION_NAME = "my_collection"

client = MilvusClient(
    uri="https://" +
        os.getenv("MILVUS_USER") + 
        ":" + 
        os.getenv("MILVUS_PASSWORD") + 
        "@" + 
        os.getenv("MILVUS_GRPC_HOST") + 
        ":" + 
        os.getenv("MILVUS_GRPC_PORT"),
    db_name=DATABASE_NAME
)

# IMPORTANT: The number of dimensions for the "embeddings" field that
# follows must match the number of dimensions for the embedding model 
# that you plan to use.
fields = [
    FieldSchema(name="element_id", dtype=DataType.VARCHAR, is_primary=True, max_length=200),
    FieldSchema(name="embeddings", dtype=DataType.FLOAT_VECTOR, dim=384),
    FieldSchema(name="record_id", dtype=DataType.VARCHAR, max_length=200),
    FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=65535),
    FieldSchema(name="type", dtype=DataType.VARCHAR,max_length=200, nullable=True) # Optional, but highly recommended.
]

schema = CollectionSchema(fields=fields)

client.create_collection(
    collection_name=COLLECTION_NAME,
    schema=schema,
    using=DATABASE_NAME
)

index_params = client.prepare_index_params()

index_params.add_index(
    field_name="embeddings",
    metric_type="COSINE",
    index_type="IVF_FLAT",
    params={"nlist": 1024}
)

client.create_index(
    collection_name=COLLECTION_NAME,
    index_params=index_params
)

client.load_collection(collection_name=COLLECTION_NAME)
```

For objects in the `metadata` field that Unstructured produces and that you want to store in Milvus, you must create fields in your Milvus collection schema that
follows Unstructured's `metadata` field naming convention. For example, if Unstructured produces a `metadata` field with the following
child objects:

```json  theme={null}
"metadata": {
  "is_extracted": "true",
  "coordinates": {
    "points": [
      [
        134.20055555555555,
        241.36027777777795
      ],
      [
        134.20055555555555,
        420.0269444444447
      ],
      [
        529.7005555555555,
        420.0269444444447
      ],
      [
        529.7005555555555,
        241.36027777777795
      ]
    ],
    "system": "PixelSpace",
    "layout_width": 1654,
    "layout_height": 2339
  },
  "filetype": "application/pdf",
  "languages": [
    "eng"
  ],
  "page_number": 1,
  "image_mime_type": "image/jpeg",
  "filename": "realestate.pdf",
  "data_source": {
    "url": "file:///home/etl/node/downloads/00000000-0000-0000-0000-000000000001/7458635f-realestate.pdf",
    "record_locator": {
      "protocol": "file",
      "remote_file_path": "file:///home/etl/node/downloads/00000000-0000-0000-0000-000000000001/7458635f-realestate.pdf"
    }
  },
  "entities": {
    "items": [
      {
        "entity": "HOME FOR FUTURE",
        "type": "ORGANIZATION"
      },
      {
        "entity": "221 Queen Street, Melbourne VIC 3000",
        "type": "LOCATION"
      }
    ],
    "relationships": [
      {
        "from": "HOME FOR FUTURE",
        "relationship": "based_in",
        "to": "221 Queen Street, Melbourne VIC 3000"
      }
    ]
  }
}
```

You could create corresponding fields in your Milvus collection schema by using the following field names, data types, and related settings for the
Python SDK for Milvus:

```python  theme={null}
# ...
fields = [
    # Define minimum viable required fields: "element_id", "embeddings", "record_id", and "text".
    # "type" is an optional field, but highly recommended.
    FieldSchema(name="element_id", dtype=DataType.VARCHAR, is_primary=True, max_length=200),
    FieldSchema(name="embeddings", dtype=DataType.FLOAT_VECTOR, dim=1536),
    FieldSchema(name="record_id", dtype=DataType.VARCHAR, max_length=200),
    FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=65535),
    FieldSchema(name="type",dtype=DataType.VARCHAR,max_length=200, nullable=True),
    # Define optional fields, such as these ones related to the "metadata" field.
    FieldSchema(name="is_extracted", dtype=DataType.VARCHAR, max_length=5, nullable=True),
    FieldSchema(name="coordinates_points", dtype=DataType.JSON, nullable=True),
    FieldSchema(name="coordinates_system", dtype=DataType.VARCHAR, max_length=64, nullable=True),
    FieldSchema(name="coordinates_layout_width", dtype=DataType.INT32, nullable=True),
    FieldSchema(name="coordinates_layout_height", dtype=DataType.INT32, nullable=True),
    FieldSchema(name="filetype", dtype=DataType.VARCHAR, max_length=64, nullable=True),
    FieldSchema(name="languages", dtype=DataType.ARRAY, element_type=DataType.VARCHAR, max_length=64, max_capacity=10, nullable=True),
    FieldSchema(name="page_number", dtype=DataType.INT32, nullable=True),
    FieldSchema(name="image_mime_type", dtype=DataType.VARCHAR, max_length=64, nullable=True),
    FieldSchema(name="filename", dtype=DataType.VARCHAR, max_length=256, nullable=True),
    FieldSchema(name="data_source_url", dtype=DataType.VARCHAR, max_length=1024, nullable=True),
    FieldSchema(name="data_source_record_locator_protocol", dtype=DataType.VARCHAR, max_length=64, nullable=True),
    FieldSchema(name="data_source_record_locator_remote_file_path", dtype=DataType.VARCHAR, max_length=1024, nullable=True),
    FieldSchema(name="entities_items", dtype=DataType.JSON, nullable=True),
    FieldSchema(name="entities_relationships", dtype=DataType.JSON, nullable=True)
]
# ...
```

<Note>
  Fields with a `DataType.VARCHAR` data type are limited to a maximum length of 65,535 characters. Attempting to exceed this character count
  will cause Unstructured to throw errors when attempting to write to a Milvus collection, and the associated Unstructured job could fail.
  For example, `metadata` fields that typically exceed these character counts include `image_base64` and `orig_elements`.
</Note>

To create a Milvus destination connector, see the following examples.

<CodeGroup>
  ```python Python SDK theme={null}
  import os

  from unstructured_client import UnstructuredClient
  from unstructured_client.models.operations import CreateDestinationRequest
  from unstructured_client.models.shared import CreateDestinationConnector

  with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
      response = client.destinations.create_destination(
          request=CreateDestinationRequest(
              create_destination_connector=CreateDestinationConnector(
                  name="<name>",
                  type="milvus",
                  config={
                      "user": "<user>",
                      "uri": "<uri>",
                      "db_name": "<db-name>",
                      "password": "<password>",
                      "collection_name": "<collection-name>"
                  }
              )
          )
      )

      print(response.destination_connector_information)
  ```

  ```bash curl theme={null}
  curl --request 'POST' --location \
  "$UNSTRUCTURED_API_URL/destinations" \
  --header 'accept: application/json' \
  --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
  --header 'content-type: application/json' \
  --data \
  '{
      "name": "<name>",
      "type": "milvus",
      "config": {
          "user": "<user>",
          "uri": "<uri>",
          "db_name": "<db-name>",
          "password": "<password>",
          "collection_name": "<collection-name>"
      }
  }'
  ```
</CodeGroup>

Replace the preceding placeholders as follows:

* `<name>` (*required*) - A unique name for this connector.
* `<user>` (*required*) - The username to access the Milvus instance.
* `<uri>` (*required*) - The URI of the instance, for example: `https://12345.serverless.gcp-us-west1.cloud.zilliz.com.`
* `<db-name>` (*required*) - The name of the database in the instance.
* `<password>` (*required*) - The password corresponding to the username to access the instance.
* `<collection-name>` (*required*) - The name of the collection in the database.

---

# Source: https://docs.unstructured.io/examplecode/codesamples/api/mlk-research.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Historical research about Dr. Martin Luther King, Jr.

The legacy of the [Reverend Dr. Martin Luther King, Jr.](https://thekingcenter.org/about-tkc/martin-luther-king-jr/)
is one of courage, justice, and transformation. Historical records surrounding the assassination of Dr. King on
April 4, 1968 were declassified on July 21, 2025 by the
[U.S. National Archives and Records Administration](https://www.archives.gov/research/mlk). These documents provide
important insights into a pivotal moment in American history and the civil rights movement.

Unstructured has published several resources that allow you to use techniques such as retrieval-augmented generation (RAG) to
explore these documents and use them in your own applications. These resources include:

* The [MLK Archive Research Assistant](https://mlk-public.demo.unstructured.io/) website, hosted by Unstructured,
  which allows you to ask natural-language questions about these documents and get back answers in natural language, all in chatbot-style format.
  To support these answers, source citations are also provided, including confidence scores and page references.
* A [notebook](https://colab.research.google.com/github/Unstructured-IO/unstructured-mlk-archive-public/blob/main/MLK_Archive_RAG_Application.ipynb),
  developed by Unstructured and hosted in [Google Colab](https://colab.google/), which demonstrates how to use a
  [Jupyter notebook](https://jupyter-notebook.readthedocs.io/en/latest/notebook.html#notebook-documents) to access the same
  resources as the preceding MLK Archive Research Assistant website.
* A [GitHub repository](https://github.com/Unstructured-IO/unstructured-mlk-archive-public) that contains additional details and
  source code for the preceding resources.
* A [raw dataset](http://example-transformations-mlk-archive.s3-website-us-east-1.amazonaws.com/) website, also hosted by Unstructured,
  that contains links to all of the released 6,301 PDF source files spanning 243,496 pages of content, and 1 MP3 audio source file, for your own use. Additionally,
  Unstructured has processed these PDF source files and made the processed output available as a single, downloadable 12.9 GB
  [JSON Lines](https://jsonlines.org/) file through this website.

For questions, feedback, or bug reports about any of the Unstructured-provided resources,
[post an issue](https://github.com/Unstructured-IO/unstructured-mlk-archive-public/issues/new) on GitHub. You can also
[view and track open issues](https://github.com/Unstructured-IO/unstructured-mlk-archive-public/issues).

For questions or feedback about the historical records themselves,
[contact the National Archives](https://www.archives.gov/research/mlk#contact).

---

# Source: https://docs.unstructured.io/open-source/concepts/models.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Models

> Depending on your need, `Unstructured` provides OCR-based and Transformer-based models to detect elements in the documents. The models are useful to detect the complex layout in the documents and predict the element types.

**Basic usage:**

```python  theme={null}
elements = partition(filename=filename,
                     strategy="hi_res",
                     hi_res_model_name="yolox")
```

<Note>
  * To use any model with the partition, set the `strategy` to `hi_res` as shown above.

  * To maintain the consistency between the `unstructured` and `unstructured-api` libraries, we are deprecating the `model_name` parameter. Please use `hi_res_model_name` parameter when specifying a model.
</Note>

The `hi_res_model_name` parameter supports the `yolox` and `detectron2_onnx` arguments.

## Using a Non-Default Model

`Unstructured` will download the model specified in `UNSTRUCTURED_HI_RES_MODEL_NAME` environment variable. If not defined, it will download the default model.

There are three ways you can use the non-default model as follows:

1. Store the model name in the environment variable

```python  theme={null}
import os
from unstructured.partition.pdf import partition_pdf

os.environ["UNSTRUCTURED_HI_RES_MODEL_NAME"] = "yolox"

out_yolox = partition_pdf("example-docs/pdf/layout-parser-paper-fast.pdf", strategy="hi_res")

```

2. Pass the model name in the `partition` function.

```python  theme={null}
filename = "example-docs/pdf/layout-parser-paper-fast.pdf"

elements = partition(filename=filename,
                     strategy="hi_res",
                     hi_res_model_name="yolox")

```

3. Use [unstructured-inference](https://github.com/Unstructured-IO/unstructured-inference) library.

```python  theme={null}
from unstructured_inference.models.base import get_model
from unstructured_inference.inference.layout import DocumentLayout

model = get_model("yolox")
layout = DocumentLayout.from_file("sample-docs/layout-parser-paper.pdf", detection_model=model)

```

---

# Source: https://docs.unstructured.io/ui/sources/mongodb.md

# Source: https://docs.unstructured.io/ui/destinations/mongodb.md

# Source: https://docs.unstructured.io/open-source/ingestion/source-connectors/mongodb.md

# Source: https://docs.unstructured.io/open-source/ingestion/destination-connectors/mongodb.md

# Source: https://docs.unstructured.io/api-reference/workflow/sources/mongodb.md

# Source: https://docs.unstructured.io/api-reference/workflow/destinations/mongodb.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# MongoDB

<Note>
  If you're new to Unstructured, read this note first.

  Before you can create a destination connector, you must first sign in to your Unstructured account:

  * If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
    After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).
    To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/api-reference/overview#pricing).
  * If you already have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at
    [https://platform.unstructured.io](https://platform.unstructured.io). For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
    or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  After you sign in, the [Unstructured user interface](/ui/overview) (UI) appears, which you use to get your Unstructured API key.

  1. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business** account, click **API Keys** on the sidebar.<br />

     For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
     for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).

  2. Click **Generate API Key**.<br />

  3. Follow the on-screen instructions to finish generating the key.<br />

  4. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

  After you create the destination connector, add it along with a
  [source connector](/api-reference/workflow/sources/overview) to a [workflow](/api-reference/workflow/overview#workflows).
  Then run the worklow as a [job](/api-reference/workflow/overview#jobs). To learn how, try out the
  the notebook [Dropbox-To-Pinecone Connector API Quickstart for Unstructured](https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Dropbox_To_Pinecone_Connector_Quickstart.ipynb),
  or watch the two 4-minute video tutorials for the [Unstructured Python SDK](/api-reference/workflow/overview#unstructured-python-sdk).

  You can also create destination connectors with the Unstructured user interface (UI).
  [Learn how](/ui/destinations/overview).

  If you need help, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  You are now ready to start creating a destination connector! Keep reading to learn how.
</Note>

Send processed data from Unstructured to MongoDB.

The requirements are as follows.

<iframe width="560" height="315" src="https://www.youtube.com/embed/8YBVHt5spIQ" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

The MongoDB requirements for a MongoDB Atlas deployment include:

<Warning>
  For MongoDB Atlas, SCRAM-SHA-1 is not supported for authentication. This means that cluster types that only
  include SCRAM-SHA-1, such as **Free**, **M0**, **Flex**, and **Serverless**, are **not** supported.
  Unstructured only supports SCRAM-SHA-256 for MongoDB Atlas, which is cryptographically stronger than SCRAM-SHA-1.

  If you try to test or use a connector that refers to a cluster type that only includes SCRAM-SHA-1, the
  operation will fail, and you will get an error message similar to the following:
  `[digital envelope routines] unsupported`.
</Warning>

* A MongoDB Atlas account. [Create an account](https://www.mongodb.com/cloud/atlas/register).

* A MongoDB Atlas cluster. [Create a cluster](https://www.mongodb.com/docs/atlas/tutorial/create-new-cluster/). Be sure to **not**
  select a cluster type that only includes SCRAM-SHA-1, such as **Free**, **M0**, **Flex**, or **Serverless**.

* The cluster must be reachable from your application environment, for example by adding IP addresses to your IP access list. [Learn more](https://www.mongodb.com/docs/atlas/setup-cluster-security/#network-and-firewall-requirements).

* The cluster must be configured to allow IP address. [Learn how](https://www.mongodb.com/docs/atlas/security/ip-access-list/#add-ip-access-list-entries).

  To get Unstructured's IP address ranges, go to
  [https://assets.p6m.u10d.net/publicitems/ip-prefixes.json](https://assets.p6m.u10d.net/publicitems/ip-prefixes.json)
  and allow all of the `ip_prefix` fields' values that are listed.

  <Note>These IP address ranges are subject to change. You can always find the latest ones in the preceding file.</Note>

* The cluster must have at least one database. [Create a database](https://www.mongodb.com/docs/compass/current/databases/#create-a-database).

* The database must have at least one user, and that user must have sufficient access to the database. [Create a database user](https://www.mongodb.com/docs/atlas/security-add-mongodb-users/#add-database-users). [Give the user database access](https://www.mongodb.com/docs/manual/core/authorization/).

* The database must have at least one collection. [Create a collection](https://www.mongodb.com/docs/compass/current/collections/#create-a-collection).

  <Note>
    For the destination connector, Unstructured recommends that all documents in the target collection have a field
    named `record_id` with a `String` data type.
    Unstructured can use this field to do intelligent document overwrites. Without this field, duplicate documents
    might be written to the collection or, in some cases, the operation could fail altogether.
  </Note>

* The connection string for the cluster. For MongoDB Atlas, this connection string must include the protocol, username, password, host, and cluster name. For example:

  ```text  theme={null}
  mongodb+srv://<db_user>:<db_password>@<host>/?retryWrites=true&w=majority&appName=<cluster>
  ```

  To get the connection string in MongoDB Atlas, do the following:

  1. Log in to your MongoDB Atlas console.
  2. In the sidebar, under **Databases**, click **Clusters**.
  3. Click on the cluster you want to connect to.
  4. Click **Connect**.
  5. Click **Drivers**.
  6. Under **Add your connection string into your application code**, copy the connection string.
     You can then close the **Connect** dialog in MongoDB Atlas.

     Before you use this connection string, be sure to fill in any placeholders in the string, such as your MongoDB Atlas database user's password value.

  [Learn more](https://www.mongodb.com/resources/products/fundamentals/mongodb-connection-string).

To create a MongoDB destination connector, see the following examples.

<CodeGroup>
  ```python Python SDK theme={null}
  import os

  from unstructured_client import UnstructuredClient
  from unstructured_client.models.operations import CreateDestinationRequest
  from unstructured_client.models.shared import CreateDestinationConnector

  with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
      response = client.destinations.create_destination(
          request=CreateDestinationRequest(
              create_destination_connector=CreateDestinationConnector(
                  name="<name>",
                  type="mongodb",
                  config={
                      "database": "<database>",
                      "collection": "<collection>",
                      "uri": "<uri>"
                  }
              )
          )
      )

      print(response.destination_connector_information)
  ```

  ```bash curl theme={null}
  curl --request 'POST' --location \
  "$UNSTRUCTURED_API_URL/destinations" \
  --header 'accept: application/json' \
  --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
  --header 'content-type: application/json' \
  --data \
  '{
      "name": "<name>",
      "type": "mongodb",
      "config": {
          "database": "<database>",
          "collection": "<collection>",
          "uri": "<uri>"
      }
  }'
  ```
</CodeGroup>

Replace the preceding placeholders as follows:

* `<name>` (*required*) - A unique name for this connector.
* `<database>` (*required*) - The name of the database on the target MongoDB instance.
* `<collection>` (*required*) - The name of the collection within the database.
* `<uri>` (*required*) - The instance connection string.

## Learn more

* <Icon icon="blog" />  [How to go from S3 to MongoDB with no code using Unstructured](https://unstructured.io/blog/how-to-go-from-s3-to-mongodb-with-no-code-using-unstructured)

---

# Source: https://docs.unstructured.io/ui/destinations/motherduck.md

# Source: https://docs.unstructured.io/open-source/ingestion/destination-connectors/motherduck.md

# Source: https://docs.unstructured.io/api-reference/workflow/destinations/motherduck.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# MotherDuck

<Note>
  If you're new to Unstructured, read this note first.

  Before you can create a destination connector, you must first sign in to your Unstructured account:

  * If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
    After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).
    To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/api-reference/overview#pricing).
  * If you already have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at
    [https://platform.unstructured.io](https://platform.unstructured.io). For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
    or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  After you sign in, the [Unstructured user interface](/ui/overview) (UI) appears, which you use to get your Unstructured API key.

  1. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business** account, click **API Keys** on the sidebar.<br />

     For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
     for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).

  2. Click **Generate API Key**.<br />

  3. Follow the on-screen instructions to finish generating the key.<br />

  4. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

  After you create the destination connector, add it along with a
  [source connector](/api-reference/workflow/sources/overview) to a [workflow](/api-reference/workflow/overview#workflows).
  Then run the worklow as a [job](/api-reference/workflow/overview#jobs). To learn how, try out the
  the notebook [Dropbox-To-Pinecone Connector API Quickstart for Unstructured](https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Dropbox_To_Pinecone_Connector_Quickstart.ipynb),
  or watch the two 4-minute video tutorials for the [Unstructured Python SDK](/api-reference/workflow/overview#unstructured-python-sdk).

  You can also create destination connectors with the Unstructured user interface (UI).
  [Learn how](/ui/destinations/overview).

  If you need help, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  You are now ready to start creating a destination connector! Keep reading to learn how.
</Note>

Send processed data from Unstructured to MotherDuck.

The requirements are as follows.

<iframe width="560" height="315" src="https://www.youtube.com/embed/tj_0qmvPpJQ" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* A [MotherDuck account](https://app.motherduck.com).

* A [MotherDuck access token](https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/#creating-an-access-token) for the account.

* A database in the account.

  * [Create a database](https://motherduck.com/docs/sql-reference/motherduck-sql-reference/create-database/).
  * [List available databases](https://motherduck.com/docs/key-tasks/database-operations/basics-operations/#listing-databases).

  You can run commands to manage MotherDuck databases, schemas, tables, and more in the
  [MotherDuck UI](https://motherduck.com/docs/getting-started/motherduck-quick-tour/) or for example by connecting to MotherDuck with the
  [DuckDB CLI](https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/connecting-to-motherduck/).

* A schema in the target database.

  * [Create a schema](https://duckdb.org/docs/sql/statements/create_schema.html).
  * You can list available schemas and their parent catalogs by running the following command in the MotherDuck UI or the DuckDB CLI:

    ```sql  theme={null}
    SELECT * FROM information_schema.schemata;
    ```

  The MotherDuck connector uses the default schema name of `main` if not otherwise specified.

* A table in the target schema.

  * [Create a table](https://duckdb.org/docs/sql/statements/create_table).
  * You can list available tables in a schema by running the following commands in the MotherDuck UI or the DuckDB CLI, replacing the target catalog and schema names:

    ```sql  theme={null}
    USE <catalog_name>.<schema_name>;
    SHOW TABLES;
    ```

  The MotherDuck connector uses the default table name of `elements` if not otherwise specified.

  For maximum compatibility, Unstructured recommends the following table schema:

  ```sql  theme={null}
  CREATE TABLE elements (
      id VARCHAR,
      element_id VARCHAR,
      text TEXT,
      embeddings FLOAT[],
      type VARCHAR,
      system VARCHAR,
      layout_width DECIMAL,
      layout_height DECIMAL,
      points TEXT,
      url TEXT,
      version VARCHAR,
      date_created INTEGER,
      date_modified INTEGER,
      date_processed DOUBLE,
      permissions_data TEXT,
      record_locator TEXT,
      category_depth INTEGER,
      parent_id VARCHAR,
      attached_filename VARCHAR,
      filetype VARCHAR,
      last_modified TIMESTAMP,
      file_directory VARCHAR,
      filename VARCHAR,
      languages VARCHAR[],
      page_number VARCHAR,
      links TEXT,
      page_name VARCHAR,
      link_urls VARCHAR[],
      link_texts VARCHAR[],
      sent_from VARCHAR[],
      sent_to VARCHAR[],
      subject VARCHAR,
      section VARCHAR,
      header_footer_type VARCHAR,
      emphasized_text_contents VARCHAR[],
      emphasized_text_tags VARCHAR[],
      text_as_html TEXT,
      regex_metadata TEXT,
      detection_class_prob DECIMAL,
      partitioner_type VARCHAR
  );
  ```

  You can list the schema of a table by running the following commands in the MotherDuck UI or the DuckDB CLI, replacing the target catalog, schema, and table names:

  ```sql  theme={null}
  USE <catalog_name>.<schema_name>;
  DESCRIBE TABLE <table_name>;
  ```

To create a MotherDuck destination connector, see the following examples.

<CodeGroup>
  ```python Python SDK theme={null}
  import os

  from unstructured_client import UnstructuredClient
  from unstructured_client.models.operations import CreateDestinationRequest
  from unstructured_client.models.shared import CreateDestinationConnector

  with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
      response = client.destinations.create_destination(
          request=CreateDestinationRequest(
              create_destination_connector=CreateDestinationConnector(
                  name="<name>",
                  type="motherduck",
                  config={
                      "database": "<database>",
                      "db_schema": "<db-schema>",
                      "table": "<table>",
                      "md_token": "<md-token>"
                  }
              )
          )
      )

      print(response.destination_connector_information)
  ```

  ```bash curl theme={null}
  curl --request 'POST' --location \
  "$UNSTRUCTURED_API_URL/destinations" \
  --header 'accept: application/json' \
  --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
  --header 'content-type: application/json' \
  --data \
  '{
      "name": "<name>",
      "type": "motherduck",
      "config": {
          "database": "<database>",
          "db_schema": "<db-schema>",
          "table": "<table>",
          "md_token": "<md-token>"
      }
  }'
  ```
</CodeGroup>

Replace the preceding placeholders as follows:

* `<name>` (*required*): A unique name for this connector.
* `<database>` (*required*): The name of the target MotherDuck database.
* `<db-schema>` (*required*): The name of the target schema within the database.
* `<table>` The name of the target table within the schema. By default, this table is named `elements` if not otherwise specified.
* `<md-token>` (*required*): The access token value within the MotherDuck account that has the appropriate access to the target database, schema, and table.

## Learn more

* <Icon icon="blog" />  [Unstructured's New MotherDuck Integration](https://unstructured.io/blog/unstructured-s-new-motherduck-integration)

---

# Source: https://docs.unstructured.io/examplecode/codesamples/oss/multi-files-api-processing.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Multi-file API processing

<Note>
  This sample code utilizes the legacy [Unstructured Partition Endpoint](/api-reference/legacy-api/partition/overview).
</Note>

## Introduction

This guide demonstrates how to process multiple files using the Unstructured API and S3 Connector and implement context-aware chunking. The process involves installing dependencies, configuring settings, and utilizing Python scripts to manage and chunk data effectively.

## Prerequisites

Ensure you have Unstructured API key and access to an S3 bucket containing the target files.

## Step-by-Step Process

### Step 1: Install Unstructured and S3 Dependency

Install the unstructured package with S3 support.

```bash  theme={null}
pip install "unstructured-ingest[s3]"

```

### Step 2: Import Libraries

Import necessary libraries from the unstructured package for chunking and S3 processing.

```python  theme={null}
from unstructured_ingest.interfaces import (
    FsspecConfig,
    PartitionConfig,
    ProcessorConfig,
    ReadConfig,
)
from unstructured_ingest.runner import S3Runner

from unstructured.chunking.title import chunk_by_title
from unstructured.staging.base import dict_to_elements

```

### Step 3: Configuration

Set up the API key and S3 URL for accessing the data.

```python  theme={null}
UNSTRUCTURED_API_KEY = os.getenv('UNSTRUCTURED_API_KEY')
S3_URL = "s3://rh-financial-reports/world-development-bank-2023/"

```

### Step 4: Python Runner

Configure and run the S3Runner for processing the data.

```python  theme={null}
runner = S3Runner(
     processor_config=ProcessorConfig(
         verbose=True,
         output_dir=os.getenv("LOCAL_FILE_OUTPUT_DIR"),
         num_processes=8,
     ),
     read_config=ReadConfig(),
     partition_config=PartitionConfig(
         partition_endpoint=os.getenv("UNSTRUCTURED_API_URL"),
         partition_by_api=True,
         api_key=os.getenv("UNSTRUCTURED_API_KEY"),
         strategy="hi_res",
         hi_res_model_name="layout_v1.1.0",
     ),
     fsspec_config=FsspecConfig(
         remote_url=S3_URL,
     ),
 )

runner.run(anonymous=True)

```

### Step 5: Combine JSON Files from Multi-files Ingestion

Combine JSON files into a single dataset for further processing.

```json  theme={null}
combined_json_data = read_and_combine_json("Connector-Output/world-development-bank-2023")

```

### Step 6: Convert into Unstructured Elements for Chunking

Convert the combined JSON data into Unstructured Elements and apply chunking by title.

```python  theme={null}
elements = dict_to_elements(combined_json_data)
chunks = chunk_by_title(elements)

```

## Conclusion

Following these steps allows for efficient processing of multiple files using the Unstructured S3 Connector. The context-aware chunking helps in organizing and analyzing the data effectively.

---

# Source: https://docs.unstructured.io/ui/destinations/neo4j.md

# Source: https://docs.unstructured.io/open-source/ingestion/destination-connectors/neo4j.md

# Source: https://docs.unstructured.io/api-reference/workflow/destinations/neo4j.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Neo4j

<Note>
  If you're new to Unstructured, read this note first.

  Before you can create a destination connector, you must first sign in to your Unstructured account:

  * If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
    After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).
    To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/api-reference/overview#pricing).
  * If you already have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at
    [https://platform.unstructured.io](https://platform.unstructured.io). For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
    or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  After you sign in, the [Unstructured user interface](/ui/overview) (UI) appears, which you use to get your Unstructured API key.

  1. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business** account, click **API Keys** on the sidebar.<br />

     For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
     for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).

  2. Click **Generate API Key**.<br />

  3. Follow the on-screen instructions to finish generating the key.<br />

  4. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

  After you create the destination connector, add it along with a
  [source connector](/api-reference/workflow/sources/overview) to a [workflow](/api-reference/workflow/overview#workflows).
  Then run the worklow as a [job](/api-reference/workflow/overview#jobs). To learn how, try out the
  the notebook [Dropbox-To-Pinecone Connector API Quickstart for Unstructured](https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Dropbox_To_Pinecone_Connector_Quickstart.ipynb),
  or watch the two 4-minute video tutorials for the [Unstructured Python SDK](/api-reference/workflow/overview#unstructured-python-sdk).

  You can also create destination connectors with the Unstructured user interface (UI).
  [Learn how](/ui/destinations/overview).

  If you need help, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  You are now ready to start creating a destination connector! Keep reading to learn how.
</Note>

Send processed data from Unstructured to Neo4j.

The requirements are as follows.

* A [Neo4j deployment](https://neo4j.com/deployment-center/).

  * For the [Unstructured UI](/ui/overview) or the [Unstructured API](/api-reference/overview), local Neo4j deployments are not supported.
  * For [Unstructured Ingest](/open-source/ingestion/overview), local and non-local Neo4j deployments are supported.

  The following video shows how to set up a Neo4j Aura deployment:

  <iframe width="560" height="315" src="https://www.youtube.com/embed/fo8uDIm1zCE" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* The username and password for the user who has access to the Neo4j deployment. The default user is typically `neo4j`.

  * For a Neo4j Aura instance, the defaut user's is typically set when the instance is created.
  * For an AWS Marketplace, Microsoft Azure Marketplace, or Google Cloud Marketplace deployment of Neo4j, the default user is typically set during the deployment process.
  * For a local Neo4j deployment, you can [set the default user's initial password](https://neo4j.com/docs/operations-manual/current/configuration/set-initial-password/) or [recover an admin user and its password](https://neo4j.com/docs/operations-manual/current/authentication-authorization/password-and-user-recovery/).

* The connection URI for the Neo4j deployment, which starts with `neo4j://`, `neo4j+s://`, `bolt://`, or `bolt+s://`; followed by `localhost` or the host name; and sometimes ending with a colon and the port number (such as `:7687`). For example:

  * For a Neo4j Aura deployment, browse to the target Neo4j instance in the Neo4j Aura account and click **Connect > Drivers** to get the connection URI, which follows the format `neo4j+s://<host-name>`. A port number is not used or needed.
  * For an AWS Marketplace, Microsoft Azure Marketplace, or Google Cloud Marketplace deployment of Neo4j, see
    [Neo4j on AWS](https://neo4j.com/docs/operations-manual/current/cloud-deployments/neo4j-aws/),
    [Neo4j on Azure](https://neo4j.com/docs/operations-manual/current/cloud-deployments/neo4j-azure/), or
    [Neo4j on GCP](https://neo4j.com/docs/operations-manual/current/cloud-deployments/neo4j-gcp/)
    for details about how to get the connection URI.
  * For a local Neo4j deployment, the URI is typically `bolt://localhost:7687`
  * For other Neo4j deployment types, see the deployment provider's documentation.

  [Learn more](https://neo4j.com/docs/browser-manual/current/operations/dbms-connection).

* The name of the target database in the Neo4j deployment. A default Neo4j deployment typically contains two standard databases: one named `neo4j` for user data and another
  named `system` for system data and metadata. Some Neo4j deployment types support more than these two databases per deployment;
  Neo4j Aura instances do not.

  * [Create additional databases](https://neo4j.com/docs/operations-manual/current/database-administration/standard-databases/create-databases/)
    for a local Neo4j deployment that uses Enterprise Edition; or for Neo4j on AWS, Neo4j on Azure, or Neo4j on GCP deployments.
  * [Get a list of additional available databases](https://neo4j.com/docs/operations-manual/current/database-administration/standard-databases/listing-databases/)
    for a local Neo4j deployment that uses Enterprise Edition; or for Neo4j on AWS, Neo4j on Azure, or Neo4j on GCP deployments.

## Graph Output

The graph output of the Neo4j destination connector is represented in the following diagram:

```mermaid  theme={null}
graph BT
    subgraph dn [Document Node]
    D[Document]
    end
    style dn stroke-dasharray: 5
    
    subgraph en [Element Nodes]
    UE1[UnstructuredElement]
    UE2[UnstructuredElement]
    UE3[UnstructuredElement]
    UE4[UnstructuredElement]
    UE5[UnstructuredElement]
    UE6[UnstructuredElement]
    end
    style en stroke-dasharray: 5
    
    UE1 -->|PART_OF_DOCUMENT| D
    UE2 -->|PART_OF_DOCUMENT| D
    UE3 -->|PART_OF_DOCUMENT| D
    UE4 -->|PART_OF_DOCUMENT| D
    UE5 -->|PART_OF_DOCUMENT| D
    UE6 -->|PART_OF_DOCUMENT| D

    subgraph cn [Chunk Nodes]
    C1[Chunk]
    C2[Chunk]
    C3[Chunk]
    C4[Chunk]
    end
    style cn stroke-dasharray: 5
    
    C1 -->|NEXT_CHUNK| C2
    C2 -->|NEXT_CHUNK| C3
    C3 -->|NEXT_CHUNK| C4

    C1 -->|PART_OF_DOCUMENT| D
    C2 -->|PART_OF_DOCUMENT| D
    C3 -->|PART_OF_DOCUMENT| D
    C4 -->|PART_OF_DOCUMENT| D

    UE1 -.->|PART_OF_CHUNK| C1
    UE2 -.->|PART_OF_CHUNK| C1
    UE3 -.->|PART_OF_CHUNK| C2
    UE4 -.->|PART_OF_CHUNK| C3
    UE5 -.->|PART_OF_CHUNK| C4
    UE6 -.->|PART_OF_CHUNK| C4
```

[View the preceding diagram in full-screen mode](https://mermaid.live/view#pako:eNqFlN9vgjAQx_-Vps-6REEfeFiyFZYli7hskCyTxXS0ihFaU9oHo_7vq_IjgIzyxN330157d70TjDmh0IFbgQ8JeA4iBvSXq9_CQRhYuTxWGWUS-Br9KQC39pYOyki5VB5Tel2XS8H3dExwnmAh8NEBs4LohKA6hJfSOkJe7hh6k1XI9C4qlkpQUjK1Oh1UrUHVHlRng-p8QO1kgRqzoC8JxuPH8_vTR7BevqzdJQoXnh-cgVvf0wRYJsA2ATMTMP8f6FQz1tVEiWL7Vi3RpHBW5rRtWm3TbpmdnMbGnKIipb73FazRa-i_nXXAKvC9ZFWHuJfs6nrIUCVkKBIy1AjZpgTfGuWhwVRnnDT6ZFC3-vVpo0v6dKvRJH263eiRXh2OYEZFhndEj5nTlY6gTPSriaCjfwndYJXKCEbsolGsJP88shg6-onRERRcbRPobHCaa0sdCJbU3WHdbFmFHDD75jyrIUp2kotFMddu4-3yB3k-fcg).

In the preceding diagram:

* The `Document` node represents the source file.
* The `UnstructuredElement` nodes represent the source file's Unstructured `Element` objects, before chunking.
* The `Chunk` nodes represent the source file's Unstructured `Element` objects, after chunking.
* Each `UnstructuredElement` node has a `PART_OF_DOCUMENT` relationship with the `Document` node.
* Each `Chunk` node also has a `PART_OF_DOCUMENT` relationship with the `Document` node.
* Each `UnstructuredElement` node has a `PART_OF_CHUNK` relationship with a `Chunk` element.
* Each `Chunk` node, except for the "last" `Chunk` node, has a `NEXT_CHUNK` relationship with its "next" `Chunk` node.

Learn more about [document elements](/ui/document-elements) and [chunking](/ui/chunking).

Some related example Neo4j graph queries include the following.

Query for all available nodes and relationships:

```text  theme={null}
MATCH path=(source)-[relationship]->(target)
RETURN path
```

Query for `Chunk` to `Document` relationships:

```text  theme={null}
MATCH (chunk:Chunk)-[relationship:PART_OF_DOCUMENT]->(doc:Document)
RETURN chunk, relationship, doc
```

Query for `UnstructuredElement` to `Document` relationships:

```text  theme={null}
MATCH (element:UnstructuredElement)-[relationship:PART_OF_DOCUMENT]->(doc:Document)
RETURN element, relationship, doc
```

Query for `UnstructuredElement` to `Chunk` relationships:

```text  theme={null}
MATCH (element:UnstructuredElement)-[relationship:PART_OF_CHUNK]->(chunk:Chunk)
RETURN element, relationship, chunk
```

Query for `Chunk` to `Chunk` relationships:

```text  theme={null}
MATCH (this:Chunk)-[relationship:NEXT_CHUNK]->(previous:Chunk)
RETURN this, relationship, previous
```

Query for `UnstructuredElement` to `Chunk` to `Document` relationships:

```text  theme={null}
MATCH (element:UnstructuredElement)-[ecrelationship:PART_OF_CHUNK]-(chunk:Chunk)-[cdrelationship:PART_OF_DOCUMENT]->(doc:Document)
RETURN element, ecrelationship, chunk, cdrelationship, doc
```

Query for `UnstructuredElements` containing the text `jury`, and show their `Chunk` relationships:

```text  theme={null}
MATCH (element:UnstructuredElement)-[relationship:PART_OF_CHUNK]->(chunk:Chunk)
WHERE element.text =~ '(?i).*jury.*'
RETURN element, relationship, chunk
```

Query for the `Chunk` with the specified `id`, and show its `UnstructuredElement` relationships:

```text  theme={null}
MATCH (element:UnstructuredElement)-[relationship:PART_OF_CHUNK]->(chunk:Chunk)
WHERE chunk.id = '731508bf53637ce4431fe93f6028ebdf'
RETURN element, relationship, chunk
```

Additionally, for the [Unstructured UI](/ui/overview) and [Unstructured API's workflow operations](/api-reference/workflow/),
when a [Named entity recognition (NER)](/ui/enriching/ner) DAG node is added to a custom workflow,
any recognized entities are output as `Entity` nodes in the graph.

This additional graph output of the Neo4j destination connector is represented in the following diagram:

```mermaid  theme={null}
graph TD
    Chunk -->|HAS_ENTITY| Entity
    Entity -->|ENTITY_TYPE| Entity
```

In the preceding diagram:

* The `Chunk` node represents one of the source file's Unstructured `Element` objects, after chunking.
* The `Entity` node represents a recognized entity.
* A `Chunk` node can have `HAS_ENTITY` relationships with `Entity` nodes.
* An `Entity` node can have `ENTITY_TYPE` relationships with other `Entity` nodes.

Some related example Neo4j graph queries include the following.

Query for all available nodes and relationships:

```text  theme={null}
MATCH path=(source)-[relationship]->(target)
RETURN path
```

Query for `Entity` to `Entity` relationships:

```text  theme={null}
MATCH (child:Entity)-[relationship:ENTITY_TYPE]->(parent:Entity)
RETURN child, relationship, parent
```

Query for `Entity` nodes containing the text `PERSON`, and show their `Entity` relationships:

```text  theme={null}
MATCH (child:Entity)-[relationship:ENTITY_TYPE]->(parent:Entity)
WHERE parent.id = 'PERSON'
RETURN child, relationship, parent
```

Query for `Entity` nodes containing the text `amendment`, and show their `Chunk` relationships:

```text  theme={null}
MATCH (element:Chunk)-[relationship:HAS_ENTITY]->(entity:Entity)
WHERE entity.id =~ '(?i).*amendment.*'
RETURN element, relationship, entity
```

QUERY FOR `Entity` nodes containing the text `PERSON`, and show their `Entity` to `Entity` to `Chunk` relationships:

```text  theme={null}
MATCH (chunk:Chunk)-[ccrelationship:HAS_ENTITY]-(child:Entity)-[cprelationship:ENTITY_TYPE]->(parent:Entity)
WHERE parent.id =~ 'PERSON'
RETURN chunk, ccrelationship, child, cprelationship, parent
```

To create a Neo4j destination connector, see the following examples.

<CodeGroup>
  ```python Python SDK theme={null}
  import os

  from unstructured_client import UnstructuredClient
  from unstructured_client.models.operations import CreateDestinationRequest
  from unstructured_client.models.shared import CreateDestinationConnector

  with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
      response = client.destinations.create_destination(
          request=CreateDestinationRequest(
              create_destination_connector=CreateDestinationConnector(
                  name="<name>",
                  type="neo4j",
                  config={
                      "uri": "<uri>",
                      "database": "<database>",
                      "username": "<username>",
                      "password": "<password>",
                      "batch_size": <batch-size>
                  }
              )
          )
      )

      print(response.destination_connector_information)
  ```

  ```bash curl theme={null}
  curl --request 'POST' --location \
  "$UNSTRUCTURED_API_URL/destinations" \
  --header 'accept: application/json' \
  --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
  --header 'content-type: application/json' \
  --data \
  '{
      "name": "<name>",
      "type": "neo4j",
      "config": {
          "uri": "<uri>",
          "database": "<database>",
          "username": "<username>",
          "password": "<password>",
          "batch_size": <batch-size>
      }
  }'
  ```
</CodeGroup>

Replace the preceding placeholders as follows:

* `<name>` (*required*) - A unique name for this connector.
* `<uri>` (*required*) - The connection URI for the Neo4j deployment, which typically starts with `neo4j://`, `neo4j+s://`, `bolt://`, or `bolt+s://`; is followed by the host name; and ends with a colon and the port number (such as `:7473`, `:7474`, or `:7687`).
* `<database>` (*required*) - The name of the target database in the Neo4j deployment. A default Neo4j deployment typically contains a standard database named neo4j for user data.
* `<username>` (*required*) - The name of the user who has access to the Neo4j deployment. A default Neo4j deployment typically contains a default user named `neo4j`.
* `<password>` (*required*) - The password for the user.
* `<batch-size>` - The maximum number of nodes or relationships to be transmitted per batch. The default is `100` if not otherwise specified.

---

# Source: https://docs.unstructured.io/ui/enriching/ner.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Named entity recognition (NER)

<iframe width="560" height="315" src="https://www.youtube.com/embed/ZEMegRGjCR0" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

After partitioning, you can have Unstructured generate a list of recognized entities and their types (such as the names of organizations, products, and people) in the content, through a process known as *named entity recognition* (NER).
You can also have Unstructured generate a list of relationships between the entities that are recognized.

This NER is done by using models offered through various model providers.

Here is an example of a list of recognized entities and their entity types, along with a list of relationships between those
entities and their relationship types, using GPT-4o. Note specifically the `entities` field that is added to the `metadata` field.

```json  theme={null}
{
    "type": "CompositeElement",
    "element_id": "bc8333ea0d374670ff0bd03c6126e70d",
    "text": "SECTION. 3\n\nThe Senate of the United States shall be composed of two Senators from each State, 
        [chosen by the Legislature there- of,]* for six Years; and each Senator shall have one Vote.\n\n
        Immediately after they shall be assembled in Consequence of the first Election, they shall be divided
        as equally as may be into three Classes. The Seats of the Senators of the first Class shall be vacated
        at the Expiration of the second Year, of the second Class at the Expiration of the fourth Year, and of
        the third Class at the Expiration of the sixth Year, so that one third may be chosen every second Year;
        [and if Vacan- cies happen by Resignation, or otherwise, during the Recess of the Legislature of any
        State, the Executive thereof may make temporary Appointments until the next Meeting of the Legislature,
        which shall then fill such Vacancies.]*\n\nC O N S T I T U T I O N O F T H E U N I T E D S T A T E S",
    "metadata": {
        "filename": "constitution.pdf",
        "filetype": "application/pdf",
        "languages": [
            "eng"
        ],
        "page_number": 2,
        "entities": {
            "items": [
                {
                    "entity": "Senate",
                    "type": "ORGANIZATION"
                },
                {
                    "entity": "United States",
                    "type": "LOCATION"
                },
                {
                    "entity": "Senator",
                    "type": "ROLE"
                },
                {
                    "entity": "State",
                    "type": "LOCATION"
                },
                {
                    "entity": "Legislature",
                    "type": "ORGANIZATION"
                },
                {
                    "entity": "Executive",
                    "type": "ROLE"
                },
                {
                    "entity": "C O N S T I T U T I O N O F T H E U N I T E D S T A T E S",
                    "type": "DOCUMENT"
                }
            ],
            "relationships": [
                {
                    "from": "Senate",
                    "relationship": "based_in",
                    "to": "United States"
                },
                {
                    "from": "Senator",
                    "relationship": "has_role",
                    "to": "Senate"
                },
                {
                    "from": "Legislature",
                    "relationship": "has_office_in",
                    "to": "State"
                },
                {
                    "from": "Executive",
                    "relationship": "has_role",
                    "to": "State"
                },
                {
                    "from": "C O N S T I T U T I O N O F T H E U N I T E D S T A T E S",
                    "relationship": "dated",
                    "to": "DATE"
                }
            ]
        }
    }
}
```

Here is another example of some of the entities, their entity types, and relationships that are recognized for a given paragraph of text. This information
is generated by GPT-4o by OpenAI:

<img src="https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/enriching/NER-Example.png?fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=b7272e9da38ed62eb938183a27a5a245" alt="Named entity recognition for information in a paragraph of text" data-og-width="2338" width="2338" data-og-height="1484" height="1484" data-path="img/enriching/NER-Example.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/enriching/NER-Example.png?w=280&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=f13538d3701cae0477ae27802cf79c52 280w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/enriching/NER-Example.png?w=560&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=131350f9af835c330a50a9ee5f389564 560w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/enriching/NER-Example.png?w=840&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=03d860f02b2d3c216a492d5dcc1d4db3 840w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/enriching/NER-Example.png?w=1100&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=9b1bb4c8b4fa904996d0235d683666a6 1100w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/enriching/NER-Example.png?w=1650&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=d127222a75f9842cb9c68e514707473d 1650w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/enriching/NER-Example.png?w=2500&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=0a25ff77062961200f6e2b0b5c8ee379 2500w" />

By default, the following entity types are supported for NER:

* `PERSON`
* `ORGANIZATION`
* `LOCATION`
* `DATE`
* `TIME`
* `EVENT`
* `MONEY`
* `PERCENT`
* `FACILITY`
* `PRODUCT`
* `ROLE`
* `DOCUMENT`
* `DATASET`

By default, the following entity relationships are supported for NER:

* `PERSON` - `ORGANIZATION`: `works_for`, `affiliated_with`, `founded`
* `PERSON` - `LOCATION`: `born_in`, `lives_in`, `traveled_to`
* `ORGANIZATION` - `LOCATION`: `based_in`, `has_office_in`
* Entity - `DATE`: `occurred_on`, `founded_on`, `died_on`, `published_in`
* `PERSON` - `PERSON`: `married_to`, `parent_of`, `colleague_of`
* `PRODUCT` - `ORGANIZATION`: `developed_by`, `owned_by`
* `EVENT` - `LOCATION`: `held_in`, `occurred_in`
* Entity - `ROLE`: `has_title`, `acts_as`, `has_role`
* `DATASET` - `PERSON`: `mentions`
* `DATASET` - `DOCUMENT`: `located_in`
* `PERSON` - `DATASET`: `published`
* `DOCUMENT` - `DOCUMENT`: `referenced_in`, `contains`
* `DOCUMENT` - `DATE`: `dated`
* `PERSON` - `DOCUMENT`: `published`

You can add, rename, or delete items in this list of default entity types and default entity relationship types.
You can also add any clarifying instructions to the
prompt that is used to run NER. To do this, see the next section.

# Generate a list of entities and their relationships

To generate a list of recognized entities and their relationships, in an **Enrichment** node in a workflow, specify the following:

<Note>
  You can change a workflow's NER settings only through [Custom](/ui/workflows#create-a-custom-workflow) workflow settings.
</Note>

<Warning>
  The following models are no longer available as of the following dates:

  * Amazon Bedrock Claude Sonnet 3.5: October 22, 2025
  * Anthropic Claude Sonnet 3.5: October 22, 2025

  Unstructured recommends the following actions:

  * For new workflows, do not use any of these models.
  * For any workflow that uses any of these models, update that workflow as soon as possible to use a different model.

  Workflows that attempt to use any of these models on or after its associated date will return errors.
</Warning>

1. Select **Text**.

2. For **Model**, select one of the available models that are shown.

3. The selected model will follow a default set of instructions (called a *prompt*) to perform NER using a set of predefined entity types and relationships. To experiment
   with running the default prompt against some sample data, click **Edit**, and then click **Run Prompt**. The selected **Model** uses the
   **Prompt** to run NER on the **Input sample** and shows the results in the **Output**. Look specifically at the `response_json` field for the
   entities that were recognized and their relationships.

4. To customize the prompt, change the contents of **Prompt**.

   <Note>
     For best results, Unstructured strongly recommends that you limit your changes only to certain portions of the default prompt, specifically:

     * Adding, renaming, or deleting items in the list of predefined types (such as `PERSON`, `ORGANIZATION`, `LOCATION`, and so on).

     * Adding, renaming, or deleting items in the list of predefined relationships (such as `works_for`, `based_in`, `has_role`, and so on).

     * As needed, adding any clarifying instructions only between these two lines:

       ```text  theme={null}
       ...
       Provide the entities and their corresponding types as a structured JSON response.

       (Add any clarifying instructions here only.)

       [START OF TEXT]
       ...
       ```

     * Changing any other portions of the default prompt could produce unexpected results.
   </Note>

5. To experiment with different data, change the contents of **Input sample**. For best results, Unstructured strongly recommends that the JSON structure in **Input sample** be preserved.

6. When you are satisfied with the **Model** and **Prompt** that you want to use, click **Save**.

---

# Source: https://docs.unstructured.io/support/issues/network-connection-timeout.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Network, connection, and timeout issues

## Issues

When you try to establish or maintain a network connection between Unstructured and an external system, you get one of the following error types:

* `ReadTimeout`
* `TimeoutError`
* `ServerDisconnectedError`
* `ClientConnectorError`: For example, a `cannot connect to localhost` error.
* `ConnectionError`: For example, a `Failed to resolve 'model-registry-api...` error.
* `ServiceResponseError`: For example, for Azure, a `Timeout on reading data from socket` error.
* `ClientPayloadError`: For example, a `Response payload not completed` or `Connection reset by peer` error.
* `OSError`: For example, for Amazon S3 during a file upload, an `[Errno 22] The request body terminated unexpectedly` error.

## Possible causes

* Unstructured failed to connect to a required system.
* The connection timed out while waiting for a response.
* An established connection was unexpectedly closed.
* Transient network issues were encountered with the local network, with the Internet, or with a cloud provider.
* Firewalls are blocking connections.
* Some services are temporarily unavailable, slow, or unresponsive.
* The processing of very large files is causing long-running operations, which lead to exceeding default timeouts.
* Issues with DNS resolutions were encountered.

## Possible solutions

* **Retry the job**: Many network errors are transient. Wait a few minutes and then retry the job.
* **Check network connectivity**: Ensure stable network connectivity with the related source and destination systems.
* **Check firewalls**: Verify that any firewalls are not blocking necessary connections to source or destination APIs, blob storage, databases, vector stores, or ports.
* **Process smaller batches or files**: If timeouts occur during processing or uploading very large files, try using smaller files or batches.
* **Contact Unstructured Support**: For persistent `ClientConnectorError`, `ConnectionError`, or other frequent timeouts or disconnects that aren't resolved by retries, [contact Unstructured Support](/support/request). These issues can sometimes point to internal Unstructured system problems.

## Additional resources

* [Unstructured Status](https://status.unstructured.io) dashboard
* [Request Unstructured support](/support/request)

---

# Source: https://docs.unstructured.io/support/issues/no-fast-partitioning-for-images.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# The Fast strategy is not available for image files

## Issue

When you try to run a workflow in Unstructured with the following settings:

* The workflow contains a **Partitioner** node set to use the **Fast** partitioning strategy.
* The workflow is configured to process an image file, such as a PNG or JPG file.

You get an error that contains text similar to the following, and the workflow fails to run:

```text  theme={null}
422: Error in partition - {"detail":"The fast strategy is not available for image files"}
```

## Cause

Unstructured is trying to process an image file—such as a PNG or JPG file—by using the **Fast** partitioning strategy, but
this strategy is designed for text-based documents only and cannot analyze the layout of images.

## Solution

1. Change the workflow's **Partitioner** node's **Partition Strategy** setting to **Auto**, **High Res**, or **VLM**.
2. Re-run the workflow.

## Additional resources

* If your workflow is configured to process a mixture of file types, you should use the **Auto** partitioning strategy. Unstructured will then
  automatically choose the best partitioning strategy for each file.
* Learn more about the various [partitioning](/ui/partitioning) strategies.
* To ask questions or get additional help with this issue, see [requesting support](/support/request).

---

# Source: https://docs.unstructured.io/examplecode/notebooks.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Notebooks

> Notebooks contain complete working sample code for end-to-end solutions.

<CardGroup cols={2}>
  <Card title="Unstructured API: Convert Documents to Stylized HTML" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Convert_Documents_to_Stylized_HTML_using_the_Unstructured_API.ipynb">
    <br />

    Partition documents into structured elements using our VLM-based partitioning strategy, then convert the extracted HTML metadata into beautifully stylized web documents for RAG pipelines, knowledge bases, and AI-powered applications.

    <br />

    `Unstructured API` `On-demand jobs` `Local files`

    <br />
  </Card>

  <Card title="Unstructured API: On-Demand Jobs Quickstart" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Unstructured_API_On_Demand_Jobs_Quickstart.ipynb">
    <br />

    Learn the basics of how to use the Unstructured API to run an on-demand job with a short-lived workflow that takes one or more local files as input.

    <br />

    `Unstructured API` `On-demand jobs` `Local files` `Workflow operations` `Workflow templates`

    <br />
  </Card>

  <Card title="Unstructured API: On-Demand Jobs Walkthrough" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Unstructured_API_On_Demand_Jobs_Walkthrough.ipynb">
    <br />

    Learn how to use the Unstructured API to run an on-demand job with a short-lived workflow that takes one or more local files as input and applies basic partitioning and a few enrichments. Then, extend your knowledge of and skills
    with Unstructured by adding chunking, more enrichments, and embeddings to your on-demand jobs.

    <br />

    `Unstructured API` `On-demand jobs` `Local files` `Workflow operations` `Workflow templates`

    <br />
  </Card>

  <Card title="Adding a Memory Layer to RAG with Unstructured and Mem0" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/MemoryLayer_for_RAG.ipynb">
    <br />

    Learn how to build a personalized RAG system using Unstructured's document processing and Mem0's memory layer to create an AI assistant that remembers user preferences and adapts responses across sessions.

    <br />

    `Unstructured API` `Workflows` `RAG` `S3` `Weaviate` `Memory`

    <br />
  </Card>

  <Card title="Agentic Weekly AI News TL;DR" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Agentic-Weekly-AI-News-TLDR.ipynb">
    <br />

    Build an automated pipeline that scrapes the last 7 days of AI papers & posts (ArXiv, OpenAI, Anthropic, Hugging Face, DeepLearning.AI), processes them with Unstructured’s Hi Res partitioner for clean text, stores structured chunks in MongoDB, and generates both detailed summaries and an executive brief for a weekly newsletter.

    <br />

    `Unstructured API` `Workflows` `hi_res` `Scraping` `ArXiv` `OpenAI` `Anthropic` `Hugging Face` `DeepLearning.AI` `S3` `MongoDB` `Summarization`

    <br />
  </Card>

  <Card title="Graph RAG for Academic Papers" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/GraphRAG_for_Academic_Papers.ipynb">
    <br />

    Learn how to build a GraphRAG system for research papers using Unstructured's Named Entity Recognition to extract custom entities and relationships, then query them with Neo4j to answer complex questions that require understanding connections between models, datasets, and metrics.

    <br />

    `Unstructured API` `Workflows` `Graph RAG` `S3` `Neo4j`

    <br />
  </Card>

  <Card title="Unstructured API Walkthrough" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Unstructured_API_Walkthrough.ipynb">
    <br />

    This walkthrough provides you with deep, hands-on experience with the Unstructured API. As you follow along, you will learn how to use many of the Unstructured API's features for partitioning, enriching, chunking, and embedding.

    <br />

    `Unstructured API` `Workflows` `Workflow Operations` `Local File`

    <br />
  </Card>

  <Card title="Everything (from) Everywhere All at Once" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/RAG_From_Multiple_Data_Sources_FileTypes.ipynb">
    <br />

    Set up an AI assistant that answers questions by querying your company's scattered data. Retrieve context from contracts in Azure, sales decks in OneDrive, and emails in Outlook through a single RAG pipeline.

    <br />

    `Unstructured API` `Workflows` `RAG` `Azure Blob Storage` `Outlook` `OneDrive` `AstraDB`

    <br />
  </Card>

  <Card title="Agentic RAG with Visually Grounded Answers and Visual Citations" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Agentic-RAG-with-Visually-Grounded-Answers-and-Visual-Citations.ipynb">
    <br />

    Learn how to build an AI-powered document processing system that extracts both text and images from PDFs in S3, generates intelligent descriptions for visual elements, and enables a searchable knowledge base that can answer questions about charts, diagrams, and product visuals.

    <br />

    `Unstructured API` `Workflows` `S3` `Image Processing` `Visual RAG` `Enterprise AI`

    <br />
  </Card>

  <Card title="Building a Hybrid RAG System: From Fragmented Data to Unified Intelligence" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Unstructured_Hybrid_RAG_Pipeline_with_ElasticSearch.ipynb">
    <br />

    Learn how to build a comprehensive hybrid RAG system that processes multiple data sources simultaneously - combining S3 PDFs and Elasticsearch records into a unified knowledge base for enterprise AI applications.

    <br />

    `Unstructured API` `Workflows` `S3` `Elasticsearch` `Hybrid RAG` `Enterprise AI`

    <br />
  </Card>

  <Card title="Dropbox-to-Pinecone Connector API Quickstart" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Dropbox_To_Pinecone_Connector_Quickstart.ipynb">
    <br />

    Learn how to set up and run a custom workflow that uses a free Dropbox storage location as a source and a free Pinecone serverless index as a destination, suitable for powering RAG applications.

    <br />

    `Unstructured API` `Workflows` `Dropbox` `Pinecone`

    <br />
  </Card>

  <Card title="Getting Started with Unstructured API and PostgreSQL" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Getting_Started_with_Unstructured_API_and_PostgreSQL.ipynb">
    <br />

    Learn how to build data processing workflows using the Unstructured API and Python SDK to preprocess unstructured files from S3 and store the structured outputs in PostgreSQL for retrieval.

    <br />

    `Unstructured API` `Workflows` `S3` `PostgreSQL`

    <br />
  </Card>

  <Card title="Unstructured Partition Endpoint Quickstart" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Unstructured_Partition_Endpoint_Quickstart.ipynb">
    <br />

    This notebook calls the Unstructured Python SDK to have Unstructured process a local file by using the legacy Unstructured Partition Endpoint.

    <br />

    `Unstructured API` `Partition Endpoint` `Local file`

    <br />
  </Card>

  <Card title="Preserving Table Structure for Better Retrieval" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Preserving_Table_Structure_for_Better_Retrieval.ipynb">
    <br />

    This notebook explores using Unstructured API to process financial documents while preserving tabular structure in a way that's usable by downstream applications.

    <br />

    `Unstructured API` `Workflows` `S3`  `Astra DB`

    <br />
  </Card>

  <Card title="Historical research about MLK with the Unstructured API" href="https://colab.research.google.com/github/Unstructured-IO/unstructured-mlk-archive-public/blob/main/MLK_Archive_RAG_Application.ipynb">
    <br />This notebook explores how you can use Unstructured to gather and process declassified historical records surrounding the assassination of Dr. Martin Luther King, Jr. These processed documents can then be analyzed by using Elasticsearch and RAG.

    <br />

    `Unstructured API` `Workflows` `S3` `VLM` `NER` `Elasticsearch` `MLK` `National Archives`

    <br />
  </Card>

  <Card title="RAG without Embeddings" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Rag_without_Embeddings.ipynb">
    <br />

    Learn how to build a RAG pipeline without any embedding models. Use Unstructured to preprocess documents, index them into Elasticsearch, and retrieve using classic BM25 scoring.

    <br />

    `Unstructured API` `Workflows` `Elasticsearch` `BM25`

    <br />
  </Card>

  <Card title="Getting Started with Unstructured API and Redis" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Getting_Started_with_Unstructured_API_and_Redis.ipynb">
    <br />

    Learn how to build data processing workflows using the Unstructured API and Python SDK to preprocess unstructured files from S3 and store the structured outputs in Redis Cloud for retrieval.

    <br />

    `Unstructured API` `Workflows` `S3` `Redis`

    <br />
  </Card>

  <Card title="Create a S3 to Qdrant Pipeline using the Unstructured API" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/S3_to_Qdrant_Workflow_using_Unstructured_API.ipynb">
    <br />

    This notebook walks through using the Unstructured API's workflow operations to set up a complete pipeline that pulls documents from S3, processes them using Unstructured, and stores the resulting embeddings in Qdrant for fast vector search and retrieval.

    <br />

    `Unstructured API` `Workflows` `S3` `Qdrant` `VLM` `Embeddings`

    <br />
  </Card>

  <Card title="Two-stage retrieval: similarity search + rerankers" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Rag_with_Reranking.ipynb">
    <br />

    Improve RAG precision with a two-stage retrieval pipeline: fast vector search followed by reranking using Cohere’s re-ranker models.

    <br />

    `Unstructured API` `Workflows` `Cohere` `Pinecone`

    <br />
  </Card>

  <Card title="Create a S3 to MongoDB Pipeline using the Unstructured API" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/S3_to_MongoDB_Workflow_using_Unstructured_API.ipynb">
    <br />

    Learn how to build an end-to-end document processing pipeline that processes PDFs from S3 and stores structured results in MongoDB. Features VLM-powered partitioning, semantic chunking, and vector embeddings using the Unstructured Workflows API.

    <br />

    `Unstructured API` `Workflows` `S3` `MongoDB` `VLM` `Embeddings`

    <br />
  </Card>

  <Card title="Getting Started with Unstructured API and IBM watsonx.data" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Azure_to_IBM_WatsonX.ipynb">
    <br />

    Learn how to create data processing workflows with Unstructured API and its Python SDK to preprocess all of your unstructured data from your Azure Blob Storage into your IBM watsonx.data instance.

    <br />

    `Unstructured API` `Workflows` `Azure Blob Storage` `IBM watsonx.data`

    <br />
  </Card>

  <Card title="Using Unstructured with Snowflake Cortex Search for RAG" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Use_Unstructured_with_Snowflake_Cortex_for_RAG_Search.ipynb">
    <br />

    Use Snowflake Cortex and RAG to do natural-language searches across a Snowflake table that contains data provided by Unstructured. Additional Snowflake Cortex functions are also explored.

    <br />

    `Unstructured API` `Snowflake Cortex` `RAG Search` `Workflows` `S3`

    <br />
  </Card>

  <Card title="Agentic RAG with LangGraph and Together AI" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/AgenticRAG_with_LangGraph,TogetherAI.ipynb">
    <br />

    Build Agentic RAG with `LangGraph` and `Together AI` and compare the results with Vanilla RAG in pure Python

    <br />

    `Unstructured API` `Workflows` `Agents` `LangGraph` `Together AI` `Astra DB`

    <br />
  </Card>

  <Card title="Getting Started with Unstructured API and Snowflake" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Getting_Started_with_Unstructured_API_and_Snowflake.ipynb">
    <br />

    Learn how to create data processing workflows with Unstructured API and its Python SDK to preprocess all of your unstructured data from your Azure Blob Storage into your Snowflake Table.

    <br />

    `Unstructured API` `Workflows` `Azure Blob Storage` `Snowflake`

    <br />
  </Card>

  <Card title="Building Graph-Based RAG Applications" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Building_Graph_Based_RAG_Applications_with_Unstructured_and_AstraDB.ipynb">
    <br />

    Learn how to use the Unstructured API to create a Graph RAG-based workflow that writes data with named entity recognition (NER) to your Astra DB.

    <br />

    `Unstructured API` `Workflows` `Graph RAG` `NER` `Astra DB`

    <br />
  </Card>

  <Card title="Getting Started with Unstructured API and Delta Tables in Databricks" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Getting_Started_with_Unstructured_API_and_Delta_Tables_in_Databricks.ipynb">
    <br />

    Learn how to create data processing workflows with Unstructured API and its Python SDK to preprocess all of your unstructured data into your Delta Table.

    <br />

    `Unstructured API` `Workflows` `Databricks` `S3`

    <br />
  </Card>

  <Card title="RAG for Online Documentation" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/RAG_for_documentation.ipynb">
    <br />

    Crawl websites with Firecrawl and build a RAG workflow powered by Unstructured and MongoDB Atlas vector search.

    <br />

    `Unstructured API` `Workflows` `MongoDB`

    <br />
  </Card>

  <Card title="Unstructured API Workflow Operations Quickstart" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Unstructured_API_Workflow_Operations_Quickstart.ipynb">
    <br />

    Build an end-to-end workflow in Unstructured programmatically by using the Unstructured API's workflow operations.

    <br />

    `Unstructured API` `Workflows` `S3`

    <br />
  </Card>

  <Card title="RAG with Databricks Vector Search with Context from Multiple Sources" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Delta_Tables_Databricks_Multiple_Sources.ipynb">
    <br />

    Build RAG with Databricks Vector Search with context preprocessed from multiple sources by Unstructured.

    <br />

    `Databricks` `Introductory notebook`

    <br />
  </Card>

  <Card title="Agentic RAG with Hugging Face smolagents vs Vanilla RAG" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Agentic_RAG_with_HuggingFace_smolagents.ipynb">
    <br />

    Build Agentic RAG with `smolagents` library and compare the results with Vanilla RAG in pure Python

    <br />

    `GPT-4o` `smolagents` `Agents` `DataStax` `S3` `Advanced notebook`

    <br />
  </Card>

  <Card title="LLama3.2 RAG evaluation on unstructured text" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Llama3_2_RAG_evaluation_on_Unstructured_Text_via_VLM.ipynb">
    <br />

    Evaluate Llama3.2 for your RAG system with Unstructured, GPT-4o, Ragas, and LangChain

    <br />

    `GPT-4o` `Ragas` `LangChain` `Llama3.2` `Pinecone` `S3` `Advanced notebook`

    <br />
  </Card>

  <Card title="Multimodal RAG: Enhancing RAG outputs with image results" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Multimodal_RAG_with_image_results.ipynb">
    <br />

    Process a file in S3 with Unstructured and return images in your RAG output

    <br />

    `S3` `FAISS` `GPT-4o-mini` `Advanced notebook`

    <br />
  </Card>

  <Card title="Quantitative Reasoning with tables inside PDFs" href="https://app.hex.tech/5e6b6e24-dead-4d3b-b9da-a9a7ad587b96/hex/cb595d8a-6eac-4e19-96ed-b1540e5c031c/draft/logic">
    <br />

    From Pixels to Insights: Seamlessly Extracting and Visualizing Table Data with Unstructured and Hex

    <br />

    `Unstructured API` `Hex` `Advanced notebook`

    <br />
  </Card>

  <Card title="PII removal with GLiNER in unstructured data ETL" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/PII_removal_in_unstructured_data_ETL.ipynb">
    <br />

    Remove Personally Identifiable Information (PII) as a part of unstructured data preprocessing.

    <br />

    `Unstructured API`  `PII` `GLiNER` `Advanced notebook`

    <br />
  </Card>

  <Card title="Custom metadata extraction and self-querying retrieval" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/custom_metadata_self_querying_rag_mongodb_unstructured_langgraph.ipynb">
    <br />

    Extract custom metadata, and enable metadata pre-filtering in your RAG.

    <br />

    `Unstructured API` `MongoDB`  `Metadata` `Advanced notebook`

    <br />
  </Card>

  <Card title="Selecting an embedding model for custom data" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Selecting_an_embedding_model_for_custom_data.ipynb">
    <br />

    End-to-end data processing pipeline using Unstructured Serverless API.

    <br />

    `Unstructured API` `Hugging Face` `Advanced notebook`

    <br />
  </Card>

  <Card title="RAG with PDFs, LangChain and Llama 3" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/RAG_Llama3_Unstructured_LangChain.ipynb">
    <br />

    A RAG system with the Llama 3 model from Hugging Face.

    <br />

    `Unstructured API`  `🤗 Hugging Face` `LangChain` `Llama 3` `Introductory notebook`
  </Card>

  <Card title="Unstructured data ETL from S3 to SingleStore DB" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Unstructured_data_ETL_from_S3_to_SingleStore.ipynb">
    <br />

    Learn to ingest, partition, chunk, embed and load data from an S3 bucket into SingleStore DB.

    <br />

    `Unstructured API`  `SingleStoreDB` `AWS S3` `Introductory notebook`
  </Card>

  <Card title="Google Drive to DataStax Astra DB" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Unstructured_Google_Docs_to_Astra.ipynb">
    <br />

    Embed your Google Drive Docs in an Astra Vector Database with Unstructured Serverless API

    <br />

    `Unstructured API` `Google` `DataStax` `Introductory notebook`

    <br />
  </Card>

  <Card title="Weaviate RAG quickstart" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Unstructured_Weaviate_Quickstart_OpenAI.ipynb">
    <br />

    Embed your local documents in an Weaviate Vector Database with Unstructured Serverless API

    <br />

    `Unstructured API` `OpenAI` `Weaviate` `Introductory notebook`

    <br />
  </Card>

  <Card title="Preprocess PDFs in AWS S3, load into Elasticsearch" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/S3_to_Elasticsearch_with_Unstructured.ipynb">
    <br />

    Ingest PDF documents from an S3 bucket, transform them into a normalized JSON with Unstructured Serverless API, chunk, embed and load into Elasticsearch.

    <br />

    `Unstructured API` `AWS S3` `Elasticsearch` `Introductory notebook`

    <br />
  </Card>

  <Card title="Preprocess documents in Google Drive, load into Databricks Volume" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/GoogleDrive_to_Databricks_Connector.ipynb">
    <br />

    Preprocess documents from a Google Drive Unstructured Serverless API and load them into Databricks Volume.

    <br />

    `Unstructured API` `Google Drive` `Databricks` `Introductory notebook`

    <br />
  </Card>

  <Card title="Source references in RAG responses" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/RAG_on_arXiv_papers_with_source_references.ipynb">
    <br />

    Add document source references to RAG responses based on documents metadata.

    <br />

    `Unstructured API` `RAG` `LangChain` `Intermediate notebook`

    <br />
  </Card>

  <Card title="Query processed PDF with HuggingChat" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/PDF_with_Unstructured_and_HuggingChat.ipynb">
    <br />

    Send a PDF to Unstructured for processing, and send a subset of the returned PDF's processed text to [HuggingChat](https://huggingface.co/chat/) for chatbot-style querying.

    <br />

    `Unstructured API`  `🤗 Hugging Face` `🤗 HuggingChat` `Introductory notebook`
  </Card>

  <Card title="Llama 3 Local RAG with emails" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Local_RAG_with_emails.ipynb">
    <br />

    Build a local RAG app for your emails with Unstructured, LangChain and Ollama.

    <br />

    `Unstructured API` `LangChain` `Ollama` `Llama 3` `Introductory notebook`
  </Card>

  <Card title="Building RAG With PowerPoint presentations" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Building_RAG_with_Powerpoint_presentations.ipynb">
    <br />

    A RAG solution that is based on PowerPoint files.

    <br />

    `Unstructured API`  `🤗 Hugging Face` `LangChain` `Llama 3` `Introductory notebook`
  </Card>

  <Card title="Synthetic test dataset generation" href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/RAG_synthetic_test_data_with_Unstructured_GPT_4o_and_Ragas.ipynb">
    <br />

    Build a Synthetic Test Dataset for your RAG system in 5 easy steps

    <br />

    `Unstructured API` `GPT-4o` `Ragas` `LangChain` `Advanced notebook`

    <br />
  </Card>
</CardGroup>

---

# Source: https://docs.unstructured.io/open-source/ingestion/source-connectors/notion.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Notion

Connect Notion to your preprocessing pipeline, and use the Unstructured Ingest CLI or the Unstructured Ingest Python library to batch process all your documents and store structured outputs locally on your filesystem.

The requirements are as follows.

* A Notion access token for an
  [internal integration](https://developers.notion.com/docs/authorization#internal-integration-auth-flow-set-up) or
  [public integration](https://developers.notion.com/docs/authorization#public-integration-auth-flow-set-up).

* The IDs of any specific target Notion [databases](https://developers.notion.com/docs/working-with-databases). To get a database's ID:

  1. Open the database as a full page in Notion.

  2. Click **Share**, and then click **Copy link**.

  3. Paste the link in your text editor. The URL uses the following format:

     ```text  theme={null}
     https://www.notion.so/{workspace_name}/{database_id}?v={view_id}
     ```

  4. Find the part that corresponds to `{database_id}` in the URL you pasted. It is a 36-character string. This value is the database's ID.

     Note that if you receive a database ID by calling the Notion API, for example from the Notion search endpoint, it will contain hyphens in the UUIDv4 format. You may use either the hyphenated or un-hyphenated ID.

* The IDs of any specific target Notion [pages](https://developers.notion.com/docs/working-with-page-content). To get a page's ID:

  1. Open the page in Notion.
  2. Click **Share**, and then click **Copy link**.
  3. Paste the link in your text editor. The URL ends in a page ID. It should be a 32-character string.
  4. Format this value by inserting hyphens (`-`) in the following pattern: `8-4-4-4-12` (each number is the length of characters between the hyphens).

     Example: `1429989fe8ac4effbc8f57f56486db54` becomes `1429989f-e8ac-4eff-bc8f-57f56486db54`.

     This value is the page's ID.

The Notion connector dependencies:

```bash CLI, Python theme={null}
pip install "unstructured-ingest[notion]"
```

You might also need to install additional dependencies, depending on your needs. [Learn more](/open-source/ingestion/ingest-dependencies).

The following environment variables:

* `NOTION_ACCESS_TOKEN` - The API key for the Notion integration, represented by `--notion-api-key` (CLI) or `notion_api_key` (Python).

Now call the Unstructured CLI or Python. The destination connector can be any of the ones supported. This example uses the local destination connector.

This example sends data to Unstructured for processing by default. To process data locally instead, see the instructions at the end of this page.

<CodeGroup>
  ```bash CLI theme={null}
  #!/usr/bin/env bash

  # Chunking and embedding are optional. 

  unstructured-ingest \
    notion \
      --notion-api-key $NOTION_ACCESS_TOKEN \
      --page-ids 4b1a4fff-d740-4e59-aa9f-cc5c7c2b4789,47dcbd6d-13f1-4fc1-af7d-5fc147697c00 \
      --database-ids 1382c376-5a0a-8034-a3c3-d33402979df2,f2a60ad9-c466-4b06-859d-1ab176dea8c8 \
      --recursive \
      --output-dir $LOCAL_FILE_OUTPUT_DIR \
      --partition-by-api \
      --api-key $UNSTRUCTURED_API_KEY \
      --partition-endpoint $UNSTRUCTURED_API_URL
  ```

  ```python Python Ingest theme={null}
  import os

  from unstructured_ingest.pipeline.pipeline import Pipeline
  from unstructured_ingest.interfaces import ProcessorConfig

  from unstructured_ingest.processes.connectors.notion.connector import (
      NotionIndexerConfig,
      NotionDownloaderConfig,
      NotionConnectionConfig,
      NotionAccessConfig
  )

  from unstructured_ingest.processes.partitioner import PartitionerConfig
  from unstructured_ingest.processes.chunker import ChunkerConfig
  from unstructured_ingest.processes.embedder import EmbedderConfig

  from unstructured_ingest.processes.connectors.local import LocalUploaderConfig

  # Chunking and embedding are optional.

  if __name__ == "__main__":
      Pipeline.from_configs(
          context=ProcessorConfig(),
          indexer_config=NotionIndexerConfig(
              page_ids=[
                  "4b1a4fff-d740-4e59-aa9f-cc5c7c2b4789",
                  "47dcbd6d-13f1-4fc1-af7d-5fc147697c00"
              ],
              database_ids=[
                  "1382c376-5a0a-8034-a3c3-d33402979df2", 
                  "f2a60ad9-c466-4b06-859d-1ab176dea8c8"
              ],
              recursive=True
          ),
          downloader_config=NotionDownloaderConfig(download_dir=os.getenv("LOCAL_FILE_DOWNLOAD_DIR")),
          source_connection_config=NotionConnectionConfig(
              access_config=NotionAccessConfig(notion_api_key=os.getenv("NOTION_ACCESS_TOKEN"))
          ),
          partitioner_config=PartitionerConfig(
              partition_by_api=True,
              api_key=os.getenv("UNSTRUCTURED_API_KEY"),
              partition_endpoint=os.getenv("UNSTRUCTURED_API_URL"),
              additional_partition_args={
                  "split_pdf_page": True,
                  "split_pdf_allow_failed": True,
                  "split_pdf_concurrency_level": 15
              }
          ),
          chunker_config=ChunkerConfig(chunking_strategy="by_title"),
          embedder_config=EmbedderConfig(embedding_provider="huggingface"),
          uploader_config=LocalUploaderConfig(output_dir=os.getenv("LOCAL_FILE_OUTPUT_DIR"))
      ).run()
  ```
</CodeGroup>

For the Unstructured Ingest CLI and the Unstructured Ingest Python library, you can use the `--partition-by-api` option (CLI) or `partition_by_api` (Python) parameter to specify where files are processed:

* To do local file processing, omit `--partition-by-api` (CLI) or `partition_by_api` (Python), or explicitly specify `partition_by_api=False` (Python).

  Local file processing does not use an Unstructured API key or API URL, so you can also omit the following, if they appear:

  * `--api-key $UNSTRUCTURED_API_KEY` (CLI) or `api_key=os.getenv("UNSTRUCTURED_API_KEY")` (Python)
  * `--partition-endpoint $UNSTRUCTURED_API_URL` (CLI) or `partition_endpoint=os.getenv("UNSTRUCTURED_API_URL")` (Python)
  * The environment variables `UNSTRUCTURED_API_KEY` and `UNSTRUCTURED_API_URL`

* To send files to the legacy [Unstructured Partition Endpoint](/api-reference/legacy-api/partition/overview) for processing, specify `--partition-by-api` (CLI) or `partition_by_api=True` (Python).

  Unstructured also requires an Unstructured API key and API URL, by adding the following:

  * `--api-key $UNSTRUCTURED_API_KEY` (CLI) or `api_key=os.getenv("UNSTRUCTURED_API_KEY")` (Python)
  * `--partition-endpoint $UNSTRUCTURED_API_URL` (CLI) or `partition_endpoint=os.getenv("UNSTRUCTURED_API_URL")` (Python)
  * The environment variables `UNSTRUCTURED_API_KEY` and `UNSTRUCTURED_API_URL`, representing your API key and API URL, respectively.

  <Note>
    You must specify the API URL only if you are not using the default API URL for Unstructured Ingest, which applies to **Let's Go**, **Pay-As-You-Go**, and **Business SaaS** accounts.

    The default API URL for Unstructured Ingest is `https://api.unstructuredapp.io/general/v0/general`, which is the API URL for the legacy[Unstructured Partition Endpoint](/api-reference/legacy-api/partition/overview). However, you should always use the URL that was provided to you when your Unstructured account was created. If you do not have this URL, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

    If you do not have an API key, [get one now](/api-reference/legacy-api/partition/overview).

    If you are using a **Business** account, the process
    for generating Unstructured API keys, and the Unstructured API URL that you use, are different.
    For instructions, see your Unstructured account administrator, or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).
  </Note>

---

# Source: https://docs.unstructured.io/business/gcp/onboard.md

# Source: https://docs.unstructured.io/business/azure/onboard.md

# Source: https://docs.unstructured.io/business/aws/onboard.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Unstructured Business in-VPC on Amazon Web Services (AWS) - onboarding checklist

<Note>
  The following information applies only to in-VPC deployments of [Unstructured Business](/business/overview).

  For dedicated instance deployments of Unstructured Business, contact your Unstructured sales representative,
  or email Unstructured Sales at [sales@unstructured.io](mailto:sales@unstructured.io).

  For dedicated instance deployments of Unstructured Business that also use PrivateLink, see
  the [onboarding process for AWS PrivateLink](/business/aws/dedicated-instance-privatelink).
</Note>

After your organization has signed the **Business** account agreement with Unstructured, a member of the Unstructured technical enablement team will reach out to you to begin the
deployment onboarding process. To streamline this process, you are encouraged to begin setting up your target environment as soon as possible. Choose one of the following setup options:

* [Do it all for me](#do-it-all-for-me): Have Unstructured set up the required infrastructure in your AWS account and then deploy the Unstructured UI and API into that newly created infrastructure.
* [Bring my own infrastructure](#bring-my-own-infrastructure): Set up the required infrastructure yourself in your AWS account, and then have Unstructured deploy the Unstructured UI and API into your existing infrastructure.

## Questions? Need help?

If you have questions or need help as you go, contact your Unstructured sales representative or technical enablement contact. If you do not know who they are,
email Unstructured Sales at [sales@unstructured.io](mailto:sales@unstructured.io), and a member of the Unstructured sales or technical enablement teams
will get back to you as soon as possible.

## Do it all for me

If you want Unstructured to set up the required infrastructure for you in your AWS account and then deploy the Unstructured UI and API into that newly created infrastructure, then provide your Unstructured sales representative or technical enablement contact with
the access credentials for an IAM user or service principal in your AWS account that has the following required permissions.

### Core networking permissions

For VPC and subnet management:

* `ec2:CreateVpc`
* `ec2:CreateSubnet`
* `ec2:CreateRouteTable`
* `ec2:CreateInternetGateway`
* `ec2:CreateNatGateway`
* `ec2:ModifyVpcAttribute` (for DNS settings)
* `ec2:AssociateRouteTable`, `ec2:CreateRoute` (for public and private route tables)
* `ec2:AllocateAddress` (for Elastic IP assignment to the NAT Gateway)

For security group rules:

* `ec2:AuthorizeSecurityGroupIngress/Egress` (to configure cluster and node security groups to allow VPC CIDR traffic)

### EKS permissions

For the cluster role:

* Attach the managed policies `AmazonEKSClusterPolicy` and `AmazonEKSVPCResourceController` to a role with `sts:AssumeRole` trust for `eks.amazonaws.com`

For the node group role:

Attach these managed policies:

* `AmazonEKSWorkerNodePolicy` (for node operations)
* `AmazonEKS_CNI_Policy` (for networking)
* `AmazonEC2ContainerRegistryReadOnly` (for ECR access)

For OIDC integration:

* `iam:CreateOpenIDConnectProvider` (to associate the EKS cluster with IAM OIDC)
* `iam:CreateRole` + `iam:AttachRolePolicy` (for service accounts in the `recommender`, `etl-operator`, and `data-broker` namespaces)

### Storage and database

These permissions:

* `s3:CreateBucket`
* `s3:PutBucketVersioning`
* `s3:PutBucketEncryption`

For these S3 buckets:

* `u10d-*-etl-blob-cache`
* `u10d-*-etl-job-db`
* `u10d-*-etl-job-status`
* `u10d-*-job-files`

For RDS:

* `rds:CreateDBInstance`
* `rds:CreateDBSubnetGroup`
* `rds:CreateDBSecurityGroup` + `ec2:AuthorizeSecurityGroupIngress` (to allow VPC CIDR access)

### Add-ons and utilities

For the EBS CSI Driver:

* `eks:CreateAddon` with IAM role attachment permissions for the `ebs.csi.aws.com` service account

For the SSH Key:

* `ec2:CreateKeyPair` + `ec2:ExportKeyPair` (for node group remote access)

### Cross-service requirements

* For IAM: `iam:PassRole` (to assign roles to EKS, RDS, and S3)
* For KMS: `kms:CreateKey` (if using CMK for S3 and RDS encryption)
* For CloudFormation: `cloudformation:*`

For least privilege, scope resource ARNs in policies (for example, restrict S3 bucket names with wildcards such as `u10d-*-etl*`).
The EKS Pod Identity Agent requires `eks-auth:AssumeRoleForPodIdentity` permission on node roles when used with IRSA.

## Bring my own infrastructure

If you want to set up the required infrastructure yourself, set things up as follows within your AWS account for Unstructured to deploy the Unstructured UI and API into.

You must also provide your Unstructured sales representative or technical enablement contact with
the access credentials for an IAM user or service principal in your AWS account that has access to the target Amazon Elastic Kubernetes Service (EKS) cluster to deploy the
Unstructured UI and API into.

### VPC and networking

* **VPC**

  * CIDR: `10.0.0.0/16` - Any CIDR should work, but make sure it has enough space.
  * DNS Hostnames: Enabled
  * DNS Support: Enabled

* **Internet Gateway**

  * Attached to the VPC

* **Public Subnet**

  * CIDR: `10.0.0.0/24`
  * Public IP on launch: true
  * Availability Zone: `${region}a`

* **NAT Gateway + Elastic IP**

  * Lives in the public subnet

* **Private Subnets (x2)**

  * CIDRs: `10.0.1.0/24`, `10.0.2.0/24`
  * AZs: `${region}a` and `${region}b`

* **Route Tables**

  * Public: default route (`0.0.0.0/0`) via IGW
  * Private (x2): default route via NAT Gateway

### **IAM roles and policies**

* **EKS Cluster Role**

  * Trusts: `eks.amazonaws.com`
  * Attached policies:

    * `AmazonEKSClusterPolicy`
    * `AmazonEKSVPCResourceController`

* **EKS Node Group Role**

  * Trusts: `ec2.amazonaws.com`, `eks.amazonaws.com`
  * Attached policies:

    * `AmazonEKSWorkerNodePolicy`
    * `AmazonEKS_CNI_Policy`
    * `AmazonEC2ContainerRegistryReadOnly`

* **OIDC Service Account IAM Roles (x3)**

  * Namespaces: `recommender`, `etl-operator`, `data-broker`
  * Each role assumes via `sts:AssumeRoleWithWebIdentity` with OIDC provider
  * Each has an S3 policy allowing access to specific buckets (see below)

### **EKS cluster**

* **EKS Control Plane**

  * Version: `1.31` or greater
  * Subnet: Private subnets only
  * Public endpoint access: Enabled
  * Private endpoint access: Disabled

* **Node Group**

  * Instance type: `c5.4xlarge` (or larger, depending on cost factors)
  * Disk size: 100 GB
  * Desired size: 2 (min 2, max 5)
  * Remote SSH access: Enabled (with generated SSH key)
  * SSH key: Key pair created and exported

* **Security Groups**

  * EKS Cluster SG (implicitly created by AWS)
  * Node SG: Allows all traffic within cluster CIDR (`10.0.0.0/16`), self, and metadata IP
  * Egress: Allows all

#### **Kubernetes add-ons**

Installed via `aws.eks.Addon`:

* **EKS Pod Identity Agent**

  * Version: `v1.3.4-eksbuild.1`

* **Metrics Server**

  * Version: `v0.7.2-eksbuild.1`

* **EBS CSI Driver**

  * Version: `v1.38.1-eksbuild.2`
  * Configured with:

    * Service account annotation: `eks.amazonaws.com/role-arn`
    * Pod identity access annotation

#### **Storage class**

* Name: `ebs-sc`
* Default: Yes
* Provisioner: `ebs.csi.aws.com`
* Parameters: `type=gp3`, `encrypted=true`
* Volume Binding Mode: `WaitForFirstConsumer`

### **RDS**

* **RDS Subnet Group**

  * Uses the private subnets

* **RDS Instance**

  * Engine: Postgres 16
  * Size: `db.t3.micro`
  * Allocated storage: 20 GB
  * Auth: Setup a Username and Password, keep secure.
  * Security group: Allows all traffic from `10.0.0.0/16` (keep in mind your CIDR group from the VPC)
  * DB name: `postgres`

### **S3 buckets**

* `u10d-{stack_name}-etl-blob-cache`
* `u10d-{stack_name}-etl-job-db`
* `u10d-{stack_name}-etl-job-status`
* `u10d-{stack_name}-job-files`

All created with:

* Versioning enabled
* Server-side encryption (AES256)
* Force destroy: true

### **Keys**

* **SSH Key Pair** (RSA 4096-bit)

  * Key exported as `private_key` (PEM)

### Secrets and ConfigMaps

After your infrastructure is set up, but before Unstructured can deploy the Unstructured UI and API into your insfrastructure,
Unstructured will need to know the values of the following Secrets and ConfigMaps. These must be provided to Unstructured as a
set of YAML files in Kubernetes [Secret](https://kubernetes.io/docs/concepts/configuration/secret/) and
[ConfigMap](https://kubernetes.io/docs/concepts/configuration/configmap/) format.

The Secrets are as follows.

#### **Blob storage credentials**

* `BLOB_STORAGE_ADAPTER_ACCESS_KEY_ID`
* `BLOB_STORAGE_ADAPTER_SECRET_ACCESS_KEY`
* `BLOB_STORAGE_ADAPTER_REGION_NAME`

#### **Database credentials**

* `DB_USERNAME`
* `DB_PASSWORD`
* `DB_HOST`
* `DB_NAME`
* `DB_DATABASE` (used in `platform-api` only)

#### **Authentication**

* `JWT_SECRET_KEY`
* `AUTH_STRATEGY` (sometimes encoded, sometimes not)
* `SESSION_SECRET`
* `SHARED_SECRET`
* `KEYCLOAK_CLIENT_SECRET`
* `KEYCLOAK_ADMIN_SECRET`
* `KEYCLOAK_ADMIN`
* `KEYCLOAK_ADMIN_PASSWORD`
* `API_BEARER_TOKEN`

The ConfigMaps are as follows.

#### **Blob storage settings**

* `BLOB_STORAGE_ADAPTER_TYPE` (always `s3` for AWS)
* `BLOB_STORAGE_ADAPTER_BUCKET`
* `ETL_BLOB_CACHE_BUCKET_NAME`
* `ETL_API_BLOB_STORAGE_ADAPTER_BUCKET`
* `ETL_API_BLOB_STORAGE_ADAPTER_TYPE`
* `ETL_API_DB_REMOTE_BUCKET_NAME`
* `ETL_API_JOB_STATUS_DEST_BUCKET_NAME`
* `JOB_STATUS_BUCKET_NAME`
* `JOB_DB_BUCKET_NAME`

#### **Environment**

* `ENV`
* `ENVIRONMENT`
* `JOB_ENV`
* `JOB_ENVIRONMENT`

#### **Observability and OpenTelemetry (OTel)**

* `JOB_OTEL_EXPORTER_OTLP_ENDPOINT`
* `JOB_OTEL_METRICS_EXPORTER`
* `JOB_OTEL_TRACES_EXPORTER`
* `OTEL_EXPORTER_OTLP_ENDPOINT`
* `OTEL_METRICS_EXPORTER`
* `OTEL_TRACES_EXPORTER`

#### **Unstructured API and authentication**

* `UNSTRUCTURED_API_URL`
* `JWKS_URL`
* `JWT_ISSUER`
* `JWT_AUDIENCE`
* `SINGLE_PLANE_DEPLOYMENT`

#### **Front end and dashboard**

* `API_BASE_URL`
* `API_CLIENT_BASE_URL`
* `API_URL`
* `APM_SERVICE_NAME`
* `APM_SERVICE_NAME_CLIENT`
* `AUTH_STRATEGY`
* `FRONTEND_BASE_URL`
* `KEYCLOAK_CALLBACK_URL`
* `KEYCLOAK_CLIENT_ID`
* `KEYCLOAK_DOMAIN`
* `KEYCLOAK_REALM`
* `KEYCLOAK_SSL_ENABLED`
* `KEYCLOAK_TRUST_ISSUER`
* `PUBLIC_BASE_URL`
* `PUBLIC_RELEASE_CHANNEL`

#### **Sentry and feature flags**

* `SENTRY_DSN`
* `SENTRY_SAMPLE_RATE`
* `WORKFLOW_NODE_EDITOR_FF_REQUEST_FORM`
* `CUSTOM_WORKFLOW_FF_REQUEST_FORM`

#### **Redis**

* `REDIS_DSN`

#### **Other**

* `IMAGE_PULL_SECRETS`
* `PRIVATE_KEY_SECRETS_ADAPTER_TYPE`
* `PRIVATE_KEY_SECRETS_ADAPTER_AWS_REGION`
* `SECRETS_ADAPTER_TYPE`
* `SECRETS_ADAPTER_AWS_REGION`

The preceding Secrets and ConfigMaps must be added to the following files:

| File name                             | Type      | Resource name              | Namespace      | Data keys                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
| ------------------------------------- | --------- | -------------------------- | -------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `data-broker-env-cm.yaml`             | ConfigMap | `data-broker-env`          | `api`          | `JOB_STATUS_BUCKET_NAME`, `JOB_DB_BUCKET_NAME`, `BLOB_STORAGE_ADAPTER_TYPE`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
| `data-broker-env-secret.yaml`         | Secret    | `data-broker-env`          | `api`          | ` BLOB_STORAGE_ADAPTER_ACCESS_KEY_ID`, `BLOB_STORAGE_ADAPTER_REGION_NAME`, `BLOB_STORAGE_ADAPTER_SECRET_ACCESS_KEY`                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| `dataplane-api-env-cm.yaml`           | Secret    | `dataplane-api-env`        | `api`          | ` DB_PASSWORD`, `DB_USERNAME`, `DB_HOST`, `DB_NAME`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| `etl-operator-env-cm.yaml`            | ConfigMap | `etl-operator-env`         | `etl-operator` | ` BLOB_STORAGE_ADAPTER_BUCKET`, `JOB_STATUS_BUCKET_NAME`, `JOB_DB_BUCKET_NAME`, `BLOB_STORAGE_ADAPTER_TYPE`, `ENV`, `ENVIRONMENT`, `REDIS_DSN`, `ETL_API_BLOB_STORAGE_ADAPTER_BUCKET`, `ETL_API_BLOB_STORAGE_ADAPTER_TYPE`, `ETL_API_DB_REMOTE_BUCKET_NAME`, `ETL_API_JOB_STATUS_DEST_BUCKET_NAME (x2)`, `ETL_BLOB_CACHE_BUCKET_NAME`, `IMAGE_PULL_SECRETS`, `JOB_ENV`, `JOB_ENVIRONMENT`, `JOB_OTEL_EXPORTER_OTLP_ENDPOINT`, `JOB_OTEL_METRICS_EXPORTER`, `JOB_OTEL_TRACES_EXPORTER`, `OTEL_EXPORTER_OTLP_ENDPOINT`, `OTEL_METRICS_EXPORTER`, `OTEL_TRACES_EXPORTER`, `UNSTRUCTURED_API_URL` |
| `etl-operator-env-secret.yaml`        | Secret    | `etl-operator-env`         | `etl-operator` | ` BLOB_STORAGE_ADAPTER_ACCESS_KEY_ID`, `BLOB_STORAGE_ADAPTER_REGION_NAME`, `BLOB_STORAGE_ADAPTER_SECRET_ACCESS_KEY`                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| `frontend-env-cm.yaml`                | ConfigMap | `frontend-env`             | `www`          | `API_BASE_URL`, `API_CLIENT_BASE_URL`, `API_URL`, `APM_SERVICE_NAME`, `APM_SERVICE_NAME_CLIENT`, `AUTH_STRATEGY`, `ENV`, `FRONTEND_BASE_URL`, `KEYCLOAK_CALLBACK_URL`, `KEYCLOAK_CLIENT_ID`, `KEYCLOAK_DOMAIN`, `KEYCLOAK_REALM`, `KEYCLOAK_SSL_ENABLED`, `KEYCLOAK_TRUST_ISSUER`, `PUBLIC_BASE_URL`, `PUBLIC_RELEASE_CHANNEL`, `SENTRY_DSN`, `SENTRY_SAMPLE_RATE`, `WORKFLOW_NODE_EDITOR_FF_REQUEST_FORM`, `CUSTOM_WORKFLOW_FF_REQUEST_FORM`                                                                                                                                                 |
| `frontend-env-secret.yaml`            | Secret    | `frontend-env`             | `www`          | `API_BEARER_TOKEN`, `KEYCLOAK_ADMIN_SECRET`, `KEYCLOAK_CLIENT_SECRET`, `SESSION_SECRET`, `SHARED_SECRET`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
| `keycloak-secret.yaml`                | Secret    | `phasetwo-keycloak-env`    | `www`          | `KEYCLOAK_ADMIN`, `KEYCLOAK_ADMIN_PASSWORD`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
| `platform-api-env-cm.yaml`            | ConfigMap | `platform-api-env`         | `api`          | `JWKS_URL`, `JWT_ISSUER`, `JWT_AUDIENCE`, `SINGLE_PLANE_DEPLOYMENT`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| `platform-api-env-secret.yaml`        | Secret    | `platform-api-env`         | `api`          | `DB_PASSWORD`, `DB_USERNAME`, `DB_HOST`, `DB_NAME`, `DB_DATABASE`, `JWT_SECRET_KEY`, `AUTH_STRATEGY`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| `recommender-env-cm.yaml`             | ConfigMap | `recommender-env`          | `recommender`  | `BLOB_STORAGE_ADAPTER_TYPE`, `ETL_BLOB_CACHE_BUCKET_NAME`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
| `recommender-env-secret.yaml`         | Secret    | `recommender-env`          | `recommender`  | `BLOB_STORAGE_ADAPTER_ACCESS_KEY_ID`, `BLOB_STORAGE_ADAPTER_REGION_NAME`, `BLOB_STORAGE_ADAPTER_SECRET_ACCESS_KEY`                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
| `secret-provider-api-env-cm.yaml`     | ConfigMap | `secrets-provider-api-env` | `secrets`      | `ENV`, `ENVIRONMENT`, `OTEL_EXPORTER_OTLP_ENDPOINT`, `OTEL_METRICS_EXPORTER`, `OTEL_TRACES_EXPORTER`, `PRIVATE_KEY_SECRETS_ADAPTER_AWS_REGION`, `PRIVATE_KEY_SECRETS_ADAPTER_TYPE`, `SECRETS_ADAPTER_AWS_REGION`, `SECRETS_ADAPTER_TYPE`                                                                                                                                                                                                                                                                                                                                                      |
| `secret-provider-api-env-secret.yaml` | Secret    | `secrets-provider-api-env` | `secrets`      | `BLOB_STORAGE_ADAPTER_ACCESS_KEY_ID`, `BLOB_STORAGE_ADAPTER_REGION_NAME`, `BLOB_STORAGE_ADAPTER_SECRET_ACCESS_KEY`                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
| `usage-collector-env-secret.yaml`     | Secret    | `usage-collector-env`      | `api`          | `DB_PASSWORD`, `DB_USERNAME`, `DB_HOST`, `DB_NAME`, `BLOB_STORAGE_ADAPTER_TYPE`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |

For example, for the `etl-operator-env-cm.yaml` [ConfigMap](https://kubernetes.io/docs/concepts/configuration/configmap/) file, the contents would look like this:

```yaml  theme={null}
apiVersion: v1
kind: ConfigMap
metadata:
  name: data-broker-env
  namespace: api
data:
  JOB_STATUS_BUCKET_NAME: "<your-value>"
  JOB_DB_BUCKET_NAME: "<your-value>"
  BLOB_STORAGE_ADAPTER_TYPE: "<your-value>"
```

For the `etl-operator-env-secret.yaml` [Secret](https://kubernetes.io/docs/concepts/configuration/secret/) file, the contents would look like this:

```yaml  theme={null}
apiVersion: v1
kind: Secret
metadata:
  name: data-broker-env
  namespace: api
type: Opaque
stringData:
  BLOB_STORAGE_ADAPTER_ACCESS_KEY_ID: "<your-value>"
  BLOB_STORAGE_ADAPTER_REGION_NAME: "<your-value>"
  BLOB_STORAGE_ADAPTER_SECRET_ACCESS_KEY: "<your-value>"
```

---

# Source: https://docs.unstructured.io/open-source/ingestion/source-connectors/one-drive.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# OneDrive

Connect OneDrive to your preprocessing pipeline, and use the Unstructured Ingest CLI or the Unstructured Ingest Python library to batch process all your documents and store structured outputs locally on your filesystem.

The requirements are as follows.

* A OneDrive for business plan, or a Microsoft 365 or Office 365 Business or enterprise plan that includes OneDrive.
  [Learn more](https://www.microsoft.com/microsoft-365/onedrive/compare-onedrive-plans).
  [Shop for business plans](https://www.microsoft.com/microsoft-365/business/compare-all-microsoft-365-business-products).
  [Shop for enterprise plans](https://www.microsoft.com/microsoft-365/enterprise/microsoft365-plans-and-pricing).
  OneDrive personal accounts, and Microsoft 365 Free, Basic, Personal, and Family plans are not supported.

* A SharePoint Online plan, or a Microsoft 365 or Office 365 Business or enterprise plan that includes SharePoint Online.
  (Even if you only plan to use OneDrive, you still need a plan that includes SharePoint Online, because OneDrive is built on SharePoint technology.)
  [Learn more](https://www.microsoft.com/en-us/microsoft-365/SharePoint/compare-SharePoint-plans).
  [Shop for business plans](https://www.microsoft.com/microsoft-365/business/compare-all-microsoft-365-business-products).
  [Shop for enterprise plans](https://www.microsoft.com/microsoft-365/enterprise/microsoft365-plans-and-pricing).

* The OneDrive and SharePoint Online plans must share the same Microsoft Entra ID tenant.
  [Learn more](https://learn.microsoft.com/microsoft-365/enterprise/subscriptions-licenses-accounts-and-tenants-for-microsoft-cloud-offerings?view=o365-worldwide).

* The path to the target OneDrive folder, starting from the OneDrive account's root folder, for example `my-folder/my-subfolder`.

  The following video shows how to get a path:

  <iframe width="560" height="315" src="https://www.youtube.com/embed/SfUA1IwtI5U" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* Two types of authentication are supported: client credentials, and a username and password. Both authentication types require a Microsoft Entra ID app registration.

  <Note>
    The OneDrive connector does not support any other authentication methods, such as multifactor (MFA) or passwordless authentication.
  </Note>

  You will need to provide the **Application (client) ID**, **Directory (tenant) ID**, and **Client secret** for the Entra ID app registration that has access to the target OneDrive account, and
  the app registration must have the correct set of Microsoft Graph access permissions. These permissions include:

  * `Files.ReadWrite.All` (if both reading and writing are needed)

  * `Sites.ReadWrite.All` (if both reading and writing are needed)

  * `User.Read.All`

  * `Directory.Read.All`

  1. [Create an Entra ID app registration](https://learn.microsoft.com/entra/identity-platform/quickstart-register-app?pivots=portal).
  2. [Add Graph access permissions to an app registration](https://learn.microsoft.com/entra/identity-platform/howto-update-permissions?pivots=portal#add-permissions-to-an-application).
  3. [Grant consent for the added Graph permissions](https://learn.microsoft.com/entra/identity-platform/howto-update-permissions?pivots=portal#grant-consent-for-the-added-permissions-for-the-enterprise-application).

  The following video shows how to create an Entra ID app registration:

  <iframe width="560" height="315" src="https://www.youtube.com/embed/aBAY-LKLPSo" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

  The following video shows how to add the correct set of Graph access permissions to the Entra ID app registration:

  <iframe width="560" height="315" src="https://www.youtube.com/embed/jBJsrSkpClo" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* Both authentication types also require the User Principal Name (UPN) for the OneDrive account. This is typically the OneDrive account user's email address. To find a UPN:

  1. Depending on your plan, sign in to your Microsoft 365 admin center (typically [https://admin.microsoft.com](https://admin.microsoft.com)) using your administrator credentials,
     or sign in to your Office 365 portal (typically [https://portal.office.com](https://portal.office.com)) using your credentials.
  2. In the **Users** section, click **Active users**.
  3. Locate the user account in the list of active users.
  4. The UPN is displayed in the **Username** column.

  The following video shows how to get a UPN:

  <iframe width="560" height="315" src="https://www.youtube.com/embed/H0yYfhfyCE0" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* For username and password authentication, you will also need the password for the target UPN.

The OneDrive connector dependencies:

```bash CLI, Python theme={null}
pip install "unstructured-ingest[onedrive]"
```

You might also need to install additional dependencies, depending on your needs. [Learn more](/open-source/ingestion/ingest-dependencies).

The following environment variables:

* `ONEDRIVE_PATH` - The path to the target folder in the OneDrive account, starting with the account's root folder (for example `my-folder/my-subfolder`), represented by `--path` (CLI) or `path` (Python).
* `ONEDRIVE_CLIENT_ID` - The application (client) ID of the Microsoft Entra ID app registration that has access to the OneDrive account, represented by `--client-id` (CLI) or `client_id` (Python).
* `ONEDRIVE_CLIENT_CRED` - The client secret for the Entra ID app registration, represented by `--client-cred` (CLI) or `client_cred` (Python).
* `ONEDRIVE_TENANT` - The directory (tenant) ID of the Entra ID app registration, represented by `--tenant` (CLI) or `tenant` (Python).
* `ONEDRIVE_USER_PNAME` - The User Principal Name (UPN) for the OneDrive user account in Entra ID (typically the user's email address), represented by `--user-pname` (CLI) or `user_pname` (Python).
* `ONEDRIVE_USER_PASSWORD` - The password for the target UPN, represented by `--password` (CLI) or `password` (Python).
* `ONEDRIVE_AUTHORITY_URL` - The authentication token provider URL for the Entra ID app registration (the default is `https://login.microsoftonline.com` if not otherwise specified), represented by `--authority-url` (CLI) or `authority_url` (Python).

Now call the Unstructured Ingest CLI or the Unstructured Ingest Python library. The destination connector can be any of the ones supported. This example uses the local destination connector:

This example sends data to Unstructured for processing by default. To process data locally instead, see the instructions at the end of this page.

<CodeGroup>
  ```bash CLI theme={null}
  #!/usr/bin/env bash

  unstructured-ingest \
    onedrive \
      --client-id $ONEDRIVE_CLIENT_ID \
      --client-cred $ONEDRIVE_CLIENT_CRED \
      --authority-url $ONEDRIVE_AUTHORITY_URL \
      --tenant $ONEDRIVE_TENANT \
      --user-pname $ONEDRIVE_USER_PNAME \
      --password $ONEDRIVE_USER_PASSWORD \ # For username and password authentication.
      --path $ONEDRIVE_PATH \
      --output-dir $LOCAL_FILE_OUTPUT_DIR \
      --num-processes 2 \
      --verbose \
      --partition-by-api \
      --api-key $UNSTRUCTURED_API_KEY \
      --partition-endpoint $UNSTRUCTURED_API_URL \
      --strategy hi_res \
      --additional-partition-args="{\"split_pdf_page\":\"true\", \"split_pdf_allow_failed\":\"true\", \"split_pdf_concurrency_level\": 15}"
  ```

  ```python Python Ingest theme={null}
  import os

  from unstructured_ingest.pipeline.pipeline import Pipeline
  from unstructured_ingest.interfaces import ProcessorConfig

  from unstructured_ingest.processes.connectors.onedrive import (
      OnedriveIndexerConfig,
      OnedriveDownloaderConfig,
      OnedriveConnectionConfig,
      OnedriveAccessConfig
  )

  from unstructured_ingest.processes.connectors.local import LocalConnectionConfig
  from unstructured_ingest.processes.partitioner import PartitionerConfig
  from unstructured_ingest.processes.chunker import ChunkerConfig
  from unstructured_ingest.processes.embedder import EmbedderConfig

  # Chunking and embedding are optional.

  if __name__ == "__main__":
      Pipeline.from_configs(
          context=ProcessorConfig(),
          indexer_config=OnedriveIndexerConfig(path=os.getenv("ONEDRIVE_PATH")),
          downloader_config=OnedriveDownloaderConfig(
              download_dir=os.getenv("LOCAL_FILE_DOWNLOAD_DIR")
          ),
          source_connection_config=OnedriveConnectionConfig(
              access_config=OnedriveAccessConfig(
                  client_cred=os.getenv("ONEDRIVE_CLIENT_CRED"),
                  password=os.getenv("ONEDRIVE_USER_PASSWORD") # For username and password authentication.
              ),
              client_id=os.getenv("ONEDRIVE_CLIENT_ID"),
              tenant=os.getenv("ONEDRIVE_TENANT"),
              user_pname=os.getenv("ONEDRIVE_USER_PNAME"),
              authority_url=os.getenv("ONEDRIVE_AUTHORITY_URL")
          ),
          partitioner_config=PartitionerConfig(
              partition_by_api=True,
              api_key=os.getenv("UNSTRUCTURED_API_KEY"),
              partition_endpoint=os.getenv("UNSTRUCTURED_API_URL"),
              strategy="hi_res",
              additional_partition_args={
                  "split_pdf_page": True,
                  "split_pdf_allow_failed": True,
                  "split_pdf_concurrency_level": 15
              }
          ),
          chunker_config=ChunkerConfig(chunking_strategy="by_title"),
          embedder_config=EmbedderConfig(embedding_provider="huggingface"),
          destination_connection_config=LocalConnectionConfig()
      ).run()
  ```
</CodeGroup>

For the Unstructured Ingest CLI and the Unstructured Ingest Python library, you can use the `--partition-by-api` option (CLI) or `partition_by_api` (Python) parameter to specify where files are processed:

* To do local file processing, omit `--partition-by-api` (CLI) or `partition_by_api` (Python), or explicitly specify `partition_by_api=False` (Python).

  Local file processing does not use an Unstructured API key or API URL, so you can also omit the following, if they appear:

  * `--api-key $UNSTRUCTURED_API_KEY` (CLI) or `api_key=os.getenv("UNSTRUCTURED_API_KEY")` (Python)
  * `--partition-endpoint $UNSTRUCTURED_API_URL` (CLI) or `partition_endpoint=os.getenv("UNSTRUCTURED_API_URL")` (Python)
  * The environment variables `UNSTRUCTURED_API_KEY` and `UNSTRUCTURED_API_URL`

* To send files to the legacy [Unstructured Partition Endpoint](/api-reference/legacy-api/partition/overview) for processing, specify `--partition-by-api` (CLI) or `partition_by_api=True` (Python).

  Unstructured also requires an Unstructured API key and API URL, by adding the following:

  * `--api-key $UNSTRUCTURED_API_KEY` (CLI) or `api_key=os.getenv("UNSTRUCTURED_API_KEY")` (Python)
  * `--partition-endpoint $UNSTRUCTURED_API_URL` (CLI) or `partition_endpoint=os.getenv("UNSTRUCTURED_API_URL")` (Python)
  * The environment variables `UNSTRUCTURED_API_KEY` and `UNSTRUCTURED_API_URL`, representing your API key and API URL, respectively.

  <Note>
    You must specify the API URL only if you are not using the default API URL for Unstructured Ingest, which applies to **Let's Go**, **Pay-As-You-Go**, and **Business SaaS** accounts.

    The default API URL for Unstructured Ingest is `https://api.unstructuredapp.io/general/v0/general`, which is the API URL for the legacy[Unstructured Partition Endpoint](/api-reference/legacy-api/partition/overview). However, you should always use the URL that was provided to you when your Unstructured account was created. If you do not have this URL, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

    If you do not have an API key, [get one now](/api-reference/legacy-api/partition/overview).

    If you are using a **Business** account, the process
    for generating Unstructured API keys, and the Unstructured API URL that you use, are different.
    For instructions, see your Unstructured account administrator, or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).
  </Note>

---

# Source: https://docs.unstructured.io/examplecode/tools/onedrive-events.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# OneDrive event triggers

You can use OneDrive events, such as adding new files to—or updating existing files within—OneDrive folders, to automatically run Unstructured ETL+ workflows
that rely on those folders as sources. This enables a no-touch approach to having Unstructured automatically process new and updated files in OneDrive folders as they are added or updated.

This example shows how to automate this process by adding two custom [Power Automate](https://www.microsoft.com/power-platform/products/power-automate) flows to your Microsoft 365 or Office 365 account. One flow runs
whenever a new file is detected in the specified OneDrive folder. The other flow runs whenever an existing file is updated within that folder. These flows then call the [Unstructured API's workflow operations](/api-reference/workflow/overview) to automatically run the
specified corresponding Unstructured ETL+ workflow within your Unstructured account.

<Note>
  This example uses custom Power Automate flows that you create and maintain.
  Any issues with file detection, timing, or flow invocation could be related to your custom flows,
  rather than with Unstructured. If you are getting unexpected or no results, be sure to check your custom
  flows' run histories first for any informational and error messages.
</Note>

## Requirements

To use this example, you will need the following:

* A Microsoft 365 or Office 365 Business or enterprise plan that includes SharePoint Online, OneDrive, and Power Automate.
  (Even if you only plan to use OneDrive, you still need a plan that includes SharePoint Online, because OneDrive is built on SharePoint technology.)
  [Shop for business plans](https://www.microsoft.com/microsoft-365/business/compare-all-microsoft-365-business-products).
  [Shop for enterprise plans](https://www.microsoft.com/microsoft-365/enterprise/microsoft365-plans-and-pricing).

* A Power Automate Premium license. [Learn how](https://learn.microsoft.com/power-platform/admin/power-automate-licensing/buy-licenses).

* An Unstructured account, and an Unstructured API key for your account, as follows:

  1. If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
     After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).

     <Note>
       To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/api-reference/overview#pricing).
     </Note>

  2. If you have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at [https://platform.unstructured.io](https://platform.unstructured.io).

     <Note>
       For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
       or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).
     </Note>

  3. Get your Unstructured API key:<br />

     a. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business** account, click **API Keys** on the sidebar.<br />

     <Note>
       For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
       for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).
     </Note>

     b. Click **Generate API Key**.<br />
     c. Follow the on-screen instructions to finish generating the key.<br />
     d. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

* The Unstructured API's workflow operations URL for your account, as follows:

  1. In the Unstructured UI, click **API Keys** on the sidebar.<br />
  2. Note the value of the **Unstructured API's workflow operations** field.

* A OneDrive source connector in your Unstructured account. [Learn how](/ui/sources/onedrive).

* Some available [destination connector](/ui/destinations/overview) in your Unstructured account.

* A workflow that uses the preceding source and destination connectors. [Learn how](/ui/workflows).

## Step 1: Create an app registration

In this step, you create an app registration in Microsoft Entra ID to access the Azure key vault that you will create in the next step.

1. Sign in to your [Azure portal](https://portal.azure.com).
2. In the **Search resources, services, and docs** box, enter **App registrations**, and then in the list of **Services**, click **App registrations**.
3. Click **+ New registration**.
4. Enter some **Name** for the app registration.
5. For **Supported account types**, select **Accounts in this organizational directory only**.
6. For **Redirect URI**, select **Web**, and then enter `https://localhost`.
7. Click **Register**.
8. In the sidebar, click **Overview**.
9. Expand **Essentials**, and note the values of the **Application (client) ID** and **Directory (tenant) ID** fields.
10. Next to **Client credentials**, click the **Add a certificate or secret** link.
11. On the **Client secrets** tab, click **+ New client secret**.
12. (Optional) Enter some **Description** for the client secret.
13. (Optional) Enter some non-default **Expires** value.
14. Click **Add**.
15. Note the value of the **Value** field. This is the app registration's client secret value. (Do not use the **Secret ID** value.)

## Step 2: Create an Azure key vault

In this step, you store your Unstructured API key in an Azure key vault for your Power Automate flow to access later.

1. With your Azure portal still open from the previous step, in the **Search resources, services, and docs** box, enter **Key vaults**, and then in the list of **Services**, click **Key vaults**.

2. Click **Create**.

3. On the **Basics** tab, set the following key vault settings:

   | Setting            | Suggested value           | Description                                                                                                           |
   | ------------------ | ------------------------- | --------------------------------------------------------------------------------------------------------------------- |
   | **Subscription**   | Your subscription         | The Azure subscription within which to create your new key vault.                                                     |
   | **Resource Group** | **Create new**            | After you click **Create new**, enter some name for the new resource group within which to create your new key vault. |
   | **Key vault name** | Some globally unique name | Some name that identifies your new key vault. Valid characters are `A`-`Z`, `a`-`z`, `0`-`9`, and `-`.                |
   | **Region**         | Your preferred region     | Select a region that's near you or near other services that your key vault can access.                                |
   | **Pricing tier**   | **Standard**              | Choose the pricing tier for your key vault.                                                                           |

4. Click **Review + create**.

5. Click **Create**.

6. After the deployment is complete, click **Go to resource**.

7. In the sidebar, click **Access control (IAM)**.

8. Click **+ Add > Add role assignment**.

9. On the **Role** tab, in the **Job function roles** list, select **Key Vault Administrator**.

10. Click **Next**.

11. On the **Members** tab, with **Assign access to** set to **User, group, or service principal**, click **+ Select members**.

12. Search for and select the name of the app registration that you created earlier in Step 1.

13. Search also for and select your user account.

14. Click **Select**.

15. Click **Review + assign** twice.

16. In the sidebar, expand **Objects**, and then click **Secrets**.

17. Click **+ Generate/Import**.

18. Enter some **Name** for the secret.

19. For **Secret value**, enter your Unstructured API key value.

20. Click **Create**.

## Step 3: Create the first Power Automate flow

1. Sign in to [Power Automate](https://make.powerautomate.com).

2. On the sidebar, click **Create**.

3. In the **Start from blank** area, click **Automated cloud flow**.

4. Enter some **Flow name** for your new flow.

5. For **Search all triggers**, enter `When a file is created`, and then click
   **When a file is created (OneDrive for Business)**.

6. Click **Create**.

7. In the flow designer, click the **When a file is created** node.

8. When prompted, sign in to OneDrive.

9. On the **Parameters** tab, set the following node properties:

   * For **Folder**, click the folder icon, and the select the folder that matches the value of the **Path** value for the OneDrive source connector for your workflow.
   * For **Include subfolders**, select **Yes** only if you also checked the **Recursive** checkbox for the OneDrive source connector for your workflow.

10. Below the **When a file is created** node, click the plus (**Insert a new action...**) button.

11. In the **Search for an action or connector** box, enter `Get secret`, and then under **Azure Key Vault**, select **Get secret**.

12. Set the following node properties:

    * Enter some **Connection name** for the connection.
    * For **Authentication type**, select **Service principal authentication**.
    * For **Client ID**, enter the **Application (client) ID** value of the app registration that you created earlier in Step 1.
    * For **Client secret**, enter the client secret value of the app registration that you created earlier in Step 1.
    * For **Tenant ID**, enter the **Directory (tenant) ID** value of the app registration that you created earlier in Step 1.
    * For **Key vault name**, enter the name of the key vault that you created earlier in Step 2.

13. Click **Create new**.

14. In the **Name of the secret** list, select the name of the secret that you created earlier in Step 2.

15. Below the **Get secret** node, click the plus (**Insert a new action...**) button.

16. In the **Search for an action or connector** box, enter `HTTP`, and then under **HTTP**, select **HTTP**.

17. On the **Parameters** tab, set the following node properties:

    * For **URI**, enter `<unstructured-api-url>/workflows/<workflow-id>/run`, and replace the following placeholders:

      * Replace `<unstructured-api-url>` with your Unstructured API's workflow operations value.
      * Replace `<workflow-id>` with the ID of your Unstructured workflow.

      The **URI** should now look similar to the following:

      ```text  theme={null}
      https://platform.unstructuredapp.io/api/v1/workflows/11111111-1111-1111-1111-111111111111/run
      ```

    * For **Method**, select **POST**.

    * For **Headers**, enter the following:

      * `accept`, set to `application/json`
      * `unstructured-api-key`, set to your key vault value. To do this, type `/` and select **Insert dynamic content**.
        Under **Get secret**, select **body/value**.

18. In the top navigation bar, click the disk (**Save**) button.

19. Click **Back** to go to the flow's home page.

20. In the top navigation bar, if the power button shows **Turn on**, click it to change it to **Turn off** (which means that the flow is on and active).

## Step 4: Create the second Power Automate flow

1. With the flow's home page still open, on the sidebar, click **My flows**.
2. In the list of flows, rest your mouse pointer on the flow that you created earlier in Step 3, and then click the ellipsis (three dots, **More commands**) icon.
3. Click **Save as**.
4. Give the new flow a **Flow name** that is different from the original flow's name.
5. Click **Save**.
6. In the list of flows, rest your mouse pointer on the flow that you just created, and then click the pencil (**Edit**) icon.
7. In the flow designer, right click the **When a file is created** node, and then click **Delete**.
8. Click the **Add a trigger** node.
9. In the **Search for an action or connector** box, enter `When a file is modified`, and then click
   **When a file is modified (OneDrive for Business)**.
10. On the **Parameters** tab, set the following node properties:

* For **Folder**, click the folder icon, and the select the folder that matches the value of the **Path** value for the OneDrive source connector for your workflow.
* For **Include subfolders**, select **Yes** only if you also checked the **Recursive** checkbox for the OneDrive source connector for your workflow.

11. In the top navigation bar, click the disk (**Save**) button.
12. Click **Back** to go to the flow's home page.
13. In the top navigation bar, if the power button shows **Turn on**, click it to change it to **Turn off** (which means that the flow is on and active).

## Step 5: Trigger the flows and view results

1. On a separate tab in your web browser, go to the OneDrive folder that corresponds to both your Unstructured workflow's
   OneDrive source connector and the Power Automate flows' **When a file is created** and **When a file is modified** nodes.
2. Add a file to the OneDrive folder.
3. On a separate tab in your web browser, in the Unstructured user interface for your account, click **Jobs** on the sidebar.
4. In the list of jobs, click the newly running job for your workflow. It might take a minute or so for the new job to appear in the list.
5. After the job status shows **Finished**, go to your destination location to see the results.
6. Repeat this step, but this time, update an existing file in the OneDrive folder.

## Step 6 (Optional): Turn off the flows

To stop the Power Automate flows from running automatically, go back to the web browser tab from earlier in Steps 3 and 4. Then, do the following:

1. On the sidebar, click **My flows**.
2. In the list of flows, click the name of the flow that you created earlier in Step 3.
3. With the flow's home page still open, in the top navigation bar, click the power (**Turn off**) button
   to change it to **Turn on** (which means that the flow is off and inactive).
4. Repeat this step, but this time, click the name of the flow that you created earlier in Step 4.

---

# Source: https://docs.unstructured.io/ui/sources/onedrive.md

# Source: https://docs.unstructured.io/ui/destinations/onedrive.md

# Source: https://docs.unstructured.io/open-source/ingestion/destination-connectors/onedrive.md

# Source: https://docs.unstructured.io/api-reference/workflow/sources/onedrive.md

# Source: https://docs.unstructured.io/api-reference/workflow/destinations/onedrive.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# OneDrive

<Note>
  If you're new to Unstructured, read this note first.

  Before you can create a destination connector, you must first sign in to your Unstructured account:

  * If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
    After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).
    To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/api-reference/overview#pricing).
  * If you already have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at
    [https://platform.unstructured.io](https://platform.unstructured.io). For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
    or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  After you sign in, the [Unstructured user interface](/ui/overview) (UI) appears, which you use to get your Unstructured API key.

  1. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business** account, click **API Keys** on the sidebar.<br />

     For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
     for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).

  2. Click **Generate API Key**.<br />

  3. Follow the on-screen instructions to finish generating the key.<br />

  4. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

  After you create the destination connector, add it along with a
  [source connector](/api-reference/workflow/sources/overview) to a [workflow](/api-reference/workflow/overview#workflows).
  Then run the worklow as a [job](/api-reference/workflow/overview#jobs). To learn how, try out the
  the notebook [Dropbox-To-Pinecone Connector API Quickstart for Unstructured](https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Dropbox_To_Pinecone_Connector_Quickstart.ipynb),
  or watch the two 4-minute video tutorials for the [Unstructured Python SDK](/api-reference/workflow/overview#unstructured-python-sdk).

  You can also create destination connectors with the Unstructured user interface (UI).
  [Learn how](/ui/destinations/overview).

  If you need help, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  You are now ready to start creating a destination connector! Keep reading to learn how.
</Note>

Send processed data from Unstructured to OneDrive.

The requirements are as follows.

* A OneDrive for business plan, or a Microsoft 365 or Office 365 Business or enterprise plan that includes OneDrive.
  [Learn more](https://www.microsoft.com/microsoft-365/onedrive/compare-onedrive-plans).
  [Shop for business plans](https://www.microsoft.com/microsoft-365/business/compare-all-microsoft-365-business-products).
  [Shop for enterprise plans](https://www.microsoft.com/microsoft-365/enterprise/microsoft365-plans-and-pricing).
  OneDrive personal accounts, and Microsoft 365 Free, Basic, Personal, and Family plans are not supported.

* A SharePoint Online plan, or a Microsoft 365 or Office 365 Business or enterprise plan that includes SharePoint Online.
  (Even if you only plan to use OneDrive, you still need a plan that includes SharePoint Online, because OneDrive is built on SharePoint technology.)
  [Learn more](https://www.microsoft.com/en-us/microsoft-365/SharePoint/compare-SharePoint-plans).
  [Shop for business plans](https://www.microsoft.com/microsoft-365/business/compare-all-microsoft-365-business-products).
  [Shop for enterprise plans](https://www.microsoft.com/microsoft-365/enterprise/microsoft365-plans-and-pricing).

* The OneDrive and SharePoint Online plans must share the same Microsoft Entra ID tenant.
  [Learn more](https://learn.microsoft.com/microsoft-365/enterprise/subscriptions-licenses-accounts-and-tenants-for-microsoft-cloud-offerings?view=o365-worldwide).

* The path to the target OneDrive folder, starting from the OneDrive account's root folder, for example `my-folder/my-subfolder`.

  The following video shows how to get a path:

  <iframe width="560" height="315" src="https://www.youtube.com/embed/SfUA1IwtI5U" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* Two types of authentication are supported: client credentials, and a username and password. Both authentication types require a Microsoft Entra ID app registration.

  <Note>
    The OneDrive connector does not support any other authentication methods, such as multifactor (MFA) or passwordless authentication.
  </Note>

  You will need to provide the **Application (client) ID**, **Directory (tenant) ID**, and **Client secret** for the Entra ID app registration that has access to the target OneDrive account, and
  the app registration must have the correct set of Microsoft Graph access permissions. These permissions include:

  * `Files.ReadWrite.All` (if both reading and writing are needed)

  * `Sites.ReadWrite.All` (if both reading and writing are needed)

  * `User.Read.All`

  * `Directory.Read.All`

  1. [Create an Entra ID app registration](https://learn.microsoft.com/entra/identity-platform/quickstart-register-app?pivots=portal).
  2. [Add Graph access permissions to an app registration](https://learn.microsoft.com/entra/identity-platform/howto-update-permissions?pivots=portal#add-permissions-to-an-application).
  3. [Grant consent for the added Graph permissions](https://learn.microsoft.com/entra/identity-platform/howto-update-permissions?pivots=portal#grant-consent-for-the-added-permissions-for-the-enterprise-application).

  The following video shows how to create an Entra ID app registration:

  <iframe width="560" height="315" src="https://www.youtube.com/embed/aBAY-LKLPSo" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

  The following video shows how to add the correct set of Graph access permissions to the Entra ID app registration:

  <iframe width="560" height="315" src="https://www.youtube.com/embed/jBJsrSkpClo" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* Both authentication types also require the User Principal Name (UPN) for the OneDrive account. This is typically the OneDrive account user's email address. To find a UPN:

  1. Depending on your plan, sign in to your Microsoft 365 admin center (typically [https://admin.microsoft.com](https://admin.microsoft.com)) using your administrator credentials,
     or sign in to your Office 365 portal (typically [https://portal.office.com](https://portal.office.com)) using your credentials.
  2. In the **Users** section, click **Active users**.
  3. Locate the user account in the list of active users.
  4. The UPN is displayed in the **Username** column.

  The following video shows how to get a UPN:

  <iframe width="560" height="315" src="https://www.youtube.com/embed/H0yYfhfyCE0" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* For username and password authentication, you will also need the password for the target UPN.

To create a OneDrive destination connector, see the following examples.

<CodeGroup>
  ```python Python SDK theme={null}
  import os

  from unstructured_client import UnstructuredClient
  from unstructured_client.models.operations import CreateDestinationRequest
  from unstructured_client.models.shared import CreateDestinationConnector

  with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
      response = client.destinations.create_destination(
          request=CreateDestinationRequest(
              create_destination_connector=CreateDestinationConnector(
                  name="<name>",
                  type="onedrive",
                  config={
                      "client_id": "<client-id>",
                      "user_pname": "<user-pname>",
                      "password": "<password>",  # For username and password authentication
                      "tenant": "<tenant>",
                      "authority_url": "<authority-url>",
                      "client_cred": "<client-cred>",
                      "remote_url": "<remote-url>"
                  }
              )
          )
      )

      print(response.destination_connector_information)
  ```

  ```bash curl theme={null}
  curl --request 'POST' --location \
  "$UNSTRUCTURED_API_URL/destinations" \
  --header 'accept: application/json' \
  --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
  --header 'content-type: application/json' \
  --data \
  '{
      "name": "<name>",
      "type": "onedrive",
      "config": {
          "client_id": "<client-id>", 
          "user_pname": "<user-pname>",
          "password": "<password>", # For username and password authentication.
          "tenant": "<tenant>", 
          "authority_url": "<authority-url>",
          "client_cred": "<client-cred>",
          "remote_url": "<remote-url>" 
      }
  }'
  ```
</CodeGroup>

Replace the preceding placeholders as follows:

* `<name>` (*required*) - A unique name for this connector.
* `<client-id>` (*required*) - The application (client) ID of the Microsoft Entra ID app registration that has access to the OneDrive account.
* `<user-pname>` (*required*) - The User Principal Name (UPN) for the OneDrive user account in Entra ID. This is typically the user's email address.
* `<password>` (*required* for username and password authentication): The password for the target UPN.
* `<tenant>` (*required*) - The directory (tenant) ID of the Entra ID app registration.
* `<authority-url>` (*required*) - The authentication token provider URL for the Entra ID app registration. The default is [https://login.microsoftonline.com](https://login.microsoftonline.com).
* `<client-cred>` (*required*) - The client secret for the Entra ID app registration.
* `<path>` (source connector only) - The path to the target folder in the OneDrive account, starting with the account's root folder, for example `my-folder/my-subfolder`.
* For `recursive` (source connector only), set to `true` to recursively access files from subfolders within the specified OneDrive `<path>`. The default is `false` if not otherwise specified.
* `<remote-url>` (destination connector only) - `onedrive://`, followed by the path to the target folder in the OneDrive account, starting with the account's root folder, for example `onedrive://my-folder/my-subfolder`.

---

# Source: https://docs.unstructured.io/ui/sources/opensearch.md

# Source: https://docs.unstructured.io/ui/destinations/opensearch.md

# Source: https://docs.unstructured.io/open-source/ingestion/source-connectors/opensearch.md

# Source: https://docs.unstructured.io/open-source/ingestion/destination-connectors/opensearch.md

# Source: https://docs.unstructured.io/api-reference/workflow/sources/opensearch.md

# Source: https://docs.unstructured.io/api-reference/workflow/destinations/opensearch.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# OpenSearch

<Note>
  If you're new to Unstructured, read this note first.

  Before you can create a destination connector, you must first sign in to your Unstructured account:

  * If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
    After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).
    To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/api-reference/overview#pricing).
  * If you already have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at
    [https://platform.unstructured.io](https://platform.unstructured.io). For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
    or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  After you sign in, the [Unstructured user interface](/ui/overview) (UI) appears, which you use to get your Unstructured API key.

  1. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business** account, click **API Keys** on the sidebar.<br />

     For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
     for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).

  2. Click **Generate API Key**.<br />

  3. Follow the on-screen instructions to finish generating the key.<br />

  4. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

  After you create the destination connector, add it along with a
  [source connector](/api-reference/workflow/sources/overview) to a [workflow](/api-reference/workflow/overview#workflows).
  Then run the worklow as a [job](/api-reference/workflow/overview#jobs). To learn how, try out the
  the notebook [Dropbox-To-Pinecone Connector API Quickstart for Unstructured](https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Dropbox_To_Pinecone_Connector_Quickstart.ipynb),
  or watch the two 4-minute video tutorials for the [Unstructured Python SDK](/api-reference/workflow/overview#unstructured-python-sdk).

  You can also create destination connectors with the Unstructured user interface (UI).
  [Learn how](/ui/destinations/overview).

  If you need help, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  You are now ready to start creating a destination connector! Keep reading to learn how.
</Note>

Send processed data from Unstructured to OpenSearch.

The requirements are as follows.

* For the [Unstructured UI](/ui/overview) or the [Unstructured API](/api-reference/overview), local OpenSearch instances are not supported.

* For [Unstructured Ingest](/open-source/ingestion/overview), local and non-local OpenSearch instances are supported.

  For example, to set up an [AWS OpenSearch Service](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/createupdatedomains.html) instance, complete steps similar to the following:

  1. Sign in to your AWS account, and then open your AWS Management Console.

  2. Open your Amazon OpenSearch Service console.

  3. On the sidebar, expand **Managed clusters**, and then click **Dashboard**.

  4. Click **Create domain**.

  5. In the **Name** tile, for **Domain name**, enter some unique domain name for your new OpenSearch instance.

  6. In the **Domain creation method** tile, select a method for creating the domain. For faster setup with sensible
     default settings, this example uses the **Easy create** method.
     [Learn more about the Standard create method](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/createupdatedomains.html).

  7. In the **Engine options** tile, for **Version**, AWS recommends that you select the latest version.

  8. In the **Network** tile, for **Network**, select a network access method.
     For faster setup, this example uses the **Public access** method.
     [Learn more about the VPC access method](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/vpc.html#prerequisites-vpc-endpoints).

  9. **For IP address type**, select **Dual-stack mode**.

  10. In the **Fine-grained access control** tile, do one of the following:

      * If you want to use an existing AWS IAM user in the AWS account as the domain's master user, then for **Master user**, select **Set IAM ARN as master user**. Then enter the IAM ARN for the master user in the **IAM ARN** box.
      * If you want to create a master user and password as the domain's master user instead, then for **Master user**, select **Create master user**. Then specify some username and password for this
        new master user by filling in the **Master username**, **Master password**, and **Confirm master password** fields. Make
        sure to save the master user's password in a secure location.

  11. Click **Create**.

  12. After the domain is created, you must allow Unstructured to access the domain, as follows:

      a. If the new domain's settings page is not already showing, open it as follows:
      in your Amazon Open Search Service console, on the sidebar, expand **Managed clusters**, and then click **Domains**. Then,
      in the list of available domains, click the name of the newly created domain.<br />
      b. On the **Security configuration** tab, click **Edit**.<br />
      c. In the **Access policy** tile, for **Domain access policy**, select **Only use fine-grained access control**.<br />
      d. Click **Clear policy**.<br />
      e. Click **Save changes**.

  The following video shows how to set up a [local OpenSearch](https://opensearch.org/downloads.html) instance.

  <iframe width="560" height="315" src="https://www.youtube.com/embed/Rew3_pNnYIs" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* The instance's host URL, as follows:

  * For an AWS OpenSearch Service instance, do the following:

    1. Sign in to your AWS account, and then open your AWS Management Console.
    2. Open your Amazon OpenSearch Service console.
    3. On the sidebar, expand **Managed clusters**, and then click **Dashboard**.
    4. In the list of available domains, click the name of your domain.
    5. In the **General information** tile, copy the value of **Domain endpoint v2 (dual stack)**.

  * For a local OpenSearch instance, see [Communicate with OpenSearch](https://opensearch.org/docs/latest/getting-started/communicate/).

* The name of the search index on the instance.

  For the destination connector, if you need to create an index and you're using a master user and password as the domain's master user, you can use for example the following `curl` command. Replace the following placeholders:

  * Replace `<host>` with the instance's host URL.
  * Replace `<port>` with the instance's port number, which is typically `443` (for encrypted connections, and less commonly `9200` for unencrypted connections).
  * Replace `<master-username>` with the master user's name, and replace `<master-password>` with the master user's password.
  * Replace `<index-name>` with the name of the new search index on the instance.
  * Replace `<index-schema>` with the schema for the new search index on the instance. A schema is optional; see the explanation
    following this `curl` command for more information.

  ```bash  theme={null}
  curl --request PUT "<host>:<port>/<index-name>" \
  --user "<master-username>:<master-password>" \
  [--header "Content-Type: application/json" \
  --data '<index-schema>']
  ```

  If you're using an existing AWS IAM user as the domain's master user instead, you should use the AWS Command Line Interface (CLI) to create the index instead of using the preceding`curl` command. To learn how, see [create-index](https://docs.aws.amazon.com/cli/latest/reference/opensearch/create-index.html) in the AWS CLI Command Reference.

  For the destination connector, the index does not need to contain a schema beforehand. If Unstructured encounters an index without a schema,
  Unstructured will automatically create a compatible schema for you before inserting items into the index. Nonetheless,
  to reduce possible schema compatibility issues, Unstructured recommends that you create a schema that is compatible with Unstructured's schema.
  Unstructured cannot provide a schema that is guaranteed to work in all
  circumstances. This is because these schemas will vary based on your source files' types; how you
  want Unstructured to partition, chunk, and generate embeddings; any custom post-processing code that you run; and other factors.

  For objects in the `metadata` field that Unstructured produces and that you want to store in an OpenSearch index, you must create fields in your index's schema that
  follows Unstructured's `metadata` field naming convention. For example, if Unstructured produces a `metadata` field with the following
  child objects:

  ```json  theme={null}
  "metadata": {
    "is_extracted": "true",
    "coordinates": {
      "points": [
        [
          134.20055555555555,
          241.36027777777795
        ],
        [
          134.20055555555555,
          420.0269444444447
        ],
        [
          529.7005555555555,
          420.0269444444447
        ],
        [
          529.7005555555555,
          241.36027777777795
        ]
      ],
      "system": "PixelSpace",
      "layout_width": 1654,
      "layout_height": 2339
    },
    "filetype": "application/pdf",
    "languages": [
      "eng"
    ],
    "page_number": 1,
    "image_mime_type": "image/jpeg",
    "filename": "realestate.pdf",
    "data_source": {
      "url": "file:///home/etl/node/downloads/00000000-0000-0000-0000-000000000001/7458635f-realestate.pdf",
      "record_locator": {
        "protocol": "file",
        "remote_file_path": "file:///home/etl/node/downloads/00000000-0000-0000-0000-000000000001/7458635f-realestate.pdf"
      }
    },
    "entities": {
      "items": [
        {
          "entity": "HOME FOR FUTURE",
          "type": "ORGANIZATION"
        },
        {
          "entity": "221 Queen Street, Melbourne VIC 3000",
          "type": "LOCATION"
        }
      ],
      "relationships": [
        {
          "from": "HOME FOR FUTURE",
          "relationship": "based_in",
          "to": "221 Queen Street, Melbourne VIC 3000"
        }
      ]
    }
  }
  ```

  You can adapt the following index schema example for your own needs. Note that outside of `metadata`, the following fields are
  required by Unstructured whenever you create your own index schema:

  * `element_id`
  * `record_id`, which is required by Unstructured for intelligent record updates.
  * `type`, which is not required, but highly recommended.
  * `text`
  * `embeddings` if embeddings are generated; make sure to set `dimension` to the same number of dimensions as the embedding model generates.

  ```json  theme={null}
  {
    "settings": {
      "index": {
        "knn": true,
        "knn.algo_param.ef_search": 100
      }
    },
    "mappings": {
      "properties": {
        "element_id": {
          "type": "keyword"
        },
        "record_id": {
          "type": "text"
        },
        "text": {
          "type": "text"
        },
        "type": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "embeddings": {
          "type": "knn_vector",
          "dimension": 1536
        },
        "metadata": {
          "properties": {
            "is_extracted": {
              "type": "boolean"
            },
            "coordinates-points": {
              "type": "float"
            },
            "coordinates-system": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            },
            "coordinates-layout_width": {
              "type": "long"
            },
            "coordinates-layout_height": {
              "type": "long"
            },
            "filetype": {
              "type": "keyword"
            },
            "languages": {
              "type": "keyword"
            },
            "page_number": {
              "type": "integer"
            },
            "image_mime_type": {
              "type": "keyword"
            },
            "filename": {
              "type": "keyword"
            },
            "data_source-url": {
              "type": "keyword"
            },
            "data_source-record_locator-protocol": {
              "type": "keyword"
            },
            "data_source-record_locator-remote_file_path": {
              "type": "keyword"
            },
            "entities-items": {
              "properties": {
                "entity": {
                  "type": "text",
                  "fields": {
                    "keyword": {
                      "type": "keyword",
                      "ignore_above": 256
                    }
                  }
                },
                "type": {
                  "type": "text",
                  "fields": {
                    "keyword": {
                      "type": "keyword",
                      "ignore_above": 256
                    }
                  }
                }
              }
            },
            "entities-relationships": {
              "properties": {
                "from": {
                  "type": "text",
                  "fields": {
                    "keyword": {
                      "type": "keyword",
                      "ignore_above": 256
                    }
                  }
                },
                "relationship": {
                  "type": "text",
                  "fields": {
                    "keyword": {
                      "type": "keyword",
                      "ignore_above": 256
                    }
                  }
                },
                "to": {
                  "type": "text",
                  "fields": {
                    "keyword": {
                      "type": "keyword",
                      "ignore_above": 256
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
  ```

  See also:

  * [Create an index](https://opensearch.org/docs/latest/api-reference/index-apis/create-index/)
  * [Mappings and field types](https://opensearch.org/docs/latest/field-types/)
  * [Explicit mapping](https://opensearch.org/docs/latest/field-types/#explicit-mapping)
  * [Dynamic mapping](https://opensearch.org/docs/latest/field-types/#dynamic-mapping)
  * [Unstructured document elements and metadata](/api-reference/partition/document-elements)

* For non-local OpenSearch instances, or if you're using basic authentication to a local OpenSearch instance, the master user's name and password.

* For local OpenSearch instances, if you're using certificates for authentication instead of basic authentication:

  * The path to the Certificate Authority (CA) bundle, if you use intermediate CAs with your root CA.
  * The path to the combined private key and certificate file, or
  * The paths to the separate private key and certificate file.

  To learn more, see [Authentication backends](https://opensearch.org/docs/latest/security/authentication-backends/authc-index/), [HTTP basic authentication](https://opensearch.org/docs/latest/security/authentication-backends/basic-authc/), and [Client certificate authentication](https://opensearch.org/docs/latest/security/authentication-backends/client-auth/).

To create an OpenSearch destination connector, see the following examples.

<CodeGroup>
  ```python Python SDK theme={null}
  import os

  from unstructured_client import UnstructuredClient
  from unstructured_client.models.operations import CreateDestinationRequest
  from unstructured_client.models.shared import CreateDestinationConnector

  with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
      response = client.destinations.create_destination(
          request=CreateDestinationRequest(
              create_destination_connector=CreateDestinationConnector(
                  name="<name>",
                  type="opensearch",
                  config={
                      "hosts": ["https://<host>:<port>"],
                      "index_name": "<index-name>",
                      "username": "<username>",
                      "password": "<password>",
                      "aws_access_key_id": "<aws-access-key-id>",
                      "aws_secret_access_key": "<aws-secret-access-key>",
                      "aws_session_token": "<aws-session-token>",
                      "use_ssl": <True|False>
                  }
              )
          )
      )

      print(response.destination_connector_information)
  ```

  ```bash curl theme={null}
  curl --request 'POST' --location \
  "$UNSTRUCTURED_API_URL/destinations" \
  --header 'accept: application/json' \
  --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
  --header 'content-type: application/json' \
  --data \
  '{
      "name": "<name>",
      "type": "opensearch",
      "config": {
      "hosts": ["https://<host>:<port>"],
          "index_name": "<index-name>",
          "username": "<username>",
          "password": "<password>",
          "aws_access_key_id": "<aws-access-key-id>",
          "aws_secret_access_key": "<aws-secret-access-key>",
          "aws_session_token": "<aws-session-token>",
          "use_ssl": "true|false"
      }
  }'
  ```
</CodeGroup>

Replace the preceding placeholders as follows:

* `<name>` (*required*) - A unique name for this connector.

* `https://<host>:<port>` (*required*) - The OpenSearch instance's host URL, which typically takes the form of `https://<host>:<port>`.

* `<index-name>` (*required*) - The name of the search index on the instance.

* `<username>` - If you're using basic authentication to the instance, the domain's master user's name.

* `<password>` - If you're using basic authentication to the instance, the domain's master user's password.

* `<aws-access-key-id>` - If you're using an existing AWS IAM user as the domain's master user, the AWS access key ID for the AWS IAM user. If you're also using AWS STS for authentication, this will be a temporary AWS access key ID.

* `<aws-secret-access-key>` - If you're using an existing AWS IAM user as the domain's master user, the AWS secret access key for the AWS IAM user. If you're also using AWS STS for authentication, this will be a temporary AWS secret access key.

* `<aws-session-token>` - If you're using AWS STS for authentication, the temporary AWS STS session token.

  <Warning>
    AWS STS credentials (consisting of a temporary AWS access key, temporary AWS secret access key, and temporary AWS STS session token) can be valid for as little as 15 minutes or as long as 36 hours, depending on how the credentials were initially
    generated. After the expiry time, the credentials are no longer valid and will no longer work with the corresponding OpenSearch connector.
    You must get a new set of AWS STS credentials to replace the expired ones by, which produces
    a new, refreshed temporary AWS access key, temporary AWS secret access key, and temporary AWS STS session token. For more information, see
    [Request temporary security credentials](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_request.html).

    After you generate refreshed temporary AWS STS credentials, you must update the OpenSearch connector's settings with the new, refreshed AWS STS credentials.
  </Warning>

* `<field-name>` (*source connectors only*) - Any specific fields to be accessed in the index.

* `<use-ssl>` (*required*) - True if the OpenSearch instance requires an SSL connection; otherwise, false.

---

# Source: https://docs.unstructured.io/ui/account/organizations.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Organizational accounts

<Note>
  The following information applies only to Unstructured **Business** accounts. To upgrade from an Unstructured **Let's Go** or **Pay-As-You-Go** account
  to a **Business** account, contact your Unstructured sales representative, or
  email Unstructured Sales at [sales@unstructured.io](mailto:sales@unstructured.io).
</Note>

An *organizational account* is an Unstructured account that allows multiple users to access Unstructured resources and share billing.
An organizational account contains a collection of Unstructured [workspaces](/ui/account/workspaces) that can share resources
with subsets of users within the account. An organizational account manages access its organization and the organization's workspaces by inviting Unstructured
*personal account* users. Any usage that occurs in an organizational account by any of its invited users is applied to the shared billing for that
organizational account.

A *workspace* is a collection of Unstructured resources (such as connectors, workflows, jobs, and API keys) within an organization.
This is different from a *personal workspace*, which is a collection of resources within an Unstructured personal account. An organizational account
can have multiple workspaces.

## Organizational account architecture

Each organizational account operates independently of all other organizational accounts.

Each organizational account can have multiple workspaces. Each workspace operates independently of all other workspaces.

```mermaid  theme={null}
graph LR
    Org1[Organization 1]
    Org2[Organization 2]
    Org3[Organization 3]
    WS1A[Workspace 1A]
    WS2A[Workspace 2A]
    WS2B[Workspace 2B]
    WS3A[Workspace 3A]
    WS3B[Workspace 3B]
    WS3C[Workspace 3C]

    Org1 --> WS1A
    
    Org2 --> WS2A
    Org2 --> WS2B

    Org3 --> WS3A
    Org3 --> WS3B
    Org3 --> WS3C
```

Each organizational account can have multiple users. Users can have access to multiple organizational accounts.

```mermaid  theme={null}
graph LR
    User1[User 1]
    User2[User 2]
    User3[User 3]
    Org1[Organization 1]
    Org2[Organization 2]
    Org3[Organization 3]

    User1 -.-> Org1
    User2 -.-> Org1
    User2 -.-> Org2
    User3 -.-> Org1
    User3 -.-> Org2
    User3 -.-> Org3
```

Each user also has a *personal account*, which operates independently of all other organizational accounts and personal accounts. Each personal account has a
*personal workspace*, which operates independently of all other workspaces. Each user can add a seprate billing method to their personal account that
will be applied only to usage in their personal account.

```mermaid  theme={null}
graph LR
    User1[User 1]
    User2[User 2]
    User3[User 3]
    PA1[Personal Account for User 1]
    PA2[Personal Account for User 2]
    PA3[Personal Account for User 3]
    Org1[Organization 1]
    Org2[Organization 2]
    Org3[Organization 3]

    User1 --> PA1
    User2 --> PA2
    User3 --> PA3
    User1 -.-> Org1
    User2 -.-> Org1
    User2 -.-> Org2
    User3 -.-> Org1
    User3 -.-> Org2
    User3 -.-> Org3
```

Within an organizational account, an invited user (also known as a *member*) can have access to multiple workspaces within that organizational account.

```mermaid  theme={null}
graph LR
    User1[User 1]
    User2_Org1[User 2]
    User2_Org2[User 2]
    User3_Org1[User 3] 
    User3_Org2[User 3]
    User3_Org3[User 3]

    Org1[Organization 1]
    Org2[Organization 2]
    Org3[Organization 3]

    WS1A[Workspace 1A]
    WS2A[Workspace 2A]
    WS2B[Workspace 2B]
    WS3A[Workspace 3A]
    WS3B[Workspace 3B]
    WS3C[Workspace 3C]
    
    Org1 --> WS1A
    Org1 -.-> User1
    Org1 -.-> User2_Org1
    Org1 -.-> User3_Org1
    User1 -.-> WS1A
    User2_Org1 -.-> WS1A
    User3_Org1 -.-> WS1A
    
    Org2 --> WS2A
    Org2 --> WS2B
    Org2 -.-> User2_Org2
    Org2 -.-> User3_Org2
    User2_Org2 -.-> WS2A
    User2_Org2 -.-> WS2B
    User3_Org2 -.-> WS2B
    
    Org3 --> WS3A
    Org3 --> WS3B
    Org3 --> WS3C
    Org3 -.-> User3_Org3
    User3_Org3 -.-> WS3A
    User3_Org3 -.-> WS3B
    User3_Org3 -.-> WS3C
```

Each user must be explicitly added to the organizational accounts—and workspaces within those organizational accounts—that they are allowed to access.

## Create an organizational account

Only authorized Unstructured representatives can create organizational accounts. To request an organizational account to be created, contact your Unstructured
sales representative, or email Unstructured Sales at [sales@unstructured.io](mailto:sales@unstructured.io).

## Access an organizational account

1. Sign in to your Unstructured account.
2. In the top navigation bar, in the organizational account selector, select the name of the organizational account that you want to access.

   If the organizational account selector is not shown, or if the selector does not show the organizational account you want to access, then
   your user has not been added to that organizational account. To be added to the organizational account, contact one of the organizational account's administrators.

## Add a member to an organizational account

<Info>
  Your user must have the **Super Administrator** [role](/ui/account/roles) in the organizational account to add members to that organizational account.
</Info>

To add a user to an organizational account as a member:

1. Sign in to your Unstructured account.
2. In the top navigation bar, in the organizational account selector, select the name of the organizational account that you want to add the member to.
3. In the sidebar, above your user icon, click the **Settings** (gear) icon.
4. Click **Manage Account**.
5. In the top navigation bar, click **Members**.
6. Click **New Member**.
7. In the **Add New Member** dialog, enter the email address for the organizational account's new member.
8. In the role selector, select the new member's [role](/ui/account/roles) for this organizational account.
9. Click **Continue**.

## Change an organizational account role for a member

<Info>
  Your user must have the **Super Administrator** role in the organizational account to change its members' organizational account roles.
</Info>

1. Sign in to your Unstructured account.
2. In the top navigation bar, in the organizational account selector, select the name of the organizational account that contains the member you want to change roles for.
3. In the sidebar, above your user icon, click the **Settings** (gear) icon.
4. Click **Manage Account**.
5. In the top navigation bar, click **Members**.
6. Click the member you want to change roles for.
7. Next to **Role** click the **Edit** (pencil) icon.
8. Select the new role.
9. Click the **Accept** (check mark) icon.

## Remove a member from an organizational account

<Warning>
  Removing a member from an organizational account also removes the member from all of the organizational account's workspaces. The member will no longer
  have access to the organizational account or any of its workspaces. The member will continue to be able to use their personal account and personal workspace.

  Any connectors, workflows, jobs, or other resources that were created by the member within the organizational accounts' workspaces will not be removed.

  The only way to reverse removing a member from an organizational account is to add the member back to the organizational account—and to each of the workspaces that the member might have previously
  been a member of in that organizational account.
</Warning>

<Info>
  Your user must have the **Super Administrator** [role](/ui/account/roles) in the organizational account to remove members from that organizational account.
</Info>

1. Sign in to your Unstructured account.
2. In the top navigation bar, in the organizational account selector, select the name of the organizational account that you want to remove the member from.
3. In the sidebar, above your user icon, click the **Settings** (gear) icon.
4. Click **Manage Account**.
5. In the top navigation bar, click **Members**.
6. In the list of members, click the member you want to remove.
7. In the member's settings pane, click **Remove Member**.
8. Confirm the removal by clicking **Continue**.

## Permanently delete an organizational account

<Warning>
  Deleting an organizational account is a permanent action and is not recoverable. Once an organizational account is deleted, all workspaces associated with that
  organizational account are also deleted and are not recoverable.

  The organizational account's members will no longer have access to the deleted organizational account or any of its workspaces.
  Each member will continue to be able to use their personal account and personal workspace.

  Any code or scripts that reference connectors, workflows, jobs, or workflow API keys that are associated with workflows in the deleted
  organizational account will fail.
</Warning>

Only authorized Unstructured representatives can delete organizational accounts. To request an organizational account to be permanently deleted, contact your Unstructured
sales representative, or email Unstructured Sales at [sales@unstructured.io](mailto:sales@unstructured.io).

---

# Source: https://docs.unstructured.io/ui/sources/outlook.md

# Source: https://docs.unstructured.io/open-source/ingestion/source-connectors/outlook.md

# Source: https://docs.unstructured.io/api-reference/workflow/sources/outlook.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Outlook

<Note>
  If you're new to Unstructured, read this note first.

  Before you can create a source connector, you must first sign in to your Unstructured account:

  * If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
    After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).
    To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/api-reference/overview#pricing).
  * If you already have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at
    [https://platform.unstructured.io](https://platform.unstructured.io). For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
    or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  After you sign in, the [Unstructured user interface](/ui/overview) (UI) appears, which you use to get your Unstructured API key, as follows:

  1. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business** account, click **API Keys** on the sidebar.<br />

     For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
     for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).

  2. Click **Generate API Key**.<br />

  3. Follow the on-screen instructions to finish generating the key.<br />

  4. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

  After you create the source connector, add it along with a
  [destination connector](/api-reference/workflow/destinations/overview) to a [workflow](/api-reference/workflow/overview#workflows).
  Then run the worklow as a [job](/api-reference/workflow/overview#jobs). To learn how, try out the
  the notebook [Dropbox-To-Pinecone Connector API Quickstart for Unstructured](https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Dropbox_To_Pinecone_Connector_Quickstart.ipynb),
  or watch the two 4-minute video tutorials for the [Unstructured Python SDK](/api-reference/workflow/overview#unstructured-python-sdk).

  You can also create source connectors with the Unstructured user interface (UI).
  [Learn how](/ui/sources/overview).

  If you need help, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  You are now ready to start creating a source connector! Keep reading to learn how.
</Note>

Ingest your files into Unstructured from Outlook.

The requirements are as follows.

<iframe width="560" height="315" src="https://www.youtube.com/embed/9yESRp9pzv0" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* The Outlook user's email address.
* A Microsoft Entra ID app registration in the same Azure account as the Outlook account. You will need
  this app registration's application (client) ID, client secret, and directory (tenant) ID. [Learn how](https://learn.microsoft.com/entra/identity-platform/quickstart-register-app).
* The Entra ID app registration must have the following Graph API permission levels of the application (not delegated) type:

  * `Mail.Read`
  * `Mail.ReadBasic`
  * `User.Read.All`

  [Learn how](https://learn.microsoft.com/entra/identity-platform/howto-update-permissions).

To create an Outlook source connector, see the following examples.

<CodeGroup>
  ```python Python SDK theme={null}
  import os

  from unstructured_client import UnstructuredClient
  from unstructured_client.models.operations import CreateSourceRequest
  from unstructured_client.models.shared import CreateSourceConnector

  with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
      response = client.sources.create_source(
          request=CreateSourceRequest(
              create_source_connector=CreateSourceConnector(
                  name="<name>",
                  type="outlook",
                  config={
                      "client_id": "<client-id>",
                      "authority_url": "<authority-url>",
                      "tenant": "<tenant>",
                      "client_cred": "<client-cred>",
                      "user_email": "<user-email>",
                      "outlook_folders": ["<folder-name>", "<folder-name>"],
                      "recursive": <True|False>
                  }
              )
          )
      )

      print(response.source_connector_information)
  ```

  ```bash curl theme={null}
  curl --request 'POST' --location \
  "$UNSTRUCTURED_API_URL/sources" \
  --header 'accept: application/json' \
  --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
  --header 'content-type: application/json' \
  --data \
  '{
      "type": "outlook",
      "name": "<name>",
      "config": {
          "client_id": "<client-id>",
          "authority_url": "<authority-url>",
          "tenant": "<tenant>",
          "client_cred": "<client-cred>",
          "user_email": "<user-email>",
          "outlook_folders": ["<folder-name>","<folder-name>"],
          "recursive": <true|false>
      }
  }'
  ```
</CodeGroup>

Replace the preceding placeholders as follows:

* `<name>` (*required*) - A unique name for this connector.
* `<client-id>` (*required*) - The application (client) ID of the Microsoft Entra ID app registration that has access to the Outlook account.
* `<authority-url>` - The authentication token provider URL for the Entra ID app registration. The default is `https://login.microsoftonline.com` if not otherwise specified.
* `<tenant>` (*required*) - The directory (tenant) ID of the Entra ID app registration.
* `<client-cred>` (*required*) - The client secret for the Entra ID app registration.
* `<user-email>` (*required*) - The user's email address for the target Outlook account.
* `outlook_folders` is a *required* array of strings, with each `<folder-name>` specifying the name of a folder to access, for example: `["Inbox","Sent"]`. Do not specify any subfolder names.
* For `recursive` (source connector only), set to `true` to access any and all subfolders within each of the specified top-level folders. The default is `false` if not otherwise specified.

---

# Source: https://docs.unstructured.io/ui/sources/overview.md

# Source: https://docs.unstructured.io/ui/overview.md

# Source: https://docs.unstructured.io/ui/enriching/overview.md

# Source: https://docs.unstructured.io/ui/destinations/overview.md

# Source: https://docs.unstructured.io/ui/account/overview.md

# Source: https://docs.unstructured.io/support/issues/overview.md

# Source: https://docs.unstructured.io/support/how-to/overview.md

# Source: https://docs.unstructured.io/open-source/introduction/overview.md

# Source: https://docs.unstructured.io/open-source/installation/overview.md

# Source: https://docs.unstructured.io/open-source/ingestion/source-connectors/overview.md

# Source: https://docs.unstructured.io/open-source/ingestion/overview.md

# Source: https://docs.unstructured.io/open-source/ingestion/ingest-configuration/overview.md

# Source: https://docs.unstructured.io/open-source/ingestion/destination-connectors/overview.md

# Source: https://docs.unstructured.io/open-source/core-functionality/overview.md

# Source: https://docs.unstructured.io/business/security-compliance/overview.md

# Source: https://docs.unstructured.io/business/overview.md

# Source: https://docs.unstructured.io/business/gcp/overview.md

# Source: https://docs.unstructured.io/business/azure/overview.md

# Source: https://docs.unstructured.io/business/aws/overview.md

# Source: https://docs.unstructured.io/api-reference/workflow/sources/overview.md

# Source: https://docs.unstructured.io/api-reference/workflow/overview.md

# Source: https://docs.unstructured.io/api-reference/workflow/destinations/overview.md

# Source: https://docs.unstructured.io/api-reference/overview.md

# Source: https://docs.unstructured.io/api-reference/legacy-api/partition/overview.md

# Source: https://docs.unstructured.io/api-reference/legacy-api/overview.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Overview

Unstructured has deprecated the following APIs:

* The [Unstructured API on AWS](/api-reference/legacy-api/aws) is deprecated. It is no longer supported and is not being actively updated.
  Unstructured is now available on the AWS Marketplace as a private offering. To explore supported options
  for running Unstructured within your virtual private cloud (VPC), email Unstructured Sales at
  [sales@unstructured.io](mailto:sales@unstructured.io).
* The [Unstructured API on Azure](/api-reference/legacy-api/azure) is deprecated. It is no longer supported and is not being actively updated.
  Unstructured is now available on the AWS Marketplace as a private offering. To explore supported options
  for running Unstructured within your virtual private cloud (VPC), email Unstructured Sales at
  [sales@unstructured.io](mailto:sales@unstructured.io).

Unstructured considers the following APIs to be in a legacy state:

* The [Unstructured Partition Endpoint](/api-reference/legacy-api/partition/overview) is in a legacy state. Although it
  is still supported, Unstructured recommends that you use the
  [on-demand jobs](/api-reference/workflow/overview#run-an-on-demand-job) functionality in the
  [Unstructured API](/api-reference/overview) instead.

These pages are not being actively updated. They might contain out-of-date information. These pages are provided for legacy reference purposes only.

---

# Source: https://docs.unstructured.io/open-source/ingestion/ingest-configuration/partition-configuration.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Partition configuration

A standard partition configuration is a collection of parameters designed to oversee document partitioning, whether executed through API integration or by the unstructured library on a local system. These parameters serve a dual role, encompassing those passed to the partition method for the initial segmentation of documents and those responsible for coordinating data after processing, including the dynamic metadata associated with each element.

## Configs for Partitioning

* `additional_partition_args`: A JSON string representation of any values to pass through to the `partition` function.

* `encoding`: The encoding method used to decode the text input. By default, UTF-8 will be used.

* `ocr_languages`: The languages present in the document, for use in partitioning, OCR, or both. Multiple languages indicate that the text could be in any of the specified languages.

* `skip_infer_table_types`: List of document types that you want to skip table extraction with.

* `strategy`: Default: `auto`. The strategy to use for partitioning PDF and image files. Uses a layout detection model if set to `hi_res`. Otherwise, partitioning simply extracts the text from the document and processes it.

## Configs for the Process

* `api_key`: If `partition_by_api` is set to `True`, requests that are sent to the Unstructured API will use this Unstructured API key to make authenticated calls.

* `fields_include`: Fields to include in the output JSON. By default, the following fields are included: `element_id`, `text`, `type`, `metadata`, and `embeddings`.

* `flatten_metadata`: Default: `False`. If set to `True`, the hierarchical metadata structure is flattened to have all values exist at the top level.

* `hi_res_model_name`: The model to use when `strategy` is set to `hi_res`. Available values are `layout_v1.0.0` (the default) and `yolox`.

* `metadata_exclude`: Values from the `metadata` field to exclude from the output.

* `metadata_include`: If provided, only the specified fields are preserved in the `metadata` output.

* `partition_by_api`: Default: `False`. If set to `True`, uses Unstructured to run partitioning. If set to `False`, runs partitioning locally.

* `partition_endpoint`: If `partition_by_api` is set to `True`, partitioning requests are sent to this Unstructured API URL.

---

# Source: https://docs.unstructured.io/open-source/concepts/partitioning-strategies.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Partitioning strategies

For certain document types, such as images and PDFs, for example, Unstructured products offer a variety of different
ways to preprocess them, controlled by the `strategy` parameter.

PDF documents, for example, vary in quality and complexity. In simple cases, traditional NLP extraction techniques may
be enough to extract all the text out of a document. In other cases, advanced image-to-text models are required
to process a PDF. You can think of the strategies as being "rule-based" workflows (thus they are "fast"), or
"model-based" workflows (slower workflow because it requires model inference, but you get "higher resolution", thus "hi\_res").
When choosing a partitioning strategy for your files, you have to be mindful of the quality/speed trade-off.
To give you an example, the `fast` strategy is roughly 100x faster than leading image-to-text models.

**Available options:**

* `auto` (default strategy): The "auto" strategy will choose the partitioning strategy based on document characteristics and the function kwargs.
* `fast`:  The "rule-based" strategy leverages traditional NLP extraction techniques to quickly pull all the text elements. "Fast" strategy is not recommended for image-based file types.
* `hi_res`: The "model-based" strategy identifies the layout of the document. The advantage of "hi\_res" is that it uses the document layout to gain additional information about document elements. We recommend using this strategy if your use case is highly sensitive to correct classifications for document elements.
* `ocr_only`: Another "model-based" strategy that leverages Optical Character Recognition to extract text from the image-based files.

**These strategies are available on the following partition functions:**

| Document Type            | Partition Function | Strategies                             | Table Support | Options                                                                                      |
| ------------------------ | ------------------ | -------------------------------------- | ------------- | -------------------------------------------------------------------------------------------- |
| Images (.png/.jpg/.heic) | partition\_image   | “auto”, “hi\_res”, “ocr\_only”         | Yes           | Encoding; Include Page Breaks; Infer Table Structure; OCR Languages, Strategy                |
| PDFs (.pdf)              | partition\_pdf     | “auto”, “fast”, “hi\_res”, “ocr\_only” | Yes           | Encoding; Include Page Breaks; Infer Table Structure; Max Partition; OCR Languages, Strategy |

---

# Source: https://docs.unstructured.io/ui/partitioning.md

# Source: https://docs.unstructured.io/open-source/core-functionality/partitioning.md

# Source: https://docs.unstructured.io/api-reference/legacy-api/partition/partitioning.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Partitioning strategies

<Note>
  The following information applies to the legacy Unstructured Partition Endpoint.

  Unstructured recommends that you use the
  [on-demand jobs](/api-reference/workflow/overview#run-an-on-demand-job) functionality in the
  [Unstructured API](/api-reference/overview) instead. Unstructured's on-demand jobs provide
  many benefits over the legacy Unstructured Partition Endpoint, including support for:

  * Production-level usage.
  * Multiple local input files in batches.
  * The latest and highest-performing models.
  * Post-transform enrichments.
  * All of Unstructured's chunking strategies.
  * The generation of vector embeddings.

  The Unstructured API also provides support for processing files and data in remote locations.
</Note>

For certain document types, such as images and PDFs, for example, Unstructured products offer a variety of different
ways to preprocess them, controlled by the `strategy` parameter.

PDF documents, for example, vary in quality and complexity. In simple cases, traditional NLP extraction techniques may
be enough to extract all the text out of a document. In other cases, advanced image-to-text models are required
to process a PDF. You can think of the strategies as being "rule-based" workflows (thus they are "fast"), or
"model-based" workflows (slower workflow because it requires model inference, but you get "higher resolution", thus "hi\_res").
When choosing a partitioning strategy for your files, you have to be mindful of the quality/speed trade-off.
To give you an example, the `fast` strategy is roughly 100x faster than leading image-to-text models.

**Available options:**

* `auto` (default strategy): The "auto" strategy will choose the partitioning strategy based on document characteristics and the function kwargs.
* `fast`:  The "rule-based" strategy leverages traditional NLP extraction techniques to quickly pull all the text elements. "Fast" strategy is not recommended for image-based file types.
* `hi_res`: The "model-based" strategy identifies the layout of the document. The advantage of "hi\_res" is that it uses the document layout to gain additional information about document elements. We recommend using this strategy if your use case is highly sensitive to correct classifications for document elements.
* `ocr_only`: Another "model-based" strategy that leverages Optical Character Recognition to extract text from the image-based files.
* `vlm`: Uses a vision language model (VLM) to extract text from these file types: `.bmp`, `.gif`, `.heic`, `.jpeg`, `.jpg`, `.pdf`, `.png`, `.tiff`, and `.webp`.

---

# Source: https://docs.unstructured.io/examplecode/tools/pii.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# PII detection

Personally identifiable information (PII) detection is important as part of an organization's overall strategy for
minimizing potential harm from misuse or unauthorized access to individuals' private data. Detecting PII is a first step
toward helping avoid identity theft, maintaining privacy, and building trust with customers and users. Organizations
might also need to follow various data protection regulations, making PII detection a crucial part of the
organization's legal compliance framework.

This hands-on example walkthrough demonstrates how to use the [Microsoft Presidio SDK](https://microsoft.github.io/presidio/)
to identify and then redact PII in Unstructured JSON output files. Presidio can identify
and redact or anonymize entities in text and images such as credit card numbers, names, locations, social security numbers,
bitcoin wallets, US phone numbers, financial data, and more.

In this walkthrough, you will use Python code to connect to a folder within an Amazon S3 bucket that already contains a
collection of Unstructured JSON output files. For each file, your code will use Presidio to identify PII that matches
specific patterns and then redact the identified PII. Your code will then write the Unstrucutred JSON output files' contents
that contain the redacted PII to a separate folder within an S3 bucket. You can then compare the output JSON files generated by
Unstructured to the redacted data generated by Presidio and see the impact of using Presidio for PII detection.

*Redaction* directly removes or obscures PII, such as replacing names with placeholders, or blacking out sensitive text.
The example code from this walkthrough replaces the detected PII characters within text strings with placeholder text. Another technique, *anonymization*,
can involve redaction, but also includes techniques such as *generalization* (for example, replacing specific dates
with age ranges), *suppression* (removing entire data fields), and *data masking* (replacing data with random values
while preserving the format). Presidio supports both redaction and anonymization. Although the term "anonymize" is
visible throughout this walkthrough's example code, Presidio's available anonymization techniques are not explored here—only redaction.

<Note>
  The Microsoft Presidio SDK is not offered, maintained, or supported by Unstructured. For questions or issues
  related to Presidio, see the following resources:

  * For general discussions, use the [discussion board](https://github.com/microsoft/presidio/discussions) in the Presidio repository on GitHub.
  * For questions or issues, file an [issue](https://github.com/microsoft/presidio/issues) in the Presidio repository on GitHub.
  * For other matters, email [presidio@microsoft.com](mailto:presidio@microsoft.com).

  The example code in this walkthrough is provided as a general reference only. It is not intended to substitute
  for a complete PII detection strategy.

  The example code in this walkthrough is not guaranteed to detect and redact all possible PII. For instance, the code
  looks for PII only in text strings. It does not look for PII in non-text fields such as `image_base64` and `orig_elements`
  within Unstructured metadata, and it does not look for PII in images.
</Note>

## Requirements

To use this example, you will need:

* An Unstructured account, as follows:

  1. If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
     After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).

     <Note>
       To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/ui/overview#how-am-i-billed%3F).
     </Note>

  2. If you have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at [https://platform.unstructured.io](https://platform.unstructured.io).

     <Note>
       For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
       or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).
     </Note>

* A set of one or more Unstructured JSON output files that have been generated by Unstructured and stored in a folder within an
  Amazon S3 bucket that you have access to. One way to generate these files is to use an Unstructured workflow that
  relies on an S3 destination connector to store these Unstructured JSON output files. Learn how to [create an S3 destination connector](/ui/destinations/s3) and
  [create a custom workflow](/ui/workflows#create-a-custom-workflow) that uses your S3 destination connector.

* Python installed on your local development machine.

## Create and run the Python code

1. In your local Python virtual environment, install the following libraries:

   * `boto3`
   * `presidio_analyzer`
   * `presidio_anonymizer`

   For example, if you are using [uv](https://docs.astral.sh/uv/), you can install these libraries into your local `uv`
   virtual environment with the following command:

   ```bash  theme={null}
   uv add boto3 presidio_analyzer presidio_anonymizer
   ```

2. In your local Python virtual environment, install the appropriate natural language processing (NLP) models for
   [spaCy](https://spacy.io/), which Presidio relies on for various internal tasks related to named entity recognition (NER) and
   PII identification.

   To find the appropriate model for your use case, do the following:

   a. Go to [spaCy Trained Models & Pipelines](https://spacy.io/models).<br />
   b. On the sidebar, click your target language, for example **English**.<br />
   c. Click the model you want to use, for example **en\_core\_web\_lg**.<br />
   d. Click **Release details**.<br />
   e. At the bottom of the release details page, in the **Assets** section, right-click the filename ending in `.whl`, for
   example **en\_core\_web\_lg-3.8.0-py3-none-any.whl**, and select **Copy Link Address** from the context menu.<br />
   f. From the release details page, copy the URL from your web browser's address bar.<br />
   g. Install the model into your local Python virtual environment by using the model's name and the URL that you just copied.
   For example, if you are using `uv`, you can install the preceding model with a command such as the following:<br />

   ```bash  theme={null}
   uv pip install en_core_web_lg@https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.8.0/en_core_web_lg-3.8.0-py3-none-any.whl
   ```

3. [Set up Boto3 credentials](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html) for your AWS account.
   The following steps assume you have set up your Boto3 credentials from outside of the following code, such as setting
   [environment variables](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html#environment-variables) or
   configuring a [shared credentials file](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html#shared-credentials-file),

   One approach to getting and setting up Boto3 credentials is to [create an AWS access key and secret access key](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html#Using_CreateAccessKey)
   and then use the [AWS Command Line Interface (AWS CLI)](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html)
   to [set up your credentials](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html#cli-configure-files-methods) on your local development machine.

   <iframe width="560" height="315" src="https://www.youtube.com/embed/MoFTaGJE65Q" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

4. Add the following code to a Python script file in your virtual environment, replacing the following placeholders:

   * Replace `<input-bucket-name>` with the name of the Amazon S3 bucket that contains your original Unstructured JSON files. This is the same
     bucket that you used for your S3 destination connector.
   * Replace `<input-folder-prefix>` with the path to the folder within the input bucket that contains your original Unstructured JSON files.
   * Replace `<output-bucket-name>` with the name of the S3 bucket that will contain copies of the contents of your Unstructured JSON files,
     with the redacted content within those files' copies. This can be the same bucket as the input bucket, or a different bucket.
   * Replace `<output-folder-prefix>` with the path to the folder within the output bucket that will contain copies of the contents of your Unstructured
     JSON files, with the redacted content within those files' copies. This must not be the same folder as the input folder.
   * Replace `<bucket-region-short-id>` with the short ID of the region where your buckets are located, for example `us-east-1`.

   For the `operators` variable, a list of operators for built-in Presidio entities is specified. These operators look for common entities such as
   credit card numbers, email addresses, phone numbers, and more. You can remove any entities from this list that you do not
   want your code to look for. You can also add operators to this list for additional
   [built-in entities](https://microsoft.github.io/presidio/supported_entities/) that you want your code to also look for. And you can also
   [add your own custom entities](https://microsoft.github.io/presidio/analyzer/adding_recognizers/) to this list.

   ```python  theme={null}
   import boto3
   import json

   from presidio_analyzer import AnalyzerEngine
   from presidio_anonymizer import AnonymizerEngine
   from presidio_anonymizer.entities import OperatorConfig

   operators={
       "CREDIT_CARD": OperatorConfig(operator_name="replace", params={'new_value': '<REDACTED_CREDIT_CARD>'}),
       "CRYPTO": OperatorConfig(operator_name="replace", params={'new_value': '<REDACTED_CRYPTO>'}),
       "EMAIL_ADDRESS": OperatorConfig(operator_name="replace", params={'new_value': '<REDACTED_EMAIL_ADDRESS>'}),
       "IBAN_CODE": OperatorConfig(operator_name="replace", params={'new_value': '<REDACTED_IBAN_CODE>'}),
       "IP_ADDRESS": OperatorConfig(operator_name="replace", params={'new_value': '<REDACTED_IP_ADDRESS>'}),
       "NRP": OperatorConfig(operator_name="replace", params={'new_value': '<REDACTED_NRP>'}),
       "LOCATION": OperatorConfig(operator_name="replace", params={'new_value': '<REDACTED_LOCATION>'}), 
       "PERSON": OperatorConfig(operator_name="replace", params={'new_value': '<REDACTED_PERSON>'}),
       "PHONE_NUMBER": OperatorConfig(operator_name="replace", params={'new_value': '<REDACTED_PHONE_NUMBER>'}),
       "MEDICAL_LICENSE": OperatorConfig(operator_name="replace", params={'new_value': '<REDACTED_MEDICAL_LICENSE>'}),
       "URL": OperatorConfig(operator_name="replace", params={'new_value': '<REDACTED_URL>'}),
       "US_BANK_NUMBER": OperatorConfig(operator_name="replace", params={'new_value': '<REDACTED_US_BANK_NUMBER>'}),
       "US_DRIVER_LICENSE": OperatorConfig(operator_name="replace", params={'new_value': '<REDACTED_US_DRIVER_LICENSE>'}),
       "US_ITIN": OperatorConfig(operator_name="replace", params={'new_value': '<REDACTED_US_ITIN>'}),
       "US_PASSPORT": OperatorConfig(operator_name="replace", params={'new_value': '<REDACTED_US_PASSPORT>'}),
       "US_SSN": OperatorConfig(operator_name="replace", params={'new_value': '<REDACTED_US_SSN>'})
   }

   # Recursively check for string values in the provided JSON object (in this case, 
   # the "metadata" field of the JSON object) and redact them 
   # as appropriate.
   def check_string_values(obj, analyzer, anonymizer):
       if isinstance(obj, dict):
           for key, value in obj.items():
               # Skip analyzing Base64-encoded fields.
               if key == 'image_base64' or key == 'orig_elements':
                   pass
               elif isinstance(value, str):
                   anonymized_results = anonymizer.anonymize(
                       text=value,
                       analyzer_results=analyzer.analyze(text=value, language="en"),
                       operators=operators
                   )
                   value = anonymized_results.text
               # Recurse through nested "metadata" fields.
               elif isinstance(value, dict):
                   check_string_values(value, analyzer, anonymizer)
               # Skip analyzing non-string fields.
               else:
                   pass
       return obj

   def main():
       s3_input_bucket_name    = '<input-bucket-name>'
       s3_input_folder_prefix  = '<input-folder-prefix>'
       s3_output_bucket_name   = '<output-bucket-name>'
       s3_output_folder_prefix = '<output-folder-prefix>'
       s3_bucket_region        = '<bucket-region-short-id>'

       s3_client = boto3.client('s3')

       # Load the JSON files from the input folder.
       # Normalize the input folder prefix to ensure it ends with '/'.
       if not s3_input_folder_prefix.endswith('/'):
           s3_input_folder_prefix += '/'

       paginator = s3_client.get_paginator('list_objects_v2')
       page_iterator = paginator.paginate(
           Bucket=s3_input_bucket_name, 
           Prefix=s3_input_folder_prefix
       )
       files = []

       # Get the list of file keys from the input folder to analyze.
       # A file's key is the full path to the file within the bucket.
       # For example, if the input folder's name is "original" and the 
       # input file's name is "file1.json", the file's key is 
       # "original/file1.json".
       # There could be multiple "pages" of file listings available, 
       # so each of these "pages" must be looped through, so that 
       # no files are missed.
       for page in page_iterator:
           # "Contents" is missing if the folder is empty or the 
           # intended prefix is not found.
           if 'Contents' in page:
               for obj in page['Contents']:
                   key = obj['Key']
                   if not key.endswith('/'):  # Skip if it's a folder placeholder.
                       files.append(key)
                       print(f"Found file: {s3_input_bucket_name}/{key}")

       analyzer = AnalyzerEngine()
       anonymizer = AnonymizerEngine()
       s3_resource = boto3.resource('s3')

       # For each JSON file to analyze, load the JSON data.
       for key in files:
           print(f"Analyzing file: {s3_input_bucket_name}/{key}")
           content_object = s3_resource.Object(
               bucket_name=s3_input_bucket_name, 
               key=key 
           )

           file_content = content_object.get()['Body'].read().decode('utf-8')  # Bytes to text.
           json_data = json.loads(file_content) # Text to JSON.

           # For each element in the JSON data...
           for element in json_data:
               print(f"    Analyzing element with ID: {element['element_id']} in file {s3_input_bucket_name}/{key}")
               # If there is a "text" field...
               if 'text' in element:
                   # ...get the text content...
                   text_element = element['text']
                   # ...and analyze and redact the text content as appropriate.
                   anonymized_results = anonymizer.anonymize(
                       text=text_element,
                       analyzer_results=analyzer.analyze(text=text_element, language="en"),
                       operators=operators
                   )
                   element['text'] = anonymized_results.text
               # If there is a "metadata" field...
               if 'metadata' in element:
                   # ...get the metadata content...
                   metadata_element = element['metadata']
                   # ...and analyze and redact the metadata content as appropriate.
                   element['metadata'] = check_string_values(metadata_element, analyzer, anonymizer)

           # Get the filename from the key.
           filename = key.split(s3_input_folder_prefix)[1]

           # Normalize the output folder prefix to ensure it ends with '/'.
           if not s3_output_folder_prefix.endswith('/'):
               s3_output_folder_prefix += '/'
           
           # Then save the JSON data with its redactions to the output folder.
           print(f"Saving file: {s3_output_bucket_name}/{s3_output_folder_prefix}{filename}")
           s3_client.put_object(
               Bucket=s3_output_bucket_name,
               Key=f"{s3_output_folder_prefix}{filename}",
               Body=json.dumps(obj=json_data, indent=4).encode('utf-8')
           )

   if __name__ == "__main__":
       main()
   ```

5. Run the Python script.

6. Go to the output folder in S3 and explore the generated files, searching for the `<REDACTED_` placeholders in the generated files' contents.

---

# Source: https://docs.unstructured.io/ui/destinations/pinecone-destination-quickstart.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Pinecone destination connector quickstart

Unstructured can connect to several types of [destinations](/ui/connectors#destinations). In this quickstart, you create a [Pinecone](https://www.pinecone.io) destination connector that you can add to your Unstructured [workflows](/ui/workflows).
This destination connector enables your Unstructured workflows to send processed data about your documents into a [Pinecone dense serverless index](https://docs.pinecone.io/guides/index-data/indexing-overview#dense-indexes) in your Pinecone account.

In this quickstart, you will:

* Create a free Pinecone account.
* Create a Pinecone destination connector in your Unstructured account.
* Add the Pinecone destination connector to a workflow in your Unstructured account.

If you are not able to complete the following steps, contact Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

## Step 1: Create your Pinecone account

In this step, you create a [free Pinecone Starter account](https://www.pinecone.io/pricing/#plan-comparison-table). This account is limited to up to 2 users, up to 2 GB of total storage, and has rate-limited reads and writes.

If you already have a Pinecone account and want to use it instead, then skip ahead to Step 2.

1. Go to the Pinecone home page, at [https://www.pinecone.io](https://www.pinecone.io).

2. Click **Sign up**.

3. Enter your email address, and then click **Continue**.

4. Leaving this tab open in your web browser, go to your email inbox, and open the email messagefrom Pinecone. Copy the verification code that appears in this email message.

5. On the tab that you left open in your web browser, paste the verification code into the **your code** box, and then click **Continue**.

6. Select the radio button labelled **I'm building a small or personal project**, and then click **Start for free**.

7. On the next page, enter the following information to continue creating your Pinecone account:

   * Enter your **First Name** and **Last Name**.
   * For **Purpose of use**, select **Personal**.
   * For **Preferred coding language**, select **Python**.

8. Click **Continue**.

9. On the next page, enter the following information to finish creating your Pinecone account:

   * For **Use case**, select **RAG**.
   * For **What does your data look like**, select **I have raw documents**.
   * For **What is your goal**, select **Build an AI feature/product**.

10. Click **Get started**.

11. In the **API key generated** dialog that appears, click the copy icon to copy your API key's value. Save this key's value to a secure location, as you will need it later in Step 2.

    <img src="https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/API-Key.png?fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=3b0f8d3f9cb2df8a719333c4502ccb3f" alt="Pinecone API key" data-og-width="1506" width="1506" data-og-height="756" height="756" data-path="img/pinecone-quickstart/API-Key.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/API-Key.png?w=280&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=350b8ca12c20d8529b24aa90c727cd9c 280w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/API-Key.png?w=560&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=65fae8f9a1932ba2caf56362af95473f 560w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/API-Key.png?w=840&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=6613315a8715e909fd9c5b94130d9865 840w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/API-Key.png?w=1100&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=8663673a6c5066591492fc638900633c 1100w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/API-Key.png?w=1650&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=fef4ba84fea942af5777aa8a893b2e83 1650w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/API-Key.png?w=2500&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=011a445f94ef1c5acac11e28de0c4d3b 2500w" />

    <Tip>
      If you lose this API key's value later, you will not be able to get it back. However, you can [create another API key](https://docs.pinecone.io/guides/projects/manage-api-keys#create-an-api-key).
      Be sure also to [delete the entry for the lost API key](https://docs.pinecone.io/guides/projects/manage-api-keys#delete-an-api-key).
    </Tip>

12. Click **Close**. The Pinecone user interface (UI) appears.

    <img src="https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Pinecone-UI.png?fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=97e192f941a85598d43e65e9d3fecb43" alt="Pinecone user interface" data-og-width="3412" width="3412" data-og-height="1824" height="1824" data-path="img/pinecone-quickstart/Pinecone-UI.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Pinecone-UI.png?w=280&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=a16e089163ee9f878039be06f1911db5 280w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Pinecone-UI.png?w=560&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=aed42502ae3f0da03185330d25cbb01b 560w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Pinecone-UI.png?w=840&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=73255fd71f0fdbc24dc862ea541d4371 840w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Pinecone-UI.png?w=1100&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=a365a538a79259eb81b6be7be0df9914 1100w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Pinecone-UI.png?w=1650&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=ac71e72ea4352d76730b95ba6c050c5a 1650w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Pinecone-UI.png?w=2500&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=725003ce2ed53a85fa1b82a22a9b780d 2500w" />

## Step 2: Create the Pinecone destination connector

In this step, you create a Pinecone destination connector in your Unstructured account. This destination connector
is used by Unstructured to send processed data about your documents into a Pinecone dense serverless index in your Pinecone account.
Unstructured will create this index for you in your Pinecone account. You can then use this index later as input to your RAG applications
and agentic AI workflows.

1. If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
   After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).

   <Note>
     To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/ui/overview#how-am-i-billed%3F).
   </Note>

2. If you have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at [https://platform.unstructured.io](https://platform.unstructured.io).

   <Note>
     For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
     or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).
   </Note>

3. On the sidebar, click **Connectors**.

   <img src="https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Sidebar-Connectors.png?fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=29c1925c8205f7def1b77f776601210e" alt="Connectors sidebar icon" data-og-width="77" width="77" data-og-height="216" height="216" data-path="img/pinecone-quickstart/Sidebar-Connectors.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Sidebar-Connectors.png?w=280&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=5b7a984fb1714e72b88d121e099d9f35 280w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Sidebar-Connectors.png?w=560&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=3a7f0ec96732df8359a12208d233d106 560w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Sidebar-Connectors.png?w=840&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=49570ed905dffc9daaa1e65bd248b6c8 840w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Sidebar-Connectors.png?w=1100&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=0ee5999f1e96df45a913f3c469d29b9b 1100w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Sidebar-Connectors.png?w=1650&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=653470445340a30af6d60f1c2caf4a8e 1650w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Sidebar-Connectors.png?w=2500&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=c8e87db8e7ffbe63ee4d81305b2efd82 2500w" />

4. Click **+ New**.

5. Enter some unique name for this connector, for example `pinecone-destination-connector`.

6. For **Type**, click **Destination**.

7. For **Provider**, click **Pinecone**.

8. Click **Continue**.

   <img src="https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Select-Connector-Type.png?fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=643b5dcf0c1f54b0e3792b01e70fcefa" alt="Selecting the connector type" data-og-width="1402" width="1402" data-og-height="1472" height="1472" data-path="img/pinecone-quickstart/Select-Connector-Type.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Select-Connector-Type.png?w=280&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=7c05242922c58a0f68527184aca12648 280w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Select-Connector-Type.png?w=560&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=fff349c6e853de4bd2382d91ee30d407 560w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Select-Connector-Type.png?w=840&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=9ca5e3749375c5a5785772e62d5af52a 840w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Select-Connector-Type.png?w=1100&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=ed888f4c324bea047aefa74f64352697 1100w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Select-Connector-Type.png?w=1650&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=bb317ebd2d3992ac47751c8fdac32578 1650w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Select-Connector-Type.png?w=2500&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=2eb5f9d73ee89c71076715c090cdeeb3 2500w" />

9. Leave **Index Name** blank.

10. Leave **Namespace** set to the default value of `default`.

11. Leave **Batch Size** set to the default value of `50`.

12. For **API Key**, enter the **API key** you noted in Step 1.

13. Click **Save and Test**, and wait while Unstructured tests the connector.

    <img src="https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Source-Connector-Settings.png?fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=4143dbf1cbdba0ebd45636c56ae4888f" alt="Entering connector settings" data-og-width="1426" width="1426" data-og-height="1322" height="1322" data-path="img/pinecone-quickstart/Source-Connector-Settings.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Source-Connector-Settings.png?w=280&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=337f91269f0e8c26bd8bd588e96b874d 280w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Source-Connector-Settings.png?w=560&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=0705b7cd498e0587b7f6a60ecaecb3cc 560w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Source-Connector-Settings.png?w=840&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=046954ed110e13d02dd34ebfbe155fd8 840w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Source-Connector-Settings.png?w=1100&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=4f7424e0f841bea53d5fe6ebd282007b 1100w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Source-Connector-Settings.png?w=1650&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=7961b0c2f6d595a11ac56a8a7a25b471 1650w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Source-Connector-Settings.png?w=2500&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=86ae1376e171e4ed7daf3b4f0b288b93 2500w" />

14. If a green **Successful** message appears, then you have successfully created the connector.

    <img src="https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Connector-Test-Success.png?fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=c7024fb4d95f9d4a34b6ffc3d3ffa52e" alt="Successful connector test" data-og-width="1418" width="1418" data-og-height="462" height="462" data-path="img/pinecone-quickstart/Connector-Test-Success.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Connector-Test-Success.png?w=280&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=9013661890cbd2cb246536b139b60e98 280w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Connector-Test-Success.png?w=560&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=e6113a91e7af8ec4035feeb95f456cfc 560w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Connector-Test-Success.png?w=840&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=cdd12a38a75666f3d6e433562b057769 840w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Connector-Test-Success.png?w=1100&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=ec44e1763de65d58b7f9a9bc67567fc9 1100w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Connector-Test-Success.png?w=1650&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=c5a4821e83b4878090157556d7aaaa01 1650w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Connector-Test-Success.png?w=2500&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=ec4247dea373a2cba62ce8c5995e8578 2500w" />

    If, however, a red error message appears, fix the issue, and try this step again.

    If you cannot fix the issue, contact Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

Congratulations! You have successfully created a Pinecone destination connector in your Unstructured account.

If you are not able to complete these steps, contact Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

## Next steps

* If you do not have a source connector in your Unstructured account, then complete the [Dropbox source connector quickstart](/ui/sources/dropbox-source-quickstart).

  If you're not sure if you have a source connector, click **Connectors** in your Unstructured account's sidebar, and then click **Sources** to see if there are any listed.

  <img src="https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Sidebar-Connectors.png?fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=29c1925c8205f7def1b77f776601210e" alt="Connectors sidebar icon" data-og-width="77" width="77" data-og-height="216" height="216" data-path="img/pinecone-quickstart/Sidebar-Connectors.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Sidebar-Connectors.png?w=280&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=5b7a984fb1714e72b88d121e099d9f35 280w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Sidebar-Connectors.png?w=560&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=3a7f0ec96732df8359a12208d233d106 560w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Sidebar-Connectors.png?w=840&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=49570ed905dffc9daaa1e65bd248b6c8 840w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Sidebar-Connectors.png?w=1100&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=0ee5999f1e96df45a913f3c469d29b9b 1100w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Sidebar-Connectors.png?w=1650&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=653470445340a30af6d60f1c2caf4a8e 1650w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Sidebar-Connectors.png?w=2500&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=c8e87db8e7ffbe63ee4d81305b2efd82 2500w" />

* If you already have a source connector, then you can add this Pinecone destination connector as well as your source connector to a workflow in your Unstructured account. To do this:

  1. Click **Workflows** in your Unstructured account's sidebar.

     <img src="https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Sidebar-Workflows.png?fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=e63a7cbe16e089305aa286175bf8a026" alt="Workflows sidebar icon" data-og-width="75" width="75" data-og-height="279" height="279" data-path="img/pinecone-quickstart/Sidebar-Workflows.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Sidebar-Workflows.png?w=280&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=cb129e285aa2d7caf5d38a3ac7a508d5 280w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Sidebar-Workflows.png?w=560&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=fe345f06c0a230a39a9d9a4d6b63925c 560w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Sidebar-Workflows.png?w=840&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=348ab7e07dec89460da08c3b7ecac8bd 840w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Sidebar-Workflows.png?w=1100&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=f64507736dc21f683d74f3a13d59c251 1100w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Sidebar-Workflows.png?w=1650&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=2ef4bc2b4ef2e12fc48d68ef3ddca640 1650w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Sidebar-Workflows.png?w=2500&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=7bd1e439dea472e6ce646292dd1a8900 2500w" />

  2. Click **New Workflow +**.

     <img src="https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/New-Workflow.png?fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=8b68cb8e6ac99c07360fdcfb3a81d2b9" alt="Begin creating a workflow" data-og-width="2612" width="2612" data-og-height="250" height="250" data-path="img/pinecone-quickstart/New-Workflow.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/New-Workflow.png?w=280&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=14ce6961b116e796f98cf851b69c60e5 280w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/New-Workflow.png?w=560&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=b0c110164d3e3d2c5203beb8308682ca 560w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/New-Workflow.png?w=840&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=bb80f757d0086476787af36498e19811 840w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/New-Workflow.png?w=1100&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=30543b697a3bba77c94b2aac3712e193 1100w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/New-Workflow.png?w=1650&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=9e21450fabddb485b72df0ba7daf9ac0 1650w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/New-Workflow.png?w=2500&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=0735ed0559e4380580126b1a383c09e2 2500w" />

  3. With **Build it Myself** already selected, click **Continue**.

     <img src="https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Build-It-Myself.png?fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=b4876eda02c2db324f45582b174a3826" alt="Build it myself workflow type" data-og-width="1162" width="1162" data-og-height="846" height="846" data-path="img/pinecone-quickstart/Build-It-Myself.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Build-It-Myself.png?w=280&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=ba3aad68b4163007d94863b7f3b694db 280w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Build-It-Myself.png?w=560&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=1a13cb6532b6c1c2cd4ba01d04cadbe6 560w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Build-It-Myself.png?w=840&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=aba41bea007e803e552ad5e1714d2ad4 840w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Build-It-Myself.png?w=1100&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=e8e19bd2383126cac87f1d70e6d4fa90 1100w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Build-It-Myself.png?w=1650&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=c6a3d15f47de6e8f5e8dbdd461155728 1650w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Build-It-Myself.png?w=2500&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=eebbcbba0bf0e35e84b0eb3fbdeeba76 2500w" />

  4. Click the **Source** node. (Do not click **Drop file to test**.)

     <img src="https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Source-Node.png?fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=76a59eb49f4eea9f81fb55d78432b86b" alt="Workflow source node" data-og-width="237" width="237" data-og-height="182" height="182" data-path="img/pinecone-quickstart/Source-Node.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Source-Node.png?w=280&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=5e245d24f4ceb4ef138f8a12154b5d3b 280w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Source-Node.png?w=560&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=71dd5e484c15768cb7246bfa84360adc 560w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Source-Node.png?w=840&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=0b96f97f8501103075defbeacdb59b96 840w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Source-Node.png?w=1100&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=1dbdbfa916c5e5b45b883bd45c864060 1100w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Source-Node.png?w=1650&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=eb7eec315aa670fb8c52a16893d20fb1 1650w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Source-Node.png?w=2500&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=1c9b219dd9dde7658b0888a4ea0cb4d3 2500w" />

  5. On the **Details** tab, click **Connectors**, and then click the name of your source connector.

     <img src="https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Select-Source-Connector.png?fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=fbd748619cbb8b98a669e8e7757ac2d8" alt="Selecting the source connector" data-og-width="530" width="530" data-og-height="311" height="311" data-path="img/pinecone-quickstart/Select-Source-Connector.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Select-Source-Connector.png?w=280&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=686c4f83680d4b9fd84bd60036801968 280w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Select-Source-Connector.png?w=560&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=430d4e0e219fc8a95193f17ce52f36e8 560w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Select-Source-Connector.png?w=840&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=e682b465638fd4351c9edf3bb66c077b 840w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Select-Source-Connector.png?w=1100&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=ea7ea9ac70254c2800e2cee19e4d58fb 1100w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Select-Source-Connector.png?w=1650&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=4839b8497de3a7d25528ce800ddce879 1650w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Select-Source-Connector.png?w=2500&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=881793585fcb54fd086e61fde6c46988 2500w" />

  6. Click the **Destination** node.

     <img src="https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Destination-Node.png?fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=ab21f07d55b31573e8a8b6cd551074d2" alt="Workflow destination node" data-og-width="287" width="287" data-og-height="118" height="118" data-path="img/pinecone-quickstart/Destination-Node.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Destination-Node.png?w=280&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=e6cd658a5b2575ff91cbe8f0058c80d8 280w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Destination-Node.png?w=560&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=85e9aa62156d9f726516d380f5118945 560w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Destination-Node.png?w=840&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=6d1e9b07af745882a8cba741d9b7d161 840w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Destination-Node.png?w=1100&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=23722c02c7e31d8f4de1b10b7f607c32 1100w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Destination-Node.png?w=1650&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=9696793dee02b0e6abe588c39f43b515 1650w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Destination-Node.png?w=2500&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=475e9bdd5290ed2be8b6280f7d7ddeb4 2500w" />

  7. On the **Details** tab, click the name of your Pinecone destination connector.

     <img src="https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Select-Destination-Connector.png?fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=40034064aad554eb6da3f1b22d7d2e0f" alt="Selecting the destination connector" data-og-width="562" width="562" data-og-height="252" height="252" data-path="img/pinecone-quickstart/Select-Destination-Connector.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Select-Destination-Connector.png?w=280&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=1327dfedc7445cbae5a295041feb699b 280w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Select-Destination-Connector.png?w=560&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=38bf2d74bd29ac8083e686eee154f736 560w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Select-Destination-Connector.png?w=840&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=1152c59038c9fc02510e68799c427bc2 840w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Select-Destination-Connector.png?w=1100&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=5edf84b834736e6054387e22b8e4fea5 1100w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Select-Destination-Connector.png?w=1650&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=c35af12d9991aab114fb64069d332149 1650w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Select-Destination-Connector.png?w=2500&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=45349db56886c76efca4321d43b94a87 2500w" />

  8. Between the **Partitioner** and **Destination** nodes, click **+**.

     <img src="https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Add-Node.png?fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=d804d9688a24cd2328ea7af2ef744752" alt="Adding a workflow node" data-og-width="1350" width="1350" data-og-height="228" height="228" data-path="img/pinecone-quickstart/Add-Node.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Add-Node.png?w=280&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=467449a31c91286ed99c8ce90883ff2a 280w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Add-Node.png?w=560&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=3089e7b181471e9f12523e56b04ec0de 560w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Add-Node.png?w=840&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=a5f4196555da02d8c74d06c43a8465fd 840w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Add-Node.png?w=1100&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=90c9ed79c3e9171da5005bdbdbe2edcb 1100w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Add-Node.png?w=1650&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=d8874b5ac2f8582f108cc15350b1de16 1650w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Add-Node.png?w=2500&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=942ae6ee0cc005663f0535342c4a899e 2500w" />

  9. Click **Transform > Embedder**.

     <img src="https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Add-Embedder-Node.png?fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=483f7528360d11d2971378da68871592" alt="Adding an embedder node" data-og-width="1352" width="1352" data-og-height="510" height="510" data-path="img/pinecone-quickstart/Add-Embedder-Node.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Add-Embedder-Node.png?w=280&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=ef6aea7d9541cb83bdb14a53c05851e3 280w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Add-Embedder-Node.png?w=560&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=02073ce24ff8919031958e8cbbd11407 560w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Add-Embedder-Node.png?w=840&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=1564bdbbb971e3254e53b3264cbacc0f 840w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Add-Embedder-Node.png?w=1100&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=fc4a5bce9c797ac779cceb9a5a53d2da 1100w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Add-Embedder-Node.png?w=1650&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=c39b67f8fb235511e70f6af2c51859ad 1650w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Add-Embedder-Node.png?w=2500&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=1e83f9d180a2108c9a8a896734b019c5 2500w" />

  10. On the **Details** tab, under **Azure OpenAI**, click the radio button labelled **text-embedding-3-small \[dim 1536]**.

      <img src="https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Select-Embedding-Model.png?fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=524aad67d5eb1db7f252ffa6aabf10a3" alt="Selecting the embedding model" data-og-width="1008" width="1008" data-og-height="740" height="740" data-path="img/pinecone-quickstart/Select-Embedding-Model.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Select-Embedding-Model.png?w=280&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=ef7b7ffc40a7e9fa147acb9e0bcf3d4b 280w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Select-Embedding-Model.png?w=560&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=474574dccb86bcce2b084ef657c7e43a 560w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Select-Embedding-Model.png?w=840&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=35dff0b305717e849352493d4bdddb8b 840w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Select-Embedding-Model.png?w=1100&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=80a386cee6099a5ea984e15bc9e63b8b 1100w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Select-Embedding-Model.png?w=1650&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=a4acb4bd7c3b4c9f2dcf3f97a7a8b1ba 1650w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Select-Embedding-Model.png?w=2500&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=0a7541bf9b4ac7fc2228d6990802f3cf 2500w" />

  11. Switch **Active** to on, and then click **Save**.

      <img src="https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Make-Workflow-Active.png?fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=8bc4c3fc5d17011e995235acc9de8496" alt="Making the workflow active" data-og-width="192" width="192" data-og-height="45" height="45" data-path="img/pinecone-quickstart/Make-Workflow-Active.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Make-Workflow-Active.png?w=280&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=7f4eab69d17de4a3b1997536fadb7dd8 280w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Make-Workflow-Active.png?w=560&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=a30a3570beb4176b57aaaa1b00e2da84 560w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Make-Workflow-Active.png?w=840&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=f852c9aac9bfd84680667c0eba11ba12 840w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Make-Workflow-Active.png?w=1100&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=472c159db7ce9e99ecb9a52e16c40b1c 1100w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Make-Workflow-Active.png?w=1650&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=2df9ca39138f43e7d3a2f169c4f5d4bc 1650w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Make-Workflow-Active.png?w=2500&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=8823d4fb2d771016efca44ff3aadcafb 2500w" />

  12. Next to your workflow's name, click **Run**.

      <img src="https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Run-Workflow.png?fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=85ffe1343d5932b90cb9c87616a65491" alt="Running the workflow" data-og-width="1457" width="1457" data-og-height="176" height="176" data-path="img/pinecone-quickstart/Run-Workflow.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Run-Workflow.png?w=280&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=68d620ccaecc839d01f9449e07b70c2d 280w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Run-Workflow.png?w=560&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=231db45c921068ba434a16c08bec816b 560w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Run-Workflow.png?w=840&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=93a13e185cf1a95f473f0335c6a102ba 840w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Run-Workflow.png?w=1100&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=69eed8e4adbc75f26ed23b67168069ca 1100w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Run-Workflow.png?w=1650&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=9b310a09effbab66d64f841eca301342 1650w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Run-Workflow.png?w=2500&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=592e301c6de6d4270983367daabf7537 2500w" />

  13. Click **Jobs** in your Unstructured account's sidebar.

      <img src="https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Sidebar-Jobs.png?fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=b7c0a6b86153af71d3a5330c123aedda" alt="Jobs sidebar icon" data-og-width="78" width="78" data-og-height="364" height="364" data-path="img/pinecone-quickstart/Sidebar-Jobs.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Sidebar-Jobs.png?w=280&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=74aa2ebb38e1e926c2fa154e9134d549 280w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Sidebar-Jobs.png?w=560&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=cfc7f2a09b704794b965b077dfbe2c67 560w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Sidebar-Jobs.png?w=840&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=04a8b782441b9552b4d843fd398aa813 840w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Sidebar-Jobs.png?w=1100&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=5bd5ad30f73416df30d7a80bc4d2e75a 1100w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Sidebar-Jobs.png?w=1650&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=0a02f492021fa8d64ed9647511eee9df 1650w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Sidebar-Jobs.png?w=2500&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=59908c1603e5e6af7087764bce584a55 2500w" />

  14. Wait until the job shows **Finished** with a green checkmark.

      <img src="https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Finished-Job.png?fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=2a47623b82a40e6cfdda4a000642e6b5" alt="Finished job status" data-og-width="786" width="786" data-og-height="382" height="382" data-path="img/pinecone-quickstart/Finished-Job.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Finished-Job.png?w=280&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=87c71920c42c158849aa57b99272d85f 280w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Finished-Job.png?w=560&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=884362e6e51aee98f85fdfcafe0e2bf1 560w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Finished-Job.png?w=840&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=a9b97cf28738d59740fa5173b4a325b3 840w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Finished-Job.png?w=1100&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=2723ce1aed04aeda8e76dcda6491b577 1100w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Finished-Job.png?w=1650&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=66b830865a856006d2e06039f1a089d1 1650w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Finished-Job.png?w=2500&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=57f9cf3e51a1901ea1ebd3f343e81c99 2500w" />

  15. Log in to your Pinecone account, if you are not already logged in. To do this, go to [https://pinecone.io](https://pinecone.io), click
      **Log in**, and follow the on-screen instructions to finish logging in.

  16. On the sidebar, under **Database**, click **Indexes**.

      <img src="https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Pinecone-Indexes.png?fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=9ba25b295cc99d52f8cc0ee608cd416f" alt="View Pinecone indexes" data-og-width="562" width="562" data-og-height="772" height="772" data-path="img/pinecone-quickstart/Pinecone-Indexes.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Pinecone-Indexes.png?w=280&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=838e97c369b0c6b58f7a5b403e71853f 280w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Pinecone-Indexes.png?w=560&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=ef17a14db84c94e58ca259650e0cc755 560w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Pinecone-Indexes.png?w=840&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=9e36bcd22197ccb5bf3a61fdb61bceee 840w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Pinecone-Indexes.png?w=1100&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=2c82019186eb54532ec0e207a8d0fd65 1100w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Pinecone-Indexes.png?w=1650&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=33d59a5bf674aeb3f79326de447b2e02 1650w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Pinecone-Indexes.png?w=2500&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=8f1c761a241461079f948558c8227b77 2500w" />

  17. Click the name of the index that Unstructured created. The name should look like this: `u<random-number>-textembedding3-small-1536`.

      <img src="https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Pinecone-Index.png?fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=fa2fd745e7c05e5d8ebd650e295fa5f6" alt="Opening a Pinecone index" data-og-width="1534" width="1534" data-og-height="668" height="668" data-path="img/pinecone-quickstart/Pinecone-Index.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Pinecone-Index.png?w=280&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=ef429c2bcde8734bf5ccdf22cac51b56 280w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Pinecone-Index.png?w=560&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=9af9b9708f3035d691054b9f823ea17b 560w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Pinecone-Index.png?w=840&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=0136c51a7309215f52d916fdd6aa4cc6 840w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Pinecone-Index.png?w=1100&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=38abdbbfd5b6dd98a747534b32e39700 1100w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Pinecone-Index.png?w=1650&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=0edccd8902f5977492b93cea20a7d37a 1650w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Pinecone-Index.png?w=2500&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=9bcff049b44f0023f68b3b6f51ce476e 2500w" />

  18. On the **Browser** tab, click **List/Fetch**.

  19. For **Namespace**. select **default**.

  20. For **Operation**, select **List IDs**.

  21. For **Limit**, enter \`10.

  22. Click **List**.

      <img src="https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/List-Index-Items.png?fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=cccbeb01eadfed493fcce8faea748e15" alt="Listing Pinecone index items" data-og-width="2804" width="2804" data-og-height="1368" height="1368" data-path="img/pinecone-quickstart/List-Index-Items.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/List-Index-Items.png?w=280&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=06bbd34ebf890a744db4cc03e9bdfa43 280w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/List-Index-Items.png?w=560&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=d2bf8e645a96e4a9e11be66549d6da37 560w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/List-Index-Items.png?w=840&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=debea7587c8b4150772adc968934ca36 840w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/List-Index-Items.png?w=1100&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=ba67e842936b2120ad4c6c7f6c016d48 1100w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/List-Index-Items.png?w=1650&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=5ff2dada2e3df154d68ded3ac0de42a2 1650w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/List-Index-Items.png?w=2500&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=45fd416dc20ee34154652bb5a6a4d13d 2500w" />

  23. To look at the data and metadata for any of the records that are in the list, click the ellipsis (three dots) icon, and then click **Fetch**.

      <img src="https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/View-Item-Data.png?fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=ef8ccdcca33bdd38da98db5a9ecf92c5" alt="Viewing Pinecone index item data" data-og-width="2792" width="2792" data-og-height="658" height="658" data-path="img/pinecone-quickstart/View-Item-Data.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/View-Item-Data.png?w=280&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=b07c76b33873496dc371a6c3eff3ae1f 280w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/View-Item-Data.png?w=560&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=85c7d701b2771f362d3d855eecb01b26 560w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/View-Item-Data.png?w=840&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=f48631ac5acbf5b14800d17dded46608 840w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/View-Item-Data.png?w=1100&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=92c66e79b10cfb42a9d847a58a5ad2da 1100w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/View-Item-Data.png?w=1650&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=d19c10299e5f2caae3826dc7ee881f39 1650w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/View-Item-Data.png?w=2500&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=a868f5eb91ad6cbae6eac29d6f9014c6 2500w" />

  24. When you are done looking at the record, click **Close**.

      <img src="https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Close-Item.png?fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=ade1e0f8ce4c31e738e42b9aa0e30dd4" alt="Closing a Pinecone index item" data-og-width="1360" width="1360" data-og-height="944" height="944" data-path="img/pinecone-quickstart/Close-Item.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Close-Item.png?w=280&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=d8cdccff87369664085e0434a0e34f93 280w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Close-Item.png?w=560&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=45241c4ba3b59f915b044ea935b48fb7 560w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Close-Item.png?w=840&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=24333209c43071cf642552b022864d00 840w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Close-Item.png?w=1100&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=3566afe6b94d8f918c29ffbf6a48adde 1100w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Close-Item.png?w=1650&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=0045a5f68e21a21570e8362624772e54 1650w, https://mintcdn.com/unstructured-53/lzJ4hi3NwruEhqBQ/img/pinecone-quickstart/Close-Item.png?w=2500&fit=max&auto=format&n=lzJ4hi3NwruEhqBQ&q=85&s=03ece79c25c90b2d4b652722bcf795cc 2500w" />

  25. To learn how to use this Pinecone index as input to your RAG applications and agentic AI workflows, see resources such as the following:

      * [Semantic search](https://docs.pinecone.io/guides/search/semantic-search) in the Pinecone documentation
      * [Filter by metadata](https://docs.pinecone.io/guides/search/filter-by-metadata) in the Pinecone documentation
      * [Rerank results](https://docs.pinecone.io/guides/search/rerank-results) in the Pinecone documentation
        \= [Build a RAG chatbot](https://docs.pinecone.io/guides/get-started/build-a-rag-chatbot) in the Pinecone documentation
      * [Pinecone integration with LangChain](https://python.langchain.com/docs/integrations/vectorstores/pinecone/) on the LangChain website

If you are not able to complete these steps, contact Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

---

# Source: https://docs.unstructured.io/ui/destinations/pinecone.md

# Source: https://docs.unstructured.io/open-source/ingestion/destination-connectors/pinecone.md

# Source: https://docs.unstructured.io/api-reference/workflow/destinations/pinecone.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Pinecone

<Note>
  If you're new to Unstructured, read this note first.

  Before you can create a destination connector, you must first sign in to your Unstructured account:

  * If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
    After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).
    To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/api-reference/overview#pricing).
  * If you already have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at
    [https://platform.unstructured.io](https://platform.unstructured.io). For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
    or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  After you sign in, the [Unstructured user interface](/ui/overview) (UI) appears, which you use to get your Unstructured API key.

  1. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business** account, click **API Keys** on the sidebar.<br />

     For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
     for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).

  2. Click **Generate API Key**.<br />

  3. Follow the on-screen instructions to finish generating the key.<br />

  4. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

  After you create the destination connector, add it along with a
  [source connector](/api-reference/workflow/sources/overview) to a [workflow](/api-reference/workflow/overview#workflows).
  Then run the worklow as a [job](/api-reference/workflow/overview#jobs). To learn how, try out the
  the notebook [Dropbox-To-Pinecone Connector API Quickstart for Unstructured](https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Dropbox_To_Pinecone_Connector_Quickstart.ipynb),
  or watch the two 4-minute video tutorials for the [Unstructured Python SDK](/api-reference/workflow/overview#unstructured-python-sdk).

  You can also create destination connectors with the Unstructured user interface (UI).
  [Learn how](/ui/destinations/overview).

  If you need help, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  You are now ready to start creating a destination connector! Keep reading to learn how.
</Note>

Send processed data from Unstructured to Pinecone.

The requirements are as follows.

<iframe width="560" height="315" src="https://www.youtube.com/embed/Ydj74uBnJyA" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* A Pinecone account. [Get an account](https://app.pinecone.io/).

* A Pinecone API key in your Pinecone account. To create an API key, do the following:

  1. [Sign in to your Pinecone account and open the API keys page](https://app.pinecone.io/organizations/-/projects/-/keys).
  2. Click **+ API key**.
  3. For **API key name**, enter some descriptive name for the API key.
  4. Click **Create key**.
  5. Copy the generated API key to some secure location. You will not be able to access this API key again after you close the dialog.

* A Pinecone serverless index in your Pinecone account.

  Creating a serverless index on your own is optional.
  An index is not required to exist in advance.

  When you set up the connector, at runtime, the index behavior is as follows:

  For the [Unstructured UI](/ui/overview) and [Unstructured API](/api-reference/overview):

  * Your workflow must contain an embedder node, and the embedder node must specify the embedding model that Unstructured will use to generate the embeedings.
  * If an existing index name is specified,
    and the number of dimensions that Unstructured generates does not match the number of dimensions that are specified in the existing index's embedding settings, the run will fail.
    You must change the number of dimensions in your workflow's embedder node or your existing index's embedding settings to match, and try the run again.
  * If an index name is not specified, Unstructured creates a new index in your Pinecone account. The
    new index's name will be `u<short-workflow-id>-<short-embedding-model-name>-<number-of-dimensions>`.

  For [Unstructured Ingest](/open-source/ingestion/overview):

  * If an existing index name is specified, and Unstructured generates embeddings,
    but the number of dimensions that are generated does not match the existing index's embedding settings, the run will fail.
    You must change your Unstructured embedding settings or your existing index's embedding settings to match, and try the run again.
  * If an index name is not specified, Unstructured creates a new index in your Pinecone account. The new index's name will be `unstructuredautocreated`.

  <Note>
    If you create a new index or use an existing one, Unstructured recommends that all records in the target index have a field
    named `record_id` with a string data type.
    Unstructured can use this field to do intelligent document overwrites. Without this field, duplicate documents
    might be written to the index or, in some cases, the operation could fail altogether.
  </Note>

  To create a serverless index on your own, do the following:

  1. [Sign in to your Pinecone account and open the Create a new index page](https://app.pinecone.io/organizations/-/projects/-/create-index/serverless).

  2. For **Enter index name**, enter some descriptive name for the index.

  3. For **Configuration**, select the check box labelled **Custom settings**, or click the tile labelled **Manual configuration**.

     <Warning>
       Do not click any of the other tiles, such as **text-embedding-3-large**. Clicking any of these other tiles will cause Pinecone to generate embeddings instead of
       having Unstructured generate them. If Pinecone generates embeddings instead of Unstructured, this could cause any related Unstructured workflows to fail.
     </Warning>

  4. For **Vector type**, select **Dense**.

  5. For **Dimension**, enter the number of dimensions for the embeddings that Unstructured will generate.

     <Warning>
       The number of dimensions that you enter here must match the number of dimensions for the embedding model that you use in any related Unstructured workflows or ingestion pipelines. If these numbers do not
       match in both places, this could cause any related Unstructured workflows or ingestion pipelines to fail.
     </Warning>

  6. For **Metric**, select **cosine**.

  7. Leave **Capacity mode** set to **Serverless**.

  8. You can leave **Cloud provider** and **Region** set to their default values, or you can select a cloud provider and region that is closest to you, if available.

  9. Click **Create index**.

* Within a Pinecone serverless index, custom [namespaces](https://docs.pinecone.io/guides/index-data/indexing-overview#namespaces) are supported but are not required.

To create a Pinecone destination connector, see the following examples.

<CodeGroup>
  ```python Python SDK theme={null}
  import os

  from unstructured_client import UnstructuredClient
  from unstructured_client.models.operations import CreateDestinationRequest
  from unstructured_client.models.shared import CreateDestinationConnector

  with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
      response = client.destinations.create_destination(
          request=CreateDestinationRequest(
              create_destination_connector=CreateDestinationConnector(
                  name="<name>",
                  type="pinecone",
                  config={
                      "index_name": "<index-name>",
                      "namespace": "<namespace-name>",
                      "api_key": "<api-key>",
                      "batch_size": <batch-size>
                  }
              )
          )
      )

      print(response.destination_connector_information)
  ```

  ```bash curl theme={null}
  curl --request 'POST' --location \
  "$UNSTRUCTURED_API_URL/destinations" \
  --header 'accept: application/json' \
  --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
  --header 'content-type: application/json' \
  --data \
  '{
      "name": "<name>",
      "type": "pinecone",
      "config": {
          "index_name" "<index-name>", 
          "namespace": "<namespace-name>",
          "api_key": "<api-key>",
          "batch_size" <batch-size>
      }
  }'
  ```
</CodeGroup>

Replace the preceding placeholders as follows:

* `<name>` (required) - A unique name for this connector.
* `<index-name>` - The name of the index in the Pinecone database. If no value is provided, see the beginning of this article for the behavior at run time.
* `<namespace-name>` - The name of any custom namespace in the Pinecone index to upsert data into. If no value is provided, the default namespace of `default` will be used.
* `<api-key>` (required) - The Pinecone API key.
* `<batch-size>` - The maximum number of records to transmit in a single batch. The default is `50` unless otherwise specified.

---

# Source: https://docs.unstructured.io/api-reference/legacy-api/partition/post-requests.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Process an individual file by making a direct POST request

<Note>
  The following information applies to the legacy Unstructured Partition Endpoint.

  Unstructured recommends that you use the
  [on-demand jobs](/api-reference/workflow/overview#run-an-on-demand-job) functionality in the
  [Unstructured API](/api-reference/overview) instead. Unstructured's on-demand jobs provide
  many benefits over the legacy Unstructured Partition Endpoint, including support for:

  * Production-level usage.
  * Multiple local input files in batches.
  * The latest and highest-performing models.
  * Post-transform enrichments.
  * All of Unstructured's chunking strategies.
  * The generation of vector embeddings.

  The Unstructured API also provides support for processing files and data in remote locations.
</Note>

Watch the following 4-minute video to learn how to make POST requests to the Unstructured Partition Endpoint to process individual files:

<iframe width="560" height="315" src="https://www.youtube.com/embed/fU080EahKwc" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

Open the related [notebook](https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Unstructured_API_Partition_endpoint.ipynb) that is shown in the preceding video.

To make POST requests to the Unstructured Partition Endpoint, you will need:

These environment variables:

* `UNSTRUCTURED_API_KEY` - Your Unstructured API key value.
* `UNSTRUCTURED_API_URL` - Your Unstructured API URL.

To get your API key, do the following:

1. If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
   After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).

   <Note>
     To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/api-reference/overview#pricing).
   </Note>

2. If you have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at [https://platform.unstructured.io](https://platform.unstructured.io).

   <Note>
     For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
     or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).
   </Note>

3. Get your Unstructured API key:<br />

   a. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business** account, click **API Keys** on the sidebar.<br />

   <Note>
     For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
     for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).
   </Note>

   b. Click **Generate API Key**.<br />
   c. Follow the on-screen instructions to finish generating the key.<br />
   d. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

The API URL was provided to you when your Unstructured account was created.
If you do not have this URL, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

<Note>
  The default URL for the Unstructured Partition Endpoint is `https://api.unstructuredapp.io/general/v0/general`.
  However, you should always use the API URL that was provided to you when your Unstructured account was created.
</Note>

Let's start with a simple example in which you use [curl](https://curl.se/) to send a local PDF file (`*.pdf`) to partition via the Unstructured Partition Endpoint.

In this command, be sure to replace `<path/to/file>` with the path to your local PDF file.

```bash  theme={null}
curl --request 'POST' \
"$UNSTRUCTURED_API_URL" \
--header 'accept: application/json' \
--header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
--header 'content-Type: multipart/form-data' \
--form 'content_type=string' \
--form 'strategy=vlm' \
--form 'vlm_model_provider=openai' \
--form 'vlm_model=gpt-4o' \
--form 'output_format=application/json' \
--form 'files=@<path/to/file>;type=application/pdf'
```

In the example above we're representing the API endpoint with the environment variable `UNSTRUCTURED_API_URL`. Note, however, that you also need to authenticate yourself with
your individual API Key, represented by the environment variable `UNSTRUCTURED_API_KEY`. Learn how to obtain an API URL and API key in the [Unstructured Partition Endpoint guide](/api-reference/legacy-api/partition/overview).

## Parameters & examples

The API parameters are the same across all methods of accessing the Unstructured Partition Endpoint.

* Refer to the [API parameters](/api-reference/legacy-api/partition/api-parameters) page for the full list of available parameters.
* Refer to the [Examples](/api-reference/legacy-api/partition/examples) page for some inspiration on using the parameters.

[//]: # "TODO: when we have the concepts page shared across products, link it from here for the users to learn about partition strategies, chunking strategies and other important shared concepts"

## Postman collection

Unstructured offers a [Postman collection](https://learning.postman.com/docs/collections/collections-overview/) that you can import into Postman to make POST requests through a graphical user interface.

1. [Install Postman](https://learning.postman.com/docs/getting-started/installation/installation-and-updates/).

2. [Sign in to Postman](https://learning.postman.com/docs/getting-started/installation/postman-account/#signing-in-to-postman).

3. In your workspace, click **Import**.

   <img src="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/post/import.png?fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=75b5e5c9bedfc173f80852e45bc2086a" alt="Import a Postman collection" data-og-width="374" width="374" data-og-height="95" height="95" data-path="img/api/post/import.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/post/import.png?w=280&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=04bb0bcbdfe8e173423a8694cb2fd9de 280w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/post/import.png?w=560&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=712a18b8d6843fc0a57961facd927478 560w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/post/import.png?w=840&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=03191eebb7f94b687e91053793a3a789 840w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/post/import.png?w=1100&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=9dee61ef12f6045a745eb1b9c85104b5 1100w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/post/import.png?w=1650&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=dd0124c64017d4e2363fb730e6788d4e 1650w, https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/api/post/import.png?w=2500&fit=max&auto=format&n=vKFDfUfAWhz_siB3&q=85&s=fe3c5fd1741ff86d7248dc66c96145a7 2500w" />

4. In the **Paste cURL, Raw text or URL** box, enter the following URL, and then press `Enter`:

   ```
   https://raw.githubusercontent.com/Unstructured-IO/docs/main/examplecode/codesamples/api/Unstructured-POST.postman_collection.json
   ```

5. On the sidebar, click **Collections**.

6. Expand **Unstructured POST**.

7. Click **(Partition Endpoint) Basic Request**.

8. On the **Headers** tab, next to `unstructured-api-key`, enter your Unstructured API key in the **Value** column.

9. On the **Body** tab, next to `files`, click the **Select files** box in the **Value** column.

10. Click **New file from local machine**.

11. Browse to and select the file that you want Unstructured to process.

12. Click **Send**. Processing could take several minutes.

To download the processed data to your local machine, in the response area, click the ellipses, and then click **Save response to file**.

---

# Source: https://docs.unstructured.io/ui/sources/postgresql.md

# Source: https://docs.unstructured.io/ui/destinations/postgresql.md

# Source: https://docs.unstructured.io/open-source/ingestion/source-connectors/postgresql.md

# Source: https://docs.unstructured.io/open-source/ingestion/destination-connectors/postgresql.md

# Source: https://docs.unstructured.io/api-reference/workflow/sources/postgresql.md

# Source: https://docs.unstructured.io/api-reference/workflow/destinations/postgresql.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# PostgreSQL

<Note>
  If you're new to Unstructured, read this note first.

  Before you can create a destination connector, you must first sign in to your Unstructured account:

  * If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
    After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).
    To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/api-reference/overview#pricing).
  * If you already have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at
    [https://platform.unstructured.io](https://platform.unstructured.io). For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
    or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  After you sign in, the [Unstructured user interface](/ui/overview) (UI) appears, which you use to get your Unstructured API key.

  1. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business** account, click **API Keys** on the sidebar.<br />

     For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
     for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).

  2. Click **Generate API Key**.<br />

  3. Follow the on-screen instructions to finish generating the key.<br />

  4. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

  After you create the destination connector, add it along with a
  [source connector](/api-reference/workflow/sources/overview) to a [workflow](/api-reference/workflow/overview#workflows).
  Then run the worklow as a [job](/api-reference/workflow/overview#jobs). To learn how, try out the
  the notebook [Dropbox-To-Pinecone Connector API Quickstart for Unstructured](https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Dropbox_To_Pinecone_Connector_Quickstart.ipynb),
  or watch the two 4-minute video tutorials for the [Unstructured Python SDK](/api-reference/workflow/overview#unstructured-python-sdk).

  You can also create destination connectors with the Unstructured user interface (UI).
  [Learn how](/ui/destinations/overview).

  If you need help, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  You are now ready to start creating a destination connector! Keep reading to learn how.
</Note>

Send processed data from Unstructured to PostgreSQL.

The requirements are as follows.

* For the [Unstructured UI](/ui/overview) or the [Unstructured API](/api-reference/overview), local PostgreSQL installations are not supported.
* For [Unstructured Ingest](/open-source/ingestion/overview), local and non-local PostgreSQL installations are supported.

The following video shows how to set up [Amazon RDS for PostgreSQL](https://aws.amazon.com/rds/postgresql/):

<iframe width="560" height="315" src="https://www.youtube.com/embed/QuIlEimejDs" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

The following video shows how to set up [Azure Database for PostgreSQL](https://azure.microsoft.com/products/postgresql):

<iframe width="560" height="315" src="https://www.youtube.com/embed/6lvtBUFI7eQ" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* A PostgreSQL instance.

  * [Create an Amazon RDS for PostgreSQL instance](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_GettingStarted.CreatingConnecting.PostgreSQL.html).
  * [Create an Azure Database for PostgreSQL server](https://learn.microsoft.com/en-us/azure/postgresql/flexible-server/how-to-deploy-on-azure-free-account).
  * [Install PostgreSQL locally](https://www.postgresql.org/docs/current/tutorial-install.html).

* The host name and port number for the instance.

  * For Amazon RDS for PostgreSQL, learn how to [get the host name and port number](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_ConnectToPostgreSQLInstance.html#postgresql-endpoint).
  * For Azure Database for PostgreSQL, learn how to [get the host](https://learn.microsoft.com/azure/postgresql/flexible-server/quickstart-create-server#get-the-connection-information). The port number is `5432`.
  * For local PostgreSQL installations, these values are in the `postgresql.conf` file's `listen_addresses` and `port` settings. This file should be on the same machine as the instance. These values might also already be set as environment variables named `PGHOST` and `PGPORT` on the same machine as the instance.
  * For other installation types, see your PostgreSQL provider's documentation.

* Depending on your network security requirements, you might need to allow access to your instance only from specific IP addresses.

  To get Unstructured's IP address ranges, go to
  [https://assets.p6m.u10d.net/publicitems/ip-prefixes.json](https://assets.p6m.u10d.net/publicitems/ip-prefixes.json)
  and allow all of the `ip_prefix` fields' values that are listed.

  <Note>These IP address ranges are subject to change. You can always find the latest ones in the preceding file.</Note>

  To learn how to allow these IP address ranges, see your PostgreSQL provider's documentation, for example with
  [Amazon RDS for PostgreSQL](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Overview.RDSSecurityGroups.html) or
  [Azure Database for PostgreSQL](https://learn.microsoft.com/azure/postgresql/flexible-server/how-to-manage-firewall-portal#create-a-firewall-rule-after-server-is-created).

  <Note>
    For Amazon RDS for PostgreSQL, Amazon recommends that you set the instance's **Public access** setting to **No** by default, as this
    approach is more secure. This means that no
    resources can connect to the instance outside of the instance's associated Virtual Private Cloud (VPC) without extra configuration.
    [Learn more](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_VPC.WorkingWithRDSInstanceinaVPC.html#USER_VPC.Hiding).
    [Access an Amazon RDS instance in a VPC](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_VPC.Scenarios.html).

    If you must enable public access, set the instance's **Public access** setting to **Yes**, and then adjust the instance's related
    security group to allow this access.
    [Learn how](https://repost.aws/en/questions/QUxemKa9u5TV6CmLiO-r5prg/lost-public-access-to-aws-rds-postgresql-instance).

    [Troubleshoot issues with connecting to Amazon RDS instances](https://repost.aws/knowledge-center/rds-connectivity-instance-subnet-vpc).
  </Note>

* A database in the instance.

  * For Amazon RDS for PostgreSQL and Azure Database for PostgreSQL, the default database name is `postgres` unless a custom database name was specified during the instance creation process.
  * For local PostgreSQL installations, learn how to [create a database](https://www.postgresql.org/docs/current/tutorial-createdb.html).
  * For other installation types, see your PostgreSQL provider's documentation.

* A table in the database. Learn how to [create a table](https://www.postgresql.org/docs/current/tutorial-table.html).

  For the destination connector, the table must have a defined schema before Unstructured can write to the table. The minimum viable
  schema for Unstructured contains only the fields `id`, `element_id`, `record_id`, `text`, (and `embeddings`, if you are using `pgvector` and generating vector embeddings), as follows.
  `type` is an optional field, but highly recommended.

  If you are using `pgvector` and generating vector embeddings, the number of dimensions (in this example, `1536`) must match the number of dimensions for the associated embedding model that you use in any related Unstructured workflows or ingestion pipelines.

  <CodeGroup>
    ```sql PostgreSQL theme={null}
    CREATE TABLE elements (
        id UUID PRIMARY KEY,
        element_id TEXT,
        record_id TEXT,
        text TEXT,
        type TEXT
    );
    ```

    ```sql PostgreSQL with pgvector theme={null}
    CREATE EXTENSION vector;

    CREATE TABLE elements (
        id UUID PRIMARY KEY,
        element_id TEXT,
        record_id TEXT,
        text TEXT,
        type TEXT,
        embeddings vector(1536)
    );
    ```
  </CodeGroup>

  For objects in the `metadata` field that Unstructured produces and that you want to store in PostgreSQL, you must create fields in your table's schema that
  follows Unstructured's `metadata` field naming convention. For example, if Unstructured produces a `metadata` field with the following
  child objects:

  ```json  theme={null}
  "metadata": {
    "is_extracted": "true",
    "coordinates": {
      "points": [
        [
          134.20055555555555,
          241.36027777777795
        ],
        [
          134.20055555555555,
          420.0269444444447
        ],
        [
          529.7005555555555,
          420.0269444444447
        ],
        [
          529.7005555555555,
          241.36027777777795
        ]
      ],
      "system": "PixelSpace",
      "layout_width": 1654,
      "layout_height": 2339
    },
    "filetype": "application/pdf",
    "languages": [
      "eng"
    ],
    "page_number": 1,
    "image_mime_type": "image/jpeg",
    "filename": "realestate.pdf",
    "data_source": {
      "url": "file:///home/etl/node/downloads/00000000-0000-0000-0000-000000000001/7458635f-realestate.pdf",
      "record_locator": {
        "protocol": "file",
        "remote_file_path": "file:///home/etl/node/downloads/00000000-0000-0000-0000-000000000001/7458635f-realestate.pdf"
      }
    }
  }
  ```

  You could create corresponding fields in your table's schema by using the following field names and data types:

  <CodeGroup>
    ```sql PostgreSQL theme={null}
    -- The fields "id", "element_id", "record_id", and "text" are required.
    -- "type" is an optional field, but highly recommended.
    -- All other "metadata" fields are optional.
    CREATE TABLE elements (
        id UUID PRIMARY KEY,
        element_id TEXT,
        record_id TEXT,
        text TEXT,
        type TEXT,
        is_extracted TEXT,
        points JSONB,
        system TEXT,
        layout_width INTEGER,
        layout_height INTEGER,
        filetype TEXT,
        languages TEXT[],
        page_number TEXT,
        image_mime_type TEXT,
        url TEXT,
        record_locator JSONB
    );
    ```

    ```sql PostgreSQL with pgvector  theme={null}
    -- The fields "id", "element_id", "record_id", and "text" are required.
    -- "embeddings" is required if you are generating vector embeddings.
    --   If you are generating embeddings, the number of dimensions in "embeddings" 
    --   must match the number of dimensions for the associated embedding model 
    --   that you use in any related Unstructured workflows or ingestion pipelines.
    -- "type" is an optional field, but highly recommended.
    -- All other "metadata" fields are optional.
    CREATE EXTENSION vector;

    CREATE TABLE elements (
        id UUID PRIMARY KEY,
        element_id TEXT,
        record_id TEXT,
        text TEXT,
        type TEXT,
        embeddings vector(1536),
        is_extracted TEXT,
        points JSONB,
        system TEXT,
        layout_width INTEGER,
        layout_height INTEGER,
        filetype TEXT,
        languages TEXT[],
        page_number TEXT,
        image_mime_type TEXT,
        url TEXT,
        record_locator JSONB
    );
    ```
  </CodeGroup>

  Unstructured cannot provide a schema that is guaranteed to work in all
  circumstances. This is because these schemas will vary based on your source files' types; how you
  want Unstructured to partition, chunk, and generate embeddings; any custom post-processing code that you run; and other factors.

  See also:

  * [CREATE TABLE](https://www.postgresql.org/docs/current/sql-createtable.html) for PostgreSQL
  * [CREATE TABLE](https://github.com/pgvector/pgvector) for PostrgreSQL with pgvector
  * [Unstructured document elements and metadata](/api-reference/legacy-api/partition/document-elements)

  The following video shows how to use the `psql` utility to connect to PostgreSQL, list databases, and list and create tables:

  <iframe width="560" height="315" src="https://www.youtube.com/embed/IKo-4QHdNF4" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* A user in the database, and a password for the user.

  * For Amazon RDS for PostgreSQL, learn how to [create a user](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Appendix.PostgreSQL.CommonDBATasks.Roles.html).
  * For Azure Database for PostgreSQL, learn how to [create a user](https://learn.microsoft.com/azure/postgresql/flexible-server/how-to-create-users).
  * For local PostgreSQL installations, learn how to [create a user](https://www.postgresql.org/docs/current/sql-createuser.html).
  * For other installation types, see your PostgreSQL provider's documentation.

* Database access for the user.

  * For Amazon RDS for PostgreSQL, learn how to [control user access](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Appendix.PostgreSQL.CommonDBATasks.Access.html).
  * For Azure Database for PostgreSQL, learn how to [control user access](https://www.postgresql.org/docs/current/sql-createuser.html).
  * For local PostgreSQL installations, learn how to [give database access to a user](https://www.postgresql.org/docs/current/sql-grant.html).
  * For other installation types, see your PostgreSQL provider's documentation.

To create a PostgreSQL destination connector, see the following examples.

<CodeGroup>
  ```python Python SDK theme={null}
  import os

  from unstructured_client import UnstructuredClient
  from unstructured_client.models.operations import CreateDestinationRequest
  from unstructured_client.models.shared import CreateDestinationConnector

  with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
      response = client.destinations.create_destination(
          request=CreateDestinationRequest(
              create_destination_connector=CreateDestinationConnector(
                  name="<name>",
                  type="postgres",
                  config={
                      "host": "<host>",
                      "database": "<database>",
                      "port": <port>
                      "username": "<username>",
                      "password": "<password>",
                      "table_name": "<table_name>",
                      "batch_size": <batch-size>
                  }
              )
          )
      )

      print(response.destination_connector_information)
  ```

  ```bash curl theme={null}
  curl --request 'POST' --location \
  "$UNSTRUCTURED_API_URL/destinations" \
  --header 'accept: application/json' \
  --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
  --header 'content-type: application/json' \
  --data \
  '{
      "name": "<name>",
      "type": "postgres",
      "config": {
          "host": "<host>",
          "database": "<database>",
          "port": "<port>",
          "username": "<username>",
          "password": "<password>",
          "table_name": "<table_name>",
          "batch_size": <batch-size>
      }
  }'
  ```
</CodeGroup>

Replace the preceding placeholders as follows:

* `<name>` (required) - A unique name for this connector.
* `<host>` (required) - The host name.
* `<database>` (required) - The name of the database.
* `<port>` (required) - The port number.
* `<username>` (required) - The username.
* `<password>` (required) - The user's password.
* `<table_name>` (required) - The name of the table in the database.
* `<batch-size>` - The maximum number of rows to transmit at a time. The default is `100` if not otherwise specified.
* `<id-column>` (required, source connector only) - The name of the ID column in the table.
* For `fields` (source connector only), set one or more `<field>` values, with each value representing the name of a column to process (including the specified `<id-column>` column). The default is all columns if not otherwise specified.

## Learn more

* <Icon icon="blog" />  [PostgreSQL Integration in the Unstructured Platform](https://unstructured.io/blog/postgresql-integration-in-the-unstructured-platform)

---

# Source: https://docs.unstructured.io/open-source/ingestion/ingest-configuration/processor-configuration.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Processor configuration

A common process configuration plays a pivotal role in overseeing the entire ingest process, encompassing various aspects to ensure efficiency and reliability. This configuration includes parameters for managing a pool of workers, which allows for parallelization of the ingest process to maximize throughput. Additionally, it addresses the storage and retrieval of intermediate results, supporting caching strategies that enable replayability in case of errors or interruptions. By configuring worker pools, the process can efficiently handle multiple tasks concurrently, enhancing performance. Furthermore, saving intermediate results allows for the resumption of the process from a known state, reducing data loss and ensuring robustness in the face of unexpected errors or system failures. This comprehensive configuration promotes an organized and resilient approach to data ingestion, improving overall system reliability and efficiency.

## Configs

* `disable_parallelism`: `True` if the `INGEST_DISABLE_PARALLELISM` environment variable is set to `True` (case-insensitive), otherwise `False` (the default).

* `download_only`: Default: `False`. If set to `True`, downloads any files that are not already present in the connector's specified download directory (`download_dir`), or `work_dir` if `download_dir` is not specified, or the default file path for `work_dir` if `work_dir` is not specified.

* `max_connections`: The maximum of connections allowed when running an asynchronous step in the ingest pipeline.

* `max_docs`: The maximum number of documents to be pulled in for the entire ingest process.

* `num_processes`: Default: `2`. For every step that can use a pool of workers to increase throughput, how many workers to configure in the pool.

* `output_dir`: Where the final results will be located when the process is finished. This is regardless of whether a destination is configured. If a directory is not specified, by default a folder named `structured-output`, relative to the current working directory, is used.

* `preserve_downloads`: When set to `False` (the default), will remove downloaded files after they are successfully processed.

* `raise_on_error`: `False `(default) for any single document that fails in the process, causes the error to be logged but allows for all other documents to proceed in the process. If `True`, causes the entire process to fail and raise the error if any one document fails.

* `re_download`: When set to `False` (the default), will not re-download files if they are already present in the download directory. Setting this to `True` downloads the files even if they exist in the download directory. `True` is useful in cases where for some reason you suspect that the files in the remote location might have changed, and you want to be sure that the file copies in the download directory are the most current ones.

* `reprocess`: Default: `False`. If set to `True`, will ignore all content that may have been cached and rerun each step.

* `tqdm`: When set to `False` (the default), will not show a progress bar.

* `uncompress`: When set to `False` (the default), will not uncompress any archived ZIP or TAR files. If set to `True`, and the specified partitioner provides support for compressed ZIP and TAR files, these files will be uncompressed.

* `verbose`: Default: `False`. If set to `True`, debug logging should be included in the output.

* `work_dir`: The file path for where intermediate results should be saved. If a file path is not set, a default file path will be used relative to the user's home directory (`/home/username/.cache/unstructured/ingest/pipeline` or `C:\Users\username\.cache\unstructured\ingest\pipeline`).

---

# Source: https://docs.unstructured.io/open-source/ingestion/python-ingest.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Process files in batches by using the Unstructured Ingest Python library

The Unstructured Ingest Python library enables you to use Python code to send files in batches to Unstructured for processing, and to tell Unstructured where to deliver the processed data.

<Note>
  The Unstructured Ingest Python library does not work with the Unstructured API.

  For information about the Unstructured API, see the [Unstructured API Overview](/api-reference/workflow/overview).
</Note>

The following 3-minute video shows how to use the Unstructured Ingest Python library to send multiple PDFs from a local directory in batches to be ingested by Unstructured for processing:

<iframe width="560" height="315" src="https://www.youtube.com/embed/tSKHFXsBQ-c" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

## Getting started

You can use the Unstructured Ingest Python library to process files locally, or you can use the Ingest Python library to send files in batches to Unstructured for processing.

Local processing does not use an Unstructured API key or API URL.

Using the Ingest Python library to send files in batches to Unstructured for processing is more robust, and usage is billed to you on a pay-as-you-go basis. Usage requires an Unstructured API key and API URL, as follows:

1. If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
   After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).

   <Note>
     To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/api-reference/overview#pricing).
   </Note>

2. If you have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at [https://platform.unstructured.io](https://platform.unstructured.io).

   <Note>
     For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
     or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).
   </Note>

3. Get your Unstructured API key:<br />

   a. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business** account, click **API Keys** on the sidebar.<br />

   <Note>
     For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
     for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).
   </Note>

   b. Click **Generate API Key**.<br />
   c. Follow the on-screen instructions to finish generating the key.<br />
   d. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

3) The Unstructured API URL for Unstructured Ingest was provided to you when your Unstructured account was created.
   If you do not have this URL, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

<Note>
  The default URL for Unstructured Ingest is the same as the default URL for the legacy Unstructured Partition Endpoint: `https://api.unstructuredapp.io/general/v0/general`.
  However, you should always use the URL that was provided to you when your Unstructured account was created.
</Note>

## Installation

One approach to get started quickly with the Unstructured Ingest Python library is to install Python and then run the following command:

```bash  theme={null}
pip install unstructured-ingest
```

This default installation option enables the ingestion of plain text files, HTML, XML, JSON and emails that do not require any extra dependencies. This default option also enables you to specify local source and destination locations.

You might also need to install additional dependencies, depending on your needs. [Learn more](/open-source/ingestion/ingest-dependencies).

For additional installation options, see the [Unstructured Ingest Python library](/open-source/ingestion/python-ingest) in the [Ingest](/open-source/ingestion/overview) section.

<Info>To migrate from older, deprecated versions of the Ingest Python library that used `pip install unstructured`, see the [migration guide](/open-source/ingestion/overview#migration-guide).</Info>

## Usage

For example, to use the Unstructured Ingest Python library to ingest files from a local source (input) location and to deliver the processed data to an Azure Storage account destination (output) location:

<CodeGroup>
  ```python Python Ingest theme={null}
  import os

  from unstructured_ingest.pipeline.pipeline import Pipeline
  from unstructured_ingest.interfaces import ProcessorConfig

  from unstructured_ingest.processes.connectors.fsspec.azure import (
      AzureConnectionConfig,
      AzureAccessConfig,
      AzureUploaderConfig
  )
  from unstructured_ingest.processes.connectors.local import (
      LocalIndexerConfig,
      LocalDownloaderConfig,
      LocalConnectionConfig
  )
  from unstructured_ingest.processes.partitioner import PartitionerConfig
  from unstructured_ingest.processes.chunker import ChunkerConfig
  from unstructured_ingest.processes.embedder import EmbedderConfig

  # Chunking and embedding are optional.

  if __name__ == "__main__":
      Pipeline.from_configs(
          context=ProcessorConfig(),
          indexer_config=LocalIndexerConfig(input_path=os.getenv("LOCAL_FILE_INPUT_DIR")),
          downloader_config=LocalDownloaderConfig(),
          source_connection_config=LocalConnectionConfig(),
          partitioner_config=PartitionerConfig(
              partition_by_api=True,
              api_key=os.getenv("UNSTRUCTURED_API_KEY"),
              partition_endpoint=os.getenv("UNSTRUCTURED_API_URL"),
              strategy="hi_res",
              additional_partition_args={
                  "split_pdf_page": True,
                  "split_pdf_allow_failed": True,
                  "split_pdf_concurrency_level": 15
              }
          ),
          chunker_config=ChunkerConfig(chunking_strategy="by_title"),
          embedder_config=EmbedderConfig(embedding_provider="huggingface"),
          destination_connection_config=AzureConnectionConfig(
              access_config=AzureAccessConfig(
                  account_name=os.getenv("AZURE_STORAGE_ACCOUNT_NAME"),
                  account_key=os.getenv("AZURE_STORAGE_ACCOUNT_KEY")
              )
          ),
          uploader_config=AzureUploaderConfig(remote_url=os.getenv("AZURE_STORAGE_REMOTE_URL"))
      ).run()
  ```
</CodeGroup>

To learn how to use the Unstructured Ingest Python library to work with a specific source (input) and destination (output) location, see the Python code examples for the [source](/open-source/ingestion/source-connectors/overview) and [destination](/open-source/ingestion/destination-connectors/overview) connectors that are available for you to choose from.

See also the [ingest configuration](/open-source/ingestion/ingest-configuration/overview) settings that enable you to further control how batches are sent and processed.

## Migration guide

The Unstructured Ingest Python library introduced breaking changes in version 0.7.0 as follows:

* The library has moved the calling pattern (formerly commonly referred to as "v2") up one level in the package. If you are using this calling pattern
  in your existing code, beginning with version 0.7.0 you must remove `.v2` from your related `from...import` statements. For example, if you previously had statments
  such as these:

  ```python  theme={null}
  from unstructured_ingest.v2.pipeline.pipeline import Pipeline
  from unstructured_ingest.v2.interfaces import ProcessorConfig
  ```

  Beginning with version 0.7.0, you must change to these by removing `.v2`:

  ```python  theme={null}
  from unstructured_ingest.pipeline.pipeline import Pipeline
  from unstructured_ingest.interfaces import ProcessorConfig
  ```

  If you do not make these changes beginning with version 0.7.0, your existing code will no longer work. To run your existing code
  without making these changes, you must pin your installed library version to 0.6.4 or earlier.

* The earlier calling pattern (commonly referred to as "v1") has been deprecated and removed from the library and will no longer work beginning with version 0.7.0. If you are using this earlier "v1" calling pattern in your code, you must
  either manually migrate your code to the new calling pattern (formerly commonly referred to as "v2"), or pin your installed library version to 0.6.4 or earlier to keep using this earlier "v1" calling pattern.

  To learn how to use the new calling pattern (formerly commonly referred to as "v2"), see the code examples in the
  [source connectors](/open-source/ingestion/source-connectors/overview) and [destination connectors](/open-source/ingestion/destination-connectors/overview) documentation.

---

# Source: https://docs.unstructured.io/ui/destinations/qdrant.md

# Source: https://docs.unstructured.io/open-source/ingestion/destination-connectors/qdrant.md

# Source: https://docs.unstructured.io/api-reference/workflow/destinations/qdrant.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Qdrant

<Note>
  If you're new to Unstructured, read this note first.

  Before you can create a destination connector, you must first sign in to your Unstructured account:

  * If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
    After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).
    To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/api-reference/overview#pricing).
  * If you already have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at
    [https://platform.unstructured.io](https://platform.unstructured.io). For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
    or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  After you sign in, the [Unstructured user interface](/ui/overview) (UI) appears, which you use to get your Unstructured API key.

  1. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business** account, click **API Keys** on the sidebar.<br />

     For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
     for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).

  2. Click **Generate API Key**.<br />

  3. Follow the on-screen instructions to finish generating the key.<br />

  4. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

  After you create the destination connector, add it along with a
  [source connector](/api-reference/workflow/sources/overview) to a [workflow](/api-reference/workflow/overview#workflows).
  Then run the worklow as a [job](/api-reference/workflow/overview#jobs). To learn how, try out the
  the notebook [Dropbox-To-Pinecone Connector API Quickstart for Unstructured](https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Dropbox_To_Pinecone_Connector_Quickstart.ipynb),
  or watch the two 4-minute video tutorials for the [Unstructured Python SDK](/api-reference/workflow/overview#unstructured-python-sdk).

  You can also create destination connectors with the Unstructured user interface (UI).
  [Learn how](/ui/destinations/overview).

  If you need help, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  You are now ready to start creating a destination connector! Keep reading to learn how.
</Note>

Send processed data from Unstructured to Qdrant.

The requirements are as follows.

* For the [Unstructured UI](/ui/overview) or the [Unstructured API](/api-reference/overview), only [Qdrant Cloud](https://qdrant.tech/documentation/cloud-intro/) is supported.
* For [Unstructured Ingest](/open-source/ingestion/overview), Qdrant Cloud,
  [Qdrant local](https://github.com/qdrant/qdrant), and [Qdrant client-server](https://qdrant.tech/documentation/quickstart/) are supported.

The following video shows how to set up Qdrant Cloud:

<iframe width="560" height="315" src="https://www.youtube.com/embed/730jcEAJUG8" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* For Qdrant local, the path to the local Qdrant installation, for example: `/qdrant/local`

* For Qdrant client-server, the Qdrant server URL, for example: `http://localhost:6333`

* For Qdrant Cloud:

  * A [Qdrant account](https://cloud.qdrant.io/login).

  * A [Qdrant cluster](https://qdrant.tech/documentation/cloud/create-cluster/).

  * The cluster's URL. To get this URL, do the following:

    1. Sign in to your Qdrant Cloud account.
    2. On the sidebar, under **Dashboard**, click **Clusters**.
    3. Click the cluster's name.
    4. Note the value of the **Endpoint** field, for example: `https://<random-guid>.<region-id>.<cloud-provider>.cloud.qdrant.io`.

  * A [Qdrant API key](https://qdrant.tech/documentation/cloud/authentication/#create-api-keys).

* The name of the target [collection](https://qdrant.tech/documentation/concepts/collections) on the Qdrant local installation,
  Qdrant server, or Qdrant Cloud cluster.

  Qdrant requires the target collection to exist before Unstructured can write to the collection.
  The following example code demonstrates the use of the [Python Qdrant Client](https://pypi.org/project/qdrant-client/) to create
  a collection on a Qdrant Cloud cluster, configuring the collection for vectors with 3072 dimensions:

  ```python Python theme={null}
  from qdrant_client import QdrantClient, models
  import os

  client = QdrantClient(
      url=os.getenv("QDRANT_URL"),
      api_key=os.getenv("QDRANT_API_KEY")
  )

  client.create_collection(
      collection_name=os.getenv("QDRANT_COLLECTION"),
      vectors_config=models.VectorParams(
          size=3072,
          distance=models.Distance.COSINE
      )
  )

  collection = client.get_collection(
                   collection_name=os.getenv("QDRANT_COLLECTION")
               )

  print(f"The collection named '{os.getenv("QDRANT_COLLECTION")}' exists and " +
        f"has a status of '{collection.status}'.")
  ```

To create a Qdrant destination connector, see the following examples.

<CodeGroup>
  ```python Python SDK theme={null}
  import os

  from unstructured_client import UnstructuredClient
  from unstructured_client.models.operations import CreateDestinationRequest
  from unstructured_client.models.shared import CreateDestinationConnector

  with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
      response = client.destinations.create_destination(
          request=CreateDestinationRequest(
              create_destination_connector=CreateDestinationConnector(
                  name="<name>",
                  type="qdrant-cloud",
                  config={
                      "url": "<url>",
                      "collection_name": "<collection-name>",
                      "batch_size": <batch-size>,
                      "api_key": "<api-key>"
                  }
              )
          )
      )

      print(response.destination_connector_information)
  ```

  ```bash curl theme={null}
  curl --request 'POST' --location \
  "$UNSTRUCTURED_API_URL/destinations" \
  --header 'accept: application/json' \
  --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
  --header 'content-type: application/json' \
  --data \
  '{
      "name": "<name>",
      "type": "qdrant-cloud",
      "config": {
          "url": "<url>",
          "collection_name": "<collection-name>",
          "batch_size": "<batch-size>",
          "api_key": "<api-key>"
      }
  }'
  ```
</CodeGroup>

Replace the preceding placeholders as follows:

* `<name>` (required) - A unique name for this connector.
* `<url>` (required) - The Qdrant cluster's URL.
* `<collection-name>` (required) - The name of the target collection on the Qdrant cluster.
* `<batch-size>` - The maximum number of records to transmit at a time. The default is `50` if not otherwise specified.
* `<api-key>` (required) - The Qdrant API key.

## Learn more

* <Icon icon="blog" />  [Qdrant Integration in the Unstructured Platform](https://unstructured.io/blog/streamlining-rag-pipeline-development-qdrant-integration-in-the-unstructured-platform)
* <Icon icon="blog" />  [How to go from S3 to Qdrant with no code using Unstructured](https://unstructured.io/blog/how-to-go-from-s3-to-qdrant-with-no-code-using-unstructured)

---

# Source: https://docs.unstructured.io/open-source/introduction/quick-start.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Quickstart

<Note>This quickstart uses the Unstructured open source library, which is designed as a starting point for quick prototyping and has [limits](#limits). For production scenarios, use the [Unstructured user interface (UI)](/ui/overview) or the [Unstructured API](/api-reference/overview) instead.</Note>

In this quickstart, you use the [Unstructured open source library](/open-source/introduction/overview)
([GitHub](https://github.com/Unstructured-IO/unstructured),
[PyPI](https://pypi.org/project/unstructured/)) along with Python on your local development machine to partition a PDF file into a standard set of
[Unstructured document elements and metadata](/open-source/concepts/document-elements). You can use these elements and
metadata as input into your RAG applications, AI agents, model fine-tuning tasks, and more.

<Steps>
  <Step title="Prerequisites">
    To complete this quickstart, you need:

    * A Python virtual environment manager is recommended to manage your Python code dependencies.
      This quickstart uses [uv](https://docs.astral.sh/uv/) for managing virtual environments and
      [venv](https://docs.python.org/3/library/venv.html) as the virtual environment type. Installation and
      use of `uv` and `venv` are described in the following steps.
      However, `uv` and `venv` are not required to use the Unstructured open source library.
    * Python 3.9 or higher. You can use `uv` to install Python if needed, as described in the following steps.
    * A PDF file on your local machine. If you do not have a PDF file available, this quickstart provides a sample PDF file named
      `layout-parser-paper.pdf` that you can download in a later step. (The Unstructured open source library provides
      [support for additional file types](/open-source/introduction/supported-file-types) as well.)
  </Step>

  <Step title="Install uv">
    <Tabs>
      <Tab title="macOS, Linux">
        To use `curl` with `sh`:

        ```bash  theme={null}
        curl -fsSL https://get.uv.dev | bash
        ```

        To use `wget` with `sh` instead:

        ```bash  theme={null}
        wget -qO- https://astral.sh/uv/install.sh | sh
        ```
      </Tab>

      <Tab title="Windows">
        To use PowerShell with `irm` to download the script and run it with `iex`:

        ```powershell  theme={null}
        powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
        ```
      </Tab>
    </Tabs>

    To install `uv` by using other approaches such as PyPI, Homebrew, or WinGet,
    see [Installing uv](https://docs.astral.sh/uv/getting-started/installation/).
  </Step>

  <Step title="Install Python">
    `uv` will detect and use Python if you already have it installed.
    To view a list of installed Python versions, run the following command:

    ```bash  theme={null}
    uv python list
    ```

    If, however, you do not already have Python installed, you can install a version of Python for use with `uv`
    by running the following command. For example, this command installs Python 3.12 for use with `uv`:

    ```bash  theme={null}
    uv python install 3.12
    ```
  </Step>

  <Step title="Create a uv project">
    Use `uv` to create a project by switching to the directory on your development machine where you want to
    create the project and then running the following command:

    ```bash  theme={null}
    uv init
    ```
  </Step>

  <Step title="Create a venv virtual environment">
    To isolate and manage your project's code dependencies, from your project directory, use `uv` to create a virtual environment with `venv` by running the following command:

    ```bash  theme={null}
    # Create the virtual environment by using the current Python version:
    uv venv

    # Or, if you want to use a specific Python version:
    uv venv --python 3.12
    ```
  </Step>

  <Step title="Activate the virtual environment">
    To activate the `venv` virtual environment, run one of the following commands:

    <Tabs>
      <Tab title="macOS, Linux">
        * For `bash` or `zsh`, run `source .venv/bin/activate`
        * For `fish`, run `source .venv/bin/activate.fish`
        * For `csh` or `tcsh`, run `source .venv/bin/activate.csh`
        * For `pwsh`, run `.venv/bin/Activate.ps1`
      </Tab>

      <Tab title="Windows">
        * For `cmd.exe`, run `.venv\Scripts\activate.bat`
        * For `PowerShell`, run `.venv\Scripts\Activate.ps1`
      </Tab>
    </Tabs>

    To deactivate the virtual environment at any time, run `deactivate`.
  </Step>

  <Step title="Install the Unstructured open source library">
    With the virtual environment activated to enable code dependency isolation and management, use `uv` to install the Unstructured open source library by running the following command:

    ```bash  theme={null}
    uv add unstructured
    ```

    The preceding command supports plain text files (`.txt`), HTML files (`.html`), XML files (`.xml`), and emails (`.eml`, `.msg`, and `.p7s`) without any additional dependencies.

    To work with other file types, you must also install these dependencies, as follows, replacing `<extra>` with the appropriate extra for the target file type:

    ```bash  theme={null}
    uv add "unstructured[<extra>]"
    ```

    The following file type extras are available:

    * `all-docs` (for all supported file types in this list)
    * `csv` (for `.csv` files only)
    * `docx` (for `.doc` and `.docx` files only)
    * `epub` (for `.epub` files only)
    * `image` (for all supported image file types: `.bmp`, `.heic`, `.jpeg`, `.png`, and `.tiff`)
    * `md` (for `.md` files only)
    * `odt` (for `.odt` files only)
    * `org` (for `.org` files only)
    * `pdf` (for `.pdf` files only)
    * `pptx` (for `.ppt` and `.pptx` files only)
    * `rst` (for `.rst` files only)
    * `rtf` (for `.rtf` files only)
    * `tsv` (for `.tsv` files only)
    * `xlsx` (for `.xls` and `.xlsx` files only)

    As this quickstart uses a sample PDF file, run the following command:

    ```bash  theme={null}
    uv add "unstructured[pdf]"
    ```

    Note that you can install multiple extras at the same time by separating them with commas, for example:

    ```bash  theme={null}
    uv add "unstructured[pdf,docx]"
    ```
  </Step>

  <Step title="Install system dependencies">
    You maximum compatibility, you should also install the following system dependencies:

    * [libmagic-dev](https://man7.org/linux/man-pages/man3/libmagic.3.html) (for filetype detection)
    * [poppler-utils](https://poppler.freedesktop.org/) and [tesseract-ocr](https://github.com/tesseract-ocr/tesseract) (for images and PDFs), and `tesseract-lang` (for additional language support)
    * [libreoffice](https://www.libreoffice.org/discover/libreoffice/) (for Microsoft Office documents)
    * [pandoc](https://pandoc.org/) (for `.epub`, `.odt`, and `.rtf` files. For `.rtf` files, you must have version 2.14.2 or newer. Running [this script](https://github.com/Unstructured-IO/unstructured/blob/main/scripts/install-pandoc.sh) will install the correct version for you.)

    Installation instructured for these system dependencies vary by operating system type. For details, follow the preceding links or see your
    operating system's documentation.
  </Step>

  <Step title="Download the sample PDF file">
    Download the sample PDF file named `layout-parser-paper.pdf` from the following location to your local development machine:

    [https://github.com/Unstructured-IO/unstructured/tree/main/example-docs/pdf](https://github.com/Unstructured-IO/unstructured/tree/main/example-docs/pdf)

    (You can also use any other PDF file that you want to work with instead of this sample file, if you prefer.)
  </Step>

  <Step title="Add the Python code">
    In the project's `main.py` file, add the following Python code, replacing `<path/to>` with the
    path to the `layout-parser-paper.pdf` file that you downloaded to your local development machine.

    (If you want to use a different PDF file, replace `layout-parser-paper` with the name of that PDF file instead.)

    ```python  theme={null}
    from unstructured.partition.pdf import partition_pdf
    from unstructured.staging.base import elements_to_json

    file_path = "<path/to>"
    base_file_name = "layout-parser-paper"

    def main():
        elements = partition_pdf(filename=f"{file_path}/{base_file_name}.pdf")
        elements_to_json(elements=elements, filename=f"{file_path}/{base_file_name}-output.json")

    if __name__ == "__main__":
        main()
    ```
  </Step>

  <Step title="Run the Python code">
    Use `uv` to run the preceding Python code by running the following command:

    ```bash  theme={null}
    uv run main.py
    ```

    It might take a few minutes for the command to finish.
  </Step>

  <Step title="View the output">
    After the command finishes running successfully, view the Unstructured elements and metadata that were generated by opening the `layout-parser-paper-output.json` file in your editor. This file will be in
    the location as the original `layout-parser-paper.pdf` file.

    (If you used a different PDF file, the output file will be named `<your-file-name>-output.json` instead.)
  </Step>
</Steps>

## Next steps

* Learn more about the [available partition functions](/open-source/core-functionality/partitioning) in addition to `partition_pdf` for converting other types of files into standard [Unstructured document elements and metadata](/open-source/concepts/document-elements).
* By default, the preceding example uses the `auto` partitioning strategy. Learn about other [available partitioning strategies](/open-source/concepts/partitioning-strategies) for fine-tuned approaches to converting different types of files into Unstructured document elements.
* Learn about [available chunking functions](/open-source/core-functionality/chunking) for splitting up the text in your document elements into manageable chunks as needed to fit into your models' limited context windows.
* Learn about [available cleaning functions](/open-source/core-functionality/cleaning) for cleaning up your document elements' data as needed.
* Learn about [available extraction functions](/open-source/core-functionality/extracting) for getting precise information out of your document elements as needed.
* Learn about how to [generate vector embeddings](/open-source/core-functionality/embedding) for the text in your document elements for use in RAG applications, AI agents, model fine-tuning tasks, and more.
* For an additional code example, see the [Unstructured Quick Tour](https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Unstructured_Quick_Tour.ipynb) Google Colab notebook.
* The Unstructured open source library is also available as a [Docker container](/open-source/installation/docker-installation).
* The [Unstructured Ingest CLI and Unstructured Ingest Python library](/open-source/ingestion/overview) build upon the Unstructured open source library by providing additional functionality such as batch file processing,
  ingesting files from remote source locations and sending the processed files' data to remote destination locations, creating programmatic ETL pipelines, optionally processing files on Unstructured-hosted compute resource instead of locally for improved performance and quality on a pay-as-you-go basis, and more.
* The [Unstructured user interface (UI)](/ui/overview) and [Unstructured API](/api-reference/overview) are superior to the Unstructured open source library, the
  Unstructured Ingest CLI, and the Unstructured Ingest Python library. The Unstructured UI and API are designed for production scenarios, with significantly increased performance and quality,
  the latest OCR and vision language models, advanced chunking strategies, security compliance, multi-user account management, job scheduling and monitoring, self-hosted deployment options, and more on a pay-as-you-go or subscription basis.

## Need help?

* Join the [Unstructured Slack community](https://short.unstructured.io/pzw05l7) and post your
  questions in the **# ask-for-help-open-source-library** channel.
* Post your bug reports and feature requests in the [Unstructured open source library GitHub repository](https://github.com/Unstructured-IO/unstructured/issues). These bug reports and feature requests are evaluated and addressed\
  based on the interest and availability of the open source community.

---

# Source: https://docs.unstructured.io/ui/quickstart.md

# Source: https://docs.unstructured.io/api-reference/quickstart.md

# Source: https://docs.unstructured.io/api-reference/legacy-api/partition/quickstart.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Unstructured API Quickstart

<Note>
  The following information applies to the legacy Unstructured Partition Endpoint.

  Unstructured recommends that you use the
  [on-demand jobs](/api-reference/workflow/overview#run-an-on-demand-job) functionality in the
  [Unstructured API](/api-reference/overview) instead. Unstructured's on-demand jobs provide
  many benefits over the legacy Unstructured Partition Endpoint, including support for:

  * Production-level usage.
  * Multiple local input files in batches.
  * The latest and highest-performing models.
  * Post-transform enrichments.
  * All of Unstructured's chunking strategies.
  * The generation of vector embeddings.

  The Unstructured API also provides support for processing files and data in remote locations.
</Note>

<Tip>
  Do you want to run this quickstart without modifying your local machine?
  [Skip ahead](https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Unstructured_Partition_Endpoint_Quickstart.ipynb) to run this quickstart as a notebook on Google Colab now!

  Do you want to just copy the sample code for use on your local machine? [Skip ahead](#sample-code) to the code now!

  This quickstart uses the Unstructured Partition Endpoint and focuses on a single, local file for ease-of-use demonstration purposes. This quickstart also
  focuses only on a limited set of Unstructured's full capabilities. To unlock the full feature set, as well as use Unstructured to do
  large-scale batch processing of multiple files and semi-structured data that are stored in remote locations,
  [skip over](/api-reference/workflow/overview#quickstart) to an expanded, advanced version of this quickstart that uses the
  Unstructured API's workflow operations instead.
</Tip>

<iframe width="560" height="315" src="https://www.youtube.com/embed/0EogKNU_BPU" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

The following code shows how to use the [Unstructured Python SDK](/api-reference/legacy-api/partition/sdk-python)
to have Unstructured process one or more local files by using
the [Unstructured Partition Endpoint](/api-reference/legacy-api/partition/overview).

To run this code, you will need the following:

* An Unstructured account and an Unstructured API key for your account. [Learn how](/api-reference/legacy-api/partition/overview#get-started).

* Python 3.9 or higher installed on your local machine.

* A Python virtual environment is recommended for isolating and versioning Python project code dependencies, but this is not required.
  To create and activate a virtual environment, you can use a framework such as
  [uv](https://docs.astral.sh/uv/) (recommended). Python provides a built-in framework named
  [venv](https://docs.python.org/3/library/venv.html).

* You must install the Unstructured Python SDK on your local machine, for example by running one of the
  following commands:

  * For `uv`, run `uv add unstructured-client`
  * For `venv` (or for no virtual environment), run `pip install unstructured-client`

* Add the following code to a Python file on your local machine; make the following code changes; and then run the code file to see the results.

  * Replace `<unstructured-api-key>` with your Unstructured API key.

  * To process all files within a directory, change `None` for `input_dir` to a string that contains the path to the directory on your local machine. This can be a relative or absolute path.

  * To process specific files within a directory or across multiple directories, change `None` for `input_file` to a string that contains
    a comma-separated list of filepaths on your local machine, for example `"./input/2507.13305v1.pdf,./input2/table-multi-row-column-cells.pdf"`. These filepaths
    can be relative or absolute.

    <Note>
      If `input_dir` and `input_file` are both set to something other than `None`, then the `input_dir` setting takes precedence, and the `input_file` setting is ignored.
    </Note>

  * For the `output_dir` parameter, specify a string that contains the path to the directory on your local machine that you want Unstructured to send its JSON output files. If the specified directory does not exist at that location, the code will create the missing directory for you. This path can be relative or absolute.

  <Note>
    If you choose to run this code in a notebook (such as a Google Colab notebook), you must do the following first to avoid
    nested event loop errors:

    1. Install the `nest_asyncio` Python package, by running the following command from a notebook cell:

       ```python  theme={null}
       !pip install nest_asyncio
       ```

    2. Import the `nest_asyncio` package and then enable nested event loops, by running the following code from a notebook cell:

       ```python  theme={null}
       import nest_asyncio

       nest_asyncio.apply()
       ```

    After completing these two steps, you can then run the following code as normal.
  </Note>

## Sample code

```python Python SDK theme={null}
import asyncio
import os
import json
import unstructured_client
from unstructured_client.models import shared, errors

client = unstructured_client.UnstructuredClient(
    api_key_auth="<unstructured-api-key>"
)

async def partition_file_via_api(filename):
    req = {
        "partition_parameters": {
            "files": {
                "content": open(filename, "rb"),
                "file_name": os.path.basename(filename),
            },
            "strategy": shared.Strategy.AUTO,
            "vlm_model": "gpt-4o",
            "vlm_model_provider": "openai",
            "languages": ['eng'],
            "split_pdf_page": True, 
            "split_pdf_allow_failed": True,
            "split_pdf_concurrency_level": 15
        }
    }

    try:
        res = await client.general.partition_async(request=req)
        return res.elements
    except errors.UnstructuredClientError as e:
        print(f"Error partitioning {filename}: {e.message}")
        return []

async def process_file_and_save_result(input_filename, output_dir):
    elements = await partition_file_via_api(input_filename)

    if elements:
        results_name = f"{os.path.basename(input_filename)}.json"
        output_filename = os.path.join(output_dir, results_name)

        with open(output_filename, "w") as f:
            json.dump(elements, f)

def load_filenames_in_directory(input_dir):
    filenames = []
    for root, _, files in os.walk(input_dir):
        for file in files:
            if not file.endswith('.json'):
                filenames.append(os.path.join(root, file))

    return filenames

async def process_files():
    # Initialize with either a directory name, to process everything in the dir,
    # or a comma-separated list of filepaths.
    input_dir = None   # "path/to/input/directory"
    input_files = None # "path/to/file,path/to/file,path/to/file"

    # Set to the directory for output json files. This dir 
    # will be created if needed.
    output_dir = "./output/"

    if input_dir:
        filenames = load_filenames_in_directory(input_dir)
    else:
        filenames = input_files.split(",")

    os.makedirs(output_dir, exist_ok=True)

    tasks = []
    for filename in filenames:
        tasks.append(
            process_file_and_save_result(filename, output_dir)
        )

    await asyncio.gather(*tasks)

if __name__ == "__main__":
    asyncio.run(process_files())
```

## Next steps

This quickstart shows how to use the Unstructured Partition Endpoint, which is intended for rapid prototyping of
some of Unstructured's [partitioning](/api-reference/legacy-api/partition/partitioning) strategies, with limited support for [chunking](/api-reference/legacy-api/partition/chunking).
It is designed to work only with processing of local files.

Take your code to the next level by switching over to the [Unstructured API's workflow operations](/api-reference/workflow/overview)
for production-level scenarios, file processing in batches, files and data in remote locations, full support for [chunking](/ui/chunking),
generating [embeddings](/ui/embedding), applying post-transform [enrichments](/ui/enriching/overview),
using the latest and highest-performing models, and much more.
[Get started](/api-reference/workflow/overview).

---

# Source: https://docs.unstructured.io/support/issues/quota-billing-rate-limiting.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Quota, billing, and rate limiting issues

## Issues

When you try to use Unstructured, you get one of the following error types:

* `ClientResponseError`: For example, an `HTTP 402 Payment Required` error.
* `SDKError`: For example, an `Insufficient API quota` error.
* `PineconeApiException`: For example, an `HTTP 429 Too Many Requests` or `write unit limit reached` error.
* `DestinationConnectionError`: For example, an error that wraps an `HTTP 429 Too Many Requests` error.
* `HTTPError`: For example, an `HTTP 429 Too Many Requests` error.
* `Error`: For example, a `RESOURCE_EXHAUSTED` or a `reduce request rate` error.

## Possible causes

* Unstructured or the source's or destination's system imposes usage limits such as the number of API calls or data processed or stored over a specified time period or a specified rate of requests. These limits have been reached, or payment is required to continue.
* A daily or monthly API quota for Unstructured, the source's or destination's API, or a specified model, has been reached.
* An API rate limit of requests per second or minute has been reached on the source's or destination's system.
* A subscription issue or required payment blocker has been encountered by Unstructured or for a related service.

## Possible solutions

Some Unstructured **Business** account plans for dedicated instances or in-VPC deployments require the customer to set up or pay for related
services, features, and models outside of Unstructured, in order to achieve volume discounts and to better manage overall usage costs.
If you are on one of these account plans, check the following:

* **Check your usage and quotas**: Review your current usage against your budget limits in your Unstructured account dashboard and any relevant third-party services—such as Pinecone, Azure OpenAI, or Databricks.
* **Verify your billing and subscription details**: Ensure any billing agreements and subscriptions are active and their payments are up-to-date.

Other possible solutions include the following:

* **Reduce the rate of requests if you can**.
* **Run processing jobs less frequently if possible**.
* **Process data in smaller batches if you can**.
* **If configurable, reduce the concurrency settings in your Unstructured ingestion pipelines**.
* **Try processing jobs again later**. Some rate limits are often time-based; waiting and retrying later might work.
* **Upgrade your plan**. If you are consistently hitting limits, consider upgrading your plan for Unstructured or the third-party service.
* **Contact Unstructured Support**. If you are consistently hitting limits, consider [contacting Unstructured Support](/support/request) to discuss your options.

## Additional resources

To ask questions or get additional help with this issue, see [requesting support](/support/request).

---

# Source: https://docs.unstructured.io/ui/destinations/redis.md

# Source: https://docs.unstructured.io/open-source/ingestion/destination-connectors/redis.md

# Source: https://docs.unstructured.io/api-reference/workflow/destinations/redis.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Redis

<Note>
  If you're new to Unstructured, read this note first.

  Before you can create a destination connector, you must first sign in to your Unstructured account:

  * If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
    After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).
    To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/api-reference/overview#pricing).
  * If you already have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at
    [https://platform.unstructured.io](https://platform.unstructured.io). For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
    or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  After you sign in, the [Unstructured user interface](/ui/overview) (UI) appears, which you use to get your Unstructured API key.

  1. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business** account, click **API Keys** on the sidebar.<br />

     For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
     for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).

  2. Click **Generate API Key**.<br />

  3. Follow the on-screen instructions to finish generating the key.<br />

  4. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

  After you create the destination connector, add it along with a
  [source connector](/api-reference/workflow/sources/overview) to a [workflow](/api-reference/workflow/overview#workflows).
  Then run the worklow as a [job](/api-reference/workflow/overview#jobs). To learn how, try out the
  the notebook [Dropbox-To-Pinecone Connector API Quickstart for Unstructured](https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Dropbox_To_Pinecone_Connector_Quickstart.ipynb),
  or watch the two 4-minute video tutorials for the [Unstructured Python SDK](/api-reference/workflow/overview#unstructured-python-sdk).

  You can also create destination connectors with the Unstructured user interface (UI).
  [Learn how](/ui/destinations/overview).

  If you need help, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  You are now ready to start creating a destination connector! Keep reading to learn how.
</Note>

Send processed data from Unstructured to Redis.

The requirements are as follows.

<iframe width="560" height="315" src="https://www.youtube.com/embed/_zpzzr2VtoM" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* A [Redis](https://redis.io) database, for example in [Redis Cloud](https://redis.io/cloud/).
* The target database's hostname and port number. [Create a database in Redis Cloud](https://redis.io/docs/latest/operate/rc/rc-quickstart/#create-an-account).
* The username and password for the target database. [Get the username and password in Redis Cloud](https://redis.io/docs/latest/operate/rc/rc-quickstart/#connect-to-a-database).
* The database number for the target database. Redis databases are typically numbered from 0 to 15, with the default database number typically being 0.

To create a Redis destination connector, see the following examples.

<CodeGroup>
  ```python Python SDK theme={null}
  import os

  from unstructured_client import UnstructuredClient
  from unstructured_client.models.operations import CreateDestinationRequest
  from unstructured_client.models.shared import CreateDestinationConnector

  with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
      response = client.destinations.create_destination(
          request=CreateDestinationRequest(
              create_destination_connector=CreateDestinationConnector(
                  name="<name>",
                  type="redis",
                  config={
                      "database": "<database>",
                      "ssl": <True|False>,
                      "batch_size": <batch-size>,
                      "key_prefix": "<key-prefix>",

                      # For URI authentication, include only "uri" key:
                      # "uri": "<uri>",

                      # For password authentication, include these keys instead:
                      # "host": "<host>",
                      # "port": <port>,
                      # "username": "<username>",
                      # "password": "<password>"
                  }
              )
          )
      )

      print(response.destination_connector_information)
  ```

  ```bash curl theme={null}
  curl --request 'POST' --location \
  "$UNSTRUCTURED_API_URL/destinations" \
  --header 'accept: application/json' \
  --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
  --header 'content-type: application/json' \
  --data \
  '{
      "name": "<name>",
      "type": "redis",
      "config": {
          "database": <database>,
          "ssl": <true|false>,
          "batch_size": <batch-size>,
          "key_prefix": "<key-prefix>",

          # For URI authentication:
          "uri": "<uri>"
        
          # For password authentication:
          "host": "<host>",
          "port": <port>,
          "username": "<username>",
          "password": "<password>"   
      }
  }'
  ```
</CodeGroup>

Replace the preceding placeholders as follows:

* `<name>` (required) - A unique name for this connector.
* `<uri>` For URI authentication, the connection URI for the target database user and password, hostname, and port number. Use the format of `redis://<username>:<password>@<hostname>:<port-number>`. If SSL encryption is enabled for the database, use `rediss://` instead of `redis://`.
* `<host>` (*required* for password authentication): The target database's hostname.
* `<port>`: For password authentication, the database's port number. The default is `6379` unless otherwise specified.
* `<username>` (*required* for password authentication): The name of the database user (not the logged in user).
* `<password>` (*required* for password authentication): The database user's password.
* `<database>`: The number (index) for the target database. The default is `0` unless otherwise specified.
* For `ssl`, `true` to enable SSL encryption for the connection. Set to `true` only if SSL is already set up and enabled for the target database. The default is `true` unless otherwise specified.
* `<key-prefix>`: A string to prepend to each element's `element_id`. This is useful for distinguishing between different data sets in the same Redis instance. The default is an empty string.
* `<batch-size>`: The maximum number of records to upload in a single batch. The default is `100` unless otherwise specified.

## Learn more

* <Icon icon="blog" />  [Redis Support in Unstructured Platform: Supercharging Your RAG Pipeline](https://unstructured.io/blog/redis-support-in-unstructured-platform-supercharging-your-rag-pipeline)
* <Icon icon="blog" />  [Getting Started with Unstructured and Redis](https://unstructured.io/blog/getting-started-with-unstructured-and-redis)

---

# Source: https://docs.unstructured.io/support/request.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Requesting support

<Tip>
  If you are having an issue with Unstructured, try checking the [Issues](/support/issues/overview) section first
  to see if your issue has already been addressed.
</Tip>

To request support, do the following:

* If you are using the [Unstructured open source library](/open-source/introduction/overview),
  request support by creating a new message in the **#general**
  channel in the [Unstructured Slack community](https://short.unstructured.io/pzw05l7).
* For Unstructured software-as-a-service (Saas) accounts, including legacy pay-as-you-go,
  [Let's Go](https://unstructured.io/pricing) and [Pay-As-You-Go](https://unstructured.io/pricing) accounts,
  email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).<sup>\*</sup>
* For Unstructured [Business](/business/overview) accounts,
  contact your assigned Unstructured Account Executive (AE) or Customer Success Manager (CSM) for access to your
  dedicated Customer Support Portal.
  If you do not know who your assigned AE or CSM is,
  email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).<sup>\*</sup>

It is critical that you route your support requests appropriately, as follows. Incorrect routing can result in delayed or no response, and
any related service level agreement (SLA) commitments by Unstructured will not be honored.

* **Support requests for paid Unstructured accounts (except for **Business** accounts) must only be emailed to Unstructured Support at
  [support@unstructured.io](mailto:support@unstructured.io)**. These kinds of support requests, if submitted by any other means,
  might take longer to receive a response from Unstructured Support—or they might not be responded to at all.
* **Support requests for Unstructured **Business** accounts must only be directed to your
  assigned Unstructured Account Executive (AE) or Customer Success Manager (CSM).** These kinds of support requests, if submitted by any other means,
  might take longer to receive a response from Unstructured Support—or they might not be responded to at all.
* **Support requests for the Unstructured open source library must only be posted in the #general channel in the Unstructured Slack community**.
  These kinds of support requests, if submitted by any other means, will not be responded to.

<sup>\*</sup> Before contacting Unstructured Support, please ensure that you consult the
latest version of the [Unstructured Shared Responsibility Model](/support/shared-responsibility) to check
who owns resolution of the issue you are experiencing. If you contact Unstructured Support about an issue for
which the Model states that the customer is ultimately responsible, Support could return the unresolved issue
back to you and take no further action.

---

# Source: https://docs.unstructured.io/api-reference/workflow/retries.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Retries

For the [Unstructured Python SDK](/api-reference/workflow/overview#unstructured-python-sdk),
the [Unstructured API's workflow operations](/api-reference/workflow/overview) allows for retries with the
`RetryConfig` class when initializing the client, through the client's `retry_config` parameter.
If a request to the Unstructured API's workflow operations fails, the client will retry the
request with an exponential backoff strategy up to a maximum interval of one minute. The
function keeps retrying until the total elapsed time exceeds `max_elapsed_time`[\*](#parameter-names),
which defaults to one hour. However, you can override these defaults, for example as follows:

```python Python SDK theme={null}
# ...
from unstructured_client.utils import RetryConfig, BackoffStrategy

client = UnstructuredClient(
    api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")
    retry_config=RetryConfig(
        strategy="backoff",
        retry_connection_errors=True,
        backoff=BackoffStrategy(
            initial_interval=500,     # 0.5 seconds: 500 milliseconds
            max_interval=60000,       # 1 minute: 60 seconds * 1000 milliseconds
            exponent=1.5,
            max_elapsed_time=900000,  # 15 minutes: 15 * 60 seconds * 1000 milliseconds
        ),
    )
)
```

Available `RetryConfig` settings include:

| Setting                            | Description                                                                       |
| ---------------------------------- | --------------------------------------------------------------------------------- |
| `strategy`                         | The strategy to use for retries. The only supported value is `backoff`.           |
| `retry_connection_errors`          | True to retry on connection errors.                                               |
| `BackoffStrategy.initial_interval` | After the first error, wait the specified number of milliseconds before retrying. |
| `BackoffStrategy.max_interval`     | The maximum wait time, specified in milliseconds between retries.                 |
| `BackoffStrategy.exponent`         | After each retry, increase the wait time exponentially by the specified factor.   |
| `BackoffStrategy.max_elapsed_time` | Stop retrying after the specified number of milliseconds.                         |

---

# Source: https://docs.unstructured.io/ui/account/roles.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Role-based access control (RBAC)

<Note>
  The following information applies only to Unstructured [organizational accounts](/ui/account/organizations), which are
  available only for Unstructured **Business** accounts. To upgrade from an Unstructured **Let's Go** or **Pay-As-You-Go** account
  to a **Business** account, contact your Unstructured sales representative, or
  email Unstructured Sales at [sales@unstructured.io](mailto:sales@unstructured.io).
</Note>

*Roles* in Unstructured are part of the *role-based access control* (RBAC) system that manages permissions for members of
organizational accounts and their workspaces. (Roles are not used in personal accounts.)

Any member with the **Super Administrator** role in an organizational account can manage the roles of that organizational account's members and the roles of the
members of the organizational account's workspaces.

Any member with the **Super Administrator** role in an organizational account or the **Workspace Admin** role in a workspace within an
organizational account can mange the roles of that workspace's members.

A **Super Administrator** member assigns an organizational account member's initial role when they are [added to the organizational account](/ui/account/organizations#add-a-member-to-an-organizational-account). This member's
initial role can be [changed](/ui/account/organizations#change-an-organizational-account-role-for-a-member) later.

A **Super Administrator** or **Workspace Admin** member assigns a workspace member's initial role when they are [added to the workspace](/ui/account/workspaces#add-a-member-to-a-workspace). This
member's initial role can be [changed](/ui/account/workspaces#change-a-workspace-role-for-a-member) later.

## Organizational account roles

Organizational account roles include:

* **Super Administrator**: Has access to all permissions, and has access to all resources created in an organization.
* **Account Member**: Able to be added to workspaces with a workspace role.
* **Billing Administrator**: Only able to view billing information and usage.

These roles include the following permissions:

| Permission                                                      | Super Administrator                | Account Member                     | Billing Administrator              |
| --------------------------------------------------------------- | ---------------------------------- | ---------------------------------- | ---------------------------------- |
| Organizational account management                               |                                    |                                    |                                    |
| Can add members to the account                                  | <span class="yes-label">Yes</span> | <span class="no-label">No</span>   | <span class="no-label">No</span>   |
| Can remove members from the account                             | <span class="yes-label">Yes</span> | <span class="no-label">No</span>   | <span class="no-label">No</span>   |
| Can view members of the account                                 | <span class="yes-label">Yes</span> | <span class="no-label">No</span>   | <span class="no-label">No</span>   |
| Can view an account member's details                            | <span class="yes-label">Yes</span> | <span class="no-label">No</span>   | <span class="no-label">No</span>   |
| Can change an account member's role                             | <span class="yes-label">Yes</span> | <span class="no-label">No</span>   | <span class="no-label">No</span>   |
| Billing                                                         |                                    |                                    |                                    |
| Can view usage and costs                                        | <span class="yes-label">Yes</span> | <span class="no-label">No</span>   | <span class="yes-label">Yes</span> |
| Can edit payment information                                    | <span class="yes-label">Yes</span> | <span class="no-label">No</span>   | <span class="yes-label">Yes</span> |
| Workspaces                                                      |                                    |                                    |                                    |
| Can create workspaces                                           | <span class="yes-label">Yes</span> | <span class="no-label">No</span>   | <span class="no-label">No</span>   |
| Can add members to a workspace                                  | <span class="yes-label">Yes</span> | <span class="no-label">No</span>   | <span class="no-label">No</span>   |
| Can view members of a workspace                                 | <span class="yes-label">Yes</span> | <span class="yes-label">Yes</span> | <span class="no-label">No</span>   |
| Can be added to a workspace                                     | <span class="yes-label">Yes</span> | <span class="yes-label">Yes</span> | <span class="yes-label">Yes</span> |
| Can take actions in a workspace (based on their workspace role) | <span class="yes-label">Yes</span> | <span class="yes-label">Yes</span> | <span class="no-label">No</span>   |

## Workspace roles

Workspace roles include:

* **Viewer**: Ability to view all connectors and workflows that exist in the workspace in a read-only capacity.
* **Operator**: Ability to create, run, schedule, and delete any workflows that exist in the workspace.
  Can view connectors but cannot create or edit them.
* **Developer**: Ability to create and edit all connectors and workflows that exist in the workspace.
* **Workspace Administrator**: Ability to manage users on the workspace (invite, remove or change roles) as well as edit the workspace.

These roles include the following permissions:

| Resource          | Action                              | Viewer                             | Operator                           | Developer                          | Workspace Administrator            |
| ----------------- | ----------------------------------- | ---------------------------------- | ---------------------------------- | ---------------------------------- | ---------------------------------- |
| Workflows         | Read                                | <span class="yes-label">Yes</span> | <span class="yes-label">Yes</span> | <span class="yes-label">Yes</span> | <span class="yes-label">Yes</span> |
|                   | Create                              | <span class="no-label">No</span>   | <span class="yes-label">Yes</span> | <span class="yes-label">Yes</span> | <span class="yes-label">Yes</span> |
|                   | Edit                                | <span class="no-label">No</span>   | <span class="yes-label">Yes</span> | <span class="yes-label">Yes</span> | <span class="yes-label">Yes</span> |
|                   | Delete                              | <span class="no-label">No</span>   | <span class="yes-label">Yes</span> | <span class="yes-label">Yes</span> | <span class="yes-label">Yes</span> |
|                   | Run                                 | <span class="no-label">No</span>   | <span class="yes-label">Yes</span> | <span class="yes-label">Yes</span> | <span class="yes-label">Yes</span> |
|                   | Schedule                            | <span class="no-label">No</span>   | <span class="yes-label">Yes</span> | <span class="yes-label">Yes</span> | <span class="yes-label">Yes</span> |
|                   | Save                                | <span class="no-label">No</span>   | <span class="yes-label">Yes</span> | <span class="yes-label">Yes</span> | <span class="yes-label">Yes</span> |
|                   | Duplicate                           | <span class="no-label">No</span>   | <span class="yes-label">Yes</span> | <span class="yes-label">Yes</span> | <span class="yes-label">Yes</span> |
|                   | Activate and deactivate             | <span class="no-label">No</span>   | <span class="yes-label">Yes</span> | <span class="yes-label">Yes</span> | <span class="yes-label">Yes</span> |
| Connectors        | Read                                | <span class="yes-label">Yes</span> | <span class="yes-label">Yes</span> | <span class="yes-label">Yes</span> | <span class="yes-label">Yes</span> |
|                   | Create                              | <span class="no-label">No</span>   | <span class="no-label">No</span>   | <span class="yes-label">Yes</span> | <span class="yes-label">Yes</span> |
|                   | Edit                                | <span class="no-label">No</span>   | <span class="no-label">No</span>   | <span class="yes-label">Yes</span> | <span class="yes-label">Yes</span> |
|                   | Delete                              | <span class="no-label">No</span>   | <span class="no-label">No</span>   | <span class="yes-label">Yes</span> | <span class="yes-label">Yes</span> |
| Workspace members | Add members to the workspace        | <span class="no-label">No</span>   | <span class="no-label">No</span>   | <span class="no-label">No</span>   | <span class="yes-label">Yes</span> |
|                   | View the workspace's members        | <span class="yes-label">Yes</span> | <span class="yes-label">Yes</span> | <span class="yes-label">Yes</span> | <span class="yes-label">Yes</span> |
|                   | Remove members from the workspace   | <span class="no-label">No</span>   | <span class="no-label">No</span>   | <span class="no-label">No</span>   | <span class="yes-label">Yes</span> |
|                   | Change the workspace members' roles | <span class="no-label">No</span>   | <span class="no-label">No</span>   | <span class="no-label">No</span>   | <span class="yes-label">Yes</span> |

Super Administrators in an organizational account have complete access to all of the organizational account's workspaces, regardless of whether they
are a member of those workspaces.

---

# Source: https://docs.unstructured.io/api-reference/workflows/run-workflow.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Run Workflow

> Run a workflow by triggering a new job if none is currently active.


## OpenAPI

````yaml https://platform.unstructuredapp.io/openapi.json post /api/v1/workflows/{workflow_id}/run
openapi: 3.1.0
info:
  title: Platform API
  version: 3.1.0
servers:
  - url: https://platform.unstructuredapp.io/
    description: Unstructured Platform API
    x-speakeasy-server-id: platform-api
security: []
paths:
  /api/v1/workflows/{workflow_id}/run:
    post:
      tags:
        - workflows
      summary: Run Workflow
      description: Run a workflow by triggering a new job if none is currently active.
      operationId: run_workflow
      parameters:
        - name: workflow_id
          in: path
          required: true
          schema:
            type: string
            format: uuid
            title: Workflow Id
        - name: unstructured-api-key
          in: header
          required: false
          schema:
            anyOf:
              - type: string
              - type: 'null'
            title: Unstructured-Api-Key
      requestBody:
        content:
          multipart/form-data:
            schema:
              $ref: '#/components/schemas/Body_run_workflow'
      responses:
        '202':
          description: Successful Response
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/JobInformation'
        '422':
          description: Validation Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/HTTPValidationError'
components:
  schemas:
    Body_run_workflow:
      properties:
        input_files:
          anyOf:
            - items:
                type: string
                format: binary
              type: array
            - type: 'null'
          title: Input Files
      type: object
      title: Body_run_workflow
    JobInformation:
      properties:
        id:
          type: string
          format: uuid
          title: Id
        workflow_id:
          type: string
          format: uuid
          title: Workflow Id
        workflow_name:
          type: string
          title: Workflow Name
        status:
          $ref: '#/components/schemas/JobStatus'
        created_at:
          type: string
          format: date-time
          title: Created At
        runtime:
          anyOf:
            - type: string
              format: duration
            - type: 'null'
          title: Runtime
        input_file_ids:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Input File Ids
        output_node_files:
          anyOf:
            - items:
                $ref: '#/components/schemas/NodeFileMetadata'
              type: array
            - type: 'null'
          title: Output Node Files
        job_type:
          $ref: '#/components/schemas/WorkflowJobType'
          default: ephemeral
      type: object
      required:
        - id
        - workflow_id
        - workflow_name
        - status
        - created_at
      title: JobInformation
    HTTPValidationError:
      properties:
        detail:
          items:
            $ref: '#/components/schemas/ValidationError'
          type: array
          title: Detail
      type: object
      title: HTTPValidationError
    JobStatus:
      type: string
      enum:
        - SCHEDULED
        - IN_PROGRESS
        - COMPLETED
        - STOPPED
        - FAILED
      title: JobStatus
    NodeFileMetadata:
      properties:
        node_id:
          type: string
          format: uuid
          title: Node Id
        file_id:
          type: string
          title: File Id
        node_type:
          type: string
          title: Node Type
        node_subtype:
          type: string
          title: Node Subtype
      type: object
      required:
        - node_id
        - file_id
        - node_type
        - node_subtype
      title: NodeFileMetadata
    WorkflowJobType:
      type: string
      enum:
        - ephemeral
        - persistent
        - scheduled
        - template
      title: WorkflowJobType
    ValidationError:
      properties:
        loc:
          items:
            anyOf:
              - type: string
              - type: integer
          type: array
          title: Location
        msg:
          type: string
          title: Message
        type:
          type: string
          title: Error Type
      type: object
      required:
        - loc
        - msg
        - type
      title: ValidationError

````

---

# Source: https://docs.unstructured.io/examplecode/tools/s3-events.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Amazon S3 event triggers

You can use Amazon S3 events, such as adding new files to—or updating existing files within—S3 buckets, to automatically run Unstructured ETL+ workflows
that rely on those buckets as sources. This enables a no-touch approach to having Unstructured automatically process new and updated files in S3 buckets as they are added or updated.

This example shows how to automate this process by adding an [AWS Lambda function](https://docs.aws.amazon.com/lambda/latest/dg/concepts-basics.html#gettingstarted-concepts-function) to your AWS account. This function runs
whenever a new file is added to—or an existing file is updated within—the specified S3 bucket. This function then calls the [Unstructured API's workflow operations](/api-reference/workflow/overview) to automatically run the
specified corresponding Unstructured ETL+ workflow in your Unstructured account.

<Note>
  This example uses a custom AWS Lambda function that you create and maintain.
  Any issues with file detection, timing, or function invocation could be related to your custom function,
  rather than with Unstructured. If you are getting unexpected or no results, be sure to check your custom
  function's Amazon CloudWatch logs first for any informational and error messages.
</Note>

## Requirements

To use this example, you will need the following:

* An Unstructured account, and an Unstructured API key for your account, as follows:

  1. If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
     After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).

     <Note>
       To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/api-reference/overview#pricing).
     </Note>

  2. If you have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at [https://platform.unstructured.io](https://platform.unstructured.io).

     <Note>
       For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
       or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).
     </Note>

  3. Get your Unstructured API key:<br />

     a. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business** account, click **API Keys** on the sidebar.<br />

     <Note>
       For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
       for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).
     </Note>

     b. Click **Generate API Key**.<br />
     c. Follow the on-screen instructions to finish generating the key.<br />
     d. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

* The Unstructured API's workflow operations URL for your account, as follows:

  1. In the Unstructured UI, click **API Keys** on the sidebar.<br />
  2. Note the value of the **Unstructured API's workflow operations** field.

* An S3 source connector in your Unstructured account. [Learn how](/ui/sources/s3).

* Some available [destination connector](/ui/destinations/overview) in your Unstructured account.

* A workflow that uses the preceding source and destination connectors. [Learn how](/ui/workflows).

## Step 1: Create the Lambda function

1. Sign in to the AWS Management Console for your account.
2. Browse to and open the **Lambda** console.
3. On the sidebar, click **Functions**.
4. Click **Create function**.
5. Select **Author from scratch**.
6. For **Function name**, enter a name for your function, such as `RunUnstructuredWorkflow`.
7. For **Runtime**, select **Node.js 22.x**.
8. For **Architecture**, select **x86\_64**.
9. Under **Permissions**, expand **Change default execution role**, and make sure **Create a new role with basic Lambda permissions** is selected.
10. Click **Create function**. After the function is created, the function's code and configuration settings page appears.

## Step 2: Add code to the function

1. With the function's code and configuration settings page open from the previous step, click the **Code** tab.

2. In the **Code source** tile, replace the contents of the `index.mjs` file with the following code instead.

   If the `index.mjs` file is not visible, do the following:

   1. Show the **Explorer**: on the sidebar, click **Explorer**.
   2. In the **Explorer** pane, expand the function name.
   3. Click to open the **index.mjs** file.

   Here is the code for the `index.mjs` file:

   ```javascript  theme={null}
   import https from 'https';

   export const handler = async (event) => {
     const apiUrl = process.env.UNSTRUCTURED_API_URL;
     const apiKey = process.env.UNSTRUCTURED_API_KEY;

     if (!apiUrl || !apiKey) {
       throw new Error('Missing UNSTRUCTURED_API_URL or UNSTRUCTURED_API_KEY environment variable or both.');
     }

     const url = new URL(apiUrl);

     const options = {
       hostname: url.hostname,
       path: url.pathname,
       method: 'POST',
       headers: {
         'accept': 'application/json',
         'unstructured-api-key': apiKey
       }
     };

     const postRequest = () => new Promise((resolve, reject) => {
       const req = https.request(options, (res) => {
         let responseBody = '';
         res.on('data', (chunk) => { responseBody += chunk; });
         res.on('end', () => {
           resolve({ statusCode: res.statusCode, body: responseBody });
         });
       });
       req.on('error', reject);
       req.end();
     });

     try {
       const response = await postRequest();
       console.log(`POST status: ${response.statusCode}, body: ${response.body}`);
     } catch (error) {
       console.error('Error posting to endpoint:', error);
     }

     return {
       statusCode: 200,
       body: JSON.stringify('Lambda executed successfully')
     };
   };
   ```

3. In **Explorer**, expand **Deploy (Undeployed Changes)**.

4. Click **Deploy**.

5. Click the **Configuration** tab.

6. On the sidebar, click **Environment variables**.

7. Click **Edit**.

8. Click **Add environment variable**.

9. For **Key**, enter `UNSTRUCTURED_API_URL`.

10. For **Value**, enter `<unstructured-api-url>/workflows/<workflow-id>/run`. Replace the following placeholders:

    * Replace `<unstructured-api-url>` with your Unstructured API's workflow operations value.
    * Replace `<workflow-id>` with the ID of your Unstructured workflow.

    The **Value** should now look similar to the following:

    ```text  theme={null}
    https://platform.unstructuredapp.io/api/v1/workflows/11111111-1111-1111-1111-111111111111/run
    ```

11. Click **Add environment variable** again.

12. For **Key**, enter `UNSTRUCTURED_API_KEY`.

13. For **Value**, enter your Unstructure API key value.

14. Click **Save**.

## Step 3: Create the function trigger

1. Browse to and open the S3 console.

2. Browse to and open the S3 bucket that corresponds to your S3 source connector. The bucket's settings page appears.

3. Click the **Properties** tab.

4. In the **Event notifications** tile, click **Create event notification**.

5. In the **General configuration** tile, enter a name for your event notification, such as `UnstructuredWorkflowNotification`.

6. (Optional) For **Prefix**, enter any prefix to limit the Lambda function's scope to only the specified prefix. For example, to limit the scope to only
   the `input/` folder within the S3 bucket, enter `input/`.

   <Warning>
     AWS does not recommend reading from and writing to the same S3 bucket because of the possibility of accidentally running Lambda functions in loops.
     However, if you must read from and write to the same S3 bucket, AWS strongly recommends specifying a **Prefix** value. [Learn more](https://docs.aws.amazon.com/lambda/latest/dg/with-s3.html).
   </Warning>

7. (Optional) For **Suffix**, enter any file extensions to limit the Lambda function's scope to only the specified file extensions. For example, to limit the scope to only
   files with the `.pdf` extension, enter `.pdf`.

8. In the **Event types** tile, check the box titled **All object create events (s3:ObjectCreated:\*)**.

9. In the **Destination** tile, select **Lambda function** and **Choose from your Lambda functions**.

10. In the **Lambda function** tile, select the Lambda function that you created earlier in Step 1.

11. Click **Save changes**.

## Step 4: Trigger the function

1. With the S3 bucket's settings page open from the previous step, click the **Objects** tab.
2. If you specified a **Prefix** value earlier in Step 3, then click to open the folder that corresponds to your **Prefix** value.
3. Click **Upload**, and then follow the on-screen instructions to upload a file to the bucket's root. If, however, you clicked to open the folder that corresponds to your **Prefix** value instead, then follow the on-screen instructions to upload a file to that folder instead.

## Step 5: View the trigger results

1. In the Unstructured user interface for your account, click **Jobs** on the sidebar.
2. In the list of jobs, click the newly running job for your workflow.
3. After the job status shows **Finished**, go to your destination location to see the results.

## Step 6 (Optional): Delete the trigger

1. To stop the function from automatically being triggered whenever you add new files to—or update existing files within—the S3 bucket, browse to and open the S3 console.
2. Browse to and open the bucket that corresponds to your S3 source connector. The bucket's settings page appears.
3. Click the **Properties** tab.
4. In the **Event notifications** tile, check the box next to the name of the event notification that you added earlier in Step 3.
5. Click **Delete**, and then click **Confirm**.

---

# Source: https://docs.unstructured.io/examplecode/tools/s3-vectors.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Amazon S3 Vectors

[Amazon S3 Vectors](https://aws.amazon.com/s3/features/vectors/) is a durable vector storage solution that can greatly reduce the total cost of uploading, storing,
and querying vectors. S3 Vectors is a cloud object store with native support to store large vector datasets and
provide subsecond query performance. This makes it more affordable for businesses to store AI-ready data at massive scale.

This hands-on example walkthrough demonstrates how to use Amazon S3 Vectors with Unstructured. In this walkthrough, you will:

1. Create an S3 vector bucket.
2. Create a vector index in the bucket.
3. Add the contents of one or more source JSON output files that have been generated by Unstructured to the vector index.
4. Query the vector index against the contents of the source JSON output files that were added.

## Requirements

To use this example, you will need:

* A set of one or more JSON output files that have been generated by Unstructured and stored somewhere on your local development machine. For maximum compatibility with this example, these files must contain vector embeddings that were
  generated by Amazon Bedrock, by using the **Titan Text Embeddings V2** (`amazon.titan-embed-text-v2:0`) embedding model, with 1024 dimensions. To get these files, you will need:

  * An Unstructured account, as follows:

    1. If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
       After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).

       <Note>
         To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/ui/overview#how-am-i-billed%3F).
       </Note>

    2. If you have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at [https://platform.unstructured.io](https://platform.unstructured.io).

       <Note>
         For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
         or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).
       </Note>

  * A workflow that generates vector embeddings and adds them to the JSON output files. Learn how to [create a custom workflow](/ui/workflows#create-a-custom-workflow) and [add an Embedder node](/ui/embedding#generate-embeddings) to that workflow.

    <Warning>
      The [destination connector](/ui/destinations/overview) for your worklow must generate JSON output files. These include destination connectors for file storage connectors such as
      Databricks Volumes, Google Cloud Storage, OneDrive, and S3. Destination connectors for databases such as Elasticsearch, Kafka, and MongoDB, and vector stores such as Astra DB, Pinecone, and Weaviate, do not generate JSON output files.
    </Warning>

  * After your workflow generates the JSON output files, you must copy them from your workflow's destination location over to some location on your local development machine for access.

* Python installed on your local development machine.

* An AWS account. [Create an AWS account](https://aws.amazon.com/free).

  <iframe width="560" height="315" src="https://www.youtube.com/embed/lIdh92JmWtg" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

## Step 1: Create the S3 vector bucket

1. Sign in to the [AWS Management Console](https://console.aws.amazon.com/).
2. Open the [Amazon S3 Console](https://console.aws.amazon.com/s3/home).
3. On the sidebar, click **Vector bucket**.
4. Click **Create vector bucket**.
5. For **Vector bucket name**, enter some name for your bucket.
6. For **Encryption**, select an encryption method, or leave the default.
7. Click **Create vector bucket**.

## Step 2: Add the vector index to the bucket

1. With the list of vector buckets showing from the previous step, click the name of the bucket that you just created.
2. Click **Create vector index**.
3. For **Vector index name**, enter some name for your index.
4. For **Dimension** enter the number of dimensions that Unstructured generated for your vector embeddings. For example,
   for the **Titan Text Embeddings V2** (`amazon.titan-embed-text-v2:0`) embedding model, enter `1024`. If you are not sure how many dimensions to enter,
   see your workflow's **Embedder** node settings.
5. Select the appropriate **Distance metric** for your embedding model. For example, for the
   **Titan Text Embeddings V2** (`amazon.titan-embed-text-v2:0`) embedding model, select **Cosine**. If you are not sure which distance metric to use,
   see your embedding model's documentation.
6. Expand **Additional settings**.
7. Within **Metadata configuration**, under **Non-filterable metadata**, click **Add key**.
8. For **Key**, enter `text`. This allows you to query the vector index by the `text` field within each object that will be coming over into the index from the JSON output files.
9. Click **Create vector index**.
10. After the vector index is created, copy the value of the index's **Amazon Resource Name (ARN)**, as you will need it in
    later steps. This ARN takes the format `arn:aws:s3vectors:<region-id>:<account>:bucket/<bucket-name>/index/<index-name>`.

## Step 3: Add the source JSON output files' contents to the vector index

1. In your local Python virtual environment, install the `boto3` and `uuid` libraries.

2. [Set up Boto3 credentials](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html) for your AWS account.
   The following steps assume you have set up your Boto3 credentials from outside of the following code, such as setting
   [environment variables](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html#environment-variables) or
   configuring a [shared credentials file](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html#shared-credentials-file),

   One approach to getting and setting up Boto3 credentials is to [create an AWS access key and secret access key](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html#Using_CreateAccessKey)
   and then use the [AWS Command Line Interface (AWS CLI)](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html)
   to [set up your credentials](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html#cli-configure-files-methods) on your local development machine.

   <iframe width="560" height="315" src="https://www.youtube.com/embed/MoFTaGJE65Q" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

3. Add the following code to a Python script file in your virtual environment, replacing the following placeholders:

   * Replace `<source-json-file-path>` with the path to the directory that contains your JSON output files.
   * Replace `<index-arn>` with the ARN of the vector index that you created previously in Step 2.
   * Replace `<index-region-short-id>` with the short ID of the region where your vector index is located, for example `us-east-1`.

   ```python  theme={null}
   import boto3
   import os
   import json
   import uuid

   source_json_file_path = '<source-json-file-path>'
   index_arn             = '<index-arn>'
   index_region_short_id = '<index-region-short-id>'
   s3vectors             = boto3.client('s3vectors', region_name=index_region_short_id)
   num_vectors           = 0
   verbose               = True # Set to False to only print final results.

   # For each JSON file in the source directory...
   for filename in os.listdir(source_json_file_path):
       if filename.endswith('.json'):
           # ...read the JSON file, and then...
           with open(source_json_file_path + filename, 'r') as f:
               file = json.load(f)
               # ...for each object in the Unstructured-formatted JSON array...
               for object in file:
                   # ...add the object's text and vectors to the S3 vector index.
                   # Use the following format:

                   # {
                   #     "key": "<random-uuid>",
                   #     "data": {
                   #       "float32": <embedding-array>
                   #     }, 
                   #     "metadata": {
                   #         "text": "<text-field>"
                   #     }
                   # }
                   json_object = {}
                   json_object['key'] = str(uuid.uuid4())
                   json_object['data'] = {}
                   json_object['metadata'] = {}

                   # If the object has no text, do not add it to the 
                   # vector index, and move on to the next object.
                   if 'text' in object:
                       json_object['metadata']['text'] = object['text']
                   else:
                       if verbose:
                           print(f"Skipping object with source element ID {object['id']} as it has no text")
                       continue

                   # If the object has no embeddings, do not add it to the 
                   # vector index either, and move on to the next object.
                   if 'embeddings' in object:
                       json_object['data']['float32'] = object['embeddings']
                   else:
                       if verbose:
                           print(f"Skipping object with source element ID {object['element_id']} as it has no embeddings")
                       continue

                   # Add the object to the vector index.
                   s3vectors.put_vectors(
                       indexArn=index_arn,
                       vectors=[json_object]
                   )

                   if verbose:
                       print(f"Added a vector entry and assigned it the internal ID {json_object['key']}. " + 
                             f"First 20 characters of the text: {object['text'][:20]}")
                   
                   num_vectors += 1

   print(f"Added {num_vectors} vector entries to the vector index.")
   ```

4. Run the script to add the JSON output files' contents to the vector index. Each object in each JSON output file
   is added as a vector entry in the vector index.

## Step 4: Query the vector index

1. In your local Python virtual environment, install the `numpy` library.

2. Add the following code to another Python script file in your virtual environment, replacing the following placeholders:

   * Replace `<index-arn>` with the ARN of the vector index that you created previously in Step 2.
   * Replace `<index-region-short-id>` with the short ID of the region where your vector index is located, for example `us-east-1`.
   * Replace `<sentence-to-embed>` with the search text that you want to embed for the query.

   ```python  theme={null}
   import boto3
   import json
   import numpy as np

   index_arn             = '<index-arn>'
   index_region_short_id = '<index-region-short-id>'
   client                = boto3.client('s3vectors', region_name=index_region_short_id)

   # The sentence to embed.
   sentence = '<sentence-to-embed>'

   # Generate embeddings for the sentence to embed.
   model_id    = 'amazon.titan-embed-text-v2:0'
   bedrock     = boto3.client('bedrock-runtime', region_name=index_region_short_id)
   body        = {'inputText': f"{sentence}"}
   json_string = json.dumps(body)
   json_bytes  = json_string.encode('utf-8')

   response = bedrock.invoke_model(
       modelId=model_id,
       body=json_bytes,
       contentType='application/json',
       accept='application/json'
   )

   # Get the embeddings for the sentence and prepare them for the query.
   response_body = json.loads(response['body'].read().decode())
   embedding     = response_body['embedding']
   embedding     = np.array(embedding, dtype=np.float32).tolist()

   # Run the query.
   response = client.query_vectors(
       indexArn=index_arn,
       topK=5,
       queryVector={'float32': embedding}
   )

   print(f"Original search query: {sentence}")
   print(f"\nTop 5 results by similarity search...")
   print('-----')

   for vector in response['vectors']:
       response = client.get_vectors(
           indexArn=index_arn,
           keys=[vector['key']],
           returnData=True,
           returnMetadata=True
       )

       for vector in response['vectors']:
           print(vector['metadata']['text'])
           print('-----')
   ```

3. Run the script to query the vector index and see the query results.

## Appendix: Additional operations

Use the following code examples to perform additional vector index and vector bucket operations.

### List all entries in a vector index

Replace the following placeholders:

* Replace `<index-arn>` with the ARN of the vector index that you created earlier in Step 2.
* Replace `<index-region-short-id>` with the short ID of the region where your vector index is located, for example `us-east-1`.

```python  theme={null}
import boto3

index_arn             = '<index-arn>'
index_region_short_id = '<index-region-short-id>'
client                = boto3.client('s3vectors', region_name=index_region_short_id)
num_vectors           = 0
next_token            = None
verbose               = True # Set to False to only print final results.

# List all of the vectors in the S3 vector index.
# Vectors are fetched by "page", so loop through the index in pages.
while True:
    kwargs = {
        'indexArn': index_arn,
        'returnData': True,
        'returnMetadata': True
    }

    if next_token:
        kwargs['nextToken'] = next_token

    response = client.list_vectors(**kwargs)

    for vector in response['vectors']:
        if verbose:
            print(f"Found vector entry with internal ID {vector['key']}. " +
                  f"First 20 characters of the text: {vector['metadata']['text'][:20]}")
        num_vectors += 1
    
    if 'nextToken' in response:
        next_token = response['nextToken']
    else:
        break

print(f"Total number of vector entries found: {num_vectors}")
```

### Delete all entries from a vector index

<Warning>
  This operation will permanently delete all vector entries in the vector index. This operation cannot be undone.
</Warning>

Replace the following placeholders:

* Replace `<index-arn>` with the ARN of the vector index that you created earlier in Step 2.
* Replace `<index-region-short-id>` with the short ID of the region where your vector index is located, for example `us-east-1`.

```python  theme={null}
import boto3

index_arn             = '<index-arn>'
index_region_short_id = '<index-region-short-id>'
client                = boto3.client('s3vectors', region_name=index_region_short_id)
num_vectors           = 0
next_token            = None
verbose               = True # Set to False to only print final results.

# Delete all of the vectors in the S3 vector index.
# Vectors are deleted by "page", so loop through the index in pages.
while True:
    kwargs = {
        'indexArn': index_arn,
        'returnData': True,
        'returnMetadata': True
    }

    if next_token:
        kwargs['nextToken'] = next_token

    response = client.list_vectors(**kwargs)

    for vector in response['vectors']:
        # Delete each vector by its key.
        for vector in response['vectors']:
            if verbose:
                print(f"Deleting vector entry with internal ID {vector['key']}. " + 
                    f"First 20 characters of the text: {vector['metadata']['text'][:20]}")
                
            client.delete_vectors(
                indexArn=index_arn,
                keys=[vector['key']]
            )

            num_vectors += 1

    if 'nextToken' in response:
        next_token = response['nextToken']
    else:
        break

print(f"Deleted {num_vectors} vector entries from the vector index.")
```

### Delete a vector index

<Warning>
  This operation will permanently delete a vector index. This operation cannot be undone.
</Warning>

Replace the following placeholders:

* Replace `<index-arn>` with the ARN of the vector index that you created earlier in Step 2.
* Replace `<index-region-short-id>` with the short ID of the region where your vector index is located, for example `us-east-1`.

```python  theme={null}
import boto3

index_arn             = '<index-arn>'
index_region_short_id = '<index-region-short-id>'
client                = boto3.client('s3vectors', region_name=index_region_short_id)

client.delete_index(
    indexArn=index_arn
)
```

### Delete a vector bucket

<Warning>
  This operation will permanently delete a vector bucket. This operation cannot be undone.
</Warning>

Replace the following placeholders:

* Replace `<bucket-arn>` with the ARN of the vector bucket that you created earlier in Step 1. To get the ARN, do the following:

  1. In the Amazon S3 console, on the sidebar, click **Vector buckets**.
  2. Next to the name of the vector bucket that you want to delete, click the copy button next to the bucket's **Amazon Resource Name (ARN)**.

* Replace `<index-region-short-id>` with the short ID of the region where your vector bucket is located, for example `us-east-1`.

```python  theme={null}
import boto3

bucket_arn            = '<bucket-arn>'
index_region_short_id = '<index-region-short-id>'
client                = boto3.client('s3vectors', region_name=index_region_short_id)

client.delete_vector_bucket(
    vectorBucketArn=bucket_arn
)
```

---

# Source: https://docs.unstructured.io/ui/sources/s3.md

# Source: https://docs.unstructured.io/ui/destinations/s3.md

# Source: https://docs.unstructured.io/open-source/ingestion/source-connectors/s3.md

# Source: https://docs.unstructured.io/open-source/ingestion/destination-connectors/s3.md

# Source: https://docs.unstructured.io/api-reference/workflow/sources/s3.md

# Source: https://docs.unstructured.io/api-reference/workflow/destinations/s3.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# S3

<Note>
  If you're new to Unstructured, read this note first.

  Before you can create a destination connector, you must first sign in to your Unstructured account:

  * If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
    After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).
    To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/api-reference/overview#pricing).
  * If you already have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at
    [https://platform.unstructured.io](https://platform.unstructured.io). For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
    or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  After you sign in, the [Unstructured user interface](/ui/overview) (UI) appears, which you use to get your Unstructured API key.

  1. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business** account, click **API Keys** on the sidebar.<br />

     For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
     for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).

  2. Click **Generate API Key**.<br />

  3. Follow the on-screen instructions to finish generating the key.<br />

  4. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

  After you create the destination connector, add it along with a
  [source connector](/api-reference/workflow/sources/overview) to a [workflow](/api-reference/workflow/overview#workflows).
  Then run the worklow as a [job](/api-reference/workflow/overview#jobs). To learn how, try out the
  the notebook [Dropbox-To-Pinecone Connector API Quickstart for Unstructured](https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Dropbox_To_Pinecone_Connector_Quickstart.ipynb),
  or watch the two 4-minute video tutorials for the [Unstructured Python SDK](/api-reference/workflow/overview#unstructured-python-sdk).

  You can also create destination connectors with the Unstructured user interface (UI).
  [Learn how](/ui/destinations/overview).

  If you need help, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  You are now ready to start creating a destination connector! Keep reading to learn how.
</Note>

Send processed data from Unstructured to Amazon S3.

The requirements are as follows.

The following video shows how to fulfill the minimum set of Amazon S3 requirements:

<iframe width="560" height="315" src="https://www.youtube.com/embed/hyDHfhVVAhs" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

<Warning>
  If you are experiencing S3 connector or workflow failures after adding a new S3 bucket or updating an existing S3 bucket,
  it could be due to S3 latency issues. You might need to wait up to a few hours before any related S3 connectors
  and workflows begin working without failures.

  Various Amazon S3 operations such as propagating DNS records for new buckets, updating bucket access policies and
  permissions, reusing bucket names after deletion, and using AWS Regions that are not geographically closer
  to your users or applications, can take a few minutes to hours to fully propagate across the Amazon network.
</Warning>

The preceding video does not show how to create an AWS account; enable anonymous access to the bucket (which is supported but
not recommended); or generate AWS STS temporary access credentials if required by your organization's security
requirements. For more information about requirements, see the following:

* An AWS account. [Create an AWS account](https://aws.amazon.com/free).

  <iframe width="560" height="315" src="https://www.youtube.com/embed/lIdh92JmWtg" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* An S3 bucket. You can create an S3 bucket by using the S3 console, following the steps [in the S3 documentation](https://docs.aws.amazon.com/AmazonS3/latest/userguide/creating-bucket.html) or in the following video.
  Additional approaches that use AWS CloudFormation or the AWS CLI are in the how-to sections later on this page.

  <iframe width="560" height="315" src="https://www.youtube.com/embed/e6w9LwZJFIA" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* Anonymous access to the bucket is supported but not recommended. (Use authenticated bucket read or write access or both instead.) To enable anonymous access, follow the steps
  [in the S3 documentation](https://docs.aws.amazon.com/AmazonS3/latest/userguide/example-bucket-policies.html#example-bucket-policies-anonymous-user) or in the following animation.

  <img src="https://mintcdn.com/unstructured-53/vKFDfUfAWhz_siB3/img/connectors/s3-anon-access.gif?s=9d25aa55a82ce91a577a10d28b57043b" alt="Enable anonymous bucket access" data-og-width="1386" width="1386" data-og-height="830" height="830" data-path="img/connectors/s3-anon-access.gif" data-optimize="true" data-opv="3" />

* For authenticated bucket read or write access or both, which is recommended over anonymous access, you should first
  [block all public access to the bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/configuring-block-public-access-bucket.html).

  After blocking all public access to the bucket, for read access, the authenticated AWS IAM user must have at minimum the permissions of `s3:ListBucket` and `s3:GetObject` for that bucket.
  For write access, the authenticated AWS IAM user must have at minimum the permission of `s3:PutObject` for that bucket. Permissions
  can be granted in one of the following ways:

  * Attach the appropriate bucket policy to the bucket. See the policy examples later on this page, and [learn about bucket policies for S3](https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-policy-language-overview.html). These permissions remain in effect until the bucket policy is removed from the bucket.
    To apply a bucket policy by using the S3 console, follow the steps [in the S3 documentation](https://docs.aws.amazon.com/AmazonS3/latest/userguide/add-bucket-policy.html) or in the following video.
    Additional approaches that use AWS CloudFormation or the AWS CLI are in the how-to sections later on this page.

    <iframe width="560" height="315" src="https://www.youtube.com/embed/y4SfQoJpipo" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

  * Have the IAM user temporarily assume an IAM role that contains the appropriate user policy. See the policy examples later on this page, and [learn about user policies for S3](https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-policy-language-overview.html). These permission remain in effect until the assumed role's time period expires.
    Learn how to use the IAM console to [create a policy](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_create-console.html),
    [create a role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user.html) that references this policy, and then
    [have the user temporarily assume the role by using the AWS CLI or an AWS SDK](https://docs.aws.amazon.com/code-library/latest/ug/sts_example_sts_AssumeRole_section.html), which produces
    a temporary AWS access key (`AccessKeyId`), AWS secret access key (`SecretAccessKey`), and AWS STS session token (`SessionToken`).

    <Warning>
      AWS STS credentials (consisting of an AWS access key, AWS secret access key, and AWS STS session token) can be valid for as little as 15 minutes or as long as 36 hours, depending on how the credentials were initially
      generated. After the expiry time, the credentials are no longer valid and will no longer work with the corresponding S3 connector.
      You must get a new set of credentials to replace the expired ones by [having the user temporarily assume the role again by using the AWS CLI or an AWS SDK](https://docs.aws.amazon.com/code-library/latest/ug/sts_example_sts_AssumeRole_section.html), which produces
      a new, refreshed temporary AWS access key, AWS secret access key, and AWS STS session token.

      To overwrite the expired credentials with the new set:

      * For the Unstructured user interface (UI), manually update the AWS Key, AWS Secret Key, and STS Token fields in the Unstructured UI
        for the corresponding S3 [source](/ui/sources/s3) or [destination](/ui/destinations/s3) connector.
      * For the Unstructured API, use the Unstructured API's workflow operations to call the
        [update source](/api-reference/workflow/overview#update-a-source-connector) or
        [update destination](/api-reference/workflow/overview#update-a-destination-connector) connector operation
        for the corresponding S3 [source](/api-reference/workflow/sources/s3) or
        [destination](/api-reference/workflow/destinations/s3) connector.
      * For Unstructured Ingest, change the values of `--key`, `--secret`, and `--token` (CLI) or `key`, `secret`, and `token` (Python) in your command or code for the
        corresponding S3 [source](/open-source/ingestion/source-connectors/s3) or [destination](/open-source/ingestion/destination-connectors/s3) connector.
    </Warning>

* If you used a bucket policy intead of having the IAM user temporarily assume an IAM role for authenticated bucket access, you must provide a long-term AWS access key and secret access key for the authenticated AWS IAM user in the account.
  Create an AWS access key and secret access key by following the steps [in the IAM documentation](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html#Using_CreateAccessKey) or in the following video.

  <iframe width="560" height="315" src="https://www.youtube.com/embed/MoFTaGJE65Q" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* If the target files are in the root of the bucket, you will need the path to the bucket, formatted as `protocol://bucket/` (for example, `s3://my-bucket/`).
  If the target files are in a folder, the path to the target folder in the S3 bucket, formatted as `protocol://bucket/path/to/folder/` (for example, `s3://my-bucket/my-folder/`).

* If the target files are in a folder, and authenticated bucket access is enabled, make sure the authenticated AWS IAM user has
  authenticated access to the folder as well. [See examples of authenticated folder access](https://docs.aws.amazon.com/AmazonS3/latest/userguide/example-bucket-policies.html#example-bucket-policies-folders).

## Add an access policy to an existing bucket

To use the Amazon S3 console to add an access policy that allows all authenticated AWS IAM users in the
corresponding AWS account to read and write to an existing S3 bucket, do the following.

<Info>Your organization might have stricter bucket policy requirements. Check with your AWS account
administrator if you are unsure.</Info>

1. Sign in to the [AWS Management Console](https://console.aws.amazon.com/).

2. Open the [Amazon S3 Console](https://console.aws.amazon.com/s3/home).

3. Browse to the existing bucket and open it.

4. Click the **Permissions** tab.

5. In the **Bucket policy** area, click **Edit**.

6. In the **Policy** text area, copy the following JSON-formatted policy.
   To change the following policy to restrict it to a specific user in the AWS account, change `root` to that
   specific username.

   In this policy, replace the following:

   * Replace `<my-account-id>` with your AWS account ID.
   * Replace `<my-bucket-name>` in two places with the name of your bucket.

   ```json  theme={null}
   {
       "Version": "2012-10-17",
       "Statement": [
           {
               "Sid": "AllowAuthenticatedUsersInAccountReadWrite",
               "Effect": "Allow",
               "Principal": {
                   "AWS": "arn:aws:iam::<my-account-id>:root"
               },
               "Action": [
                   "s3:GetObject",
                   "s3:PutObject",
                   "s3:ListBucket",
                   "s3:DeleteObject"
               ],
               "Resource": [
                   "arn:aws:s3:::<my-bucket-name>",
                   "arn:aws:s3:::<my-bucket-name>/*"
               ],
               "Condition": {
                   "StringEquals": {
                       "aws:PrincipalType": "IAMUser"
                   }
               }
           }
       ]
   }
   ```

7. Click **Save changes**.

## Create a bucket with AWS CloudFormation

To use the AWS CloudFormation console to create an Amazon S3 bucket that allows all authenticated AWS IAM users
in the corresponding AWS account to read and write to the bucket, do the following.

<Info>Your organization might have stricter bucket policy requirements. Check with your AWS account
administrator if you are unsure.</Info>

1. Save the following YAML to a file on your local machine, for example `create-s3-bucket.yaml`. To change
   the following bucket policy to restrict it to a specific user in the AWS account, change `root` to that
   specific username.

   ```yaml  theme={null}
   AWSTemplateFormatVersion: '2010-09-09'
   Description: 'CloudFormation template to create an S3 bucket with specific permissions for account users.'

   Parameters:
     BucketName:
       Type: String
       Description: 'Name of the S3 bucket to create'

   Resources:
     MyS3Bucket:
       Type: 'AWS::S3::Bucket'
       Properties:
         BucketName: !Ref BucketName
         PublicAccessBlockConfiguration:
           BlockPublicAcls: true
           BlockPublicPolicy: false
           IgnorePublicAcls: true
           RestrictPublicBuckets: true

     BucketPolicy:
       Type: 'AWS::S3::BucketPolicy'
       Properties:
         Bucket: !Ref MyS3Bucket
         PolicyDocument:
           Version: '2012-10-17'
           Statement:
             - Sid: AllowAllAuthenticatedUsersInAccount
               Effect: Allow
               Principal:
                 AWS: !Sub 'arn:aws:iam::${AWS::AccountId}:root'
               Action:
                 - 's3:GetObject'
                 - 's3:PutObject'
                 - 's3:ListBucket'
                 - 's3:DeleteObject'
               Resource:
                 - !Sub 'arn:aws:s3:::${BucketName}'
                 - !Sub 'arn:aws:s3:::${BucketName}/*'

   Outputs:
     BucketName:
       Description: 'Name of the created S3 bucket'
       Value: !Ref MyS3Bucket
   ```

2. Sign in to the [AWS Management Console](https://console.aws.amazon.com/).

3. Open the [AWS CloudFormation Console](https://console.aws.amazon.com/cloudformation/home).

4. Click **Create stack > With new resources (standard)**.

5. On the **Create stack** page, with **Choose an existing template** already selected, select **Upload a template file**.

6. Click **Choose file**, and browse to and select the YAML file from your local machine.

7. Click **Next**.

8. Enter a unique **Stack name** and **BucketName**.

9. Click **Next** two times.

10. Click **Submit**.

11. Wait until the **Status** changes to **CREATE\_COMPLETE**.

12. After the bucket is created, you can delete the YAML file, if you want.

## Create a bucket with the AWS CLI

To use the AWS CLI to create an Amazon S3 bucket that allows all authenticated AWS IAM users in the
corresponding AWS account to read and write to the bucket, do the following.

<Info>Your organization might have stricter bucket policy requirements. Check with your AWS account
administrator if you are unsure.</Info>

1. [Install the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html).

2. [Set up the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-quickstart.html).

3. Copy the following script to a file on your local machine, for example a file named `create-s3-bucket.sh`.
   To change the following bucket policy to restrict it to a specific user in the AWS account, change `root` to that
   specific username.

   In this script, replace the following:

   * Replace `<my-account-id>` with your AWS account ID.
   * Replace `<my-unique-bucket-name>` with the name of your bucket.
   * Replace `<us-east-1>` with your AWS Region.

   ```bash  theme={null}
   #!/bin/bash

   # Set variables for the AWS account ID, Amazon S3 bucket name, and AWS Region.
   ACCOUNT_ID="<my-account-id>"
   BUCKET_NAME="<my-unique-bucket-name>"
   REGION="<us-east-1>"

   # Temporary filename for the bucket policy.
   # Do not change this variable.
   POLICY_FILE="bucket_policy.json"

   # Create the bucket.
   aws s3api create-bucket --bucket $BUCKET_NAME --region $REGION

   # Wait for the bucket to exist.
   echo "Waiting for bucket '$BUCKET_NAME' to be fully created..."
   aws s3api wait bucket-exists --bucket $BUCKET_NAME

   # Check if the wait command was successful.
   if [ $? -eq 0 ]; then
       echo "The bucket '$BUCKET_NAME' has been fully created."
   else
       echo "Error: Timed out waiting for bucket '$BUCKET_NAME' to be created."
       exit 1
   fi

   # Remove the "block public policy" bucket access setting.
   aws s3api put-public-access-block \
       --bucket $BUCKET_NAME \
       --public-access-block-configuration \
       '{"BlockPublicPolicy": false, "IgnorePublicAcls": false, "BlockPublicAcls": false, "RestrictPublicBuckets": false}'

   # Check if the operation was successful.
   if [ $? -eq 0 ]; then
       echo "The block public policy access setting was removed from '$BUCKET_NAME'."
   else
       echo "Error: Failed to remove the block public policy access setting from '$BUCKET_NAME'."
       exit 1
   fi

   # Create the bucket policy.
   cat << EOF > $POLICY_FILE
   {
       "Version": "2012-10-17",
       "Statement": [
           {
               "Sid": "AllowAuthenticatedUsersInAccountReadWrite",
               "Effect": "Allow",
               "Principal": {
                   "AWS": "arn:aws:iam::$ACCOUNT_ID:root"
               },
               "Action": [
                   "s3:GetObject",
                   "s3:PutObject",
                   "s3:ListBucket",
                   "s3:DeleteObject"
               ],
               "Resource": [
                   "arn:aws:s3:::$BUCKET_NAME",
                   "arn:aws:s3:::$BUCKET_NAME/*"
               ],
               "Condition": {
                   "StringEquals": {
                       "aws:PrincipalType": "IAMUser"
                   }
               }
           }
       ]
   }
   EOF

   # Apply the bucket policy.
   aws s3api put-bucket-policy --bucket $BUCKET_NAME --policy file://$POLICY_FILE

   # Check if the policy application was successful.
   if [ $? -eq 0 ]; then
       echo "The bucket policy was applied to '$BUCKET_NAME'."
   else
       echo "Error: Failed to apply the bucket policy to '$BUCKET_NAME'."
       exit 1
   fi

   # Verify the applied policy.
   echo "Verifying the applied policy:"
   aws s3api get-bucket-policy --bucket $BUCKET_NAME --query Policy --output text

   # Remove the temporary bucket policy file.
   rm $POLICY_FILE
   ```

4. Run the script, for example:

   ```bash  theme={null}
   sh create-s3-bucket.sh
   ```

5. After the bucket is created, you can delete the script file, if you want.

## FIPS and ambient credentials

<Note>
  The following information applies to [Unstructured Business dedicated instance and in-VPC](/business/overview) deployments only.
</Note>

Unstructured Business accounts support the Federal Information Processing Standard (FIPS) for
Amazon S3. [Learn more about AWS support for FIPS](https://aws.amazon.com/compliance/fips/). Specifically,
when creating an S3 connector with the [Unstructured user interface (UI)](/ui/overview) or
[Unstructured API's workflow operations](/api-reference/workflow/overview), Unstructured Business dedicated instance and in-VPC deployments
support the use of `fips`-based
[S3 endpoints](https://docs.aws.amazon.com/general/latest/gr/s3.html) and, for authenticated access to
S3 FIPS buckets, S3 ambient credentials.

To use the Unstructured (UI) to set up an S3 source or destination connector
to use an S3 FIPS bucket and S3 ambient credentials, do the following:

1. Create an environment variable named `ALLOW_AMBIENT_CREDENTIALS_S3`, and set its value to `true`.
2. When creating the connector, for the S3 connector's **Bucket URI** field, specify the path to the S3 FIPS bucket, formatted as
   `https://<bucket-name>.<endpoint>`, for example
   `https://my-bucket-name.s3-fips.us-gov-east-1.amazonaws.com`.
   If the target files are in a folder, specify the path to the target folder in the S3 FIPS bucket instead,
   formatted as `https://<bucket-name>.<endpoint>/path/to/folder` (for example,
   `https://my-bucket-name.s3-fips.us-gov-east-1.amazonaws.com/my-folder/`).
3. For the **Authentication Method** field, select **Ambient Credentials**.
4. Check the box labelled **Use Ambient Credentials**.
5. Save and test the connector.

To use the Unstructured API's workflow operations to set up an S3 source or
destination connector to use an S3 FIPS bucket and S3 ambient credentials, do the following:

1. Create an environment variable named `ALLOW_AMBIENT_CREDENTIALS_S3`, and set its value to `true`.

2. When creating the connector,
   for the `config` parameter's `remote_url` field, specify the path to the S3 FIPS bucket, formatted as
   `https://<bucket-name>.<endpoint>`, for example
   `https://my-bucket-name.s3-fips.us-gov-east-1.amazonaws.com`.
   If the target files are in a folder, specify the path to the target folder in the S3 FIPS bucket instead,
   formatted as `https://<bucket-name>.<endpoint>/path/to/folder` (for example,
   `https://my-bucket-name.s3-fips.us-gov-east-1.amazonaws.com/my-folder/`).

3. For the `config` parameter, add an `ambient_credentials` field, and set its value to true. For example:

   <CodeGroup>
     ```python Python SDK theme={null}
     # ...
     config={
         "...": "...",
         "remote_url": "https://my-bucket-name.s3-fips.us-gov-east-1.amazonaws.com/my-folder/",
         "ambient_credentials": True
     }
     # ...
     ```

     ```bash curl theme={null}
     # ...
     {
         "...": "...", 
         "config": {
             "...": "...",
             "remote_url": "https://my-bucket-name.s3-fips.us-gov-east-1.amazonaws.com/my-folder/",
             "ambient_credentials": "true"
         }
     }
     # ...
     ```
   </CodeGroup>

4. Run your code to create the connector.

## Create the destination connector

To create an S3 destination connector, see the following examples.

<CodeGroup>
  ```python Python SDK theme={null}
  import os

  from unstructured_client import UnstructuredClient
  from unstructured_client.models.operations import CreateDestinationRequest
  from unstructured_client.models.shared import CreateDestinationConnector

  with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
      response = client.destinations.create_destination(
          request=CreateDestinationRequest(
              create_destination_connector=CreateDestinationConnector(
                  name="<name>",
                  type="s3",
                  config={
                      # For AWS Access Key ID with AWS Secret Access Key authentication:
                      "key": "<key>",
                      "secret": "<secret>",

                      # For AWS STS token authentication (use instead of or in addition to above):
                      # "token": "<token>",

                      "remote_url": "<remote_url>",
                      "endpoint_url": "<endpoint-url>"
                  }
              )
          )
      )

      print(response.destination_connector_information)
  ```

  ```bash curl theme={null}
  curl --request 'POST' --location \
  "$UNSTRUCTURED_API_URL/destinations" \
  --header 'accept: application/json' \
  --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
  --header 'content-type: application/json' \
  --data \
  '{
      "name": "<name>",
      "type": "s3",
      "config": {
          # For AWS access key ID with AWS secret access key authentication:
          "key": "<key>", 
          "secret": "<secret>", 

          # For AWS STS token authentication:
          "token": "<token>", 
          "key": "<key>", 
          "secret": "<secret>", 

          "remote_url": "<remote_url>",
          "endpoint_url": "<endpoint-url>",
      }
  }'
  ```
</CodeGroup>

Replace the preceding placeholders as follows:

* `<name>` (*required*) - A unique name for this connector.

* For AWS access key ID with AWS secret access key authentication, or for AWS STS token authentication:

  * `<key>` - The AWS access key ID for the authenticated AWS IAM user (*required*).
  * `<secret>` - The AWS secret access key corresponding to the preceding AWS access key ID (*required*).

* For AWS STS token authentication:

  * `<token>` - The AWS STS session token for temporary access (*required*).

* `<endpoint-url>` - A custom URL, if connecting to a non-AWS S3 bucket.

* `<remote-url>` (*required*) - The S3 URI to the bucket or folder, formatted as `s3://my-bucket/` (if the files are in the bucket’s root) or `s3://my-bucket/my-folder/`.

* For `recursive` (source connector only), set to `true` to access subfolders within the bucket. The default is `false` if not otherwise specified.

---

# Source: https://docs.unstructured.io/ui/sources/salesforce.md

# Source: https://docs.unstructured.io/open-source/ingestion/source-connectors/salesforce.md

# Source: https://docs.unstructured.io/api-reference/workflow/sources/salesforce.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Salesforce

<Note>
  If you're new to Unstructured, read this note first.

  Before you can create a source connector, you must first sign in to your Unstructured account:

  * If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
    After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).
    To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/api-reference/overview#pricing).
  * If you already have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at
    [https://platform.unstructured.io](https://platform.unstructured.io). For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
    or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  After you sign in, the [Unstructured user interface](/ui/overview) (UI) appears, which you use to get your Unstructured API key, as follows:

  1. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business** account, click **API Keys** on the sidebar.<br />

     For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
     for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).

  2. Click **Generate API Key**.<br />

  3. Follow the on-screen instructions to finish generating the key.<br />

  4. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

  After you create the source connector, add it along with a
  [destination connector](/api-reference/workflow/destinations/overview) to a [workflow](/api-reference/workflow/overview#workflows).
  Then run the worklow as a [job](/api-reference/workflow/overview#jobs). To learn how, try out the
  the notebook [Dropbox-To-Pinecone Connector API Quickstart for Unstructured](https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Dropbox_To_Pinecone_Connector_Quickstart.ipynb),
  or watch the two 4-minute video tutorials for the [Unstructured Python SDK](/api-reference/workflow/overview#unstructured-python-sdk).

  You can also create source connectors with the Unstructured user interface (UI).
  [Learn how](/ui/sources/overview).

  If you need help, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  You are now ready to start creating a source connector! Keep reading to learn how.
</Note>

Ingest your files into Unstructured from Salesforce.

The requirements are as follows.

<iframe width="560" height="315" src="https://www.youtube.com/embed/5a-nh4t78V8" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* A [Salesforce account](https://developer.salesforce.com/signup).

* Your Salesforce username. To get this username, do the following:

  1. Log in to your Salesforce account.
  2. In the top navigation bar, click the **Quick Settings** (gear) icon, and then click **Open Advanced Setup**.
  3. In the **Home** tab, under **Administration**, expand **Users**, and then click **Users**.
  4. Note the **User Name** value (*not* the **Name** value) for your user.

* The names of the Salesforce categories (objects) that you want to access, specified as a comma-separated list.
  Available categories include `Account`, `Campaign`, `Case`, `EmailMessage`, and `Lead`.

* A Salesforce connected app in your Salesforce account.

  If you do not already have a Salesforce connected app, to create one, start by creating or getting the certificate (`.crt`) and private key (`.pem`) files
  that you will associate with the connected app.

  If you do not have the certificate and private key files, you can use the `openssl` utility on your local machine to create your own
  private key and self-signed certificate, if your organization allows, by running the following commands, one command at a time:

  ```bash  theme={null}
  openssl genrsa -out MyPrivateKey.pem -traditional
  openssl req -new -key MyPrivateKey.pem -out MyCertificateSigningRequest.csr
  openssl x509 -req -in MyCertificateSigningRequest.csr -signkey MyPrivateKey.pem -out MyCertificate.crt -days 365
  ```

  Of course, you can change these preceding example filenames as needed. Be sure to store these generated files in a secure location.

  To create a Salesforce connected app, do the following:

  1. Log in to your Salesforce account.

  2. In the top navigation bar, click the **Quick Settings** (gear) icon, and then click **Open Advanced Setup**.

  3. In the **Home** tab, under **Platform Tools**, expand **Apps**, and then click **App Manager**.

  4. Click **New Connected App**.

  5. With **Create a Connected App** selected, click **Continue**.

  6. At a minimum, fill in the following, and then click **Save**:

     * **Connected App Name**
     * **API Name** (can be the same as **Connected App Name**, but do not use spaces or punctuation)
     * **Contact Email**
     * Under **API (Enable OAuth Settings)**, check **Enable OAuth Settings**.
     * For **Callback URL**, entering `https://localhost` is okay if you won't be using this connected app for other special authentication scenarios.
     * Check **Use digital signatures**, click **Choose File**, and browse to and select your certificate (`.crt`) file.
     * For **Selected OAuth Scopes**, move the following entries from the **Available OAuth Scopes** list to the **Selected OAuth Scopes** list:

       * **Manage user data via APIs (api)**
       * **Perform requests on your behalf at any time (refresh\_token, offline\_access)**
     * Uncheck **Require Proof Key for Code Exchange (PKCE) Extension for Supported Authorization Flows**.
     * Leave **Require Secret for Web Server Flow** checked.
     * Leave **Require Secret for Refresh Token Flow** checked.
     * Check **Enable Authorization Code and Credentials Flow**.

  7. On the connected app's details page, click **Manage**, click **Edit Policies**, set the following under **OAuth Policies**, and then click **Save**:

     * Set **Permitted Users** to **All users may self-authorize**.
     * Set **IP Relaxation** to **Relax IP restrictions**.
     * Set **Refresh Token Policy** to **Refresh token is valid until revoked**.

* The OAuth consumer key (client ID) for the Salesforce connected app.

  To get the Salesforce connected app's consumer key, do the following:

  1. Log in to your Salesforce account.
  2. In the top navigation bar, click the **Quick Settings** (gear) icon, and then click **Open Advanced Setup**.
  3. In the **Home** tab, under **Platform Tools**, expand **Apps**, and then click **App Manager**.
  4. In the list of apps, click the arrow next to the target connected app, and click **View**.
  5. Click **Manage Consumer Details**.
  6. Complete the on-screen security verification.
  7. Note the **Consumer Key** value.

* You must use your Salesforce account to do a one-time approval of the Salesforce connected app by using its consumer key and callback URL. To do this, while you are logged in to your
  Salesforce account, browse to the following URL, replacing `<client-id>` with the consumer key value. This URL assumes that the callback URL
  is `https://localhost`:

  ```
  https://login.salesforce.com/services/oauth2/authorize?response_type=code&client_id=<client-id>&redirect_uri=https%3A%2F%2Flocalhost
  ```

* To ensure maximum compatibility across Unstructured service offerings, you should give the contents of the private key (`.pem`) file to Unstructured as
  a string that contains the contents of the file (*not* the private key file itself).\
  To print this string suitable for copying, you can run one of the following commands from your Terminal or Command Prompt.
  In this command, replace `<path-to-private-key-file>` with the path to the private key file.

  * For macOS or Linux:

    ```bash  theme={null}
    cat <path-to-private-key-file>
    ```

  * For Windows:

    ```text  theme={null}
    Get-Content <path-to-private-key-file>
    ```

To create a Salesforce source connector, see the following examples.

<CodeGroup>
  ```python Python SDK theme={null}
  import os

  from unstructured_client import UnstructuredClient
  from unstructured_client.models.operations import CreateSourceRequest
  from unstructured_client.models.shared import CreateSourceConnector

  with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
      response = client.sources.create_source(
          request=CreateSourceRequest(
              create_source_connector=CreateSourceConnector(
                  name="<name>",
                  type="salesforce",
                  config={
                      "username": "<user-name>",
                      "consumer_key": "<consumer-key>",
                      "private_key": "<private-key>",
                      "categories": [
                          "<category>",
                          "<category>"
                      ]
                  }
              )
          )
      )

      print(response.source_connector_information)
  ```

  ```bash curl theme={null}
  curl --request 'POST' --location \
  "$UNSTRUCTURED_API_URL/sources" \
  --header 'accept: application/json' \
  --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
  --header 'content-type: application/json' \
  --data \
  '{
      "name": "<name>",
      "type": "salesforce",
      "config": {
          "username": "<user-name>",
          "consumer_key": "<consumer-key>",
          "private_key": "<private-key>",
          "categories": [
              "<category>",
              "<category>"
          ]
      }
  }'
  ```
</CodeGroup>

Replace the preceding placeholders as follows:

* `<name>` (*required*) - A unique name for this connector.
* `<user-name>` - The Salesforce username that has access to the required Salesforce categories.
* `<consumer-key>` - The consumer key (client ID) for the Salesforce connected app.
* `<private-key>` - The contents of the private key (PEM) associated with the consumer key for the Salesforce connected app, expressed as a single-line string.
* For `categories`, set one or more `<category>` values (such as `Account`, `Campaign`, `Case`, `EmailMessage`, and `Lead`) to process only those categories.
  The default is to include these catagories if not otherwise specified: `Account`, `Campaign`, `Case`, `EmailMessage`, and `Lead`.

---

# Source: https://docs.unstructured.io/api-reference/legacy-api/partition/sdk-jsts.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# JavaScript/TypeScript SDK

<Note>
  The following information applies to the legacy Unstructured Partition Endpoint.

  Unstructured recommends that you use the
  [on-demand jobs](/api-reference/workflow/overview#run-an-on-demand-job) functionality in the
  [Unstructured API](/api-reference/overview) instead. Unstructured's on-demand jobs provide
  many benefits over the legacy Unstructured Partition Endpoint, including support for:

  * Production-level usage.
  * Multiple local input files in batches.
  * The latest and highest-performing models.
  * Post-transform enrichments.
  * All of Unstructured's chunking strategies.
  * The generation of vector embeddings.

  The Unstructured API also provides support for processing files and data in remote locations.
</Note>

The [Unstructured JavaScript/TypeScript SDK](https://github.com/Unstructured-IO/unstructured-js-client) client allows you to send one file at a time for processing by the Unstructured Partition Endpoint.

To use the JavaScript/TypeScript SDK, you'll first need to set an environment variable named `UNSTRUCTURED_API_KEY`,
representing your Unstructured API key. To get your API key, do the following:

1. If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
   After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).

   <Note>
     To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/api-reference/overview#pricing).
   </Note>

2. If you have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at [https://platform.unstructured.io](https://platform.unstructured.io).

   <Note>
     For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
     or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).
   </Note>

3. Get your Unstructured API key:<br />

   a. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business** account, click **API Keys** on the sidebar.<br />

   <Note>
     For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
     for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).
   </Note>

   b. Click **Generate API Key**.<br />
   c. Follow the on-screen instructions to finish generating the key.<br />
   d. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

## Installation

Before using the SDK to interact with Unstructured, install the library:

<CodeGroup>
  ```bash JavaScript/TypeScript theme={null}
  npm install unstructured-client
  ```
</CodeGroup>

<Note>
  The SDK uses semantic versioning and major bumps could bring breaking changes. It is advised to
  pin your installed version.
</Note>

## Basics

Let's start with a simple example in which you send a PDF document to the Unstructured Partition Endpoint to be partitioned by Unstructured.

<Warning>
  The JavaScript/TypeScript SDK has the following breaking changes in v0.11.0:

  * Imports under the `dist` path have moved up a level
  * Enums are now used for parameters with a set of options
    * This includes `chunkingStrategy`, `outputFormat`, and `strategy`
  * All parameters to `partition` have moved to a `partitionParameters` object
</Warning>

<CodeGroup>
  ```typescript TypeScript theme={null}
  import { UnstructuredClient } from "unstructured-client";
  import { PartitionResponse } from "unstructured-client/sdk/models/operations";
  import { Strategy } from "unstructured-client/sdk/models/shared";
  import * as fs from "fs";

  const key = process.env.UNSTRUCTURED_API_KEY;
  const url = process.env.UNSTRUCTURED_API_URL;

  const client = new UnstructuredClient({
      serverURL: url,
      security: {
          apiKeyAuth: key,
      },
  });

  const filename = "PATH_TO_INPUT_FILE";
  const data = fs.readFileSync(filename);

  client.general.partition({
      partitionParameters: {
          files: {
              content: data,
              fileName: filename,
          },
          strategy: Strategy.HiRes,
          splitPdfPage: true,
          splitPdfAllowFailed: true,
          splitPdfConcurrencyLevel: 15
          languages: ['eng']
      }
  }).then((res: PartitionResponse) => {
      if (res.statusCode == 200) {
          // Print the processed data's first element only.
          console.log(res.elements?.[0])

          // Write the processed data to a local file.
          const jsonElements = JSON.stringify(res, null, 2)

          fs.writeFileSync("PATH_TO_OUTPUT_FILE", jsonElements)
      }
  }).catch((e) => {
      if (e.statusCode) {
        console.log(e.statusCode);
        console.log(e.body);
      } else {
        console.log(e);
      }
  });
  ```

  ```typescript TypeScript (SDK <=v0.10.6) theme={null}
  import { UnstructuredClient } from "unstructured-client";
  import { PartitionResponse } from "unstructured-client/dist/sdk/models/operations";
  import * as fs from "fs";

  const key = process.env.UNSTRUCTURED_API_KEY;
  const url = process.env.UNSTRUCTURED_API_URL;

  const client = new UnstructuredClient({
      serverURL: url,
      security: {
          apiKeyAuth: key,
      },
  });

  const filename = "PATH_TO_FILE";
  const data = fs.readFileSync(filename);

  client.general.partition({
      files: {
          content: data,
          fileName: filename,
      },
      strategy: 'hi_res',
      languages: ['eng']
  }).then((res: PartitionResponse) => {
      if (res.statusCode == 200) {
          console.log(res.elements?.[0]);
      }
  }).catch((e) => {
    if (e.statusCode) {
      console.log(e.statusCode);
      console.log(e.body);
    } else {
      console.log(e);
    }
  });
  ```
</CodeGroup>

For a code example that works with an entire directory of files instead of just a single PDF, see the [Processing multiple files](#processing-multiple-files) section.

## Page splitting

In order to speed up processing of large PDF files, the `splitPdfPage`[\*](#parameter-names) parameter is `true` by default. This
causes the PDF to be split into small batches of pages before sending requests to the API. The client
awaits all parallel requests and combines the responses into a single response object. This is specific to PDF files and other
filetypes are ignored.

The number of parallel requests is controlled by `splitPdfConcurrencyLevel`[\*](#parameter-names).
The default is 8 and the max is set to 15 to avoid high resource usage and costs.

If at least one request is successful, the responses are combined into a single response object. An
error is returned only if all requests failed or there was an error during splitting.

<Note>
  This feature may lead to unexpected results when chunking because the server does not see the entire
  document context at once. If you'd like to chunk across the whole document and still get the speedup from
  parallel processing, you can:

  * Partition the PDF with `splitPdfPage` set to `true`, without any chunking parameters.
  * Store the returned elements in `results.json`.
  * Partition this JSON file with the desired chunking parameters.
</Note>

```typescript TypeScript theme={null}
client.general.partition({
    partitionParameters: {
        files: {
            content: data,
            fileName: filename,
        },
        strategy: Strategy.HiRes,
        // Set to `false` to disable PDF splitting
        splitPdfPage: true,
        // Continue PDF splitting even if some earlier split operations fail.
        splitPdfAllowFailed: true,
        // Modify splitPdfConcurrencyLevel to set the number of parallel requests
        splitPdfConcurrencyLevel: 10,
    }
})
```

## Customizing the client

### Retries

You can also change the defaults for retries through the `retryConfig`[\*](#parameter-names)
when initializing the client. If a request to the API fails, the client will retry the
request with an exponential backoff strategy up to a maximum interval of one minute. The
function keeps retrying until the total elapsed time exceeds `maxElapsedTime`[\*](#parameter-names),
which defaults to one hour:

```typescript TypeScript theme={null}
const key = process.env.UNSTRUCTURED_API_KEY;
const url = process.env.UNSTRUCTURED_API_URL;    

const client = new UnstructuredClient({
    security: { apiKeyAuth: key },
    serverURL: url,
    retryConfig: {
        strategy: "backoff",
        retryConnectionErrors: true,
        backoff: {
            initialInterval: 500,
            maxInterval: 60000,
            exponent: 1.5,
            maxElapsedTime: 900000, // 15min*60sec*1000ms = 15 minutes
        },
    };
});
```

## Processing multiple files

The code example in the [Basics](#basics) section processes a single PDF file. But what if you want to process
multiple files inside a directory with a mixture of subdirectories and files with different file types?

The following example takes an input directory path to read files from and an output directory path to
write the processed data to, processing one file at a time.

```typescript TypeScript theme={null}
import { UnstructuredClient } from "unstructured-client";
import * as fs from "fs";
import * as path from "path";
import { Strategy } from "unstructured-client/sdk/models/shared/index.js";
import { PartitionResponse } from "unstructured-client/sdk/models/operations";

// Send all files in the source path to Unstructured for processing.
// Send the processed data to the destination path.
function processFiles(
    client: UnstructuredClient,
    sourcePath: string,
    destinationPath: string
): void {

    // If an output directory does not exist for the corresponding input
    // directory, then create it.
    if (!fs.existsSync(destinationPath)) {
        fs.mkdirSync(destinationPath, { recursive: true });
    }

    // Get all folders and files at the current level of the input directory.
    const items = fs.readdirSync(sourcePath);

    // For each folder and file in the input directory...
    for (const item of items) {
        const inputPath = path.join(sourcePath, item);
        const outputPath = path.join(destinationPath, item)

        // If it's a folder, call this function recursively.
        if (fs.statSync(inputPath).isDirectory()) {
            processFiles(client, inputPath, outputPath);
        } else {
            // If it's a file, send it to Unstructured for processing.
            const data = fs.readFileSync(inputPath);

            client.general.partition({
                partitionParameters: {
                    files: {
                        content: data,
                        fileName: inputPath
                    },
                    strategy: Strategy.HiRes,
                    splitPdfPage: true,
                    splitPdfConcurrencyLevel: 15,
                    splitPdfAllowFailed: true
                }
            }).then((res: PartitionResponse) => {
                // If successfully processed, write the processed data to
                // the destination directory.
                if (res.statusCode == 200) {
                    const jsonElements = JSON.stringify(res, null, 2)
                    fs.writeFileSync(outputPath + ".json", jsonElements)
                }
            }).catch((e) => {
                if (e.statusCode) {
                    console.log(e.statusCode);
                    console.log(e.body);
                } else {
                    console.log(e);
                }
            });
        }
    }
}

const client = new UnstructuredClient({
    security: { apiKeyAuth: process.env.UNSTRUCTURED_API_KEY },
    serverURL: process.env.UNSTRUCTURED_API_URL
});

processFiles(
    client,
    process.env.LOCAL_FILE_INPUT_DIR,
    process.env.LOCAL_FILE_OUTPUT_DIR
);
```

## Parameters & examples

The parameter names used in this document are for the JavaScript/TypeScript SDK, which follows the `camelCase`
convention. The Python SDK follows the `snake_case` convention. Other than this difference in naming convention,
the names used in the SDKs are the same across all methods.

* Refer to the [API parameters](/api-reference/legacy-api/partition/api-parameters) page for the full list of available parameters.
* Refer to the [Examples](/api-reference/legacy-api/partition/examples) page for some inspiration on using the parameters.

---

# Source: https://docs.unstructured.io/api-reference/legacy-api/partition/sdk-python.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Python SDK

<Note>
  The following information applies to the legacy Unstructured Partition Endpoint.

  Unstructured recommends that you use the
  [on-demand jobs](/api-reference/workflow/overview#run-an-on-demand-job) functionality in the
  [Unstructured API](/api-reference/overview) instead. Unstructured's on-demand jobs provide
  many benefits over the legacy Unstructured Partition Endpoint, including support for:

  * Production-level usage.
  * Multiple local input files in batches.
  * The latest and highest-performing models.
  * Post-transform enrichments.
  * All of Unstructured's chunking strategies.
  * The generation of vector embeddings.

  The Unstructured API also provides support for processing files and data in remote locations.
</Note>

The [Unstructured Python SDK](https://github.com/Unstructured-IO/unstructured-python-client) client allows you to send one file at a time for processing by
the [Unstructured Partition Endpoint](/api-reference/legacy-api/partition/overview).

To use the Python SDK, you'll first need to set an environment variable named `UNSTRUCTURED_API_KEY`,
representing your Unstructured API key. To get your API key, do the following:

1. If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
   After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).

   <Note>
     To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/api-reference/overview#pricing).
   </Note>

2. If you have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at [https://platform.unstructured.io](https://platform.unstructured.io).

   <Note>
     For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
     or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).
   </Note>

3. Get your Unstructured API key:<br />

   a. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business** account, click **API Keys** on the sidebar.<br />

   <Note>
     For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
     for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).
   </Note>

   b. Click **Generate API Key**.<br />
   c. Follow the on-screen instructions to finish generating the key.<br />
   d. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

## Installation

Before using the SDK to interact with Unstructured, install the library:

```bash Python theme={null}
pip install unstructured-client
```

<Note>
  The SDK uses semantic versioning and major bumps could bring breaking changes. It is advised to
  pin your installed version. See the [migration guide](#migration-guide), later on this page, for breaking change announcements.
</Note>

## Basics

Let's start with a simple example in which you send a PDF document to the Unstructured Partition Endpoint to be partitioned by Unstructured.

```python Python theme={null}
import os, json

import unstructured_client
from unstructured_client.models import operations, shared

client = unstructured_client.UnstructuredClient(
    api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")
)

filename = "PATH_TO_INPUT_FILE"

req = operations.PartitionRequest(
    partition_parameters=shared.PartitionParameters(
        files=shared.Files(
            content=open(filename, "rb"),
            file_name=filename,
        ),
        strategy=shared.Strategy.VLM,
        vlm_model="gpt-4o",
        vlm_model_provider="openai",
        languages=['eng'],
        split_pdf_page=True,            # If True, splits the PDF file into smaller chunks of pages.
        split_pdf_allow_failed=True,    # If True, the partitioning continues even if some pages fail.
        split_pdf_concurrency_level=15  # Set the number of concurrent request to the maximum value: 15.
    ),
)

try:
    res = client.general.partition(
        request=req
    )
    element_dicts = [element for element in res.elements]
    
    # Print the processed data's first element only.
    print(element_dicts[0])

    # Write the processed data to a local file.
    json_elements = json.dumps(element_dicts, indent=2)

    with open("PATH_TO_OUTPUT_FILE", "w") as file:
        file.write(json_elements)
except Exception as e:
    print(e)
```

## Async partitioning

The Python SDK also has a `partition_async`. This call is equivalent to `partition` except that it can be used in a non blocking context. For instance, `asyncio.gather` can be used to concurrently process multiple files inside of a directory hierarchy, as demonstrated here:

<CodeGroup>
  ```python Python theme={null}
  import asyncio
  import os
  import json
  import unstructured_client
  from unstructured_client.models import shared

  client = unstructured_client.UnstructuredClient(
      api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")
  )

  async def call_api(filename, input_dir, output_dir):
      req = {
          "partition_parameters": {
              "files": {
                  "content": open(filename, "rb"),
                  "file_name": os.path.basename(filename),
              },
              "strategy": shared.Strategy.VLM,
              "vlm_model": "gpt-4o",
              "vlm_model_provider": "openai",
              "languages": ['eng'],
              "split_pdf_page", True,            # If True, splits the PDF file into smaller chunks of pages.
              "split_pdf_allow_failed": True,    # If True, the partitioning continues even if some pages fail.
              "split_pdf_concurrency_level": 15  # Set the number of concurrent request to the maximum value: 15.
          }
      }

      try:
          res = await client.general.partition_async(
              request=req
          )
          element_dicts = [element for element in res.elements]
          json_elements = json.dumps(element_dicts, indent=2)

          # Create the output directory structure.
          relative_path = os.path.relpath(os.path.dirname(filename), input_dir)
          output_subdir = os.path.join(output_dir, relative_path)
          os.makedirs(output_subdir, exist_ok=True)

          # Write the output file.
          output_filename = os.path.join(output_subdir, os.path.basename(filename) + ".json")
          with open(output_filename, "w") as file:
              file.write(json_elements)

      except Exception as e:
          print(f"Error processing {filename}: {e}")

  async def process_files(input_directory, output_directory):
      tasks = []

      for root, _, files in os.walk(input_directory):
          for file in files:
              if not file.endswith('.json'):
                  full_path = os.path.join(root, file)
                  tasks.append(call_api(full_path, input_directory, output_directory))

      await asyncio.gather(*tasks)

  if __name__ == "__main__":
      asyncio.run(process_files(
          input_directory=os.getenv("LOCAL_FILE_INPUT_DIR"), 
          output_directory=os.getenv("LOCAL_FILE_OUTPUT_DIR")
      ))
  ```
</CodeGroup>

## Page splitting

In order to speed up processing of large PDF files, the `split_pdf_page`[\*](#parameter-names) parameter is `True` by default. This
causes the PDF to be split into small batches of pages before sending requests to the API. The client
awaits all parallel requests and combines the responses into a single response object. This is specific to PDF files and other
filetypes are ignored.

The number of parallel requests is controlled by `split_pdf_concurrency_level`[\*](#parameter-names).
The default is 8 and the max is set to 15 to avoid high resource usage and costs.

If at least one request is successful, the responses are combined into a single response object. An
error is returned only if all requests failed or there was an error during splitting.

<Note>
  This feature may lead to unexpected results when chunking because the server does not see the entire
  document context at once. If you'd like to chunk across the whole document and still get the speedup from
  parallel processing, you can:

  * Partition the PDF with `split_pdf_page` set to `True`, without any chunking parameters.
  * Store the returned elements in `results.json`.
  * Partition this JSON file with the desired chunking parameters.
</Note>

```python Python theme={null}
req = operations.PartitionRequest(
    partition_parameters=shared.PartitionParameters(
        files=shared.Files(
            content=file.read(),
            file_name=filename,
        ),
        strategy=shared.Strategy.VLM,
        vlm_model="gpt-4o",
        vlm_model_provider="openai",
        split_pdf_page=True,            # If True, splits the PDF file into smaller chunks of pages.
        split_pdf_allow_failed=True,    # If True, the partitioning continues even if some pages fail.
        split_pdf_concurrency_level=15  # Set the number of concurrent request to the maximum value: 15.
    )
)
res = client.general.partition(
    request=req
)
```

## Customizing the client

### Retries

You can also change the defaults for retries through the `retry_config`[\*](#parameter-names)
parameter when initializing the client. If a request to the API fails, the client will retry the
request with an exponential backoff strategy up to a maximum interval of one minute. The
function keeps retrying until the total elapsed time exceeds `max_elapsed_time`[\*](#parameter-names),
which defaults to one hour. However, you can override these defaults, for example as follows:

```python Python theme={null}
# ...
from unstructured_client.utils import RetryConfig, BackoffStrategy

client = UnstructuredClient(
    api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")
    retry_config=RetryConfig(
        strategy="backoff",
        retry_connection_errors=True,
        backoff=BackoffStrategy(
            initial_interval=500,     # 0.5 seconds: 500 milliseconds
            max_interval=60000,       # 1 minute: 60 seconds * 1000 milliseconds
            exponent=1.5,
            max_elapsed_time=900000,  # 15 minutes: 15 * 60 seconds * 1000 milliseconds
        ),
    )
)
```

Available `RetryConfig` settings include:

| Setting                            | Description                                                                       |
| ---------------------------------- | --------------------------------------------------------------------------------- |
| `strategy`                         | The strategy to use for retries. The only supported value is `backoff`.           |
| `retry_connection_errors`          | True to retry on connection errors.                                               |
| `BackoffStrategy.initial_interval` | After the first error, wait the specified number of milliseconds before retrying. |
| `BackoffStrategy.max_interval`     | The maximum wait time, specified in milliseconds between retries.                 |
| `BackoffStrategy.exponent`         | After each retry, increase the wait time exponentially by the specified factor.   |
| `BackoffStrategy.max_elapsed_time` | Stop retrying after the specified number of milliseconds.                         |

### Disabling SSL validation

If you disable SSL validation, requests will accept any TLS certificate
presented by the server and ignore hostname mismatches and/or expired certificates,
which will make your application vulnerable to man-in-the-middle (MitM) attacks.
Only set this to `False` for testing.

```python Python theme={null}
http_client = requests.Session()
http_client.verify = False

client = UnstructuredClient(
    client=http_client,
    ...
)
```

## Handling the response

The partition response defaults to a dict format that can be converted to Unstructured elements with
the `elements_from_dicts` utility function as seen below. Otherwise, the API response can be sent directly
to your vector store or another destination.

```python Python theme={null}
from unstructured.staging.base import elements_from_dicts

# ...

if res.elements is not None:
    elements = elements_from_dicts(response.elements)
```

## Parameters & examples

The parameter names used in this document are for the Python SDK, which follow snake\_case convention. The JavaScript/TypeScript SDK follows camelCase
convention. Other than this difference in naming convention,
the names used in the SDKs are the same across all methods.

* Refer to the [API parameters](/api-reference/legacy-api/partition/api-parameters) page for the full list of available parameters.
* Refer to the [Examples](/api-reference/legacy-api/partition/examples) page for some inspiration on using the parameters.

## Migration guide

There are breaking changes beginning with Python SDK version 0.26.0. If you encounter any errors when upgrading, please find the solution below.

**If you see the error: `AttributeError: 'PartitionParameters' object has no attribute 'partition_parameters'`**

Before 0.26.0, the SDK accepted a `PartitionParameters` object as input to the `sdk.general.partition` function. Beginning with 0.26.0, this object must be wrapped in a `PartitionRequest` object. The old behavior was deprecated in 0.23.0 and removed in 0.26.0.

```python  theme={null}
# Instead of:
from unstructured_client.models import shared

req = shared.PartitionParameters(
    files=files,
)

resp = s.general.partition(
    request=req
)

# Switch to:
from unstructured_client.models import shared, operations

req = operations.PartitionRequest(
    partition_parameters=shared.PartitionParameters(
        files=files,
    )
)

resp = s.general.partition(
    request=req
)
```

**If you see the error: `TypeError: BaseModel.__init__() takes 1 positional argument but 2 were given`**

Beginning with 0.26.0, the `PartitionRequest` constructor no longer allows for positional arguments. You must specify `partition_parameters` by name.

```python  theme={null}
# Instead of:
req = operations.PartitionRequest(
    shared.PartitionParameters(
        files=files,
    )
)

# Switch to:
req = operations.PartitionRequest(
    partition_parameters=shared.PartitionParameters(
        files=files,
    )
)
```

**If you see the error: `TypeError: General.partition() takes 1 positional argument but 2 were given`**

Beginning with 0.26.0, the `partition` function no longer allows for positional arguments. You must specify `request` by name.

```python  theme={null}
# Instead of:
resp = s.general.partition(req)

# Switch to:
resp = s.general.partition(
    request=req
)
```

---

# Source: https://docs.unstructured.io/open-source/how-to/set-ocr-agent.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Set the OCR agent

## Task

You want to specify the type of agent that you use when performing optical character recognition (OCR) on files, such as images and PDFs.

<Note>This task works only with the Unstructured open source library.</Note>

## Approach

Set the environment variable named `OCR_AGENT` to one of the following supported values:

* `unstructured.partition.utils.ocr_models.tesseract_ocr.OCRAgentTesseract` to use Tesseract OCR. This is the default if not otherwise specified.
* `unstructured.partition.utils.ocr_models.paddle_ocr.OCRAgentPaddle` to use Paddle OCR.
* `unstructured.partition.utils.ocr_models.google_vision_ocr.OCRAgentGoogleVision` to use Google Cloud Vision OCR.

Also, be sure to install the corresponding OCR agent and its dependencies, if you have not already done so:

* For Tesseract OCR, [see the dependency list](https://github.com/Unstructured-IO/unstructured/blob/main/requirements/extra-pdf-image.in).
* For Paddle OCR, [see the dependency list](https://github.com/Unstructured-IO/unstructured/blob/main/requirements/extra-paddleocr.in).
* For Google Cloud Vision OCR, [see the dependency list](https://github.com/Unstructured-IO/unstructured/blob/main/requirements/extra-pdf-image.in).

## Example code

This example uses a PNG file with an embedded combination of English and Korean text. This example uses Tesseract OCR.

Language codes will differ depending on the OCR agent you use:

* For Tesseract OCR, [see the language codes list](https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/partition/common/lang.py).
* For Paddle OCR, [see the language codes list](https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/partition/lang.py) and [language names list](https://github.com/PaddlePaddle/PaddleOCR/blob/main/doc/doc_en/multi_languages_en.md#language_abbreviations).
* For Google Cloud Vision OCR, [see the language codes list](https://cloud.google.com/vision/docs/languages).

```python Python theme={null}
import json

from unstructured.partition.image import partition_image

# Source: https://github.com/Unstructured-IO/unstructured-ingest/blob/main/example-docs/img/english-and-korean.png
# Path to the local file to process, relative to this .py file.
filename = "local-ingest-png/english-and-korean.png"

elements = partition_image(
  filename=filename,
  strategy="ocr_only",
  languages=["eng", "kor"] # Language codes differ by the OCR agent used.
)

# Convert the list of returned elements into a list of dictionaries for printing or saving.
element_dicts = [element.to_dict() for element in elements]

# Print the list.
print(json.dumps(element_dicts, indent=2))

# Or, save the list locally:
#
# file = "local-ingest-output/english-and-korean.json"
#
# with open(file, "w") as file:
#     json.dump(element_dicts, file, indent=2)
```

---

# Source: https://docs.unstructured.io/open-source/ingestion/source-connectors/sftp.md

# Source: https://docs.unstructured.io/open-source/ingestion/destination-connectors/sftp.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# SFTP

Batch process all your records to store structured outputs in SFTP.

The requirements are as follows.

* The SFTP server hostname, port, username, and password.

  SFTP servers are offered by several vendors. For example, the following video shows how to create and set up an SFTP server by using AWS Transfer Family:

  <iframe width="560" height="315" src="https://www.youtube.com/embed/3_HHSnoFsoM" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* The directory path to start accessing data from, specified as `sftp://<path>/<to>/<directory>`.

See the [SFTP documentation](https://en.wikipedia.org/wiki/SSH_File_Transfer_Protocol).

The SFTP connector dependencies:

```bash CLI, Python theme={null}
pip install "unstructured-ingest[sftp]"
```

You might also need to install additional dependencies, depending on your needs. [Learn more](/open-source/ingestion/ingest-dependencies).

The following environment variables:

* `SFTP_HOST` - The SFTP hostname, represented by `--host` (CLI) or `host` (Python).
* `SFTP_PORT` - The SFTP port number, represented by `--port` (CLI) or `port` (Python).
* `SFTP_REMOTE_URL` - The directory path to start accessing data from, represented by `--remote-url` (CLI) or `remote_url` (Python).
* `SFTP_USERNAME` - The SFTP username, represented by `--username` (CLI) or `username` (Python).
* `SFTP_PASSWORD` - The SFTP password, represented by `--password` (CLI) or `password` (Python).

Now call the Unstructured Ingest CLI or the Unstructured Ingest Python library. The source connector can be any of the ones supported. This example uses the local source connector.

This example sends files to Unstructured for processing by default. To process files locally instead, see the instructions at the end of this page.

<CodeGroup>
  ```bash CLI theme={null}
  unstructured-ingest \
    local \
      --input-path $LOCAL_FILE_INPUT_DIR \
      --strategy hi_res \
      --partition-by-api \
      --api-key $UNSTRUCTURED_API_KEY \
      --partition-endpoint $UNSTRUCTURED_API_URL \
    sftp \
      --host $SFTP_HOST \
      --port $SFTP_PORT \
      --username $SFTP_USERNAME \
      --password $SFTP_PASSWORD \
      --remote-url $SFTP_REMOTE_URL
  ```

  ```python Python Ingest theme={null}
  import os

  from unstructured_ingest.pipeline.pipeline import Pipeline
  from unstructured_ingest.interfaces import ProcessorConfig

  from unstructured_ingest.processes.connectors.fsspec.sftp import (
      SftpConnectionConfig,
      SftpAccessConfig,
      SftpUploaderConfig
  )

  from unstructured_ingest.processes.connectors.local import (
      LocalIndexerConfig,
      LocalDownloaderConfig,
      LocalConnectionConfig
  )

  from unstructured_ingest.processes.partitioner import PartitionerConfig
  from unstructured_ingest.processes.chunker import ChunkerConfig
  from unstructured_ingest.processes.embedder import EmbedderConfig

  # Chunking and embedding are optional.

  if __name__ == "__main__":
      Pipeline.from_configs(
          context=ProcessorConfig(),
          indexer_config=LocalIndexerConfig(input_path=os.getenv("LOCAL_FILE_INPUT_DIR")),
          downloader_config=LocalDownloaderConfig(),
          source_connection_config=LocalConnectionConfig(),
          partitioner_config=PartitionerConfig(
              partition_by_api=True,
              api_key=os.getenv("UNSTRUCTURED_API_KEY"),
              partition_endpoint=os.getenv("UNSTRUCTURED_API_URL"),
              strategy="hi_res",
              additional_partition_args={
                  "split_pdf_page": True,
                  "split_pdf_allow_failed": True,
                  "split_pdf_concurrency_level": 15
              }
          ),
          chunker_config=ChunkerConfig(chunking_strategy="by_title"),
          embedder_config=EmbedderConfig(embedding_provider="huggingface"),
          destination_connection_config=SftpConnectionConfig(
              access_config=SftpAccessConfig(password=os.getenv("SFTP_PASSWORD")),
              host=os.getenv("SFTP_HOST"),
              port=os.getenv("SFTP_PORT"),
              username=os.getenv("SFTP_USERNAME")
          ),
          uploader_config=SftpUploaderConfig(remote_url=os.getenv("SFTP_REMOTE_URL"))
      ).run()
  ```
</CodeGroup>

For the Unstructured Ingest CLI and the Unstructured Ingest Python library, you can use the `--partition-by-api` option (CLI) or `partition_by_api` (Python) parameter to specify where files are processed:

* To do local file processing, omit `--partition-by-api` (CLI) or `partition_by_api` (Python), or explicitly specify `partition_by_api=False` (Python).

  Local file processing does not use an Unstructured API key or API URL, so you can also omit the following, if they appear:

  * `--api-key $UNSTRUCTURED_API_KEY` (CLI) or `api_key=os.getenv("UNSTRUCTURED_API_KEY")` (Python)
  * `--partition-endpoint $UNSTRUCTURED_API_URL` (CLI) or `partition_endpoint=os.getenv("UNSTRUCTURED_API_URL")` (Python)
  * The environment variables `UNSTRUCTURED_API_KEY` and `UNSTRUCTURED_API_URL`

* To send files to the legacy [Unstructured Partition Endpoint](/api-reference/legacy-api/partition/overview) for processing, specify `--partition-by-api` (CLI) or `partition_by_api=True` (Python).

  Unstructured also requires an Unstructured API key and API URL, by adding the following:

  * `--api-key $UNSTRUCTURED_API_KEY` (CLI) or `api_key=os.getenv("UNSTRUCTURED_API_KEY")` (Python)
  * `--partition-endpoint $UNSTRUCTURED_API_URL` (CLI) or `partition_endpoint=os.getenv("UNSTRUCTURED_API_URL")` (Python)
  * The environment variables `UNSTRUCTURED_API_KEY` and `UNSTRUCTURED_API_URL`, representing your API key and API URL, respectively.

  <Note>
    You must specify the API URL only if you are not using the default API URL for Unstructured Ingest, which applies to **Let's Go**, **Pay-As-You-Go**, and **Business SaaS** accounts.

    The default API URL for Unstructured Ingest is `https://api.unstructuredapp.io/general/v0/general`, which is the API URL for the legacy[Unstructured Partition Endpoint](/api-reference/legacy-api/partition/overview). However, you should always use the URL that was provided to you when your Unstructured account was created. If you do not have this URL, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

    If you do not have an API key, [get one now](/api-reference/legacy-api/partition/overview).

    If you are using a **Business** account, the process
    for generating Unstructured API keys, and the Unstructured API URL that you use, are different.
    For instructions, see your Unstructured account administrator, or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).
  </Note>

---

# Source: https://docs.unstructured.io/support/shared-responsibility.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Shared Responsibility Model

The Unstructured Shared Responsibility Model defines the division of ownership and duties between
Unstructured, customers, and the cloud service providers (CSPs) that Unstructured and customers rely on.

The Model lists and clarifies who is ultimately responsible for specific aspects of
physical and cloud infrastructure, security, and application services, ensuring each party understands
not only its own obligations but the obligations of the other participating parties. This Model is
regularly updated to reflect the latest additions and changes to Unstructured's and the
CSPs' product and service offerings.

Before you contact Unstructured Support, you are encouraged to review the latest version of the Model
to check who owns resolution of the issue you are experiencing. If you contact
Unstructured Support about an issue for which the Model states that the customer is ultimately responsible,
Support could return the unresolved issue back to you and take no further action.

To see the latest version of the Model, go to the [Unstructured Trust Portal](https://trust.unstructured.io/),
scroll down to the "Topics and common questions" section, and expand the entry labelled
"Shared Responsibility Model."

---

# Source: https://docs.unstructured.io/examplecode/tools/sharepoint-events.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# SharePoint event triggers

You can use SharePoint events, such as adding new files to—or updating existing files within—SharePoint document libraries, to automatically run Unstructured ETL+ workflows
that rely on those document libraries as sources. This enables a no-touch approach to having Unstructured automatically process new and updated files in SharePoint document libraries as they are added or updated.

This example shows how to automate this process by adding a custom [Power Automate](https://www.microsoft.com/power-platform/products/power-automate) flow to your Microsoft 365 or Office 365 account. This flow runs
whenever a new or updated file is detected in the specified SharePoint document library. This flow then calls the [Unstructured API's workflow operations](/api-reference/workflow/overview) to automatically run the
specified corresponding Unstructured ETL+ workflow within your Unstructured account.

<Note>
  This example uses a custom Power Automate flow that you create and maintain.
  Any issues with file detection, timing, or flow invocation could be related to your custom flow,
  rather than with Unstructured. If you are getting unexpected or no results, be sure to check your custom
  flow's run history first for any informational and error messages.
</Note>

## Requirements

To use this example, you will need the following:

* A Microsoft 365 or Office 365 Business or enterprise plan that includes SharePoint Online, OneDrive, and Power Automate.
  [Shop for business plans](https://www.microsoft.com/microsoft-365/business/compare-all-microsoft-365-business-products).
  [Shop for enterprise plans](https://www.microsoft.com/microsoft-365/enterprise/microsoft365-plans-and-pricing).

* A Power Automate Premium license. [Learn how](https://learn.microsoft.com/power-platform/admin/power-automate-licensing/buy-licenses).

* An Unstructured account, and an Unstructured API key for your account, as follows:

  1. If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
     After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).

     <Note>
       To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/api-reference/overview#pricing).
     </Note>

  2. If you have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at [https://platform.unstructured.io](https://platform.unstructured.io).

     <Note>
       For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
       or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).
     </Note>

  3. Get your Unstructured API key:<br />

     a. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business** account, click **API Keys** on the sidebar.<br />

     <Note>
       For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
       for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).
     </Note>

     b. Click **Generate API Key**.<br />
     c. Follow the on-screen instructions to finish generating the key.<br />
     d. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

* The Unstructured API's workflow operations URL for your account, as follows:

  1. In the Unstructured UI, click **API Keys** on the sidebar.<br />
  2. Note the value of the **Unstructured API's workflow operations** field.

* A SharePoint source connector in your Unstructured account. [Learn how](/ui/sources/sharepoint).

* Some available [destination connector](/ui/destinations/overview) in your Unstructured account.

* A workflow that uses the preceding source and destination connectors. [Learn how](/ui/workflows).

## Step 1: Create an app registration

In this step, you create an app registration in Microsoft Entra ID to access the Azure key vault that you will create in the next step.

1. Sign in to your [Azure portal](https://portal.azure.com).
2. In the **Search resources, services, and docs** box, enter **App registrations**, and then in the list of **Services**, click **App registrations**.
3. Click **+ New registration**.
4. Enter some **Name** for the app registration.
5. For **Supported account types**, select **Accounts in this organizational directory only**.
6. For **Redirect URI**, select **Web**, and then enter `https://localhost`.
7. Click **Register**.
8. In the sidebar, click **Overview**.
9. Expand **Essentials**, and note the values of the **Application (client) ID** and **Directory (tenant) ID** fields.
10. Next to **Client credentials**, click the **Add a certificate or secret** link.
11. On the **Client secrets** tab, click **+ New client secret**.
12. (Optional) Enter some **Description** for the client secret.
13. (Optional) Enter some non-default **Expires** value.
14. Click **Add**.
15. Note the value of the **Value** field. This is the app registration's client secret value. (Do not use the **Secret ID** value.)

## Step 2: Create an Azure key vault

In this step, you store your Unstructured API key in an Azure key vault for your Power Automate flow to access later.

1. With your Azure portal still open from the previous step, in the **Search resources, services, and docs** box, enter **Key vaults**, and then in the list of **Services**, click **Key vaults**.

2. Click **Create**.

3. On the **Basics** tab, set the following key vault settings:

   | Setting            | Suggested value           | Description                                                                                                           |
   | ------------------ | ------------------------- | --------------------------------------------------------------------------------------------------------------------- |
   | **Subscription**   | Your subscription         | The Azure subscription within which to create your new key vault.                                                     |
   | **Resource Group** | **Create new**            | After you click **Create new**, enter some name for the new resource group within which to create your new key vault. |
   | **Key vault name** | Some globally unique name | Some name that identifies your new key vault. Valid characters are `A`-`Z`, `a`-`z`, `0`-`9`, and `-`.                |
   | **Region**         | Your preferred region     | Select a region that's near you or near other services that your key vault can access.                                |
   | **Pricing tier**   | **Standard**              | Choose the pricing tier for your key vault.                                                                           |

4. Click **Review + create**.

5. Click **Create**.

6. After the deployment is complete, click **Go to resource**.

7. In the sidebar, click **Access control (IAM)**.

8. Click **+ Add > Add role assignment**.

9. On the **Role** tab, in the **Job function roles** list, select **Key Vault Administrator**.

10. Click **Next**.

11. On the **Members** tab, with **Assign access to** set to **User, group, or service principal**, click **+ Select members**.

12. Search for and select the name of the app registration that you created earlier in Step 1.

13. Search also for and select your user account.

14. Click **Select**.

15. Click **Review + assign** twice.

16. In the sidebar, expand **Objects**, and then click **Secrets**.

17. Click **+ Generate/Import**.

18. Enter some **Name** for the secret.

19. For **Secret value**, enter your Unstructured API key value.

20. Click **Create**.

## Step 3: Create a Power Automate flow

1. Sign in to [Power Automate](https://make.powerautomate.com).

2. On the sidebar, click **Create**.

3. In the **Start from blank** area, click **Automated cloud flow**.

4. Enter some **Flow name** for your new flow.

5. For **Search all triggers**, enter `When a file is created or modified (properties only)`, and then click
   **When a file is created or modified (properties only) (SharePoint)**.

6. Click **Create**.

7. In the flow designer, click the **When a file is created or modified (properties only)** node.

8. On the **Parameters** tab, set the following node properties:

   * For **Site Address**, enter the **Site URL** value of the SharePoint source connector for your workflow.
   * For **Library Name**, enter the **Documents Library** value of the SharePoint source connector for your workflow.
   * (Optional) If you specified a **Path** value for the SharePoint source connector for your workflow, select **Folder** in the **Advanced parameters** list and then select the corresponding folder name.

9. Below the **When a file is created or modified (properties only)** node, click the plus (**Insert a new action...**) button.

10. In the **Search for an action or connector** box, enter `Get secret`, and then under **Azure Key Vault**, select **Get secret**.

11. Set the following node properties:

    * Enter some **Connection name** for the connection.
    * For **Authentication type**, select **Service principal authentication**.
    * For **Client ID**, enter the **Application (client) ID** value of the app registration that you created earlier in Step 1.
    * For **Client secret**, enter the client secret value of the app registration that you created earlier in Step 1.
    * For **Tenant ID**, enter the **Directory (tenant) ID** value of the app registration that you created earlier in Step 1.
    * For **Key vault name**, enter the name of the key vault that you created earlier in Step 2.

12. Click **Create new**.

13. In the **Name of the secret** list, select the name of the secret that you created earlier in Step 2.

14. Below the **Get secret** node, click the plus (**Insert a new action...**) button.

15. In the **Search for an action or connector** box, enter `HTTP`, and then under **HTTP**, select **HTTP**.

16. On the **Parameters** tab, set the following node properties:

    * For **URI**, enter `<unstructured-api-url>/workflows/<workflow-id>/run`, and replace the following placeholders:

      * Replace `<unstructured-api-url>` with your Unstructured API's workflow operations value.
      * Replace `<workflow-id>` with the ID of your Unstructured workflow.

      The **URI** should now look similar to the following:

      ```text  theme={null}
      https://platform.unstructuredapp.io/api/v1/workflows/11111111-1111-1111-1111-111111111111/run
      ```

    * For **Method**, select **POST**.

    * For **Headers**, enter the following:

      * `accept`, set to `application/json`
      * `unstructured-api-key`, set to your key vault value. To do this, type `/` and select **Insert dynamic content**.
        Under **Get secret**, select **body/value**.

17. In the top navigation bar, click the disk (**Save**) button.

18. Click **Back** to go to the flow's home page.

19. In the top navigation bar, if the power button shows **Turn on**, click it to change it to **Turn off** (which means that the flow is on and active).

## Step 4: Trigger the flow and view results

1. On a separate tab in your web browser, go to the SharePoint document library that corresponds to both your Unstructured workflow's
   SharePoint source connector and the Power Automate flow's **When a file is created or modified (properties only)** node.
2. Add or update a file in the document library.
3. On a separate tab in your web browser, in the Unstructured user interface for your account, click **Jobs** on the sidebar.
4. In the list of jobs, click the newly running job for your workflow. It might take a minute or so for the new job to appear in the list.
5. After the job status shows **Finished**, go to your destination location to see the results.

## Step 5 (Optional): Turn off the flow

To stop the Power Automate flow from running automatically, go back to the web browser tab from earlier in Step 3. Then,
with the flow's home page still open, in the top navigation bar, click the power (**Turn off**) button
to change it to **Turn on** (which means that the flow is off and inactive).

---

# Source: https://docs.unstructured.io/ui/sources/sharepoint.md

# Source: https://docs.unstructured.io/open-source/ingestion/source-connectors/sharepoint.md

# Source: https://docs.unstructured.io/api-reference/workflow/sources/sharepoint.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# SharePoint

<Note>
  If you're new to Unstructured, read this note first.

  Before you can create a source connector, you must first sign in to your Unstructured account:

  * If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
    After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).
    To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/api-reference/overview#pricing).
  * If you already have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at
    [https://platform.unstructured.io](https://platform.unstructured.io). For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
    or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  After you sign in, the [Unstructured user interface](/ui/overview) (UI) appears, which you use to get your Unstructured API key, as follows:

  1. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business** account, click **API Keys** on the sidebar.<br />

     For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
     for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).

  2. Click **Generate API Key**.<br />

  3. Follow the on-screen instructions to finish generating the key.<br />

  4. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

  After you create the source connector, add it along with a
  [destination connector](/api-reference/workflow/destinations/overview) to a [workflow](/api-reference/workflow/overview#workflows).
  Then run the worklow as a [job](/api-reference/workflow/overview#jobs). To learn how, try out the
  the notebook [Dropbox-To-Pinecone Connector API Quickstart for Unstructured](https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Dropbox_To_Pinecone_Connector_Quickstart.ipynb),
  or watch the two 4-minute video tutorials for the [Unstructured Python SDK](/api-reference/workflow/overview#unstructured-python-sdk).

  You can also create source connectors with the Unstructured user interface (UI).
  [Learn how](/ui/sources/overview).

  If you need help, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  You are now ready to start creating a source connector! Keep reading to learn how.
</Note>

Ingest your files into Unstructured from SharePoint.

The requirements are as follows.

<Note>
  If you are setting up the SharePoint connector for the first time, you can skip past this note.

  Previous versions of the SharePoint connector relied on SharePoint app principals for authentication. Current versions of the
  SharePoint connector no longer support these SharePoint app principals. Microsoft deprecated support for Share Point app principals on November 27, 2023.
  SharePoint app principals will no longer work for SharePoint tenants that were created on or after November 1, 2024, and they will stop working
  for all SharePoint tenants as of April 2, 2026. [Learn more](https://learn.microsoft.com/sharepoint/dev/sp-add-ins/retirement-announcement-for-azure-acs).

  Current versions of the SharePoint connector now rely on Microsoft Entra ID app registrations for authentication.

  To migrate from SharePoint app princpals to Entra ID app regisrations, replace the following settings in your existing SharePoint connector,
  as listed in the requirements following this note:

  * Replace the deprecated SharePoint app principal's application client ID value with your replacement Entra ID app registration's **Application (client) ID** value.
  * Replace the deprecated SharePoint app principal's client secret value with your replacement Entra ID app registration's **Client secret** value.
  * Add your replacement Entra ID app registration's **Directory (tenant) ID** value, token authority URL value, and the correct set of Microsoft Graph access permissions for SharePoint Online.

  If you need migration help, get assistance from our [Slack community](https://short.unstructured.io/pzw05l7), or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).
</Note>

* A SharePoint Online plan, or a Microsoft 365 or Office 365 Business or enterprise plan that includes SharePoint Online.
  [Learn more](https://www.microsoft.com/en-us/microsoft-365/SharePoint/compare-SharePoint-plans).
  [Shop for business plans](https://www.microsoft.com/microsoft-365/business/compare-all-microsoft-365-business-products).
  [Shop for enterprise plans](https://www.microsoft.com/microsoft-365/enterprise/microsoft365-plans-and-pricing).

* A OneDrive for business plan, or a Microsoft 365 or Office 365 Business or enterprise plan that includes OneDrive.
  (Even if you only plan to use SharePoint Online, you still need a plan that includes OneDrive, because the SharePoint connector is built on OneDrive technology.)
  [Learn more](https://www.microsoft.com/microsoft-365/onedrive/compare-onedrive-plans).
  [Shop for business plans](https://www.microsoft.com/microsoft-365/business/compare-all-microsoft-365-business-products).
  [Shop for enterprise plans](https://www.microsoft.com/microsoft-365/enterprise/microsoft365-plans-and-pricing).
  OneDrive personal accounts, and Microsoft 365 Free, Basic, Personal, and Family plans are not supported.

* The SharePoint Online and OneDrive plans must share the same Microsoft Entra ID tenant.
  [Learn more](https://learn.microsoft.com/microsoft-365/enterprise/subscriptions-licenses-accounts-and-tenants-for-microsoft-cloud-offerings?view=o365-worldwide).

* The SharePoint Online site base URL. This URL must meet the following requirements:

  1. Starts with `https://`.
  2. Followed by any number of uppercase or lowercase letters, digits, or hyphens (`-`). This portion of the URL must contain at least
     one dot segment after the first domain label; for example, `.com`, `.sharepoint.com`, or `.co.uk`.
  3. Followed optionally by `/sites/` or `/teams/` and ends with any name that can include any number of
     uppercase or lowercase letters, digits, hyphens (`-`), or underscores (`_`); for example, `/sites/my-site` or `/teams/my_team`.

  Some examples of valid site URLs include:

  * `https://contoso.com`
  * `https://my-company.sharepoint.com`
  * `https://example.co.uk`
  * `https://contoso.sharepoint.com/sites/engineering`
  * `https://teams.example.com/teams/marketing`

  Additionally, note the following about collection-level URLs and all site URLs:

  * Site collection-level URLs typically have a format with a pattern similar to `https://<tenant>.sharepoint.com/sites/<site-collection-name>`.
  * Team collection-level URLs typically have a format with a pattern similar to `https://<tenant>.sharepoint.com/teams/<team-collection-name>`.
  * Root site collection-level URLs typically have a format with a pattern similar to `https://<tenant>.sharepoint.com`.
  * To process all sites within a SharePoint tenant, use a site URL with a pattern similar to `https://<tenant>-admin.sharepoint.com`.

  [Learn more](https://learn.microsoft.com/microsoft-365/community/query-string-url-tricks-sharepoint-m365).

* The display name of the SharePoint Online library to use. The default is `Documents`.

* The path to the SharePoint Online library to use. By default, the root of the target library is used.
  To start from a path other than the root, enter the path that you want to use, beginning from the root. For example, to use
  the **my-folder > my-subfolder** path in the target library, you would specify `my-folder/my-subfolder`.

  The following video shows how to get the site URL and a path within the site:

  <iframe width="560" height="315" src="https://www.youtube.com/embed/E3fRwJU-KTc" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* Two types of authentication are supported: client credentials and a username and password. Both authentication types require a
  Microsoft Entra ID app registration. You will need to provide
  the **Application (client) ID**, **Directory (tenant) ID**, and **Client secret** for the Entra ID app registration, and the
  app registration must have the correct set of Microsoft Graph access permissions. These permissions include:

  * `Sites.ReadWrite.All` (if both reading and writing are needed)
  * `User.Read.All`
    [Learn more](https://learn.microsoft.com/answers/questions/2116616/service-principal-access-to-sharepoint-online).

  1. [Create an Entra ID app registration](https://learn.microsoft.com/entra/identity-platform/quickstart-register-app?pivots=portal).
  2. [Add Graph access permissions to an app registration](https://learn.microsoft.com/entra/identity-platform/howto-update-permissions?pivots=portal#add-permissions-to-an-application).
  3. [Grant consent for the added Graph permissions](https://learn.microsoft.com/entra/identity-platform/howto-update-permissions?pivots=portal#grant-consent-for-the-added-permissions-for-the-enterprise-application).

  The following video shows how to create an Entra ID app registration:

  <iframe width="560" height="315" src="https://www.youtube.com/embed/aBAY-LKLPSo" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

  The following video shows how to add the correct set of Graph access permissions to the Entra ID app registration:

  <iframe width="560" height="315" src="https://www.youtube.com/embed/X7fnRYyxy0Q" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* The token authority URL for your Microsoft Entra ID app registration. This is typically `https://login.microsoftonline.com`

* For username and password authentication, you must also provide the User Principal Name (UPN) and its password for the OneDrive account in the Microsoft Entra ID tenant. This UPN is typically the OneDrive account user's email address. To find a UPN:

  1. Depending on your plan, sign in to your Microsoft 365 admin center (typically [https://admin.microsoft.com](https://admin.microsoft.com)) using your administrator credentials,
     or sign in to your Office 365 portal (typically [https://portal.office.com](https://portal.office.com)) using your credentials.
  2. In the **Users** section, click **Active users**.
  3. Locate the user account in the list of active users.
  4. The UPN is displayed in the **Username** column.

  The following video shows how to get a UPN:

  <iframe width="560" height="315" src="https://www.youtube.com/embed/H0yYfhfyCE0" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

To create a SharePoint source connector, see the following examples.

<CodeGroup>
  ```python Python SDK theme={null}
  import os

  from unstructured_client import UnstructuredClient
  from unstructured_client.models.operations import CreateSourceRequest
  from unstructured_client.models.shared import CreateSourceConnector

  with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
      response = client.sources.create_source(
          request=CreateSourceRequest(
              create_source_connector=CreateSourceConnector(
                  name="<name>",
                  type="sharepoint",
                  config={
                      "site": "<site>",
                      "library": "<library>",
                      "path": "<path>",
                      "recursive": <True|False>,
                      "client_id": "<client_id>",
                      "tenant": "<tenant>",
                      "authority_url": "<authority_url>",
                      "client_cred": "<client_cred>",
                      "user_pname": "<user_pname>", # For username and password authentication.
                      "password": "<password>" # For username and password authentication.
                  }
              )
          )
      )

      print(response.source_connector_information)
  ```

  ```bash curl theme={null}
  curl --request 'POST' --location \
  "$UNSTRUCTURED_API_URL/sources" \
  --header 'accept: application/json' \
  --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
  --header 'content-type: application/json' \
  --data \
  '{
      "name": "<name>",
      "type": "sharepoint",
      "config": {
          "site": "<site>",
          "library": "<library>",
          "path": "<path>",
          "recursive": <true|false>,
          "client_id": "<client-id>",
          "tenant": "<tenant>",
          "authority_url": "<authority-url>",
          "client_cred": "<client-cred>",
          "user_pname": "<user-pname>", # For username and password authentication.
          "password": "<password>" # For username and password authentication.
      }
  }'
  ```
</CodeGroup>

Replace the preceding placeholders as follows:

* `<name>` (*required*) - A unique name for this connector.
* `<site>` (*required*) - The base URL of the SharePoint site to connect to.
* `<library>` - The display name of the SharePoint library to use. The default is `Documents`.
* `<path>` - The path to use within the library. The default is the root of the target library. To use a different path, specify the correct path format as described previously in this article.
* For `recursive`, set to `true` to recursively process data from subfolders within the target path. The default is `false` if not otherwise specified.
* `<client-id>` (*required*) - The client ID provided by SharePoint for the app registration.
* `<tenant>` (*required*) - The **Directory (tenant) ID** for the Microsoft Entra ID app registration with the correct set of Microsoft Graph access permissions.
* `<authority-url>` - The authentication token provider URL for the Entra ID app registration. The default is [https://login.microsoftonline.com](https://login.microsoftonline.com).
* `<client-cred>` (*required*) - The **Client secret** for the Entra ID app registration.
* `<user-pname>` (*required* for username and password authentication) - For username and password authentication, the UPN for the OneDrive account in the Entra ID tenant.
* `<password>` (*required* for username and password authentication) - For username and password authentication, the password for the target UPN.

---

# Source: https://docs.unstructured.io/open-source/ingestion/source-connectors/singlestore.md

# Source: https://docs.unstructured.io/open-source/ingestion/destination-connectors/singlestore.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# SingleStore

Batch process all your records to store structured outputs in a SingleStore account.

The requirements are as follows.

* A SingleStore deployment, database, and table. [Learn how](https://www.singlestore.com/blog/how-to-get-started-with-singlestore/).
* The hostname for the SingleStore deployment.
* The port for the host.
* The username for the deployment.
* The password for the user.
* The name of the database in the deployment.
* The name of the table in the database.

To get the values for the hostname, port, username, and password:

1. In your SingleStore account's dashboard sidebar, click **Deployments**.
2. From the drop-down list at the top of the **Deployments** page, select your deployment.
3. On the **Overview** tab, in the **Compute** area, in the **Connect** drop-down list for your deployment, select
   **Your App**.
4. If a **Create User** dialog box appears, note the **User** name and **Password** values.
5. In the **Connect to Workspace** pane's **Your App** tab, note the string in the following format:

```
<user-name>:<password>@<host>:<port>
```

* `<user-name>` is the username.
* `<password>` is the user's password.
* `<host>` is the workspace's hostname.
* `<post>` is the host's port.

To get the values for the database and table names:

1. In your SingleStore dashboard's sidebar, click **Deployments**.
2. From the drop-down list at the top of the **Deployments** page, select your deployment.
3. On the **Databases** tab, note **Name** of your database.
4. Click the database and, on the **Tables** tab, note the name of your table.

The SingleStore connector dependencies:

```bash CLI, Python theme={null}
pip install "unstructured-ingest[singlestore]"
```

You might also need to install additional dependencies, depending on your needs. [Learn more](/open-source/ingestion/ingest-dependencies).

These environment variables:

* `SINGLESTORE_HOST` - The hostname for the SingleStore deployment, represented by `--host` (CLI) or `host` (Python).
* `SINGLESTORE_PORT` - The port for the host, represented by `--port` (CLI) or `port` (Python).
* `SINGLESTORE_USER` - The username for the deployment, represented by `--user` (CLI) or `user` (Python).
* `SINGLESTORE_PASSWORD` - The password for the user, represented by `--password` (CLI) or `password` (Python).
* `SINGLESTORE_DB` - The name of the database in the deployment, represented by `--database` (CLI) or `database` (Python).
* `SINGLESTORE_TABLE` - The name of the table in the database, represented by `--table-name` (CLI) or `table_name` (Python).

The table's schema must match the schema of the documents that Unstructured produces.

During insertion, JSON objects are flattened, and underscores are placed between nested object names.
For example, the following JSON object matches a column in the table named `data_source_date_created`:

```json  theme={null}
{
    "...": "...",
    "data_source": {
        "date_created": "1719963233.949"
    },
    "...": "..."
}
```

The `metadata` object itself is also flattened in a similar way. However, for nested objects in `metadata`,
the column in the table does not start with `metadata_`. For example, the following JSON object matches a
column in the table named `last_modified`:

```json  theme={null}
{
    "...": "...",
    "metadata": {
        "...": "...",
        "last_modified": "2022-12-16T17:04:16-05:00",
        "...": "..."
    },
    "...": "..."
}
```

Unstructured cannot provide a table schema that is guaranteed to work in all
circumstances. This is because these schemas will vary based on your source files' types; how you
want Unstructured to partition, chunk, and generate embeddings; any custom post-processing code that you run; and other factors.

You can adapt the following table schema example for your own needs:

```sql  theme={null}
CREATE TABLE elements (
    id INT PRIMARY KEY NOT NULL AUTO_INCREMENT,
    record_id TEXT,
    element_id TEXT,
    text TEXT,
    embeddings Vector(384),
    parent_id TEXT,
    page_number TEXT,
    is_continuation BOOLEAN,
    orig_elements TEXT,
    partitioner_type TEXT
);
```

See also:

* [CREATE TABLE](https://docs.singlestore.com/cloud/reference/sql-reference/data-definition-language-ddl/create-table/)
  in the SingleStore documentation
* [Unstructured document elements and metadata](/api-reference/legacy-api/partition/document-elements)

Now call the Unstructured Ingest CLI or the Unstructured Ingest Python library. The source connector can be any of the ones supported. This example uses the local source connector.

This example sends files to Unstructured for processing by default. To process files locally instead, see the instructions at the end of this page.

<CodeGroup>
  ```bash CLI theme={null}
  #!/usr/bin/env bash

  # Chunking and embedding are optional.

  unstructured-ingest \
    local \
      --input-path $LOCAL_FILE_INPUT_DIR \
      --partition-by-api \
      --strategy hi_res \
      --chunking-strategy by_title \
      --embedding-provider huggingface \
      --partition-by-api \
      --api-key $UNSTRUCTURED_API_KEY \
      --partition-endpoint $UNSTRUCTURED_API_URL \
      --additional-partition-args="{\"split_pdf_page\":\"true\", \"split_pdf_allow_failed\":\"true\", \"split_pdf_concurrency_level\": 15}" \
    singlestore \
      --host $SINGLESTORE_HOST \
      --port $SINGLESTORE_PORT \
      --user $SINGLESTORE_USER \
      --password $SINGLESTORE_PASSWORD \
      --database $SINGLESTORE_DB \
      --table-name $SINGLESTORE_TABLE
  ```

  ```python Python Ingest theme={null}
  import os

  from unstructured_ingest.pipeline.pipeline import Pipeline
  from unstructured_ingest.interfaces import ProcessorConfig
  from unstructured_ingest.processes.connectors.local import (
      LocalIndexerConfig,
      LocalDownloaderConfig,
      LocalConnectionConfig
  )
  from unstructured_ingest.processes.partitioner import PartitionerConfig
  from unstructured_ingest.processes.chunker import ChunkerConfig
  from unstructured_ingest.processes.embedder import EmbedderConfig
  from unstructured_ingest.processes.connectors.singlestore import (
      SingleStoreConnectionConfig,
      SingleStoreAccessConfig,
      SingleStoreUploadStagerConfig,
      SingleStoreUploaderConfig
  )

  # Chunking and embedding are optional.

  if __name__ == "__main__":
      Pipeline.from_configs(
          context=ProcessorConfig(),
          indexer_config=LocalIndexerConfig(input_path=os.getenv("LOCAL_FILE_INPUT_DIR")),
          downloader_config=LocalDownloaderConfig(),
          source_connection_config=LocalConnectionConfig(),
          partitioner_config=PartitionerConfig(
              partition_by_api=True,
              api_key=os.getenv("UNSTRUCTURED_API_KEY"),
              partition_endpoint=os.getenv("UNSTRUCTURED_API_URL"),
              strategy="hi_res",
              additional_partition_args={
                  "split_pdf_page": True,
                  "split_pdf_allow_failed": True,
                  "split_pdf_concurrency_level": 15
              }
          ),
          chunker_config=ChunkerConfig(chunking_strategy="by_title"),
          embedder_config=EmbedderConfig(embedding_provider="huggingface"),
          destination_connection_config=SingleStoreConnectionConfig(
              host=os.getenv("SINGLESTORE_HOST"),
              port=os.getenv("SINGLESTORE_PORT"),
              user=os.getenv("SINGLESTORE_USER"),
              database=os.getenv("SINGLESTORE_DB"),
              access_config=SingleStoreAccessConfig(
                  password=os.getenv("SINGLESTORE_PASSWORD")
              )
          ),
          stager_config=SingleStoreUploadStagerConfig(drop_empty_cols=True),
          uploader_config=SingleStoreUploaderConfig(table_name=os.getenv("SINGLESTORE_TABLE"))
      ).run()
  ```
</CodeGroup>

For the Unstructured Ingest CLI and the Unstructured Ingest Python library, you can use the `--partition-by-api` option (CLI) or `partition_by_api` (Python) parameter to specify where files are processed:

* To do local file processing, omit `--partition-by-api` (CLI) or `partition_by_api` (Python), or explicitly specify `partition_by_api=False` (Python).

  Local file processing does not use an Unstructured API key or API URL, so you can also omit the following, if they appear:

  * `--api-key $UNSTRUCTURED_API_KEY` (CLI) or `api_key=os.getenv("UNSTRUCTURED_API_KEY")` (Python)
  * `--partition-endpoint $UNSTRUCTURED_API_URL` (CLI) or `partition_endpoint=os.getenv("UNSTRUCTURED_API_URL")` (Python)
  * The environment variables `UNSTRUCTURED_API_KEY` and `UNSTRUCTURED_API_URL`

* To send files to the legacy [Unstructured Partition Endpoint](/api-reference/legacy-api/partition/overview) for processing, specify `--partition-by-api` (CLI) or `partition_by_api=True` (Python).

  Unstructured also requires an Unstructured API key and API URL, by adding the following:

  * `--api-key $UNSTRUCTURED_API_KEY` (CLI) or `api_key=os.getenv("UNSTRUCTURED_API_KEY")` (Python)
  * `--partition-endpoint $UNSTRUCTURED_API_URL` (CLI) or `partition_endpoint=os.getenv("UNSTRUCTURED_API_URL")` (Python)
  * The environment variables `UNSTRUCTURED_API_KEY` and `UNSTRUCTURED_API_URL`, representing your API key and API URL, respectively.

  <Note>
    You must specify the API URL only if you are not using the default API URL for Unstructured Ingest, which applies to **Let's Go**, **Pay-As-You-Go**, and **Business SaaS** accounts.

    The default API URL for Unstructured Ingest is `https://api.unstructuredapp.io/general/v0/general`, which is the API URL for the legacy[Unstructured Partition Endpoint](/api-reference/legacy-api/partition/overview). However, you should always use the URL that was provided to you when your Unstructured account was created. If you do not have this URL, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

    If you do not have an API key, [get one now](/api-reference/legacy-api/partition/overview).

    If you are using a **Business** account, the process
    for generating Unstructured API keys, and the Unstructured API URL that you use, are different.
    For instructions, see your Unstructured account administrator, or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).
  </Note>

---

# Source: https://docs.unstructured.io/ui/sources/slack.md

# Source: https://docs.unstructured.io/open-source/ingestion/source-connectors/slack.md

# Source: https://docs.unstructured.io/api-reference/workflow/sources/slack.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Slack

<Note>
  If you're new to Unstructured, read this note first.

  Before you can create a source connector, you must first sign in to your Unstructured account:

  * If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
    After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).
    To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/api-reference/overview#pricing).
  * If you already have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at
    [https://platform.unstructured.io](https://platform.unstructured.io). For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
    or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  After you sign in, the [Unstructured user interface](/ui/overview) (UI) appears, which you use to get your Unstructured API key, as follows:

  1. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business** account, click **API Keys** on the sidebar.<br />

     For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
     for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).

  2. Click **Generate API Key**.<br />

  3. Follow the on-screen instructions to finish generating the key.<br />

  4. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

  After you create the source connector, add it along with a
  [destination connector](/api-reference/workflow/destinations/overview) to a [workflow](/api-reference/workflow/overview#workflows).
  Then run the worklow as a [job](/api-reference/workflow/overview#jobs). To learn how, try out the
  the notebook [Dropbox-To-Pinecone Connector API Quickstart for Unstructured](https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Dropbox_To_Pinecone_Connector_Quickstart.ipynb),
  or watch the two 4-minute video tutorials for the [Unstructured Python SDK](/api-reference/workflow/overview#unstructured-python-sdk).

  You can also create source connectors with the Unstructured user interface (UI).
  [Learn how](/ui/sources/overview).

  If you need help, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  You are now ready to start creating a source connector! Keep reading to learn how.
</Note>

Ingest your files into Unstructured from Slack.

The requirements are as follows.

<iframe width="560" height="315" src="https://www.youtube.com/embed/0F1MXMgk3kY" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* A Slack app. Create a Slack app by following [Step 1: Creating an app](https://api.slack.com/quickstart#creating).

* The app must have the following OAuth scopes:

  * To read messages from public channels, add the scopes `channels:join` and `channels:history`.
  * To read messages from private channels, add the scope `groups:history`.

  Add these scopes to the app by following [Step 2: Requesting scopes](https://api.slack.com/quickstart#scopes).

* The app must be installed and authorized for the target Slack workspace. Install and authorize the app by following [Step 3: Installing and authorizing the app](https://api.slack.com/quickstart#installing).

* The app's access token. Get this token by following [Step 3: Installing and authorizing the app](https://api.slack.com/quickstart#installing).

* Add the app to the target channels in the Slack workspace. To do this from the channel, open the channel's details page, click the **Integrations** tab, click **Add apps**, and follow the on-screen directions to install the app.

* The channel ID for each target channel. To get this ID, open the channel's details page, and look for the **Channel ID** field on the **About** tab.

* The starting and ending date and time range for the channels to be processed. Supported formats include:

  * `YYYY-MM-DD`
  * `YYYY-MM-DDTHH:MM:SS`
  * `YYYY-MM-DDTHH:MM:SSZ`
  * `YYYY-MM-DD+HH:MM:SS`
  * `YYYY-MM-DD-HH:MM:SS`

<Info>
  Slack has established API rate limits that could restrict the number and frequency of messages that can be read. To
  explore possible workarounds for these cases, see your Slack representative.
</Info>

To create a Slack source connector, see the following examples.

<CodeGroup>
  ```python Python SDK theme={null}
  import os

  from unstructured_client import UnstructuredClient
  from unstructured_client.models.operations import CreateSourceRequest
  from unstructured_client.models.shared import CreateSourceConnector

  with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
      response = client.sources.create_source(
          request=CreateSourceRequest(
              create_source_connector=CreateSourceConnector(
                  name="<name>",
                  type="slack",
                  config={
                      "channels": [
                          "<channel>",
                          "<channel>"
                      ],
                      "start_date": "<start-date>",
                      "end_date": "<end-date>",
                      "token": "<token>"
                  }
              )
          )
      )

      print(response.source_connector_information)
  ```

  ```bash curl theme={null}
  curl --request 'POST' --location \
  "$UNSTRUCTURED_API_URL/sources" \
  --header 'accept: application/json' \
  --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
  --header 'content-type: application/json' \
  --data \
  '{
      "name": "<name>",
      "type": "slack",
      "config": {
          "channels": [
              "<channel>",
              "<channel>"
          ],
          "start-date": "<start-date>",
          "end-date": "<end-date>",
          "token": "<token>"
      }
  }'
  ```
</CodeGroup>

Replace the preceding placeholders as follows:

* `<name>` (*required*) - A unique name for this connector.
* `<channel>` (*required*): The ID of the target Slack channel from which to read messages.
* `<start-date>`: The date and time, in ISO 8601 format (such as `YYYY-MM-DDTHH:MM:SSZ`), from which to begin reading messages. Messages created before this timestamp are ignored. By default, all messages are read unless an end date is specified.
* `<end-date>`: The date and time, in ISO 8601 format (such as `YYYY-MM-DDTHH:MM:SSZ`), at which to stop reading messages. Messages created after this timestamp are ignored. By default, all messages are read unless a start date is specified.
* `<token>` (*required*): The access token for the Slack app that has permission to read messages from the target Slack channels.

---

# Source: https://docs.unstructured.io/examplecode/tools/snowflake-streamlit.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Streamlit in Snowflake

[Streamlit](https://streamlit.io/) is an open-source Python framework for data scientists and AI/ML engineers to
deliver dynamic data apps with only a few lines of code.

[Streamlit in Snowflake](https://www.snowflake.com/en/product/features/streamlit-in-snowflake/) enables data scientists
and Python developers to combine Streamlit's component-rich, open-source Python library with the scale, performance and
security of the Snowflake platform. Streamlit Python scripts can define user interface (UI) components such as
filters, graphs, sliders, and more to interact with your data.

In this example, you use Snowsight in your Snowflake account to create a simple Streamlit app that uses
[Snowflake Cortex Search for RAG](https://docs.snowflake.com/user-guide/snowflake-cortex/cortex-search/cortex-search-overview)
to ask natural-language questions about an existing table in your Snowflake account. This table
contains data that was generated by Unstructured. Answers are returned in natural-language,
chatbot-style format.

## Prerequisites

* A table in Snowflake that contains data that was generated by Unstructured. The
  target Snowflake table must have a column named `EMBEDDINGS` that will contains vector embeddings for the text in the table's `TEXT` column.
  The following Streamlit example app assumes that the `EMBEDDINGS` column contains 1,024 vector embeddings and has a data type of `VECTOR(FLOAT, 1024)`.

  To create this table, you can [create a custom Unstructured workflow](/ui/workflows#create-a-custom-workflow)
  that uses any supported [source connector](/ui/sources/overview) along with the
  [Snowflake destination connector](/ui/destinations/snowflake). Then
  [run](/ui/workflows#edit%2C-delete%2C-or-run-a-workflow) the workflow to
  generate the data and then insert that generated data into the target Snowflake table.

  After the data is inserted into the target Snowflake table, you can run the following Snowflake SQL statement to
  generate the 1,024 vector embeddings for the text in the table's `TEXT` column and then insert those generated vector
  embeddings into the table's `EMBEDDINGS` column. The model specified here for generating the vector embeddings is the
  same one that is used by the Streamlit example app:

  ```sql  theme={null}
  UPDATE ELEMENTS
  SET EMBEDDINGS = SNOWFLAKE.CORTEX.EMBED_TEXT_1024(
      'snowflake-arctic-embed-l-v2.0', 
      TEXT
  );
  ```

  To learn how to run Snowflake SQL statements, see for example
  [Querying data using worksheets](https://docs.snowflake.com/user-guide/ui-snowsight-query).

* You must have the appropriate privileges to create and use a Streamlit app in your Snowflake account. These
  privileges include ones for the target table's parent database and schema as well as the Snowflake warehouse that
  runs the Streamlit app. For details, see
  [Getting started with Streamlit in Snowflake](https://docs.snowflake.com/developer-guide/streamlit/getting-started).

## Create and run the example app

<Steps>
  <Step title="Create the Streamlit app">
    1. In Snowsight for your Snowflake account, on the sidebar, click **Projects > Streamlit**.
    2. Click **+ Streamlit App**.
    3. For **App title**, enter a name for your app, such as `Unstructured Demo Streamlit App`.
    4. For **App location**, chose the target database and schema to store the app in.
    5. For **App warehouse**, choose the warehouse that you want to use to run your app and execute its queries.
    6. Click **Create**.
  </Step>

  <Step title="Add code to the Streamlit app">
    In this step, you add Python code to the Streamlit app that you created in the previous step.

    This step explains each part of the code as you add it. If you want to skip past these explanations, add the
    code in the [complete code example](#complete-code-example) all at once, and then skip ahead to
    the next step, "Run the Streamlit app."

    1. Import Python dependencies that get the current connection to the Snowflake database and schema and get Streamlit functions and features.

       ```python  theme={null}
       from snowflake.snowpark.context import get_active_session
       import streamlit as st
       ```

    2. Get the current connection to the Snowflake database and schema.

       ```python  theme={null}
       session = get_active_session() 
       ```

    3. Display the title of the app in the Streamlit UI, and get the user's search query from the Streamlit UI.

       ```python  theme={null}
       st.title("Snowflake Cortex Search for RAG with Data from Unstructured")

       query = st.text_input("Enter your search query:")
       ```

    4. Get the user's search query and display a progress indicator in the UI.

       ```python  theme={null}
       if query:
           with st.spinner("Embedding and retrieving..."):
       ```

    5. Use the user's search query to get the top result from the `ELEMENTS` table.
       The `ELEMENTS` table contains the data that was generated by Unstructured. The code uses the
       `SNOWFLAKE.CORTEX.EMBED_TEXT_1024` function to generate vector embeddings for the user's search query and the `VECTOR_COSINE_SIMILARITY`
       function to get the similarity between the vector embeddings for the user's search query and the vector embeddings for the `TEXT` column
       for each rown in the `ELEMENTS` table. The code then orders the results by similarity and limits the results to the row with the greatest similarity
       between the search query and the target text.

       ```python  theme={null}
               top_result_df = session.sql(f"""
                   WITH query_embedding AS (
                       SELECT SNOWFLAKE.CORTEX.EMBED_TEXT_1024(
                           'snowflake-arctic-embed-l-v2.0', '{query}'
                       ) AS EMBED
                   )
                   SELECT 
                       e.TEXT,
                       VECTOR_COSINE_SIMILARITY(e.EMBEDDINGS, q.EMBED) AS similarity
                   FROM ELEMENTS e, query_embedding q
                   ORDER BY similarity DESC
                   LIMIT 1
               """).to_pandas()
       ```

    6. Get the `TEXT` column from the top result and use it as context for the user's search query.

       ```python  theme={null}
               context = top_result_df["TEXT"][0]
       ```

    7. Use the user's search query and the context from the top result to get a response from Snowflake Cortex Search for RAG.
       The code uses the `SNOWFLAKE.CORTEX.COMPLETE` function to generate a response to the user's search query based on the context from the top result.

       ```python  theme={null}
               completion_df = session.sql(f"""
                   SELECT SNOWFLAKE.CORTEX.COMPLETE(
                       'snowflake-arctic',
                   CONCAT('Context: ', $$ {context} $$, ' \\n\\nQuestion: {query}\\nAnswer:')
               ) AS RESPONSE
               """).to_pandas()
       ```

    8. Display the generated response in the Streamlit UI.

       ```python  theme={null}
               st.write("Answer:")
               st.write(completion_df["RESPONSE"][0])
       ```
  </Step>

  <Step title="Run the Streamlit app">
    1. In the upper right corner, click **Run**.
    2. For **Enter your search query**, enter some natural-language question about the `TEXT` column in the table.
    3. Press **Enter**.

    Snowflake Cortex Search for RAG returns its answer to your question in natural-language, chatbot-style format.
  </Step>
</Steps>

## Complete code example

The full code example for the Streamlit app is as follows:

```python  theme={null}
from snowflake.snowpark.context import get_active_session
import streamlit as st

session = get_active_session()

st.title("Snowflake Cortex Search for RAG with Data from Unstructured")

query = st.text_input("Enter your search query:")

if query:
    with st.spinner("Embedding and retrieving..."):

        top_result_df = session.sql(f"""
            WITH query_embedding AS (
                SELECT SNOWFLAKE.CORTEX.EMBED_TEXT_1024(
                    'snowflake-arctic-embed-l-v2.0', '{query}'
                ) AS EMBED
            )
            SELECT 
                e.TEXT,
                VECTOR_COSINE_SIMILARITY(e.EMBEDDINGS, q.EMBED) AS similarity
            FROM ELEMENTS e, query_embedding q
            ORDER BY similarity DESC
            LIMIT 1
        """).to_pandas()

        context = top_result_df["TEXT"][0]

        completion_df = session.sql(f"""
            SELECT SNOWFLAKE.CORTEX.COMPLETE(
                'snowflake-arctic',
                CONCAT('Context: ', $$ {context} $$, ' \\n\\nQuestion: {query}\\nAnswer:')
            ) AS RESPONSE
        """).to_pandas()

        st.write("Answer:")
        st.write(completion_df["RESPONSE"][0])
```

## Additional resources

* [Streamlit in Snowflake documentation](https://docs.snowflake.com/developer-guide/streamlit/about-streamlit)
* [Create and deploy Streamlit apps using Snowsight](https://docs.snowflake.com/developer-guide/streamlit/create-streamlit-ui)
* [Snowflake Solutions Developer Center for Streamlit](https://www.snowflake.com/en/developers/solutions-center/?tags=technology%2Fstreamlit)
* [Streamlit documentation](https://docs.streamlit.io/)

---

# Source: https://docs.unstructured.io/ui/sources/snowflake.md

# Source: https://docs.unstructured.io/ui/destinations/snowflake.md

# Source: https://docs.unstructured.io/open-source/ingestion/source-connectors/snowflake.md

# Source: https://docs.unstructured.io/open-source/ingestion/destination-connectors/snowflake.md

# Source: https://docs.unstructured.io/api-reference/workflow/sources/snowflake.md

# Source: https://docs.unstructured.io/api-reference/workflow/destinations/snowflake.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Snowflake

<Note>
  If you're new to Unstructured, read this note first.

  Before you can create a destination connector, you must first sign in to your Unstructured account:

  * If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
    After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).
    To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/api-reference/overview#pricing).
  * If you already have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at
    [https://platform.unstructured.io](https://platform.unstructured.io). For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
    or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  After you sign in, the [Unstructured user interface](/ui/overview) (UI) appears, which you use to get your Unstructured API key.

  1. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business** account, click **API Keys** on the sidebar.<br />

     For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
     for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).

  2. Click **Generate API Key**.<br />

  3. Follow the on-screen instructions to finish generating the key.<br />

  4. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

  After you create the destination connector, add it along with a
  [source connector](/api-reference/workflow/sources/overview) to a [workflow](/api-reference/workflow/overview#workflows).
  Then run the worklow as a [job](/api-reference/workflow/overview#jobs). To learn how, try out the
  the notebook [Dropbox-To-Pinecone Connector API Quickstart for Unstructured](https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Dropbox_To_Pinecone_Connector_Quickstart.ipynb),
  or watch the two 4-minute video tutorials for the [Unstructured Python SDK](/api-reference/workflow/overview#unstructured-python-sdk).

  You can also create destination connectors with the Unstructured user interface (UI).
  [Learn how](/ui/destinations/overview).

  If you need help, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  You are now ready to start creating a destination connector! Keep reading to learn how.
</Note>

Send processed data from Unstructured to Snowflake.

The requirements are as follows.

* A Snowflake [account](https://signup.snowflake.com/) and its account identifier.

  <iframe width="560" height="315" src="https://www.youtube.com/embed/LgCjLyClg3o" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

  To get the identifier for the current Snowflake account:

  1. Log in to [Snowsight](https://docs.snowflake.com/user-guide/ui-snowsight-homepage) with your Snowflake account.
  2. In Snowsight, on the navigation menu, click your username, and then click **Account > View account details**.
  3. On the **Account** tab, note the value of the **Account Identifier** field.

  Alternatively, the following Snowflake query returns the current account's identifier:

  ```text  theme={null}
  SELECT CURRENT_ORGANIZATION_NAME() || '-' || CURRENT_ACCOUNT_NAME() AS "Account Identifier"
  ```

* A Snowflake user, which can be a service user (recommended) or a human user.

  To create a service user entry and get their login name (not username):

  <iframe width="560" height="315" src="https://www.youtube.com/embed/n755MKunb0k" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

  1. Log in to [Snowsight](https://docs.snowflake.com/user-guide/ui-snowsight-homepage) with your Snowflake account.

  2. In Snowsight, on the navigation menu, click **Projects > Worksheets**.

  3. Click the **+** button to create a SQL worksheet.

  4. In the worksheet, enter the following Snowflake query to create a service user, replacing the following placeholders:

     * Replace `<service-user-name>` with some name for the service user.
     * Replace `<default-role-name>` with the name of any default role for the service user to use.

     ```sql  theme={null}
     CREATE USER <service-user-name>
     DEFAULT_ROLE = "<default-role-name>"
     TYPE = SERVICE
     ```

  5. Click the arrow icon to run the worksheet, which creates the service user.

  6. To get their login name, on the navigation menu, click **Admin > Users & Roles**.

  7. On the **Users** tab, in the list of available users, click the name of the target user.

  8. In the **About** tile, note the **Login Name** for the user.

  To create a human user entry and get their login name (not username):

  <iframe width="560" height="315" src="https://www.youtube.com/embed/CECTR2QE__w" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

  1. Log in to [Snowsight](https://docs.snowflake.com/user-guide/ui-snowsight-homepage) with your Snowflake account.
  2. In Snowsight, on the navigation menu, click **Admin > Users & roles**.
  3. Click the **Users** tab.
  4. Click **+ User**.
  5. Follow the on-screen guidance to specify the user's settings.
  6. Click **Create User**.
  7. To get their login name, on the navigation menu, click **Admin > Users & Roles**.
  8. On the **Users** tab, in the list of available users, click the name of the target user.
  9. In the **About** tile, note the **Login Name** for the user.

* A programmatic access token (PAT) for the Snowflake user.

  <iframe width="560" height="315" src="https://www.youtube.com/embed/8u41UJ8lN3w" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

  To create a programmatic access token (PAT) for a user:

  1. Log in to [Snowsight](https://docs.snowflake.com/user-guide/ui-snowsight-homepage) with your Snowflake account.

  2. In Snowsight, on the navigation menu, click **Admin > Users & Roles**.

  3. On the **Users** tab, in the list of available users, click the name of the target user.

  4. In the **Programmatic access tokens** tile, click the **Generate new token** button.

  5. Follow the on-screen guidance to specify the PAT's settings.

     <Warning>
       You must set an expiration date for the PAT. This expiration date can be as soon as one day after the PAT is created or up to one year or even later.
       Once this PAT expires, the connector will stop working.
       To make sure that your connector continues to work, before your current PAT expires, you must follow this procedure again to generate a new PAT and
       update your connector's settings with your new PAT's value.

       Unstructured does not notify you when a PAT is about to expire or has already expired.
       You are responsible for tracking your PATs' expiration dates and taking corrective action before they expire.
     </Warning>

  6. Click **Generate**.

  7. Copy the generated PAT's value to a secure location, as you will not be able to access it again. If you lose this PAT's value,
     you will need to repeat this procedure to generate a new, replacement one.

  The PAT will not work unless the Snowflake account also has a valid
  [network rule](https://docs.snowflake.com/en/user-guide/network-rules) along with a valid
  [network policy](https://docs.snowflake.com/en/user-guide/network-policies) attached to that rule.
  The network rule must also be activated on the Snowflake account to begin taking effect.

  To create a valid network rule:

  1. Log in to [Snowsight](https://docs.snowflake.com/user-guide/ui-snowsight-homepage) with your Snowflake account.

  2. In Snowsight, on the navigation menu, click **Admin > Security > Network Rules**.

  3. Click **+ Network Rule**.

  4. Enter some name for the network rule.

  5. For **Type**, select **IPv4**.

  6. For **Mode**, select **Ingress**.

  7. For **Identifiers**, next to the magnifying glass icon, enter `0.0.0.0/0`, and then press **Enter**.

     <Note>
       The `0.0.0.0/0` value allows all IP addresses to access the Snowflake account.
       You can specify a more specific IP address range if you prefer. However, this more specific IP address range
       will apply to all users, including the user for which you created the PAT.
     </Note>

  8. Click **Create Network Rule**.

  To create a valid network policy, attaching the preceding network rule to this policy at the same time:

  1. Log in to [Snowsight](https://docs.snowflake.com/user-guide/ui-snowsight-homepage) with your Snowflake account.
  2. In Snowsight, on the navigation menu, click **Admin > Security > Network Policies**.
  3. Click **+ Network Policy**.
  4. Enter some name for the network policy.
  5. Make sure **Allowed** is selected.
  6. In the **Select rule** drop-down list, select the precedingnetwork rule to attach to this network policy.
  7. Click **Create Network Policy**.

  To activate the network rule in the account:

  1. Log in to [Snowsight](https://docs.snowflake.com/user-guide/ui-snowsight-homepage) with your Snowflake account.
  2. In Snowsight, on the navigation menu, click **Admin > Security > Network Policies**.
  3. Click the name of the precedingnetwork policy to activate.
  4. In the policy's side panel, click the ellipsis (three dots) icon, and then click **Activate On Account**.
  5. Click **Activate policy**.

* (No longer recommended, as passwords are being deprecated by Snowflake—use PATs instead) The Snowflake [user's login name (not username) and the user's password](https://docs.snowflake.com/user-guide/admin-user-management#creating-users) in the account.
  This user must be a human user. Passwords are not supported for service users.

* The name of the Snowflake [role](https://docs.snowflake.com/sql-reference/sql/create-role) that the user belongs to and that also has sufficient access to the Snowflake database, schema, table, and host.

  <iframe width="560" height="315" src="https://www.youtube.com/embed/cxe2Ed_-qkM" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

  * To create a database in Snowflake, the role needs to be granted `CREATE DATABASE` privilege at the current account level; and `USAGE` privilege on the warehouse that is used to create the database.
  * To create a schema in a database in Snowflake, the role needs to be granted `USAGE` privilege on the database and the warehouse that is used to create the schema; and `CREATE SCHEMA` on the database.
  * To create a table in a schema in Snowflake, the role needs to be granted `USAGE` privilege on the database and schema and the warehouse that is used to create the table; and `CREATE TABLE` on the schema.
  * To write to a table in Snowflake, the role needs to be granted `USAGE` privilege on the database and schema and the warehouse that is used to write to the table; and `INSERT` on the table.
  * To read from a table in Snowflake, the role needs to be granted `USAGE` privilege on the database and schema and the warehouse that is used to write to the table; and `SELECT` on the table.

  To view a list of available roles in the current Snowflake account:

  1. Log in to [Snowsight](https://docs.snowflake.com/user-guide/ui-snowsight-homepage) with your Snowflake account.
  2. In Snowsight, on the navigation menu, click **Admin > Users & Roles**.
  3. Click the **Roles** tab.

  Alternatively, the following Snowflake query returns a list of available roles in the current account:

  ```text  theme={null}
  SHOW ROLES;
  ```

  [Grant privileges to a role](https://docs.snowflake.com/sql-reference/sql/grant-privilege). [Learn more](https://docs.snowflake.com/user-guide/security-access-control-privileges).

* The Snowflake warehouse's [hostname and its port number](https://docs.snowflake.com/sql-reference/functions/system_allowlist) in the account.

  <iframe width="560" height="315" src="https://www.youtube.com/embed/ItLW0N1uEOI" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

  To view a list of available warehouses in the current Snowflake account:

  1. Log in to [Snowsight](https://docs.snowflake.com/user-guide/ui-snowsight-homepage) with your Snowflake account.
  2. In Snowsight, on the navigation menu, click **Admin > Warehouses**. This view does not provide access to the warehouses' hostnames or port numbers. To get this information, you must run a Snowflake query.

  The following Snowflake query returns a list of available warehouse types, hostnames, and port numbers in the current account. Look for the row with a `type` of `SNOWFLAKE_DEPLOYMENT`:

  ```text  theme={null}
  SELECT t.VALUE:type::VARCHAR as type,
         t.VALUE:host::VARCHAR as host,
         t.VALUE:port as port
  FROM TABLE(FLATTEN(input => PARSE_JSON(SYSTEM$ALLOWLIST()))) AS t;
  ```

* The name of the Snowflake [database](https://docs.snowflake.com/sql-reference/sql/create-database) in the account.

  <iframe width="560" height="315" src="https://www.youtube.com/embed/f49tin_Pehs" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

  To view a list of available databases in the current Snowflake account:

  1. Log in to [Snowsight](https://docs.snowflake.com/user-guide/ui-snowsight-homepage) with your Snowflake account.
  2. In Snowsight, on the navigation menu, click **Data > Databases**.

  Alternatively, the following Snowflake query returns a list of available databases in the current account:

  ```text  theme={null}
  SHOW DATABASES;
  ```

* The name of the [schema](https://docs.snowflake.com/sql-reference/sql/create-schema) in the database.

  To view a list of available schemas for a database in the current Snowflake account:

  1. Log in to [Snowsight](https://docs.snowflake.com/user-guide/ui-snowsight-homepage) with your Snowflake account.
  2. In Snowsight, on the navigation menu, click **Data > Databases**.
  3. Expand the name of the target database.

  Alternatively, the following Snowflake query returns a list of available schemas in the current account:

  ```text  theme={null}
  SHOW SCHEMAS;
  ```

  The following Snowflake query returns a list of available schemas for the database named `<database_name>` in the current account:

  ```text  theme={null}
  SHOW SCHEMAS IN DATABASE <database_name>;
  ```

* The name of the [table](https://docs.snowflake.com/sql-reference/sql/create-table) in the schema.

  To view a list of available tables for a schema in a database in the current Snowflake account:

  1. Log in to [Snowsight](https://docs.snowflake.com/user-guide/ui-snowsight-homepage) with your Snowflake account.
  2. In Snowsight, on the navigation menu, click **Data > Databases**.
  3. Expand the name of the database that contains the target schema.
  4. Expand the name of the target schema.
  5. Expand **Tables**.

  Alternatively, the following Snowflake query returns a list of available tables for the schema named `<schema_name>` in the datbase named
  `<database_name>` in the current account:

  ```text  theme={null}
  SHOW TABLES IN SCHEMA <database_name>.<schema_name>;
  ```

  Snowflake requires the target table to have a defined schema before Unstructured can write to the table. The minimum viable
  schema for Unstructured contains only the columns `ID`, `ELEMENT_ID`, and `RECORD_ID`. The columns `TEXT` and `TYPE` are optional, but highly recommended.
  If you are generating embeddings, then you must also include the column `EMBEDDINGS`.

  In the following `CREATE TABLE` statement, replace the following placeholders with the appropriate values:

  * `<database_name>`: The name of the target database in the Snowflake account.
  * `<schema_name>`: The name of the target schema in the database.
  * `<number-of-dimensions>`: The number of dimensions for any embeddings that you plan to use. This value must match the number of dimensions for any embeddings that are
    specified in your related Unstructured workflows or pipelines. If you plan to use Snowflake vector embedding generation or Snowflake vector search,
    this value must match the number of dimensions that you plan to have Snowflake generate or search against.

  ```sql SQL theme={null}
  CREATE TABLE <database_name>.<schema_name>.ELEMENTS (
      ID VARCHAR(36) PRIMARY KEY NOT NULL DEFAULT UUID_STRING(),
      ELEMENT_ID VARCHAR,
      RECORD_ID VARCHAR,
      TEXT VARCHAR,
      TYPE VARCHAR,
      EMBEDDINGS VECTOR(FLOAT, <number-of-dimensions>)
  );
  ```

  For objects in the `metadata` field that Unstructured produces and that you want to store in a Snowflake table, you must create columns in your table's schema that
  follow Unstructured's `metadata` field naming convention. For example, if Unstructured produces a `metadata` field with the following
  child objects:

  ```json  theme={null}
  "metadata": {
    "is_extracted": "true",
    "coordinates": {
      "points": [
        [
          134.20055555555555,
          241.36027777777795
        ],
        [
          134.20055555555555,
          420.0269444444447
        ],
        [
          529.7005555555555,
          420.0269444444447
        ],
        [
          529.7005555555555,
          241.36027777777795
        ]
      ],
      "system": "PixelSpace",
      "layout_width": 1654,
      "layout_height": 2339
    },
    "filetype": "application/pdf",
    "languages": [
      "eng"
    ],
    "page_number": 1,
    "image_mime_type": "image/jpeg",
    "filename": "realestate.pdf",
    "data_source": {
      "url": "file:///home/etl/node/downloads/00000000-0000-0000-0000-000000000001/7458635f-realestate.pdf",
      "record_locator": {
        "protocol": "file",
        "remote_file_path": "file:///home/etl/node/downloads/00000000-0000-0000-0000-000000000001/7458635f-realestate.pdf"
      }
    }
  }
  ```

  You could create corresponding fields in your table's schema by using the following field names and data types:

  ```sql  theme={null}
  -- The ID, RECORD_ID, and ELEMENT_ID columns are required.
  -- TEXT and TYPE are not required but highly recommended.
  -- EMBEDDINGS is required if embeddings are being generated.
  -- All other "metadata" columns are optional.
  CREATE TABLE <database_name>.<schema_name>.ELEMENTS (
      ID VARCHAR(36) PRIMARY KEY NOT NULL DEFAULT UUID_STRING(),
      RECORD_ID VARCHAR,
      ELEMENT_ID VARCHAR,
      TEXT VARCHAR,
      TYPE VARCHAR,
      EMBEDDINGS VECTOR(FLOAT, <number-of-dimensions>),
      IS_EXTRACTED VARCHAR,
      POINTS VARCHAR,
      SYSTEM VARCHAR,
      LAYOUT_WIDTH INTEGER,
      LAYOUT_HEIGHT INTEGER,
      FILETYPE VARCHAR,
      LANGUAGES ARRAY,
      PAGE_NUMBER VARCHAR,
      IMAGE_MIME_TYPE VARCHAR,
      FILENAME VARCHAR,
      URL VARCHAR,
      RECORD_LOCATOR VARCHAR
  );
  ```

  Unstructured cannot provide a schema that is guaranteed to work in all
  circumstances. This is because these schemas will vary based on your source files' types; how you
  want Unstructured to partition, chunk, and generate embeddings; any custom post-processing code that you run; and other factors.

* The name of the column in the table that uniquely identifies each record (for example, `RECORD_ID`).

To create a Snowflake destination connector, see the following examples.

<CodeGroup>
  ```python Python SDK theme={null}
  import os

  from unstructured_client import UnstructuredClient
  from unstructured_client.models.operations import CreateDestinationRequest
  from unstructured_client.models.shared import CreateDestinationConnector

  with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
      response = client.destinations.create_destination(
          request=CreateDestinationRequest(
              create_destination_connector=CreateDestinationConnector(
                  name="<name>",
                  type="snowflake",
                  config={
                      "account": "<account>",
                      "user": "<user>",
                      "host": "<host>",
                      "port": <port>,
                      "database": "<database>",
                      "schema": "<schema>",
                      "role": "<role>",
                      "password": "<programmatic-access-token>",
                      "record_id_key": "<record-id-key>",
                      "table_name": "<table_name>",
                      "batch_size": <batch-size>
                  }
              )
          )
      )

      print(response.destination_connector_information)
  ```

  ```bash curl theme={null}
  curl --request 'POST' --location \
  "$UNSTRUCTURED_API_URL/destinations" \
  --header 'accept: application/json' \
  --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
  --header 'content-type: application/json' \
  --data \
  '{
      "name": "<name>",
      "type": "snowflake",
      "config": {
          "account": "<account>",
          "user": "<user>",
          "host": "<host>",
          "port": <port>,
          "database": "<database>",
          "schema": "<schema>",
          "role": "<role>",
          "password": "<programmatic-access-token>",
          "record_id_key": "<record-id-key>",
          "table_name": "<table_name>",
          "batch_size": <batch-size>
      }
  }'
  ```
</CodeGroup>

Replace the preceding placeholders as follows:

* `<name>` (*required*) - A unique name for this connector.

* `<account>` (*required*): The target Snowflake account's identifier.

* `<role>` (*required*): The name of the Snowflake role that the user belongs to. This role must have the appropriate access to the target Snowflake warehouse, database, schema, and table.

* `<user>` (*required*): The target Snowflake user's login name (not their username).

* `<programmatic-access-token>` (*required*): The user's programmatic access token (PAT).

  <Note>
    Specifying a password is no longer recommended, as passwords are being deprecated by Snowflake. Use a PAT instead.
  </Note>

* `<host>` (*required*): The hostname of the target Snowflake warehouse.

* `<port>` (*required*): The warehouse's port number. The default is `443` if not otherwise specified.

* `<database>` (*required*): The name of the target Snowflake database.

* `<schema>` (*required*): The name of the target Snowflake schema within the database.

* `<table_name>`: The name of the target Snowflake table within the database's schema. For the destination connector, the default is `elements` if not otherwise specified.

* `<columns>` (source connector only): A comma-separated list of columns to fetch from the table. By default, all columns are fetched unless otherwise specified.

* `<id-column>` (*required*, source connector only): The name of the column that uniquely identifies each record in the table.

* `<record-id-key>` (destination connector only): The name of the column that uniquely identifies each record in the table. The default is `record_id` if not otherwise specified.

* `<batch-size>` (*required*): The maximum number of rows to fetch for each batch. The default is `50` if not otherwise specified.

## Learn more

* <Icon icon="blog" />  [Powering Enterprise RAG: Unstructured's New Snowflake Integration](https://unstructured.io/blog/powering-enterprise-rag-unstructured-s-new-snowflake-integration)

---

# Source: https://docs.unstructured.io/open-source/how-to/speed-up-large-files-batches.md

# Source: https://docs.unstructured.io/api-reference/legacy-api/partition/speed-up-large-files-batches.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Speed up processing of large files and batches

<Note>
  The following information applies to the legacy Unstructured Partition Endpoint.

  Unstructured recommends that you use the
  [on-demand jobs](/api-reference/workflow/overview#run-an-on-demand-job) functionality in the
  [Unstructured API](/api-reference/overview) instead. Unstructured's on-demand jobs provide
  many benefits over the legacy Unstructured Partition Endpoint, including support for:

  * Production-level usage.
  * Multiple local input files in batches.
  * The latest and highest-performing models.
  * Post-transform enrichments.
  * All of Unstructured's chunking strategies.
  * The generation of vector embeddings.

  The Unstructured API also provides support for processing files and data in remote locations.
</Note>

When you use Unstructured, here are some techniques that you can try to help speed up the processing of large files and large batches of files.

Choose your partitioning strategy wisely. For example, if you have simple PDFs that don't have images and tables, you might be able to use the `fast` strategy. Try the `fast` strategy on a few of your documents before you try using the `hi_res` strategy. [Learn more](/api-reference/legacy-api/partition/partitioning).

To speed up PDF file processing, the [Unstructured SDK for Python](/api-reference/legacy-api/partition/sdk-python) and the [Unstructured SDK for JavaScript/TypeScript](/api-reference/legacy-api/partition/sdk-jsts) provide the following parameters to help speed up processing a large PDF file:

* `split_pdf_page` (Python) or `splitPdfPage` (JavaScript/TypeScript), when set to true (the default), splits the PDF file on the client side before sending it as batches to Unstructured for processing. The number of pages in each batch is determined internally. Batches can contain between 2 and 20 pages.
* `split_pdf_concurrency_level` (Python) or `splitPdfConcurrencyLevel` (JavaScript/TypeScript) is an integer that specifies the number of parallel requests. The default is 5. The maximum is 15. This behavior is ignored unless `split_pdf_page` (Python) or `splitPdfPage` (JavaScript/TypeScript) is also set to true.
* `split_pdf_allow_failed` (Python) or splitPdfAllowFailed\` (JavaScript/TypeScript), when set to true, allows partitioning to continue even if some pages fail.
* `split_pdf_page_range` (Python only) is a list of two integers that specify the beginning and ending page numbers of the PDF file to be sent. A `ValueError` is raised if the specified range is not valid. This behavior is ignored unless `split_pdf_page` is also set to true.

[Learn more](/api-reference/legacy-api/partition/sdk-python#page-splitting).

---

# Source: https://docs.unstructured.io/open-source/ingestion/source-connectors/sqlite.md

# Source: https://docs.unstructured.io/open-source/ingestion/destination-connectors/sqlite.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# SQLite

Batch process all your records to store structured outputs in a SQLite schema.

Insert query is currently limited to append.

The requirements are as follows.

* A SQLite instance. [Download and install SQLite](https://www.sqlitetutorial.net/download-install-sqlite/).
* A SQLite database. [Create a database](https://www.sqlite.org/quickstart.html).
* The path to the database's `.db` file.
* A table in the database. [Create a table](https://www.sqlite.org/cli.html).

  The table's schema must match the schema of the documents that Unstructured produces.
  Unstructured cannot provide a schema that is guaranteed to work in all
  circumstances. This is because these schemas will vary based on your source files' types; how you
  want Unstructured to partition, chunk, and generate embeddings; any custom post-processing code that you run; and other factors.

  You can adapt the following table schema example for your own needs:

  ```sql SQLite theme={null}
  CREATE TABLE elements (
      id TEXT PRIMARY KEY,
      record_id TEXT,
      element_id TEXT,
      text TEXT,
      embeddings TEXT,
      parent_id TEXT,
      page_number INTEGER,
      is_continuation INTEGER,
      orig_elements TEXT,
      partitioner_type TEXT
  );
  ```

  See also:

  * [CREATE TABLE](https://www.sqlite.org/lang_createtable.html) for SQLite
  * [Unstructured document elements and metadata](/api-reference/legacy-api/partition/document-elements)

You might also need to install additional dependencies, depending on your needs. [Learn more](/open-source/ingestion/ingest-dependencies).

The following environment variables:

* `SQLITE_DB_PATH` - The path to the database's `.db` file, represented by `--database` (CLI) or `database` (Python).

Now call the Unstructured Ingest CLI or the Unstructured Ingest Python library. The source connector can be any of the ones supported. This example uses the local source connector:

This example sends files to Unstructured for processing by default. To process files locally instead, see the instructions at the end of this page.

<CodeGroup>
  ```bash CLI theme={null}
  #!/usr/bin/env bash

  # Specify which fields to output in the processed data. This can help prevent
  # database record insert issues, where a particular field in the processed data
  # does not match a column in the database table on insert.
  metadata_includes="id,element_id,text,embeddings,type,system,layout_width,\
  layout_height,points,url,version,date_created,date_modified,date_processed,\
  permissions_data,record_locator,category_depth,parent_id,attached_filename,\
  filetype,last_modified,file_directory,filename,languages,page_number,links,\
  page_name,link_urls,link_texts,sent_from,sent_to,subject,section,\
  header_footer_type,emphasized_text_contents,emphasized_text_tags,\
  text_as_html,regex_metadata,detection_class_prob"

  unstructured-ingest \
    local \
      --input-path $LOCAL_FILE_INPUT_DIR \
      --output-dir $LOCAL_FILE_OUTPUT_DIR \
      --num-processes 2 \
      --verbose \
      --strategy hi_res \
      --partition-by-api \
      --api-key $UNSTRUCTURED_API_KEY \
      --partition-endpoint $UNSTRUCTURED_API_URL \
      --metadata-include "$metadata_includes" \
      --additional-partition-args="{\"split_pdf_page\":\"true\", \"split_pdf_allow_failed\":\"true\", \"split_pdf_concurrency_level\": 15}" \
    sqlite \
      --database-path $SQLITE_DB_PATH
  ```

  ```python Python Ingest theme={null}
  import os

  from unstructured_ingest.pipeline.pipeline import Pipeline
  from unstructured_ingest.interfaces import ProcessorConfig

  from unstructured_ingest.processes.connectors.sql.sqlite import (
      SQLiteConnectionConfig,
      SQLiteAccessConfig,
      SQLiteUploaderConfig,
      SQLiteUploadStagerConfig
  )
  from unstructured_ingest.processes.connectors.local import (
      LocalIndexerConfig,
      LocalDownloaderConfig,
      LocalConnectionConfig
  )
  from unstructured_ingest.processes.partitioner import PartitionerConfig
  from unstructured_ingest.processes.chunker import ChunkerConfig
  from unstructured_ingest.processes.embedder import EmbedderConfig

  # Chunking and embedding are optional.

  if __name__ == "__main__":
      # Specify which fields to output in the processed data. This can help prevent
      # database record insert issues, where a particular field in the processed data
      # does not match a column in the database table on insert.
      metadata_includes = [
          "id", "element_id", "text", "embeddings", "type", "system", "layout_width",
          "layout_height", "points", "url", "version", "date_created", "date_modified",
          "date_processed", "permissions_data", "record_locator", "category_depth",
          "parent_id", "attached_filename", "filetype", "last_modified", "file_directory",
          "filename", "languages", "page_number", "links", "page_name", "link_urls",
          "link_texts", "sent_from", "sent_to", "subject", "section", "header_footer_type",
          "emphasized_text_contents", "emphasized_text_tags", "text_as_html", "regex_metadata",
          "detection_class_prob"
      ]

      Pipeline.from_configs(
          context=ProcessorConfig(),
          indexer_config=LocalIndexerConfig(input_path=os.getenv("LOCAL_FILE_INPUT_DIR")),
          downloader_config=LocalDownloaderConfig(),
          source_connection_config=LocalConnectionConfig(),
          partitioner_config=PartitionerConfig(
              partition_by_api=True,
              api_key=os.getenv("UNSTRUCTURED_API_KEY"),
              partition_endpoint=os.getenv("UNSTRUCTURED_API_URL"),
              metadata_include=metadata_includes,
              additional_partition_args={
                  "split_pdf_page": True,
                  "split_pdf_allow_failed": True,
                  "split_pdf_concurrency_level": 15
              }
          ),
          chunker_config=ChunkerConfig(chunking_strategy="by_title"),
          embedder_config=EmbedderConfig(embedding_provider="huggingface"),
          destination_connection_config=SQLiteConnectionConfig(
              access_config=SQLiteAccessConfig(),
              database_path=os.getenv("SQLITE_DB_PATH")
          ),
          stager_config=SQLiteUploadStagerConfig(),
          uploader_config=SQLiteUploaderConfig()
      ).run()
  ```
</CodeGroup>

For the Unstructured Ingest CLI and the Unstructured Ingest Python library, you can use the `--partition-by-api` option (CLI) or `partition_by_api` (Python) parameter to specify where files are processed:

* To do local file processing, omit `--partition-by-api` (CLI) or `partition_by_api` (Python), or explicitly specify `partition_by_api=False` (Python).

  Local file processing does not use an Unstructured API key or API URL, so you can also omit the following, if they appear:

  * `--api-key $UNSTRUCTURED_API_KEY` (CLI) or `api_key=os.getenv("UNSTRUCTURED_API_KEY")` (Python)
  * `--partition-endpoint $UNSTRUCTURED_API_URL` (CLI) or `partition_endpoint=os.getenv("UNSTRUCTURED_API_URL")` (Python)
  * The environment variables `UNSTRUCTURED_API_KEY` and `UNSTRUCTURED_API_URL`

* To send files to the legacy [Unstructured Partition Endpoint](/api-reference/legacy-api/partition/overview) for processing, specify `--partition-by-api` (CLI) or `partition_by_api=True` (Python).

  Unstructured also requires an Unstructured API key and API URL, by adding the following:

  * `--api-key $UNSTRUCTURED_API_KEY` (CLI) or `api_key=os.getenv("UNSTRUCTURED_API_KEY")` (Python)
  * `--partition-endpoint $UNSTRUCTURED_API_URL` (CLI) or `partition_endpoint=os.getenv("UNSTRUCTURED_API_URL")` (Python)
  * The environment variables `UNSTRUCTURED_API_KEY` and `UNSTRUCTURED_API_URL`, representing your API key and API URL, respectively.

  <Note>
    You must specify the API URL only if you are not using the default API URL for Unstructured Ingest, which applies to **Let's Go**, **Pay-As-You-Go**, and **Business SaaS** accounts.

    The default API URL for Unstructured Ingest is `https://api.unstructuredapp.io/general/v0/general`, which is the API URL for the legacy[Unstructured Partition Endpoint](/api-reference/legacy-api/partition/overview). However, you should always use the URL that was provided to you when your Unstructured account was created. If you do not have this URL, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

    If you do not have an API key, [get one now](/api-reference/legacy-api/partition/overview).

    If you are using a **Business** account, the process
    for generating Unstructured API keys, and the Unstructured API URL that you use, are different.
    For instructions, see your Unstructured account administrator, or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).
  </Note>

---

# Source: https://docs.unstructured.io/open-source/core-functionality/staging.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Staging

<Warning>
  Staging functions in the Unstructured open source library are being deprecated in favor of [destination connectors](/open-source/ingestion/destination-connectors/overview) in the [Unstructured Ingest CLI and Unstructured Ingest Python library](/open-source/ingestion/overview).
</Warning>

Staging functions in the `unstructured` package help prepare your data for ingestion into downstream systems. A staging function accepts a list of document elements as input and return an appropriately formatted dictionary as output. In the example below, we get our narrative text samples prepared for ingestion into LabelStudio using `the stage_for_label_studio` function. We can take this data and directly upload it into LabelStudio to quickly get started with an NLP labeling task.

```python  theme={null}
import json
from unstructured.staging.label_studio import stage_for_label_studio

output = stage_for_label_studio(narrative_text)
print(json.dumps(output[:2], indent=4))

```

## `convert_to_csv`

Converts outputs to the initial structured data (ISD) format as a CSV string.

Examples:

```python  theme={null}
from unstructured.documents.elements import Title, NarrativeText
from unstructured.staging.base import convert_to_csv

elements = [Title(text="Title"), NarrativeText(text="Narrative")]
isd_csv = convert_to_csv(elements)

```

For more information about the `convert_to_csv` function, you can check the [source code here](https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/staging/base.py).

## `convert_to_dataframe`

Converts a list of document `Element` objects to a `pandas` dataframe. The dataframe will have a `text` column with the text from the element and a `type` column indicating the element type, such as `NarrativeText` or `Title`.

Examples:

```python  theme={null}
from unstructured.documents.elements import Title, NarrativeText
from unstructured.staging.base import convert_to_dataframe

elements = [Title(text="Title"), NarrativeText(text="Narrative")]
df = convert_to_dataframe(elements)

```

For more information about the `convert_to_dataframe` function, you can check the [source code here](https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/staging/base.py).

## `convert_to_dict`

Converts a list of `Element` objects to a dictionary. This is the default format for representing documents in `unstructured`.

Examples:

```python  theme={null}
from unstructured.documents.elements import Title, NarrativeText
from unstructured.staging.base import convert_to_dict

elements = [Title(text="Title"), NarrativeText(text="Narrative")]
isd = convert_to_dict(elements)

```

For more information about the `convert_to_dict` function, you can check the [source code here](https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/staging/base.py).

## `dict_to_elements`

Converts a dictionary of the format produced by `convert_to_dict` back to a list of `Element` objects.

Examples:

```python  theme={null}
from unstructured.staging.base import dict_to_elements

isd = [
  {"text": "My Title", "type": "Title"},
  {"text": "My Narrative", "type": "NarrativeText"}
]

# elements will look like:
# [ Title(text="My Title"), NarrativeText(text="My Narrative")]
elements = dict_to_elements(isd)

```

For more information about the `dict_to_elements` function, you can check the [source code here](https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/staging/base.py).

## `stage_csv_for_prodigy`

Formats outputs in CSV format for use with [Prodigy](https://prodi.gy/docs/api-loaders). After running `stage_csv_for_prodigy`, you can write the results to a CSV file that is ready to be used with Prodigy.

Examples:

```python  theme={null}
from unstructured.documents.elements import Title, NarrativeText
from unstructured.staging.prodigy import stage_csv_for_prodigy

elements = [Title(text="Title"), NarrativeText(text="Narrative")]
metadata = [{"type": "title"}, {"source": "news"}]
prodigy_csv_data = stage_csv_for_prodigy(elements, metadata)

# The resulting CSV file is ready to be used with Prodigy
with open("prodigy.csv", "w") as csv_file:
    csv_file.write(prodigy_csv_data)

```

For more information about the `stage_csv_for_prodigy` function, you can check the [source code here](https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/staging/prodigy.py).

## `stage_for_argilla`

Convert a list of `Text` elements to an [Argilla Dataset](https://docs.argilla.io/en/latest/reference/python/python_client.html#python-ref-datasets). The type of Argilla dataset to be generated can be specified with `argilla_task` parameter. Valid values for `argilla_task` are `"text_classification"`, `"token_classification"`, and `"text2text"`. If `"token_classification"` is selected and `tokens` is not included in the optional kwargs, the `nltk` word tokenizer is used by default.

Examples:

```python  theme={null}
import json

from unstructured.documents.elements import Title, NarrativeText
from unstructured.staging.argilla import stage_for_argilla

elements = [Title(text="Title"), NarrativeText(text="Narrative")]
metadata = [{"type": "title"}, {"type": "text"}]

argilla_dataset = stage_for_argilla(elements, "text_classification", metadata=metadata)

```

For more information about the `stage_for_argilla` function, you can check the [source code here](https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/staging/argilla.py).

## `stage_for_baseplate`-

The `stage_for_baseplate` staging function prepares a list of `Element` objects for ingestion into [Baseplate](https://docs.baseplate.ai/introduction), an LLM backend with a spreadsheet interface. After running the `stage_for_baseplate` function, you can use the [Baseplate API](https://docs.baseplate.ai/api-reference/documents/upsert-data-rows) to upload the documents to Baseplate. The following example code shows how to use the `stage_for_baseplate` function.

```python  theme={null}
from unstructured.documents.elements import ElementMetadata, NarrativeText, Title
from unstructured.staging.baseplate import stage_for_baseplate

metadata = ElementMetadata(filename="fox.epub")

elements = [
  Title("A Wonderful Story About A Fox", metadata=metadata),
  NarrativeText(
    "A fox ran into the chicken coop and the chickens flew off!",
    metadata=metadata,
  ),
]

rows = stage_for_baseplate(elements)

```

The output will look like:

```json  theme={null}
{
      "rows": [
          {
              "data": {
                  "element_id": "ad270eefd1cc68d15f4d3e51666d4dc8",
                  "text": "A Wonderful Story About A Fox",
                  "type": "Title",
              },
              "metadata": {"filename": "fox.epub"},
          },
          {
              "data": {
                  "element_id": "8275769fdd1804f9f2b55ad3c9b0ef1b",
                  "text": "A fox ran into the chicken coop and the chickens flew off!",
                  "type": "NarrativeText",
              },
              "metadata": {"filename": "fox.epub"},
          },
      ],
  }

```

For more information about the `stage_for_baseplate` function, you can check the [source code here](https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/staging/baseplate.py).

## `stage_for_datasaur`

Formats a list of `Text` elements as input to token based tasks in Datasaur.

Example:

```python  theme={null}
from unstructured.documents.elements import Text
from unstructured.staging.datasaur import stage_for_datasaur

elements  = [Text("Text1"),Text("Text2")]
datasaur_data = stage_for_datasaur(elements)

```

The output is a list of dictionaries, each one with two keys: “text” with the content of the element and “entities” with an empty list.

You can also specify entities in the `stage_for_datasaur` function. Entities you specify in the input will be included in the entities key in the output. The list of entities is a list of dictionaries and must have all of the keys in the example below. The list of entities must be the same length as the list of elements. Use an empty list for any elements that do not have any entities.

Example:

```python  theme={null}
from unstructured.documents.elements import Text
from unstructured.staging.datasaur import stage_for_datasaur

elements  = [Text("Hi my name is Matt.")]
entities = [[{"text": "Matt", "type": "PER", "start_idx": 11, "end_idx": 15}]]
datasaur_data = stage_for_datasaur(elements, entities)

```

For more information about the `stage_for_datasaur` function, you can check the [source code here](https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/staging/datasaur.py).

## `stage_for_label_box`

Formats outputs for use with [LabelBox](https://docs.labelbox.com/docs/overview). LabelBox accepts cloud-hosted data and does not support importing text directly. The `stage_for_label_box` does the following:

* Stages the data files in the `output_directory` specified in function arguments to be uploaded to a cloud storage service.

* Returns a config of type `List[Dict[str, Any]]` that can be written to a `json` file and imported into LabelBox.

**Note:** `stage_for_label_box` does not upload the data to remote storage such as S3. Users can upload the data to S3 using `aws s3 sync ${output_directory} ${url_prefix}` after running the `stage_for_label_box` staging function.

Examples:

The following example demonstrates generating a `config.json` file that can be used with LabelBox and uploading the staged data files to an S3 bucket.

```python  theme={null}
import os
import json

from unstructured.documents.elements import Title, NarrativeText
from unstructured.staging.label_box import stage_for_label_box

# The S3 Bucket name where data files should be uploaded.
S3_BUCKET_NAME = "labelbox-staging-bucket"

# The S3 key prefix (I.e. directory) where data files should be stored.
S3_BUCKET_KEY_PREFIX = "data/"

# The URL prefix where the data files will be accessed.
S3_URL_PREFIX = f"https://{S3_BUCKET_NAME}.s3.amazonaws.com/{S3_BUCKET_KEY_PREFIX}"

# The local output directory where the data files will be staged for uploading to a Cloud Storage service.
LOCAL_OUTPUT_DIRECTORY = "/tmp/labelbox-staging"

elements = [Title(text="Title"), NarrativeText(text="Narrative")]

labelbox_config = stage_for_label_box(
    elements,
    output_directory=LOCAL_OUTPUT_DIRECTORY,
    url_prefix=S3_URL_PREFIX,
    external_ids=["id1", "id2"],
    attachments=[[{"type": "RAW_TEXT", "value": "Title description"}], [{"type": "RAW_TEXT", "value": "Narrative Description"}]],
    create_directory=True,
)

# The resulting JSON config file is ready to be used with LabelBox.
with open("config.json", "w+") as labelbox_config_file:
    json.dump(labelbox_config, labelbox_config_file, indent=4)


# Upload staged data files to S3 from local output directory.
def upload_staged_files():
    from s3fs import S3FileSystem
    fs = S3FileSystem()
    for filename in os.listdir(LOCAL_OUTPUT_DIRECTORY):
        filepath = os.path.join(LOCAL_OUTPUT_DIRECTORY, filename)
        upload_key = os.path.join(S3_BUCKET_KEY_PREFIX, filename)
        fs.put_file(lpath=filepath, rpath=os.path.join(S3_BUCKET_NAME, upload_key))

upload_staged_files()

```

For more information about the `stage_for_label_box` function, you can check the [source code here](https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/staging/label_box.py).

## `stage_for_label_studio`

Formats outputs for upload to LabelStudio. After running `stage_for_label_studio`, you can write the results to a JSON folder that is ready to be included in a new LabelStudio project.

Examples:

```python  theme={null}
import json

from unstructured.documents.elements import Title, NarrativeText
from unstructured.staging.label_studio import stage_for_label_studio

elements = [Title(text="Title"), NarrativeText(text="Narrative")]
label_studio_data = stage_for_label_studio(elements, text_field="my_text", id_field="my_id")

# The resulting JSON file is ready to be uploaded to LabelStudio
with open("label_studio.json", "w") as f:
    json.dump(label_studio_data, f, indent=4)

```

You can also include pre-annotations and predictions as part of your LabelStudio upload.

The `annotations` kwarg is a list of lists. If `annotations` is specified, there must be a list of annotations for each element in the `elements` list. If an element does not have any annotations, use an empty list. The following shows an example of how to upload annotations for the “Text Classification” task in LabelStudio:

```python  theme={null}
import json

from unstructured.documents.elements import NarrativeText
from unstructured.staging.label_studio import (
    stage_for_label_studio,
    LabelStudioAnnotation,
    LabelStudioResult,
)


elements = [NarrativeText(text="Narrative")]
annotations = [[
  LabelStudioAnnotation(
      result=[
          LabelStudioResult(
              type="choices",
              value={"choices": ["Positive"]},
              from_name="sentiment",
              to_name="text",
          )
      ]
  )
]]
label_studio_data = stage_for_label_studio(
    elements,
    annotations=annotations,
    text_field="my_text",
    id_field="my_id"
)

# The resulting JSON file is ready to be uploaded to LabelStudio
# with annotations included
with open("label_studio.json", "w") as f:
    json.dump(label_studio_data, f, indent=4)

```

Similar to annotations, the `predictions` kwarg is also a list of lists. A `prediction` is an annotation with the addition of a `score` value. If `predictions` is specified, there must be a list of predictions for each element in the `elements` list. If an element does not have any predictions, use an empty list. The following shows an example of how to upload predictions for the “Text Classification” task in LabelStudio:

```python  theme={null}
import json

from unstructured.documents.elements import NarrativeText
from unstructured.staging.label_studio import (
    stage_for_label_studio,
    LabelStudioPrediction,
    LabelStudioResult,
)


elements = [NarrativeText(text="Narrative")]
predictions = [[
  LabelStudioPrediction(
      result=[
          LabelStudioResult(
              type="choices",
              value={"choices": ["Positive"]},
              from_name="sentiment",
              to_name="text",
          )
      ],
      score=0.68
  )
]]
label_studio_data = stage_for_label_studio(
    elements,
    predictions=predictions,
    text_field="my_text",
    id_field="my_id"
)

# The resulting JSON file is ready to be uploaded to LabelStudio
# with annotations included
with open("label_studio.json", "w") as f:
    json.dump(label_studio_data, f, indent=4)

```

The following shows an example of how to upload annotations for the “Named Entity Recognition” task in LabelStudio:

```python  theme={null}
import json

from unstructured.documents.elements import NarrativeText
from unstructured.staging.label_studio import (
    stage_for_label_studio,
    LabelStudioAnnotation,
    LabelStudioResult,
)


elements = [NarrativeText(text="Narrative")]
annotations = [[
  LabelStudioAnnotation(
      result=[
          LabelStudioResult(
              type="labels",
              value={"start": 0, "end": 9, "text": "Narrative", "labels": ["MISC"]},
              from_name="label",
              to_name="text",
          )
      ]
  )
]]
label_studio_data = stage_for_label_studio(
    elements,
    annotations=annotations,
    text_field="my_text",
    id_field="my_id"
)

# The resulting JSON file is ready to be uploaded to LabelStudio
# with annotations included
with open("label_studio.json", "w") as f:
    json.dump(label_studio_data, f, indent=4)

```

See the [LabelStudio docs](https://labelstud.io/tags/labels.html) for a full list of options for labels and annotations.

For more information about the `stage_for_label_studio` function, you can check the [source code here](https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/staging/label_studio.py).

## `stage_for_prodigy`

Formats outputs in JSON format for use with [Prodigy](https://prodi.gy/docs/api-loaders). After running `stage_for_prodigy`, you can write the results to a JSON file that is ready to be used with Prodigy.

Examples:

```python  theme={null}
import json

from unstructured.documents.elements import Title, NarrativeText
from unstructured.staging.prodigy import stage_for_prodigy

elements = [Title(text="Title"), NarrativeText(text="Narrative")]
metadata = [{"type": "title"}, {"type": "text"}]
prodigy_data = stage_for_prodigy(elements, metadata)

# The resulting JSON file is ready to be used with Prodigy
with open("prodigy.json", "w") as f:
    json.dump(prodigy_data, f, indent=4)

```

**Note**: Prodigy recommends `.jsonl` format for feeding data to API loaders. After running `stage_for_prodigy`, you can use the `save_as_jsonl` utility function to save the formatted data to a `.jsonl` file that is ready to be used with Prodigy.

```python  theme={null}
from unstructured.documents.elements import Title, NarrativeText
from unstructured.staging.prodigy import stage_for_prodigy
from unstructured.utils import save_as_jsonl

elements = [Title(text="Title"), NarrativeText(text="Narrative")]
metadata = [{"type": "title"}, {"type": "text"}]
prodigy_data = stage_for_prodigy(elements, metadata)

# The resulting jsonl file is ready to be used with Prodigy.
save_as_jsonl(prodigy_data, "prodigy.jsonl")

```

For more information about the `stage_for_prodigy` function, you can check the [source code here](https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/staging/prodigy.py).

## `stage_for_transformers`

Prepares `Text` elements for processing in `transformers` pipelines by splitting the elements into chunks that fit into the model’s attention window.

Examples:

```python  theme={null}
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

from unstructured.documents.elements import NarrativeText
from unstructured.staging.huggingface import stage_for_transformers

model_name = "hf-internal-testing/tiny-bert-for-token-classification"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)

nlp = pipeline("ner", model=model, tokenizer=tokenizer)

text = """From frost advisories this morning to a strong cold front expected later this week, the chance of fall showing up is real.

There's a refreshing crispness to the air, and it looks to get only more pronounced as the week goes on.

Frost advisories were in place this morning across portions of the Appalachians and coastal Maine as temperatures dropped into the 30s.

Temperatures this morning were in the 40s as far south as the Florida Panhandle.

And Maine even had a few reports of their first snow of the season Sunday. More cities could see their first snow later this week.

Yes, hello fall!

As temperatures moderate during the next few days, much of the east will stay right around seasonal norms, but the next blast of cold air will be strong and come with the potential for hazardous conditions.

"A more active fall weather pattern is expected to evolve by the end of this week and continuing into the weekend as a couple of cold fronts move across the central and eastern states," the Weather Prediction Center said.

The potent cold front will come in from Canada with a punch of chilly air, heavy rain and strong wind.

The Weather Prediction Center has a slight risk of excessive rainfall for much of the Northeast and New England on Thursday, including places like New York City, Buffalo and Burlington, so we will have to look out for flash flooding in these areas.

"More impactful weather continues to look likely with confidence growing that our region will experience the first real fall-like system with gusty to strong winds and a period of moderate to heavy rain along and ahead of a cold front passage," the National Weather Service office in Burlington wrote.

The potential for very heavy rain could accompany the front, bringing up to two inches of rain for much of the area, and isolated locations could see even more.

"Ensembles [forecast models] show median rainfall totals by Wednesday night around a half inch, with a potential for some spots to see around one inch, our first substantial rainfall in at least a couple of weeks," the weather service office in Grand Rapids noted, adding, "It may also get cold enough for some snow to mix in Thursday night to Friday morning, especially in the higher terrain north of Grand Rapids toward Cadillac."

There is also a chance for very strong winds to accompany the system.

The weather service is forecasting winds of 30-40 mph ahead of the cold front, which could cause some tree limbs to fall and sporadic power outages.

Behind the front, temperatures will fall.

"East Coast, with highs about 5-15 degrees below average to close out the workweek and going into next weekend, with highs only in the 40s and 50s from the Great Lakes to the Northeast on most days," the Weather Prediction Center explained.

By the weekend, a second cold front will drop down from Canada and bring a reinforcing shot of chilly air across the eastern half of the country."""

elements = stage_for_transformers([NarrativeText(text=text)], tokenizer)

```

The following optional keyword arguments can be specified in `stage_for_transformers`:

<Note>
  * `buffer`: Indicates the number of tokens to leave as a buffer for the attention window. This is to account for special tokens like `[CLS]` that can appear at the beginning or end of an input sequence.

  * `max_input_size`: The size of the attention window for the model. If not specified, the default is the `model_max_length` attribute on the tokenizer object.

  * `split_function`: The function used to split the text into chunks to consider for adding to the attention window. Splits on spaces be default.

  * `chunk_separator`: The string used to concat adjacent chunks when reconstructing the text. Uses spaces by default.

  If you need to operate on text directly instead of `unstructured` `Text` objects, use the `chunk_by_attention_window` helper function. Simply modify the example above to include the following:

  ```python  theme={null}
  from unstructured.staging.huggingface import chunk_by_attention_window

  chunks = chunk_by_attention_window(text, tokenizer)

  results = [nlp(chunk) for chunk in chunks]

  ```
</Note>

For more information about the `stage_for_transformers` function, you can check the [source code here](https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/staging/huggingface.py).

## `stage_for_weaviate`

The `stage_for_weaviate` staging function prepares a list of `Element` objects for ingestion into the [Weaviate](https://weaviate.io/) vector database. You can create a schema in Weaviate for the unstructured outputs using the following workflow:

```python  theme={null}
from unstructured.staging.weaviate import create_unstructured_weaviate_class

import weaviate

# Change `class_name` if you want the class for unstructured documents in Weaviate
# to have a different name
unstructured_class = create_unstructured_weaviate_class(class_name="UnstructuredDocument")
schema = {"classes": [unstructured_class]}

client = weaviate.Client("http://localhost:8080")
client.schema.create(schema)

```

Once the schema is created, you can batch upload documents to Weaviate using the following workflow. See the [Weaviate documentation](https://weaviate.io/developers/weaviate) for more details on options for uploading data and querying data once it has been uploaded.

```python  theme={null}
from unstructured.partition.pdf import partition_pdf
from unstructured.staging.weaviate import stage_for_weaviate

import weaviate
from weaviate.util import generate_uuid5


filename = "example-docs/pdf/layout-parser-paper-fast.pdf"
elements = partition_pdf(filename=filename, strategy="fast")
data_objects = stage_for_weaviate(elements)

client = weaviate.Client("http://localhost:8080")

with client.batch(batch_size=10) as batch:
    for data_object in tqdm.tqdm(data_objects):
        batch.add_data_object(
            data_object,
            unstructured_class_name,
            uuid=generate_uuid5(data_object),
        )

```

For more information about the `stage_for_weaviate` function, you can check the [source code here](https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/staging/weaviate.py).

---

# Source: https://docs.unstructured.io/support/status.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Check service status

The [Unstructured Status](https://unstructuredio.trust.pagerduty.com/posts/dashboard) dashboard provides information about the
status of the Unstructured service.

<Note>
  This dashboard provides information only about the publicly available Unstructured service. If you are running a
  self-contained distribution of Unstructured, contact your organization's network administrator instead.
</Note>

From this dashboard, you can:

* Check the status of the publicly available Unstructured user interface (UI). This is labeled in the dashboard as **Platform Frontend**.
* Check the status of the publicly available Unstructured API. This is labeled in the dashboard as **Platform API**.
* Subscribe to public service status updates delivered to an email address for your organization, to a specific channel in your organization's Slack workspace, or
  to a notification URL provided by your organization. To do this, click the **Subscribe to Updates** button and then follow the on-screen prompts.

You can also [browse the history](https://unstructuredio.trust.pagerduty.com/posts/history) of previous public service status updates.

---

# Source: https://docs.unstructured.io/api-reference/general/summary.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Summary

> Description


## OpenAPI

````yaml https://api.unstructuredapp.io/general/openapi.json post /general/v0/general
openapi: 3.1.0
info:
  title: Unstructured Partition API
  version: 1.2.31
servers:
  - url: https://api.unstructuredapp.io
    description: Serverless SaaS API
    x-speakeasy-server-id: saas-api
  - url: http://localhost:8000
    description: Development server
    x-speakeasy-server-id: development
security:
  - ApiKeyAuth: []
  - {}
tags:
  - name: general
paths:
  /general/v0/general:
    post:
      tags:
        - general
      summary: Summary
      description: Description
      operationId: partition
      parameters:
        - name: unstructured-api-key
          in: header
          required: false
          schema:
            anyOf:
              - type: string
              - type: 'null'
            title: Unstructured-Api-Key
      requestBody:
        required: true
        content:
          multipart/form-data:
            schema:
              $ref: '#/components/schemas/partition_parameters'
      responses:
        '200':
          description: Successful Response
          content:
            application/json:
              schema:
                items:
                  $ref: '#/components/schemas/Element'
                description: A list of element dictionaries extracted from the file
                title: Elements
                type: array
                x-speakeasy-name-override: elements
            text/csv:
              schema:
                items:
                  $ref: '#/components/schemas/CSV-Element'
                description: A string containing elements in csv format
                title: CSV-Element
                type: string
                x-speakeasy-name-override: csv_elements
        '422':
          description: Validation Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/HTTPValidationError'
        5XX:
          description: Server Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ServerError'
components:
  schemas:
    partition_parameters:
      properties:
        files:
          type: string
          format: binary
          description: The file to extract
          examples:
            - summary: File to be partitioned
              externalValue: >-
                https://github.com/Unstructured-IO/unstructured/blob/98d3541909f64290b5efb65a226fc3ee8a7cc5ee/example-docs/layout-parser-paper.pdf
        coordinates:
          type: boolean
          title: Coordinates
          description: >-
            If `True`, return coordinates for each element extracted via OCR.
            Default: `False`
          default: false
        content_type:
          anyOf:
            - type: string
            - type: 'null'
          title: Content type
          description: >-
            A hint about the content type to use (such as text/markdown), when
            there are problems processing a specific file. This value is a MIME
            type in the format type/subtype.
        encoding:
          anyOf:
            - type: string
            - type: 'null'
          title: Encoding
          description: 'The encoding method used to decode the text input. Default: utf-8'
        extract_image_block_types:
          items:
            type: string
          type: array
          title: Image block types to extract
          description: >-
            The types of elements to extract, for use in extracting image blocks
            as base64 encoded data stored in metadata fields.
          default: []
        gz_uncompressed_content_type:
          anyOf:
            - type: string
            - type: 'null'
          title: Uncompressed Content Type
          description: If file is gzipped, use this content type after unzipping.
        hi_res_model_name:
          anyOf:
            - type: string
            - type: 'null'
          title: Hi Res Model Name
          description: The name of the inference model used when strategy is hi_res
        include_page_breaks:
          type: boolean
          title: Include Page Breaks
          description: >-
            If true, the output will include page breaks if the filetype
            supports it. Default: false
          default: false
        languages:
          items:
            type: string
          type: array
          title: OCR Languages
          description: >-
            The languages present in the document, for use in partitioning
            and/or OCR. See the Tesseract documentation for a full list of
            languages.
          default: []
        ocr_languages:
          items:
            type: string
          type: array
          title: OCR Languages
          description: >-
            Deprecated! The languages present in the document, for use in
            partitioning and/or OCR
          default: []
        output_format:
          type: string
          enum:
            - application/json
            - text/csv
          title: Output Format
          description: >-
            The format of the response. Supported formats are application/json
            and text/csv. Default: application/json.
          default: application/json
          x-speakeasy-unknown-values: allow
        pdf_infer_table_structure:
          type: boolean
          title: Pdf Infer Table Structure
          description: >-
            Deprecated! Use skip_infer_table_types to opt out of table
            extraction for any file type. If False and strategy=hi_res, no Table
            Elements will be extracted from pdf files regardless of
            skip_infer_table_types contents.
          default: true
        skip_infer_table_types:
          items:
            type: string
          type: array
          title: Skip Infer Table Types
          description: >-
            The document types that you want to skip table extraction with.
            Default: []
          default: []
        starting_page_number:
          anyOf:
            - type: integer
            - type: 'null'
          title: PDF Starting Page Number
          description: >-
            When PDF is split into pages before sending it into the API,
            providing this information will allow the page number to be assigned
            correctly. Introduced in 1.0.27.
        strategy:
          type: string
          enum:
            - fast
            - hi_res
            - auto
            - ocr_only
            - od_only
            - vlm
          title: Strategy
          description: >-
            The strategy to use for partitioning PDF/image. Options are fast,
            hi_res, auto. Default: hi_res
          default: hi_res
          examples:
            - auto
            - hi_res
            - vlm
          x-speakeasy-unknown-values: allow
        vlm_model_provider:
          type: string
          enum:
            - openai
            - anthropic
            - bedrock
            - anthropic_bedrock
            - vertexai
            - google
            - azure_openai
          title: VLM Model Provider
          description: The VLM Model provider to use.
          examples:
            - openai
            - anthropic
            - bedrock
          x-speakeasy-unknown-values: allow
        vlm_model:
          type: string
          title: VLM Model
          description: The VLM Model to use.
          examples:
            - gpt-4o
          x-speakeasy-unknown-values: allow
        table_ocr_agent:
          anyOf:
            - type: string
            - type: 'null'
          title: Table OCR Agent
          description: The OCR agent to use for table ocr inference.
          default: >-
            unstructured.partition.utils.ocr_models.tesseract_ocr.OCRAgentTesseract
        unique_element_ids:
          type: boolean
          title: Unique Element IDs
          description: >-
            When `True`, assign UUIDs to element IDs, which guarantees their
            uniqueness (useful when using them as primary keys in database).
            Otherwise a SHA-256 of element text is used. Default: `False`
          default: false
        xml_keep_tags:
          type: boolean
          title: Xml Keep Tags
          description: >-
            If `True`, will retain the XML tags in the output. Otherwise it will
            simply extract the text from within the tags. Only applies to XML
            documents.
          default: false
        chunking_strategy:
          anyOf:
            - type: string
            - type: 'null'
          title: Chunking Strategy
          description: >-
            Use one of the supported strategies to chunk the returned elements
            after partitioning. When 'chunking_strategy' is not specified, no
            chunking is performed and any other chunking parameters provided are
            ignored. Supported strategies: 'basic', 'by_page', 'by_similarity',
            or 'by_title'
          examples:
            - by_title
            - basic
          x-speakeasy-unknown-values: allow
        combine_under_n_chars:
          anyOf:
            - type: integer
            - type: 'null'
          title: Combine Under N Chars
          description: >-
            If chunking strategy is set, combine elements until a section
            reaches a length of n chars. Default: 500
        include_orig_elements:
          anyOf:
            - type: boolean
            - type: 'null'
          title: Include original elements in chunks
          description: >-
            When a chunking strategy is specified, each returned chunk will
            include the elements consolidated to form that chunk as
            `.metadata.orig_elements`. Default: true.
        tracking_enabled:
          anyOf:
            - type: boolean
            - type: 'null'
          title: Tracking enabled
          description: Tracking enabled
        max_characters:
          anyOf:
            - type: integer
            - type: 'null'
          title: Max Characters
          description: >-
            If chunking strategy is set, cut off new sections after reaching a
            length of n chars (hard max). Default: 500
        multipage_sections:
          type: boolean
          title: Multipage Sections
          description: >-
            If chunking strategy is set, determines if sections can span
            multiple sections. Default: true
          default: true
        new_after_n_chars:
          anyOf:
            - type: integer
            - type: 'null'
          title: New after n chars
          description: >-
            If chunking strategy is set, cut off new sections after reaching a
            length of n chars (soft max). Default: 1500
        overlap:
          type: integer
          title: Overlap
          description: >-
            Specifies the length of a string ('tail') to be drawn from each
            chunk and prefixed to the next chunk as a context-preserving
            mechanism. By default, this only applies to split-chunks where an
            oversized element is divided into multiple chunks by text-splitting.
            Default: 0
          default: 0
        overlap_all:
          type: boolean
          title: Overlap all
          description: >-
            When `True`, apply overlap between 'normal' chunks formed from whole
            elements and not subject to text-splitting. Use this with caution as
            it entails a certain level of 'pollution' of otherwise clean
            semantic chunk boundaries. Default: False
          default: false
        similarity_threshold:
          anyOf:
            - type: number
            - type: 'null'
          title: similarity-threshold
          description: >-
            A value between 0.0 and 1.0 describing the minimum similarity two
            elements must have to be included in the same chunk. Note that
            similar elements may be separated to meet chunk-size criteria; this
            value can only guarantees that two elements with similarity below
            the threshold will appear in separate chunks.
        do_not_break_similarity_on_footer_header:
          type: boolean
          title: do-not-break-similarity-on-footer-header
          description: >-
            When `True`, footer, header, and page number are always considered
            similar to the text before them for chunk by similarity method. This
            allows chunk by similarity to connect contents across page better.
          default: false
        include_slide_notes:
          type: boolean
          title: include_slide_notes
          description: >-
            When `True`, slide notes from .ppt and .pptx files will be included
            in the response. Default: `True`
          default: true
        pdfminer_line_overlap:
          anyOf:
            - type: number
            - type: 'null'
          title: PDFMiner Line Overlap
          description: >-
            If two characters have more overlap than this they are considered to
            be on the same line. The overlap is specified relative to the
            minimum height of both characters.
        pdfminer_char_margin:
          anyOf:
            - type: number
            - type: 'null'
          title: PDFMiner Char Margin
          description: >-
            If two characters are closer together than this margin they are
            considered part of the same line. The margin is specified relative
            to the width of the character.
        pdfminer_line_margin:
          anyOf:
            - type: number
            - type: 'null'
          title: PDFMiner Line Margin
          description: >-
            If two lines are close together they are considered to be part of
            the same paragraph. The margin is specified relative to the height
            of a line.
        pdfminer_word_margin:
          anyOf:
            - type: number
            - type: 'null'
          title: PDFMiner Word Margin
          description: >-
            If two characters on the same line are further apart than this
            margin then they are considered to be two separate words, and an
            intermediate space will be added for readability. The margin is
            specified relative to the width of the character.
          default: 0.185
      type: object
      required:
        - files
      title: Partition Parameters
    Element:
      title: Element
      type: object
      additionalProperties: true
      example:
        type: Title
        element_id: 6aa0ff22f91bbe7e26e8e25ca8052acd
        text: >-
          LayoutParser: A Unified Toolkit for Deep Learning Based Document Image
          Analysis
        metadata:
          languages:
            - eng
          page_number: 1
          filename: layout-parser-paper.pdf
          filetype: application/pdf
    CSV-Element:
      title: CSV-Element
      type: string
      example: >-
        type,element_id,text,languages,page_number,filename,filetype,parent_id,links
            Title,b7f58c2fd9c15949a55a62eb84e39575,LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis,['eng'],1,layout-parser-paper.pdf,application/pdf,,
    HTTPValidationError:
      type: object
      properties:
        detail:
          oneOf:
            - type: array
              items:
                $ref: '#/components/schemas/ValidationError'
            - type: string
      example:
        detail:
          - type: int_parsing
            loc:
              - body
              - combine_under_n_chars
            msg: >-
              Input should be a valid integer, unable to parse string as an
              integer
            input: forty
    ServerError:
      type: object
      properties:
        detail:
          type: string
      example:
        detail: An error occurred
    ValidationError:
      properties:
        loc:
          items:
            anyOf:
              - type: string
              - type: integer
          type: array
          title: Location
        msg:
          type: string
          title: Message
        type:
          type: string
          title: Error Type
      type: object
      required:
        - loc
        - msg
        - type
      title: ValidationError
  securitySchemes:
    ApiKeyAuth:
      type: apiKey
      name: unstructured-api-key
      in: header
      x-speakeasy-example: YOUR_API_KEY

````

---

# Source: https://docs.unstructured.io/ui/supported-file-types.md

# Source: https://docs.unstructured.io/open-source/introduction/supported-file-types.md

# Source: https://docs.unstructured.io/open-source/ingestion/supported-file-types.md

# Source: https://docs.unstructured.io/api-reference/supported-file-types.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Supported file types

The Unstructured API supports processing of the following file types:

By file extension:

| File extension |
| -------------- |
| `.abw`         |
| `.bmp`         |
| `.csv`         |
| `.cwk`         |
| `.dbf`         |
| `.dif`\*       |
| `.doc`         |
| `.docx`        |
| `.dot`         |
| `.dotm`        |
| `.eml`         |
| `.epub`        |
| `.et`          |
| `.eth`         |
| `.fods`        |
| `.heic`        |
| `.htm`         |
| `.html`        |
| `.hwp`         |
| `.jpeg`        |
| `.jpg`         |
| `.md`          |
| `.mcw`         |
| `.msg`         |
| `.mw`          |
| `.odt`         |
| `.org`         |
| `.p7s`         |
| `.pbd`         |
| `.pdf`         |
| `.png`         |
| `.pot`         |
| `.ppt`         |
| `.pptm`        |
| `.pptx`        |
| `.prn`         |
| `.rst`         |
| `.rtf`         |
| `.sdp`         |
| `.sxg`         |
| `.tiff`        |
| `.txt`         |
| `.tsv`         |
| `.xls`         |
| `.xlsx`        |
| `.xml`         |
| `.zabw`        |

By file type:

| Category          | File types                                                |
| ----------------- | --------------------------------------------------------- |
| Apple             | `.cwk`, `.mcw`                                            |
| CSV               | `.csv`                                                    |
| Data Interchange  | `.dif`\*                                                  |
| dBase             | `.dbf`                                                    |
| E-mail            | `.eml`, `.msg`, `.p7s`                                    |
| EPUB              | `.epub`                                                   |
| HTML              | `.htm`, `.html`                                           |
| Image             | `.bmp`, `.heic`, `.jpeg`, `.jpg`, `.png`, `.prn`, `.tiff` |
| Markdown          | `.md`                                                     |
| OpenOffice        | `.odt`                                                    |
| Org Mode          | `.org`                                                    |
| Other             | `.eth`, `.pbd`, `.sdp`                                    |
| PDF               | `.pdf`                                                    |
| Plain text        | `.txt`                                                    |
| PowerPoint        | `.pot`, `.ppt`, `.pptm`, `.pptx`                          |
| reStructured Text | `.rst`                                                    |
| Rich Text         | `.rtf`                                                    |
| Spreadsheet       | `.et`, `.fods`, `.mw`, `.xls`, `.xlsx`                    |
| StarOffice        | `.sxg`                                                    |
| TSV               | `.tsv`                                                    |
| Word processing   | `.abw`, `.doc`, `.docx`, `.dot`, `.dotm`, `.hwp`, `.zabw` |
| XML               | `.xml`                                                    |

`*` For `.dif`, `\n` characters in `.dif` files are supported, but `\r\n` characters will raise the error
`UnsupportedFileFormatError: Partitioning is not supported for the FileType.UNK file type`.

---

# Source: https://docs.unstructured.io/ui/enriching/table-descriptions.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Table descriptions

<iframe width="560" height="315" src="https://www.youtube.com/embed/3UIW5PDck74" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

After partitioning, you can have Unstructured generate text-based summaries of detected tables.

This summarization is done by using models offered through various model providers.

Here is an example of the output of a detected table using GPT-4o. Note specifically the `text` field that is added.
Line breaks have been inserted here for readability. The output will not contain these line breaks.

```json  theme={null}
{
    "type": "Table",
    "element_id": "5713c0e90194ac7f0f2c60dd614bd24d",
    "text": "The table consists of 6 rows and 7 columns. The columns represent 
        inhibitor concentration (g), bc (V/dec), ba (V/dec), Ecorr (V), icorr 
        (A/cm\u00b2), polarization resistance (\u03a9), and corrosion rate 
        (mm/year). As the inhibitor concentration increases, the corrosion 
        rate generally decreases, indicating the effectiveness of the 
        inhibitor. Notably, the polarization resistance increases with higher 
        inhibitor concentrations, peaking at 6 grams before slightly 
        decreasing. This suggests that the inhibitor is most effective at 
        6 grams, significantly reducing the corrosion rate and increasing 
        polarization resistance. The data provides valuable insights into the 
        optimal concentration of the inhibitor for corrosion prevention.",
    "metadata": {
        "text_as_html": "<table>...<full results omitted for brevity>...</table>",
        "filetype": "application/pdf",
        "languages": [
            "eng"
        ],
        "page_number": 1,
        "image_base64": "/9j...<full results omitted for brevity>...//Z",
        "image_mime_type": "image/jpeg",
        "filename": "7f239e1d4ef3556cc867a4bd321bbc41.pdf",
        "data_source": {}
    }
}
```

<Note>
  The `image_base64` field is generated only for documents or PDF pages that are [partitioned](/ui/partitioning) by using the High Res strategy. This field is not generated for
  documents or PDF pages that are partitioned by using the Fast or VLM strategy.
</Note>

Here are two examples of the descriptions for detected tables. These descriptions are generated with GPT-4o by OpenAI:

<img src="https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/enriching/Table-Description-1.png?fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=35d5ed3a55a463deb9450e6841e15a89" alt="Description of a table with information about endoscopic datasets" data-og-width="2978" width="2978" data-og-height="562" height="562" data-path="img/enriching/Table-Description-1.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/enriching/Table-Description-1.png?w=280&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=e1c7018844513570205cd2b4cb8e110e 280w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/enriching/Table-Description-1.png?w=560&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=51ffc6fa2fae9fa04cc91535e1511226 560w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/enriching/Table-Description-1.png?w=840&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=60a0bb92bd07061da28e7b5fbdcedaf7 840w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/enriching/Table-Description-1.png?w=1100&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=563fc53f4ab297c7cd1739d72e13f20c 1100w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/enriching/Table-Description-1.png?w=1650&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=ecb939c4695dff39a073c2398bc3ecc1 1650w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/enriching/Table-Description-1.png?w=2500&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=80882b18948d17aa503d01834ac9049d 2500w" />

<img src="https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/enriching/Table-Description-2.png?fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=69356d0e8ff919facc848645ca04e2df" alt="Description of a table with information about potentiodynamic polarization of stainless steel" data-og-width="3108" width="3108" data-og-height="544" height="544" data-path="img/enriching/Table-Description-2.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/enriching/Table-Description-2.png?w=280&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=f5359c7b6d9540046f92e328d242a984 280w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/enriching/Table-Description-2.png?w=560&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=a864dce3563c09ac1fb5bc82595e416b 560w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/enriching/Table-Description-2.png?w=840&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=bbc57d51163dbd6f14b4137c71f82e6a 840w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/enriching/Table-Description-2.png?w=1100&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=e521ed67f0cf40341c70791e4ef00e23 1100w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/enriching/Table-Description-2.png?w=1650&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=b112071b4a3942dda5c42331ae958725 1650w, https://mintcdn.com/unstructured-53/ognmPfo7rw6i-YTz/img/enriching/Table-Description-2.png?w=2500&fit=max&auto=format&n=ognmPfo7rw6i-YTz&q=85&s=1f2d75d2dc56aadf43301ed82dd0ad3a 2500w" />

The generated table's summary will overwrite any text that Unstructured had previously extracted from that table into the `text` field.
The table's original content is available in the `image_base64` field.

<Note>
  The `image_base64` field is generated only for documents or PDF pages that are [partitioned](/ui/partitioning) by using the High Res strategy. This field is not generated for
  documents or PDF pages that are partitioned by using the Fast or VLM strategy.
</Note>

For workflows that use [chunking](/ui/chunking), note the following changes:

* If a `Table` element must be chunked, the `Table` element is replaced by a set of related `TableChunk` elements.
* Each of these `TableChunk` elements will contain a summary description only for its own element, as part of the element's `text` field.
* These `TableChunk` elements will not contain an `image_base64` field.

Any embeddings that are produced after these summaries are generated will be based on the new `text` field's contents.

## Generate table descriptions

To generate table descriptions, in an **Enrichment** node in a workflow, select **Table**, and then choose one of the available provider (and model) combinations that are shown.

Make sure after you choose the provider and model, that **Table Description** is also displayed. If **Table Description** and **Table to HTML** are both
displayed, be sure to select **Table Description**.

<Note>
  You can change a workflow's table description settings only through [Custom](/ui/workflows#create-a-custom-workflow) workflow settings.

  For workflows that use [chunking](/ui/chunking), the **Chunker** node should be placed after all **Enrichment** nodes. Placing the
  **Chunker** node before a table descriptions **Enrichment** node could cause incomplete or no table descriptions to be generated.
</Note>

<Warning>
  The following models are no longer available as of the following dates:

  * Amazon Bedrock Claude Sonnet 3.5: October 22, 2025
  * Anthropic Claude Sonnet 3.5: October 22, 2025

  Unstructured recommends the following actions:

  * For new workflows, do not use any of these models.
  * For any workflow that uses any of these models, update that workflow as soon as possible to use a different model.

  Workflows that attempt to use any of these models on or after its associated date will return errors.
</Warning>

<Warning>
  Unstructured can potentially generate table summary descriptions only for workflows that are configured as follows:

  * With a **Partitioner** node set to use the **Auto** or **High Res** partitioning strategy, and a table summary description node is added.
  * With a **Partitioner** node set to use the **VLM** partitioning strategy. No table summary description node is needed (or allowed).

  Even with these configurations, Unstructured actually generates table summary descriptions only for files that contain tables and are also eligible
  for processing with the following partitioning strategies:

  * **High Res**, when the workflow's **Partitioner** node is set to use **Auto** or **High Res**.
  * **VLM** or **High Res**, when the workflow's **Partitioner** node is set to use **VLM**.

  Unstructured never generates table summary descriptions for workflows that are configured as follows:

  * With a **Partitioner** node set to use the **Fast** partitioning strategy.
  * With a **Partitioner** node set to use the **Auto**, **High Res**, or **VLM** partitioning strategy, for all files that Unstructured encounters that do not contain tables.
</Warning>

## Learn more

* <Icon icon="video" />  [How to Extract Data from Complex Tables](https://unstructured.io/events/how-to-extract-data-from-complex-tables)

---

# Source: https://docs.unstructured.io/examplecode/codesamples/apioss/table-extraction-from-pdf.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Table extraction from PDF

> This section describes two methods for extracting tables from PDF files.

<Note>
  This sample code utilizes the [Unstructured Open Source](/open-source/introduction/overview "Open Source") library and also provides an alternative method the utilizing the legacy [Unstructured Partition Endpoint](/api-reference/legacy-api/partition/overview).
</Note>

## Method 1: Using partition\_pdf

To extract the tables from PDF files using the [partition\_pdf](/open-source/core-functionality/partitioning#partition-pdf), set the `skip_infer_table_types` parameter to `False` and `strategy` parameter to `hi_res`.

**Usage**

```python  theme={null}
from unstructured.partition.pdf import partition_pdf

fname = "example-docs/pdf/layout-parser-paper.pdf"

elements = partition_pdf(filename=fname,
                         skip_infer_table_types=False,
                         strategy='hi_res',
           )

tables = [el for el in elements if el.category == "Table"]

print(tables[0].text)
print(tables[0].metadata.text_as_html)

```

## Method 2: Using Auto Partition or Unstructured API

By default, table extraction from all file types is enabled. To extract tables from PDFs and images using [Auto Partition](/open-source/core-functionality/partitioning#partition) or [Unstructured API parameters](/api-reference/legacy-api/partition/api-parameters) simply set `strategy` parameter to `hi_res`.

**Usage: Auto Partition**

```python  theme={null}
from unstructured.partition.auto import partition

filename = "example-docs/pdf/layout-parser-paper.pdf"

elements = partition(filename=filename,
                     strategy='hi_res',
           )

tables = [el for el in elements if el.category == "Table"]

print(tables[0].text)
print(tables[0].metadata.text_as_html)

```

**Usage: API Parameters**

```bash  theme={null}
curl -X 'POST' \
    'https://api.unstructured.io' \
    -H 'accept: application/json' \
    -H 'Content-Type: multipart/form-data' \
    -F 'files=@sample-docs/layout-parser-paper-with-table.jpg' \
    -F 'strategy=hi_res' \
    | jq -C . | less -R

```

---

# Source: https://docs.unstructured.io/examplecode/codesamples/oss/table-source-connector.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Delta Table source connector

<Note>
  This sample code utilizes the [Unstructured Open Source](/open-source/introduction/overview "Open Source") Library.
</Note>

## Objectives

1. Extract text and metadata from a PDF file using the Unstructured.io Python SDK.

2. Process and store this data in a Databricks Delta Table.

3. Retrieve data from the Delta Table using the Unstructured.io Delta Table Connector.

## Prerequisites

* Unstructured Python SDK

* Databricks account and workspace

* AWS S3 for Delta Table storage

## Processing and Storing into Databricks Delta Table

3. Initialize PySpark

```python  theme={null}
from pyspark.sql import SparkSession

spark = SparkSession.builder.appName('sparkdf').getOrCreate()

```

4. Convert JSON output into Dataframe

```python  theme={null}
import pyspark

dataframe = spark.createDataFrame(res.elements)

```

5. Store DataFrame as Delta Table

```
dataframe.write.mode("overwrite").format("delta").saveAsTable("delta_table")

```

## Conclusion

This documentation covers the essential steps for converting unstructured PDF data into structured data and storing it in a Databricks Delta Table. It also outlines how to extract this data for further use.

---

# Source: https://docs.unstructured.io/ui/enriching/table-to-html.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Tables to HTML

<iframe width="560" height="315" src="https://www.youtube.com/embed/lT2ixyunrvA" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

After partitioning, you can have Unstructured generate representations of each detected table in HTML markup format.

This table-to-HTML output is generated by using agent AI or a vision language model (VLM).
The agentic AI option typically provides more accurate table-to-HTML output than the VLM option.

Here is an example of the HTML markup output of a detected table using GPT-4o. Note specifically the `text_as_html` field that is added.
Line breaks have been inserted here for readability. The output will not contain these line breaks.

````json  theme={null}
{
    "type": "Table",
    "element_id": "31aa654088742f1388d46ea9c8878272",
    "text": "Inhibitor Polarization Corrosion be (V/dec) ba (V/dec) Ecorr (V) icorr 
        (AJcm?) concentration (g) resistance (Q) rate (mmj/year) 0.0335 0.0409 
        \u20140.9393 0.0003 24.0910 2.8163 1.9460 0.0596 .8276 0.0002 121.440 
        1.5054 0.0163 0.2369 .8825 0.0001 42121 0.9476 s NO 03233 0.0540 
        \u20140.8027 5.39E-05 373.180 0.4318 0.1240 0.0556 .5896 5.46E-05 
        305.650 0.3772 = 5 0.0382 0.0086 .5356 1.24E-05 246.080 0.0919",
    "metadata": {
        "text_as_html": "```html\n
            <table>\n
                <tr>\n<th>Inhibitor concentration (g)</th>\n
                    <th>bc (V/dec)</th>\n<th>ba (V/dec)</th>\n<th>Ecorr (V)</th>\n
                    <th>icorr (A/cm\u00b2)</th>\n<th>Polarization resistance (\u03a9)</th>\n
                    <th>Corrosion rate (mm/year)</th>\n
                </tr>\n  
                <tr>\n
                    <td>0</td>\n<td>0.0335</td>\n<td>0.0409</td>\n<td>\u22120.9393</td>\n
                    <td>0.0003</td>\n<td>24.0910</td>\n<td>2.8163</td>\n  
                </tr>\n
                <tr>\n   
                    <td>2</td>\n<td>1.9460</td>\n<td>0.0596</td>\n<td>\u22120.8276</td>\n<td>0.0002</td>\n<td>121.440</td>\n<td>1.5054</td>\n  
                </tr>\n
                <tr>\n
                    <td>4</td>\n<td>0.0163</td>\n<td>0.2369</td>\n<td>\u22120.8825</td>\n<td>0.0001</td>\n<td>42.121</td>\n<td>0.9476</td>\n  
                </tr>\n  
                <tr>\n
                    <td>6</td>\n<td>0.3233</td>\n<td>0.0540</td>\n<td>\u22120.8027</td>\n<td>5.39E-05</td>\n<td>373.180</td>\n<td>0.4318</td>\n  
                </tr>\n  
                <tr>\n
                    <td>8</td>\n<td>0.1240</td>\n<td>0.0556</td>\n<td>\u22120.5896</td>\n<td>5.46E-05</td>\n<td>305.650</td>\n<td>0.3772</td>\n  
                </tr>\n  
                <tr>\n
                    <td>10</td>\n<td>0.0382</td>\n<td>0.0086</td>\n<td>\u22120.5356</td>\n<td>1.24E-05</td>\n<td>246.080</td>\n<td>0.0919</td>\n
                </tr>\n
            </table>\n```",
        "filetype": "application/pdf",
        "languages": [
            "eng"
        ],
        "page_number": 1,
        "image_base64": "/9j...<full results omitted for brevity>...//Z",
        "image_mime_type": "image/jpeg",
        "filename": "embedded-images-tables.pdf",
        "data_source": {}
    }
}
````

<Note>
  The `image_base64` field is generated only for documents or PDF pages that are [partitioned](/ui/partitioning) by using the High Res strategy. This field is not generated for
  documents or PDF pages that are partitioned by using the Fast or VLM strategy.
</Note>

For workflows that use [chunking](/ui/chunking), note the following changes:

* If a `Table` element must be chunked, the `Table` element is replaced by a set of related `TableChunk` elements.
* Each of these `TableChunk` elements will contain HTML table output for only its own element.
* None of the these `TableChunk` elements will contain an `image_base64` field.

## Generate table-to-HTML output

To generate table-to-HTML output, for an **Enrichment** node in a workflow, do the following:

1. For **Input Type**, click **Table**.
2. To use agentic AI to generate the table-to-HTML output, for **Provider**, click **Agentic Table Parsing**.

   To use a VLM to generate the table-to-HTML output instead, do the following:

   a. For **Provider**, click **Anthropic** or **OpenAI**.<br />
   b. For **Model**, click one of the available models that are shown.<br />
   c. For **Task**, click **Table to HTML**.<br />

<Note>
  You can change a workflow's table description settings only through [Custom](/ui/workflows#create-a-custom-workflow) workflow settings.

  For workflows that use [chunking](/ui/chunking), the **Chunker** node should be placed after all **Enrichment** nodes. Placing the
  **Chunker** node before a table-to-HTML output **Enrichment** node could cause incomplete or no table-to-HTML output to be generated.
</Note>

<Warning>
  The following models are no longer available as of the following dates:

  * Amazon Bedrock Claude Sonnet 3.5: October 22, 2025
  * Anthropic Claude Sonnet 3.5: October 22, 2025

  Unstructured recommends the following actions:

  * For new workflows, do not use any of these models.
  * For any workflow that uses any of these models, update that workflow as soon as possible to use a different model.

  Workflows that attempt to use any of these models on or after its associated date will return errors.
</Warning>

<Warning>
  Unstructured can potentially generate table-to-HTML output only for workflows that are configured as follows:

  * With a **Partitioner** node set to use the **Auto** or **High Res** partitioning strategy, and a table-to-HTML output node is added.
  * With a **Partitioner** node set to use the **VLM** partitioning strategy. No table-to-HTML output node is needed (or allowed).

  Even with these configurations, Unstructured actually generates table-to-HTML output only for files that contain tables and are also eligible
  for processing with the following partitioning strategies:

  * **High Res**, when the workflow's **Partitioner** node is set to use **Auto** or **High Res**.
  * **VLM** or **High Res**, when the workflow's **Partitioner** node is set to use **VLM**.

  Unstructured never generates table-to-HTML output for workflows that are configured as follows:

  * With a **Partitioner** node set to use the **Fast** partitioning strategy.
  * With a **Partitioner** node set to use the **Auto**, **High Res**, or **VLM** partitioning strategy, for all files that Unstructured encounters that do not contain tables.
</Warning>

## Learn more

* <Icon icon="video" />  [How to Extract Data from Complex Tables](https://unstructured.io/events/how-to-extract-data-from-complex-tables)
* <Icon icon="blog" />  [The Case for HTML as the Canonical Representation in Document AI](https://unstructured.io/blog/the-case-for-html-as-the-canonical-representation-in-document-ai)
* <Icon icon="blog" />  [Preserving Table Structure for Better Retrieval](https://unstructured.io/blog/preserving-table-structure-for-better-retrieval)

---

# Source: https://docs.unstructured.io/ui/sources/teradata-sql.md

# Source: https://docs.unstructured.io/ui/destinations/teradata-sql.md

# Source: https://docs.unstructured.io/open-source/ingestion/source-connectors/teradata-sql.md

# Source: https://docs.unstructured.io/open-source/ingestion/destination-connectors/teradata-sql.md

# Source: https://docs.unstructured.io/api-reference/workflow/sources/teradata-sql.md

# Source: https://docs.unstructured.io/api-reference/workflow/destinations/teradata-sql.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Teradata

<Note>
  If you're new to Unstructured, read this note first.

  Before you can create a destination connector, you must first sign in to your Unstructured account:

  * If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
    After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).
    To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/api-reference/overview#pricing).
  * If you already have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at
    [https://platform.unstructured.io](https://platform.unstructured.io). For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
    or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  After you sign in, the [Unstructured user interface](/ui/overview) (UI) appears, which you use to get your Unstructured API key.

  1. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business** account, click **API Keys** on the sidebar.<br />

     For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
     for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).

  2. Click **Generate API Key**.<br />

  3. Follow the on-screen instructions to finish generating the key.<br />

  4. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

  After you create the destination connector, add it along with a
  [source connector](/api-reference/workflow/sources/overview) to a [workflow](/api-reference/workflow/overview#workflows).
  Then run the worklow as a [job](/api-reference/workflow/overview#jobs). To learn how, try out the
  the notebook [Dropbox-To-Pinecone Connector API Quickstart for Unstructured](https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Dropbox_To_Pinecone_Connector_Quickstart.ipynb),
  or watch the two 4-minute video tutorials for the [Unstructured Python SDK](/api-reference/workflow/overview#unstructured-python-sdk).

  You can also create destination connectors with the Unstructured user interface (UI).
  [Learn how](/ui/destinations/overview).

  If you need help, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  You are now ready to start creating a destination connector! Keep reading to learn how.
</Note>

Send processed data from Unstructured to Teradata.

The requirements are as follows.

* A Teradata Vantage system that can be accessed by its host name or IP address.

  For example, a Teradata Vantage system in Teradata ClearScape Analytics Experience includes:

  * A Teradata ClearScape Analytics Experience account.
  * An environment in the account.
  * A Teradata Vantage database in the environment.
  * The name and password for a Teradata user who has the appropriate access to the database.

  [Learn how to create these in Teradata ClearScape Analytics Experience](https://developers.teradata.com/quickstarts/get-access-to-vantage/clearscape-analytics-experience/getting-started-with-csae/).

* The system's corresponding host name or IP address.

  For example, you can get these values from Teradata ClearScape Analytics Experience as follows:

  1. Sign in to your Teradata ClearScape Analytics Experience account.<br />
  2. On the sidebar, under **Environments**, click the name of the database's corresponding environment.<br />
  3. Under **Connection details for Vantage database**, use the **Host** value.<br />

* The name of the target database in the system. To get a list of available databases in the system, you can run a Teradata SQL query such as the following:

  ```sql  theme={null}
  SELECT DatabaseName 
  FROM DBC.DatabasesV 
  ORDER BY DatabaseName;
  ```

* The name of the target table in the database. To get a list of available tables in a database, you can run a Teradata SQL query such as the following, replacing `<database-name>` with the name of the target database:

  ```sql  theme={null}
  SELECT TableName
  FROM DBC.TablesV
  WHERE DatabaseName = '<database-name>' AND TableKind = 'T'
  ORDER BY TableName;
  ```

  When Unstructured writes rows to a table, the table's columns must have a schema that is compatible with Unstructured.
  Unstructured cannot provide a schema that is guaranteed to work for everyone in all circumstances.
  This is because these schemas will vary based on
  your source files' types; how you want Unstructured to partition, chunk, and generate embeddings;
  any custom post-processing code that you run; and other factors.

  In any case, note the following about table schemas:

  * The following columns are always required by Unstructured: `record_id` and `element_id`.
  * The following columns are optional for Unstructured, but highly recommended: `text` and `type`.
  * The rest of the columns are optional and typically will be output by Unstructured as part of the `metadata` field.
  * If Unstructured is generating vector embeddings, the `embeddings` column is also required.

    <Warning>
      The destination connector outputs Unstructured-generated [embeddings](/ui/embedding) that are not directly compatible
      with Teradata Enterprise Vector Store. To use embeddings with Teradata Enterprise Vector Store, Unstructured
      recommends that you choose from among the following options:

      * Define a column in your target table named `embeddings` that is of type `VARCHAR(64000)`, to store the
        Unstructured-generated embeddings. After Unstructured adds its embeddings to your `embeddings` column,
        choose from among Teradata's options to convert the `embeddings` column's `VARCHAR` values to the
        Teradata [VECTOR Data Type](https://docs.teradata.com/r/Enterprise_IntelliFlex_VMware/Teradata-Vector-Store-User-Guide/Vector-Store-Components-and-Features/VECTOR-Data-Type)
        yourself.
      * Omit any columns named `embedding`, `message`, or `num_tokens` from your target table. Then choose from
        among Teradata's options (such as [AI\_TextEmbeddings](https://docs.teradata.com/r/Enterprise_IntelliFlex_VMware/Teradata-Vector-Store-User-Guide/Vector-Store-Components-and-Features/In-Database-Analytic-Functions-for-Vector-Store))
        to have Teradata generate the embeddings for you, instead of having Unstructured generate them.
    </Warning>

  Here is an example table schema that is compatible with Unstructured. It includes all of the required and recommended columns, as
  well as a few additional columns that are typically output by Unstructured as part of the `metadata` field. Be sure to replace
  `<database-name>` with the name of the target database and `<table-name>` with the name of the target table (by Unstructured convention,
  the table name is typically `elements`, but this is not a requirement).

  ```sql  theme={null}
  CREATE SET TABLE "<database-name>"."<table-name>" (
    "id" VARCHAR(64) NOT NULL,
    PRIMARY KEY ("id"),
    "record_id" VARCHAR(64),
    "element_id" VARCHAR(64),
    "text" VARCHAR(32000) CHARACTER SET UNICODE,
    "type" VARCHAR(50),
    "embeddings" VARCHAR(64000), -- Add this column only if Unstructured is generating vector embeddings.
    "last_modified" VARCHAR(50),
    "languages" VARCHAR(200),
    "file_directory" VARCHAR(500),
    "filename" VARCHAR(255),
    "filetype" VARCHAR(50),
    "record_locator" VARCHAR(1000),
    "date_created" VARCHAR(50),
    "date_modified" VARCHAR(50),
    "date_processed" VARCHAR(50),
    "permissions_data" VARCHAR(1000),
    "filesize_bytes" INTEGER,
    "parent_id" VARCHAR(64)
  )
  ```

* For the source connector, the name of the primary key column in the table (for example, a column named `id`, typically defined as `"id" VARCHAR(64) NOT NULL, PRIMARY KEY ("id")`).

* For the source connector, the names of any specific columns to fetch from the table. By default, all columns are fetched unless otherwise specified.

* For the destination connector, the name of the column in the table that uniquely identifies each record for Unstructured to perform any necessary record updates. By default convention, Unstructured expects this field to be named `record_id`.

* The name of the Teradata user who has the appropriate access to the target database.

  For example, you can get this from Teradata ClearScape Analytics Experience as follows:

  1. Sign in to your Teradata ClearScape Analytics account.<br />
  2. On the sidebar, under **Environments**, click the name of the database's corresponding environment.<br />
  3. Under **Connection details for Vantage database**, use the **Username** value.<br />

  The Teradata SQL command to get a list of available users is as follows:

  ```sql  theme={null}
  SELECT UserName 
  FROM DBC.UsersV 
  ORDER BY UserName;
  ```

* The password for the user, which was set up when the user was created.

  If the user has forgotten their password, the Teradata SQL command to change a user's password is as follows, replacing `<user-name>` with the name of the user and `<new-password>` with the new password:

  ```sql  theme={null}
  MODIFY USER <user-name> SET PASSWORD = '<new-password>';
  ```

  To change a user's password, you must be an administrator (such as the `DBC` user or another user with `DROP USER` privileges).

To create a Teradata destination connector, see the following examples.

<CodeGroup>
  ```python Python SDK theme={null}
  import os

  from unstructured_client import UnstructuredClient
  from unstructured_client.models.operations import CreateDestinationRequest
  from unstructured_client.models.shared import CreateDestinationConnector

  with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
      response = client.destinations.create_destination(
          request=CreateDestinationRequest(
              create_destination_connector=CreateDestinationConnector(
                  name="<name>",
                  type="teradata",
                  config={
                      "host": "<host>",
                      "database": "<database>",
                      "table_name": "<table-name>",
                      "batch_size": <batch-size>,
                      "record_id_key": "<record-id-key>",
                      "username": "<username>",
                      "password": "<password>"
                  }
              )
          )
      )

      print(response.destination_connector_information)
  ```

  ```bash curl theme={null}
  curl --request 'POST' --location \
  "$UNSTRUCTURED_API_URL/destinations" \
  --header 'accept: application/json' \
  --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
  --header 'content-type: application/json' \
  --data \
  '{
      "name": "<name>",
      "type": "teradata",
      "config": {
          "host": "<host>",
          "database": "<database>",
          "table_name": "<table-name>",
          "batch_size": <batch-size>,
          "record_id_key": "<record-id-key>",
          "username": "<username>",
          "password": "<password>"
      }
  }'
  ```
</CodeGroup>

Replace the preceding placeholders as follows:

* `<name>` (*required*) - A unique name for this connector.
* `<host>` (*required*): The hostname or IP address associated with the target Teradata Vantage database.
* `<database>`: The name of the target database. By default, the default database name is used if not otherwise specified. To get the name of the default database, you can run the Teradata SQL command `SELECT DATABASE;`.
* `<table-name>` (*required*): The name of the target table in the database.
* `<batch-size>`: The maximum number of rows per batch. The default is `100` if not otherwise specified.
* `<id-column>` (*required*, source connector only): The name of the primary key column that Teradata uses to uniquely identify each record in the table.
* `<record-id-key>` (destination connector only): The name of the column that Unstructured uses to uniquely identify each record in the table for record update purposes. The default is `record_id` if not otherwise specified.
* `<column-name>` (source connector only): The name of a column to fetch from the table. By default, all columns are fetched unless otherwise specified.
* `<username>` (*required*): The name of the user who has the appropriate access to the database.
* `<password>` (*required*): The password for the user.

---

# Source: https://docs.unstructured.io/open-source/how-to/text-as-html.md

# Source: https://docs.unstructured.io/api-reference/legacy-api/partition/text-as-html.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Extract tables as HTML

<Note>
  The following information applies to the legacy Unstructured Partition Endpoint.

  Unstructured recommends that you use the
  [on-demand jobs](/api-reference/workflow/overview#run-an-on-demand-job) functionality in the
  [Unstructured API](/api-reference/overview) instead. Unstructured's on-demand jobs provide
  many benefits over the legacy Unstructured Partition Endpoint, including support for:

  * Production-level usage.
  * Multiple local input files in batches.
  * The latest and highest-performing models.
  * Post-transform enrichments.
  * All of Unstructured's chunking strategies.
  * The generation of vector embeddings.

  The Unstructured API also provides support for processing files and data in remote locations.
</Note>

## Task

You want to get, save, or show the contents of elements that are represented as HTML, such as tables that are embedded in a PDF document.

## Approach

Extract the contents of an element's `text_as_html` JSON object, which is nested inside of its parent `metadata` object.

## To run this example

You will need a document that is one of the document types that can output the `text_as_html` JSON object. For the list of applicable document types, see the entries in the table at the beginning of [Partitioning](/open-source/core-functionality/partitioning) where "Table Support" is "Yes."

This example uses a PDF file with an embedded table.

## Code

For the [Unstructured Python SDK](/api-reference/legacy-api/partition/sdk-python), you'll need:

These environment variables:

* `UNSTRUCTURED_API_KEY` - Your Unstructured API key value.
* `UNSTRUCTURED_API_URL` - Your Unstructured API URL.

```python Python SDK theme={null}
from unstructured_client import UnstructuredClient
from unstructured_client.models import operations, shared
from unstructured.staging.base import elements_from_dicts, elements_to_json

import os, webbrowser

if __name__ == "__main__":
    client = UnstructuredClient(
        api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")
    )

    # Source: https://github.com/Unstructured-IO/unstructured-ingest/blob/main/example-docs/pdf/embedded-images-tables.pdf

    # Where to get the local file, relative to this .py file.
    local_input_filepath = "local-ingest-input-pdf/embedded-images-tables.pdf"

    # Where to store the retrieved HTML (and the processed JSON), relative to this .py file.
    local_output_filepath = "local-ingest-output"

    with open(local_input_filepath, "rb") as f:
        files = shared.Files(
            content=f.read(),
            file_name=local_input_filepath
        )

    request = operations.PartitionRequest(
        shared.PartitionParameters(
            files=files,
            strategy=shared.Strategy.VLM,
            vlm_model="gpt-4o",
            vlm_model_provider="openai",
            split_pdf_page=True,
            split_pdf_allow_failed=True,
            split_pdf_concurrency_level=15
        )
    )

    try:
        result = await client.general.partition_async(
            request=request
        )

        # Provide some minimal CSS for better table readability.
        table_css = "<head><style>table, th, td { border: 1px solid; }</style></head>"

        for element in result.elements:
            if "text_as_html" in element["metadata"]:
                # Surround the element's HTML with basic <html> and <body> tags, and add the minimal CSS.
                html_string = f"<!DOCTYPE html><html>{table_css}<body>{element["metadata"]["text_as_html"]}</body></html>"

                # Save the element's HTML to a local file.
                save_path = f"{local_output_path}/{element["element_id"]}.html"
                file = open(save_path, 'w')
                file.write(html_string)
                file.close()

                # View the locally saved file in the local default web browser.
                webbrowser.open_new(f"file:///{os.getcwd()}/{save_path}")

        # Also get the elements for inspection and validation.
        dict_elements = elements_from_dicts(
            element_dicts=result.elements
        )

        # Save the elements as JSON.
        elements_to_json(
            elements=dict_elements,
            indent=2,
            filename=f"{local_output_filepath}/embedded-images-tables.json"
        )
    except Exception as e:
        print(e)
```

## See also

* [Extract images and tables from documents](/api-reference/legacy-api/partition/extract-image-block-types)
* [Table Extraction from PDF](/examplecode/codesamples/apioss/table-extraction-from-pdf)

---

# Source: https://docs.unstructured.io/api-reference/legacy-api/partition/transform-schemas.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Transform a JSON file into a different schema

<Note>
  The following information applies to the legacy Unstructured Partition Endpoint.

  Unstructured recommends that you use the
  [on-demand jobs](/api-reference/workflow/overview#run-an-on-demand-job) functionality in the
  [Unstructured API](/api-reference/overview) instead. Unstructured's on-demand jobs provide
  many benefits over the legacy Unstructured Partition Endpoint, including support for:

  * Production-level usage.
  * Multiple local input files in batches.
  * The latest and highest-performing models.
  * Post-transform enrichments.
  * All of Unstructured's chunking strategies.
  * The generation of vector embeddings.

  The Unstructured API also provides support for processing files and data in remote locations.
</Note>

## Task

You want to convert a JSON file that Unstructured produces into a separate JSON file that uses
a different JSON schema than the one that Unstructured uses.

## Approach

Use a Python package such as [json-converter](https://pypi.org/project/json-converter/) in your Python code project
to transform your source JSON file into a target JSON file that conforms to your own schema.

<Info>The `json-converter` package is not owned or supported by Unstructured. For questions and
requests, see the [Issues](https://github.com/ebi-ait/json-converter/issues) tab of the
`json-converter` repository in GitHub.</Info>

## Code

<Steps>
  <Step title="Install dependencies">
    In your local Python code project, install the [json-converter](https://pypi.org/project/json-converter/)
    package.

    ```bash  theme={null}
    pip install json-converter
    ```
  </Step>

  <Step title="Identify the JSON file to transform">
    1. Find the local source JSON file that you want to transform.
    2. Note the JSON field names and structures that you want to transform. For example, the JSON file might
       look like the following (the ellipses indicate content omitted for brevity):

       ```json  theme={null}
       [
         {
           "type": "...",
           "element_id": "...",
           "text": "...",
           "metadata": {
             "filetype": "...",
             "languages": [
               "eng"
             ],
             "page_number": 1,
             "filename": "..."
           }
         },
         {
           "type": "...",
           "element_id": "...",
           "text": "...",
           "metadata": {
             "filetype": "...",
             "languages": [
               "eng"
             ],
             "page_number": 1,
             "filename": "..."
           }
         },
         {
           "...": "..."
         }
       ]
       ```
  </Step>

  <Step title="Create the JSON field mappings file">
    1. Decide what you want the JSON schema in the transformed file to look like. For example, the
       transformed JSON file might look like the following (the ellipses indicate content omitted for brevity):

       ```json  theme={null}
       [
         {
           "content_type": "...",
           "content_id": "...",
           "content": "...",
           "content_properties": {
             "page": 1
           }
         },
         {
           "content_type": "...",
           "content_id": "...",
           "content": "...",
           "content_properties": {
             "page": 1
           }
         },
         {
           "...": "..."
         }
       ]
       ```

    2. Create the JSON field mappings file, for example:

       ```json  theme={null}
       {
         "content_type": ["type"],
         "content_id": ["element_id"],
         "content": ["text"]
         "content_properties.page": ["metadata.page_number"]
       }
       ```

       This file declares the following mappings:

       * The `type` field is renamed to `content_type`.
       * The `element_id` field is renamed to `content_id`.
       * The `text` field is renamed to `content`.
       * The `page_number` field nested inside `metadata` is renamed to `page` and is nested inside `content_properties`.
       * All of the other fields (`filetype`, `languages`, and `filename`) are dropped.

       For more information about the format of this JSON field mappings file, see the
       [Project Description](https://pypi.org/project/json-converter) in the `json-converter` page on PyPI or the
       [README](https://github.com/ebi-ait/json-converter) in the `json-converter` repository in GitHub.
  </Step>

  <Step title="Add and run the transform code">
    1. Set the following local environment variables:

       * Set `LOCAL_FILE_INPUT_PATH` to the local path to the source JSON file.
       * Set `LOCAL_FILE_OUTPUT_PATH` to the local path to the target JSON file.
       * Set `LOCAL_FIELD_MAPPINGS_PATH` to the local path to the JSON field mappings file.

    2. Add the following Python code file to your project:

       ```python  theme={null}
       import os, json
       from json_converter.json_mapper import JsonMapper

       # Converts one JSON file with one schema to another 
       # JSON file with a different schema. 
       # Provide the path to the input (source) and output 
       # (target) JSON files and the path to the JSON schema 
       # field mappings file.
       def convert_json_with_schemas(
           input_file_path: str,
           output_file_path: str,
           field_mappings_path: str
       ) -> None:

           output_data = []

           try:
               with open(field_mappings_path, 'r') as f:
                   element_mappings = json.load(f)
           except FileNotFoundError:
               print(f"Error: Input JSON schema field mappings file '{input_file_path}' not found.")
           except IOError as e:
               print(f"I/O error occurred: {e}")
           except Exception as e:
               print(f"An unexpected error occurred: {e}")

           try:
               with open(input_file_path, 'r') as f:
                   input_data = json.load(f)
                   
                   for item in input_data:
                       converted_data = JsonMapper(item).map(element_mappings)
                       output_data.append(converted_data)
           except FileNotFoundError:
               print(f"Error: Input JSON file '{input_file_path}' not found.")
           except IOError as e:
               print(f"I/O error occurred: {e}")
           except Exception as e:
               print(f"An unexpected error occurred: {e}")

           try:
               with open(output_file_path, 'w') as f:
                   json.dump(output_data, f, indent=2)
           
               print(f"Transformation complete. Output written to '{output_file_path}'.")
           except IOError as e:
               print(f"I/O error occurred: {e}")
           except Exception as e:
               print(f"An unexpected error occurred: {e}")

       if __name__ == "__main__":
           convert_json_with_schemas(
               input_file_path=os.getenv("LOCAL_FILE_INPUT_PATH"),
               output_file_path=os.getenv("LOCAL_FIELD_MAPPINGS_PATH"),
               field_mappings_path=os.getenv("LOCAL_FILE_OUTPUT_PATH")
           )
       ```

    3. Run the Python code file.

    4. Check the path specified by `LOCAL_FILE_OUTPUT_PATH` for the transformed JSON file.
  </Step>
</Steps>

## Troubleshooting

### Error when trying to import Mapping from collections

**Issue**: When you run your Python code file, the following error message appears: "ImportError: cannot import name 'Mapping' from 'collections'".

**Cause**: When you use the `json-converter` package with newer versions of Python such as 3.11 and later,
Python tries to use an outdated import in this `json-converter` package.

**Solution**: Update the `json-converter` package's source code to use a different import, as follows:

1. In your Python project, find the `json-converter` package's source location, by running the `pip show` command:

   ```bash  theme={null}
   pip show json-converter
   ```

   Note the path in the **Location** field.

2. Use your code editor to the open the path to the `json-converter` package's source code.

3. In the source code, open the file named `json_mapper.py`.

4. Change the following line of code...

   ```python  theme={null}
   from collections import Mapping
   ```

   ...to the following line of code, by adding `.abc`:

   ```python  theme={null}
   from collections.abc import Mapping
   ```

5. Save this source code file.

6. Run your Python code file again.

---

# Source: https://docs.unstructured.io/api-reference/destinations/update-destination-connector.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Update destination connector

> Update the configuration of an existing destination connector.


## OpenAPI

````yaml https://platform.unstructuredapp.io/openapi.json put /api/v1/destinations/{destination_id}
openapi: 3.1.0
info:
  title: Platform API
  version: 3.1.0
servers:
  - url: https://platform.unstructuredapp.io/
    description: Unstructured Platform API
    x-speakeasy-server-id: platform-api
security: []
paths:
  /api/v1/destinations/{destination_id}:
    put:
      tags:
        - destinations
      summary: Update destination connector
      description: Update the configuration of an existing destination connector.
      operationId: update_destination
      parameters:
        - name: destination_id
          in: path
          required: true
          schema:
            type: string
            format: uuid
            title: Destination Id
        - name: unstructured-api-key
          in: header
          required: false
          schema:
            anyOf:
              - type: string
              - type: 'null'
            title: Unstructured-Api-Key
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/UpdateDestinationConnector'
      responses:
        '200':
          description: Successful Response
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/DestinationConnectorInformation'
        '422':
          description: Validation Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/HTTPValidationError'
components:
  schemas:
    UpdateDestinationConnector:
      properties:
        config:
          anyOf:
            - $ref: '#/components/schemas/AzureDestinationConnectorConfigInput'
            - $ref: '#/components/schemas/AstraDBConnectorConfigInput'
            - $ref: '#/components/schemas/AzureAISearchConnectorConfigInput'
            - $ref: '#/components/schemas/CouchbaseDestinationConnectorConfigInput'
            - $ref: '#/components/schemas/DatabricksVolumesConnectorConfigInput'
            - $ref: >-
                #/components/schemas/DatabricksVDTDestinationConnectorConfigInput
            - $ref: '#/components/schemas/DeltaTableConnectorConfigInput'
            - $ref: '#/components/schemas/ElasticsearchConnectorConfigInput'
            - $ref: '#/components/schemas/GCSDestinationConnectorConfigInput'
            - $ref: '#/components/schemas/KafkaCloudDestinationConnectorConfigInput'
            - $ref: '#/components/schemas/MilvusDestinationConnectorConfigInput'
            - $ref: '#/components/schemas/MongoDBConnectorConfigInput'
            - $ref: '#/components/schemas/Neo4jDestinationConnectorConfigInput'
            - $ref: '#/components/schemas/OneDriveDestinationConnectorConfigInput'
            - $ref: '#/components/schemas/OpenSearchConnectorConfigInput'
            - $ref: '#/components/schemas/PineconeDestinationConnectorConfigInput'
            - $ref: '#/components/schemas/PostgresDestinationConnectorConfigInput'
            - $ref: '#/components/schemas/RedisDestinationConnectorConfigInput'
            - $ref: '#/components/schemas/QdrantCloudDestinationConnectorConfigInput'
            - $ref: '#/components/schemas/S3DestinationConnectorConfigInput'
            - $ref: '#/components/schemas/SnowflakeDestinationConnectorConfigInput'
            - $ref: '#/components/schemas/TeradataDestinationConnectorConfigInput'
            - $ref: '#/components/schemas/WeaviateDestinationConnectorConfigInput'
            - $ref: '#/components/schemas/IBMWatsonxS3DestinationConnectorConfigInput'
            - additionalProperties: true
              type: object
          title: Config
      type: object
      required:
        - config
      title: UpdateDestinationConnector
    DestinationConnectorInformation:
      properties:
        id:
          type: string
          format: uuid
          title: Id
        name:
          type: string
          title: Name
        type:
          anyOf:
            - $ref: '#/components/schemas/DestinationConnectorType'
            - type: string
          title: Type
        config:
          anyOf:
            - $ref: '#/components/schemas/AzureDestinationConnectorConfig'
            - $ref: '#/components/schemas/AstraDBConnectorConfig'
            - $ref: '#/components/schemas/AzureAISearchConnectorConfig'
            - $ref: '#/components/schemas/CouchbaseDestinationConnectorConfig'
            - $ref: '#/components/schemas/DatabricksVolumesConnectorConfig'
            - $ref: '#/components/schemas/DatabricksVDTDestinationConnectorConfig'
            - $ref: '#/components/schemas/DeltaTableConnectorConfig'
            - $ref: '#/components/schemas/ElasticsearchConnectorConfig'
            - $ref: '#/components/schemas/GCSDestinationConnectorConfig'
            - $ref: '#/components/schemas/KafkaCloudDestinationConnectorConfig'
            - $ref: '#/components/schemas/MilvusDestinationConnectorConfig'
            - $ref: '#/components/schemas/MongoDBConnectorConfig'
            - $ref: '#/components/schemas/Neo4jDestinationConnectorConfig'
            - $ref: '#/components/schemas/OneDriveDestinationConnectorConfig'
            - $ref: '#/components/schemas/OpenSearchConnectorConfig'
            - $ref: '#/components/schemas/PineconeDestinationConnectorConfig'
            - $ref: '#/components/schemas/PostgresDestinationConnectorConfig'
            - $ref: '#/components/schemas/RedisDestinationConnectorConfig'
            - $ref: '#/components/schemas/QdrantCloudDestinationConnectorConfig'
            - $ref: '#/components/schemas/S3DestinationConnectorConfig'
            - $ref: '#/components/schemas/SnowflakeDestinationConnectorConfig'
            - $ref: '#/components/schemas/TeradataDestinationConnectorConfig'
            - $ref: '#/components/schemas/WeaviateDestinationConnectorConfig'
            - $ref: '#/components/schemas/IBMWatsonxS3DestinationConnectorConfig'
            - additionalProperties: true
              type: object
          title: Config
        created_at:
          type: string
          format: date-time
          title: Created At
        updated_at:
          anyOf:
            - type: string
              format: date-time
            - type: 'null'
          title: Updated At
      type: object
      required:
        - id
        - name
        - type
        - config
        - created_at
      title: DestinationConnectorInformation
    HTTPValidationError:
      properties:
        detail:
          items:
            $ref: '#/components/schemas/ValidationError'
          type: array
          title: Detail
      type: object
      title: HTTPValidationError
    AzureDestinationConnectorConfigInput:
      properties:
        remote_url:
          type: string
          pattern: ^(az:\/\/|abfs:\/\/)
          title: Remote Url
        account_name:
          anyOf:
            - type: string
            - type: 'null'
          title: Account Name
        account_key:
          anyOf:
            - type: string
            - type: 'null'
          title: Account Key
        connection_string:
          anyOf:
            - type: string
            - type: 'null'
          title: Connection String
        sas_token:
          anyOf:
            - type: string
            - type: 'null'
          title: Sas Token
      type: object
      required:
        - remote_url
      title: AzureDestinationConnectorConfigInput
    AstraDBConnectorConfigInput:
      properties:
        collection_name:
          type: string
          pattern: ^\w+$
          title: Collection Name
        keyspace:
          anyOf:
            - type: string
            - type: 'null'
          title: Keyspace
          default: default_keyspace
        batch_size:
          type: integer
          minimum: 1
          title: Batch Size
          default: 20
        binary_encode_vectors:
          type: boolean
          title: Binary Encode Vectors
          default: true
        api_endpoint:
          type: string
          title: Api Endpoint
        token:
          type: string
          title: Token
        flatten_metadata:
          type: boolean
          title: Flatten Metadata
          default: false
      additionalProperties: true
      type: object
      required:
        - collection_name
        - api_endpoint
        - token
      title: AstraDBConnectorConfigInput
    AzureAISearchConnectorConfigInput:
      properties:
        endpoint:
          type: string
          pattern: ^https:\/\/[a-z0-9-]+
          title: Endpoint
        index:
          type: string
          title: Index
        key:
          type: string
          title: Key
      type: object
      required:
        - endpoint
        - index
        - key
      title: AzureAISearchConnectorConfigInput
    CouchbaseDestinationConnectorConfigInput:
      properties:
        bucket:
          type: string
          title: Bucket
        connection_string:
          type: string
          title: Connection String
        scope:
          anyOf:
            - type: string
            - type: 'null'
          title: Scope
        collection:
          anyOf:
            - type: string
            - type: 'null'
          title: Collection
        batch_size:
          type: integer
          title: Batch Size
        username:
          type: string
          title: Username
        password:
          type: string
          title: Password
      type: object
      required:
        - bucket
        - connection_string
        - batch_size
        - username
        - password
      title: CouchbaseDestinationConnectorConfigInput
    DatabricksVolumesConnectorConfigInput:
      properties:
        host:
          type: string
          title: Host
        catalog:
          type: string
          title: Catalog
        schema:
          type: string
          title: Schema
          default: default
        volume:
          type: string
          title: Volume
        volume_path:
          type: string
          title: Volume Path
        client_secret:
          type: string
          title: Client Secret
        client_id:
          type: string
          title: Client Id
      type: object
      required:
        - host
        - catalog
        - volume
        - volume_path
        - client_secret
        - client_id
      title: DatabricksVolumesConnectorConfigInput
    DatabricksVDTDestinationConnectorConfigInput:
      properties:
        server_hostname:
          type: string
          title: Server Hostname
        http_path:
          type: string
          title: Http Path
        token:
          anyOf:
            - type: string
            - type: 'null'
          title: Token
        client_id:
          anyOf:
            - type: string
            - type: 'null'
          title: Client Id
        client_secret:
          anyOf:
            - type: string
            - type: 'null'
          title: Client Secret
        catalog:
          type: string
          title: Catalog
        database:
          type: string
          title: Database
          default: default
        table_name:
          anyOf:
            - type: string
            - type: 'null'
          title: Table Name
        schema:
          type: string
          title: Schema
          default: default
        volume:
          type: string
          title: Volume
        volume_path:
          anyOf:
            - type: string
            - type: 'null'
          title: Volume Path
      type: object
      required:
        - server_hostname
        - http_path
        - catalog
        - volume
      title: DatabricksVDTDestinationConnectorConfigInput
    DeltaTableConnectorConfigInput:
      properties:
        aws_access_key_id:
          type: string
          title: Aws Access Key Id
        aws_secret_access_key:
          type: string
          title: Aws Secret Access Key
        aws_region:
          type: string
          title: Aws Region
        table_uri:
          type: string
          pattern: ^s3:\/\/
          title: Table Uri
      type: object
      required:
        - aws_access_key_id
        - aws_secret_access_key
        - aws_region
        - table_uri
      title: DeltaTableConnectorConfigInput
    ElasticsearchConnectorConfigInput:
      properties:
        hosts:
          items:
            type: string
          type: array
          title: Hosts
        index_name:
          type: string
          title: Index Name
        es_api_key:
          type: string
          title: Es Api Key
      type: object
      required:
        - hosts
        - index_name
        - es_api_key
      title: ElasticsearchConnectorConfigInput
    GCSDestinationConnectorConfigInput:
      properties:
        remote_url:
          type: string
          pattern: ^(gs|gcs):\/\/
          title: Remote Url
        service_account_key:
          type: string
          title: Service Account Key
      type: object
      required:
        - remote_url
        - service_account_key
      title: GCSDestinationConnectorConfigInput
    KafkaCloudDestinationConnectorConfigInput:
      properties:
        bootstrap_servers:
          type: string
          title: Bootstrap Servers
        port:
          type: integer
          minimum: 1
          title: Port
          default: 9092
        group_id:
          anyOf:
            - type: string
            - type: 'null'
          title: Group Id
        topic:
          type: string
          title: Topic
        kafka_api_key:
          type: string
          title: Kafka Api Key
        secret:
          type: string
          title: Secret
        batch_size:
          type: integer
          minimum: 1
          title: Batch Size
          default: 100
      type: object
      required:
        - bootstrap_servers
        - topic
        - kafka_api_key
        - secret
      title: KafkaCloudDestinationConnectorConfigInput
    MilvusDestinationConnectorConfigInput:
      properties:
        uri:
          type: string
          pattern: ^https?:\/\/
          title: Uri
        user:
          anyOf:
            - type: string
            - type: 'null'
          title: User
        token:
          anyOf:
            - type: string
            - type: 'null'
          title: Token
        password:
          anyOf:
            - type: string
            - type: 'null'
          title: Password
        db_name:
          anyOf:
            - type: string
            - type: 'null'
          title: Db Name
        collection_name:
          type: string
          title: Collection Name
        record_id_key:
          type: string
          title: Record Id Key
      type: object
      required:
        - uri
        - collection_name
        - record_id_key
      title: MilvusDestinationConnectorConfigInput
    MongoDBConnectorConfigInput:
      properties:
        database:
          type: string
          title: Database
        collection:
          type: string
          title: Collection
        uri:
          type: string
          pattern: >-
            ^mongodb(?:\+srv)?:\/\/(?:([^:@/\s]+)(?::([^@/\s]*))?@)?([^/\s:,]+)(?::(\d+))?(?:,([^/\s:,]+)(?::(\d+))?)*(?:\/([^/\s?]*))?(?:\?([^/\s]*))?$
          title: Uri
      type: object
      required:
        - database
        - collection
        - uri
      title: MongoDBConnectorConfigInput
    Neo4jDestinationConnectorConfigInput:
      properties:
        uri:
          type: string
          title: Uri
        database:
          type: string
          title: Database
        username:
          type: string
          title: Username
        password:
          type: string
          title: Password
        batch_size:
          type: integer
          minimum: 1
          title: Batch Size
          default: 100
      type: object
      required:
        - uri
        - database
        - username
        - password
      title: Neo4jDestinationConnectorConfigInput
    OneDriveDestinationConnectorConfigInput:
      properties:
        client_id:
          type: string
          title: Client Id
        user_pname:
          type: string
          title: User Pname
        tenant:
          type: string
          title: Tenant
        authority_url:
          type: string
          title: Authority Url
        client_cred:
          type: string
          title: Client Cred
        remote_url:
          type: string
          pattern: ^onedrive:\/\/
          title: Remote Url
      type: object
      required:
        - client_id
        - user_pname
        - tenant
        - authority_url
        - client_cred
        - remote_url
      title: OneDriveDestinationConnectorConfigInput
    OpenSearchConnectorConfigInput:
      properties:
        hosts:
          items:
            type: string
          type: array
          minItems: 1
          title: Hosts
          description: List of OpenSearch hosts to connect to
        index_name:
          type: string
          title: Index Name
          description: Name of the OpenSearch index to read from or write to
        username:
          anyOf:
            - type: string
            - type: 'null'
          title: Username
          description: Username for basic authentication
        password:
          anyOf:
            - type: string
            - type: 'null'
          title: Password
          description: Password for basic authentication
        aws_access_key_id:
          anyOf:
            - type: string
            - type: 'null'
          title: Aws Access Key Id
          description: >-
            AWS access key ID for IAM authentication. When provided with
            aws_secret_access_key, IAM authentication is used instead of basic
            auth. Region and service type are auto-detected from the host URL.
        aws_secret_access_key:
          anyOf:
            - type: string
            - type: 'null'
          title: Aws Secret Access Key
          description: >-
            AWS secret access key for IAM authentication. Required when
            aws_access_key_id is provided.
        aws_session_token:
          anyOf:
            - type: string
            - type: 'null'
          title: Aws Session Token
          description: >-
            AWS session token for temporary credentials (optional). Only used
            when aws_access_key_id and aws_secret_access_key are provided.
        use_ssl:
          anyOf:
            - type: boolean
            - type: 'null'
          title: Use Ssl
          description: Whether to use SSL for the connection
          default: true
      type: object
      required:
        - hosts
        - index_name
      title: OpenSearchConnectorConfigInput
      description: Input model for creating/updating OpenSearch connectors via API.
    PineconeDestinationConnectorConfigInput:
      properties:
        index_name:
          type: string
          title: Index Name
        api_key:
          type: string
          title: Api Key
        namespace:
          type: string
          title: Namespace
        batch_size:
          type: integer
          minimum: 1
          title: Batch Size
          default: 50
      type: object
      required:
        - index_name
        - api_key
        - namespace
      title: PineconeDestinationConnectorConfigInput
    PostgresDestinationConnectorConfigInput:
      properties:
        host:
          type: string
          title: Host
        database:
          type: string
          title: Database
        port:
          type: integer
          title: Port
        username:
          type: string
          title: Username
        password:
          type: string
          title: Password
        table_name:
          type: string
          title: Table Name
        batch_size:
          type: integer
          title: Batch Size
      type: object
      required:
        - host
        - database
        - port
        - username
        - password
        - table_name
        - batch_size
      title: PostgresDestinationConnectorConfigInput
    RedisDestinationConnectorConfigInput:
      properties:
        host:
          type: string
          title: Host
        port:
          type: integer
          minimum: 1
          title: Port
          default: 6379
        username:
          anyOf:
            - type: string
            - type: 'null'
          title: Username
        password:
          anyOf:
            - type: string
            - type: 'null'
          title: Password
        uri:
          anyOf:
            - type: string
            - type: 'null'
          title: Uri
        database:
          type: integer
          minimum: 0
          title: Database
          default: 0
        ssl:
          type: boolean
          title: Ssl
          default: true
        batch_size:
          type: integer
          minimum: 1
          title: Batch Size
          default: 100
      type: object
      required:
        - host
      title: RedisDestinationConnectorConfigInput
    QdrantCloudDestinationConnectorConfigInput:
      properties:
        url:
          type: string
          title: Url
        api_key:
          type: string
          title: Api Key
        collection_name:
          type: string
          title: Collection Name
        batch_size:
          type: integer
          minimum: 1
          title: Batch Size
          default: 50
      type: object
      required:
        - url
        - api_key
        - collection_name
      title: QdrantCloudDestinationConnectorConfigInput
    S3DestinationConnectorConfigInput:
      properties:
        remote_url:
          type: string
          pattern: ^s3:\/\/
          title: Remote Url
        anonymous:
          type: boolean
          title: Anonymous
          default: false
        key:
          anyOf:
            - type: string
            - type: 'null'
          title: Key
        secret:
          anyOf:
            - type: string
            - type: 'null'
          title: Secret
        token:
          anyOf:
            - type: string
            - type: 'null'
          title: Token
        endpoint_url:
          anyOf:
            - type: string
            - type: 'null'
          title: Endpoint Url
      type: object
      required:
        - remote_url
      title: S3DestinationConnectorConfigInput
    SnowflakeDestinationConnectorConfigInput:
      properties:
        account:
          type: string
          title: Account
        role:
          type: string
          title: Role
        user:
          type: string
          title: User
        password:
          type: string
          title: Password
        host:
          type: string
          title: Host
        port:
          type: integer
          minimum: 1
          title: Port
          default: 443
        database:
          type: string
          title: Database
        schema:
          type: string
          title: Schema
        table_name:
          type: string
          title: Table Name
          default: elements
        batch_size:
          type: integer
          minimum: 1
          title: Batch Size
          default: 50
        record_id_key:
          anyOf:
            - type: string
            - type: 'null'
          title: Record Id Key
          default: record_id
      type: object
      required:
        - account
        - role
        - user
        - password
        - host
        - database
        - schema
      title: SnowflakeDestinationConnectorConfigInput
    TeradataDestinationConnectorConfigInput:
      properties:
        host:
          type: string
          title: Host
        user:
          type: string
          title: User
        password:
          type: string
          title: Password
        database:
          anyOf:
            - type: string
            - type: 'null'
          title: Database
        dbs_port:
          type: integer
          minimum: 1
          title: Dbs Port
          default: 1025
        table_name:
          type: string
          title: Table Name
          default: elements
        batch_size:
          type: integer
          minimum: 1
          title: Batch Size
          default: 50
        record_id_key:
          anyOf:
            - type: string
            - type: 'null'
          title: Record Id Key
          default: record_id
      type: object
      required:
        - host
        - user
        - password
      title: TeradataDestinationConnectorConfigInput
    WeaviateDestinationConnectorConfigInput:
      properties:
        cluster_url:
          type: string
          title: Cluster Url
        api_key:
          type: string
          title: Api Key
        collection:
          anyOf:
            - type: string
            - type: 'null'
          title: Collection
      type: object
      required:
        - cluster_url
        - api_key
      title: WeaviateDestinationConnectorConfigInput
    IBMWatsonxS3DestinationConnectorConfigInput:
      properties:
        iam_api_key:
          type: string
          title: Iam Api Key
        access_key_id:
          type: string
          title: Access Key Id
        secret_access_key:
          type: string
          title: Secret Access Key
        iceberg_endpoint:
          type: string
          title: Iceberg Endpoint
        object_storage_endpoint:
          type: string
          title: Object Storage Endpoint
        object_storage_region:
          type: string
          title: Object Storage Region
        catalog:
          type: string
          title: Catalog
        max_retries_connection:
          type: integer
          maximum: 100
          minimum: 2
          title: Max Retries Connection
          description: Maximum number of retries in case of a connection error
          default: 10
        namespace:
          type: string
          title: Namespace
        table:
          type: string
          title: Table
        max_retries:
          type: integer
          maximum: 500
          minimum: 2
          title: Max Retries
          description: Maximum number of retries to upload data
          default: 50
        record_id_key:
          type: string
          title: Record Id Key
          description: Searchable key to find entries for the same record on previous runs
          default: record_id
      type: object
      required:
        - iam_api_key
        - access_key_id
        - secret_access_key
        - iceberg_endpoint
        - object_storage_endpoint
        - object_storage_region
        - catalog
        - namespace
        - table
      title: IBMWatsonxS3DestinationConnectorConfigInput
    DestinationConnectorType:
      type: string
      enum:
        - azure
        - astradb
        - azure_ai_search
        - couchbase
        - databricks_volumes
        - databricks_volume_delta_tables
        - delta_table
        - elasticsearch
        - gcs
        - kafka-cloud
        - milvus
        - mongodb
        - motherduck
        - neo4j
        - onedrive
        - opensearch
        - pinecone
        - postgres
        - redis
        - qdrant-cloud
        - s3
        - snowflake
        - teradata
        - weaviate-cloud
        - ibm_watsonx_s3
      title: DestinationConnectorType
    AzureDestinationConnectorConfig:
      properties:
        remote_url:
          type: string
          title: Remote Url
        account_name:
          anyOf:
            - type: string
            - type: 'null'
          title: Account Name
        account_key:
          anyOf:
            - type: string
            - type: 'null'
          title: Account Key
        connection_string:
          anyOf:
            - type: string
            - type: 'null'
          title: Connection String
        sas_token:
          anyOf:
            - type: string
            - type: 'null'
          title: Sas Token
      type: object
      required:
        - remote_url
      title: AzureDestinationConnectorConfig
    AstraDBConnectorConfig:
      properties:
        collection_name:
          type: string
          title: Collection Name
        keyspace:
          anyOf:
            - type: string
            - type: 'null'
          title: Keyspace
        batch_size:
          type: integer
          title: Batch Size
        binary_encode_vectors:
          type: boolean
          title: Binary Encode Vectors
          default: true
        api_endpoint:
          type: string
          title: Api Endpoint
        token:
          type: string
          title: Token
      additionalProperties: true
      type: object
      required:
        - collection_name
        - batch_size
        - api_endpoint
        - token
      title: AstraDBConnectorConfig
    AzureAISearchConnectorConfig:
      properties:
        endpoint:
          type: string
          title: Endpoint
        index:
          type: string
          title: Index
        key:
          type: string
          title: Key
      type: object
      required:
        - endpoint
        - index
        - key
      title: AzureAISearchConnectorConfig
    CouchbaseDestinationConnectorConfig:
      properties:
        bucket:
          type: string
          title: Bucket
        connection_string:
          type: string
          title: Connection String
        scope:
          anyOf:
            - type: string
            - type: 'null'
          title: Scope
        collection:
          anyOf:
            - type: string
            - type: 'null'
          title: Collection
        batch_size:
          type: integer
          title: Batch Size
        username:
          type: string
          title: Username
        password:
          type: string
          title: Password
      type: object
      required:
        - bucket
        - connection_string
        - batch_size
        - username
        - password
      title: CouchbaseDestinationConnectorConfig
    DatabricksVolumesConnectorConfig:
      properties:
        host:
          type: string
          title: Host
        catalog:
          type: string
          title: Catalog
        schema:
          type: string
          title: Schema
          default: default
        volume:
          type: string
          title: Volume
        volume_path:
          type: string
          title: Volume Path
        client_secret:
          type: string
          title: Client Secret
        client_id:
          type: string
          title: Client Id
      type: object
      required:
        - host
        - catalog
        - volume
        - volume_path
        - client_secret
        - client_id
      title: DatabricksVolumesConnectorConfig
    DatabricksVDTDestinationConnectorConfig:
      properties:
        server_hostname:
          type: string
          title: Server Hostname
        http_path:
          type: string
          title: Http Path
        token:
          anyOf:
            - type: string
            - type: 'null'
          title: Token
        client_id:
          anyOf:
            - type: string
            - type: 'null'
          title: Client Id
        client_secret:
          anyOf:
            - type: string
            - type: 'null'
          title: Client Secret
        catalog:
          type: string
          title: Catalog
        database:
          type: string
          title: Database
          default: default
        table_name:
          anyOf:
            - type: string
            - type: 'null'
          title: Table Name
        schema:
          type: string
          title: Schema
          default: default
        volume:
          type: string
          title: Volume
        volume_path:
          anyOf:
            - type: string
            - type: 'null'
          title: Volume Path
      type: object
      required:
        - server_hostname
        - http_path
        - catalog
        - volume
      title: DatabricksVDTDestinationConnectorConfig
    DeltaTableConnectorConfig:
      properties:
        aws_access_key_id:
          type: string
          title: Aws Access Key Id
        aws_secret_access_key:
          type: string
          title: Aws Secret Access Key
        aws_region:
          type: string
          title: Aws Region
        table_uri:
          type: string
          title: Table Uri
      type: object
      required:
        - aws_access_key_id
        - aws_secret_access_key
        - aws_region
        - table_uri
      title: DeltaTableConnectorConfig
    ElasticsearchConnectorConfig:
      properties:
        hosts:
          items:
            type: string
          type: array
          title: Hosts
        index_name:
          type: string
          title: Index Name
        es_api_key:
          type: string
          title: Es Api Key
      type: object
      required:
        - hosts
        - index_name
        - es_api_key
      title: ElasticsearchConnectorConfig
    GCSDestinationConnectorConfig:
      properties:
        remote_url:
          type: string
          title: Remote Url
        service_account_key:
          type: string
          title: Service Account Key
      type: object
      required:
        - remote_url
        - service_account_key
      title: GCSDestinationConnectorConfig
    KafkaCloudDestinationConnectorConfig:
      properties:
        bootstrap_servers:
          type: string
          title: Bootstrap Servers
        port:
          type: integer
          title: Port
        group_id:
          anyOf:
            - type: string
            - type: 'null'
          title: Group Id
        topic:
          type: string
          title: Topic
        kafka_api_key:
          type: string
          title: Kafka Api Key
        secret:
          type: string
          title: Secret
        batch_size:
          type: integer
          title: Batch Size
      type: object
      required:
        - bootstrap_servers
        - port
        - topic
        - kafka_api_key
        - secret
        - batch_size
      title: KafkaCloudDestinationConnectorConfig
    MilvusDestinationConnectorConfig:
      properties:
        uri:
          type: string
          title: Uri
        user:
          anyOf:
            - type: string
            - type: 'null'
          title: User
        token:
          anyOf:
            - type: string
            - type: 'null'
          title: Token
        password:
          anyOf:
            - type: string
            - type: 'null'
          title: Password
        db_name:
          anyOf:
            - type: string
            - type: 'null'
          title: Db Name
        collection_name:
          type: string
          title: Collection Name
        record_id_key:
          type: string
          title: Record Id Key
      type: object
      required:
        - uri
        - collection_name
        - record_id_key
      title: MilvusDestinationConnectorConfig
    MongoDBConnectorConfig:
      properties:
        database:
          type: string
          title: Database
        collection:
          type: string
          title: Collection
        uri:
          type: string
          title: Uri
      type: object
      required:
        - database
        - collection
        - uri
      title: MongoDBConnectorConfig
    Neo4jDestinationConnectorConfig:
      properties:
        uri:
          type: string
          title: Uri
        database:
          type: string
          title: Database
        username:
          type: string
          title: Username
        password:
          type: string
          title: Password
        batch_size:
          type: integer
          title: Batch Size
      type: object
      required:
        - uri
        - database
        - username
        - password
        - batch_size
      title: Neo4jDestinationConnectorConfig
    OneDriveDestinationConnectorConfig:
      properties:
        client_id:
          type: string
          title: Client Id
        user_pname:
          type: string
          title: User Pname
        tenant:
          type: string
          title: Tenant
        authority_url:
          type: string
          title: Authority Url
        client_cred:
          type: string
          title: Client Cred
        remote_url:
          type: string
          title: Remote Url
      type: object
      required:
        - client_id
        - user_pname
        - tenant
        - authority_url
        - client_cred
        - remote_url
      title: OneDriveDestinationConnectorConfig
    OpenSearchConnectorConfig:
      properties:
        hosts:
          items:
            type: string
          type: array
          minItems: 1
          title: Hosts
          description: List of OpenSearch hosts to connect to
        index_name:
          type: string
          title: Index Name
          description: Name of the OpenSearch index to read from or write to
        username:
          anyOf:
            - type: string
            - type: 'null'
          title: Username
          description: Username for basic authentication
        password:
          anyOf:
            - type: string
            - type: 'null'
          title: Password
          description: Password for basic authentication
        aws_access_key_id:
          anyOf:
            - type: string
            - type: 'null'
          title: Aws Access Key Id
          description: >-
            AWS access key ID for IAM authentication. When provided with
            aws_secret_access_key, IAM authentication is used instead of basic
            auth. Region and service type are auto-detected from the host URL.
        aws_secret_access_key:
          anyOf:
            - type: string
            - type: 'null'
          title: Aws Secret Access Key
          description: >-
            AWS secret access key for IAM authentication. Required when
            aws_access_key_id is provided.
        aws_session_token:
          anyOf:
            - type: string
            - type: 'null'
          title: Aws Session Token
          description: >-
            AWS session token for temporary credentials (optional). Only used
            when aws_access_key_id and aws_secret_access_key are provided.
        use_ssl:
          anyOf:
            - type: boolean
            - type: 'null'
          title: Use Ssl
          description: Whether to use SSL for the connection
          default: true
      type: object
      required:
        - hosts
        - index_name
      title: OpenSearchConnectorConfig
      description: >-
        OpenSearch connector configuration.


        OpenSearch is a fork of Elasticsearch with similar functionality.

        Supports two authentication methods:

        1. Basic auth: username + password

        2. AWS IAM auth: aws_access_key_id + aws_secret_access_key (+ optional
        aws_session_token)


        For AWS OpenSearch Service or OpenSearch Serverless, region and service
        type

        are auto-detected from the host URL.
    PineconeDestinationConnectorConfig:
      properties:
        index_name:
          type: string
          title: Index Name
        api_key:
          type: string
          title: Api Key
        namespace:
          type: string
          title: Namespace
        batch_size:
          type: integer
          title: Batch Size
      type: object
      required:
        - index_name
        - api_key
        - namespace
        - batch_size
      title: PineconeDestinationConnectorConfig
    PostgresDestinationConnectorConfig:
      properties:
        host:
          type: string
          title: Host
        database:
          type: string
          title: Database
        port:
          type: integer
          title: Port
        username:
          type: string
          title: Username
        password:
          type: string
          title: Password
        table_name:
          type: string
          title: Table Name
        batch_size:
          type: integer
          title: Batch Size
      type: object
      required:
        - host
        - database
        - port
        - username
        - password
        - table_name
        - batch_size
      title: PostgresDestinationConnectorConfig
    RedisDestinationConnectorConfig:
      properties:
        host:
          type: string
          title: Host
        port:
          type: integer
          title: Port
        username:
          anyOf:
            - type: string
            - type: 'null'
          title: Username
        password:
          anyOf:
            - type: string
            - type: 'null'
          title: Password
        uri:
          anyOf:
            - type: string
            - type: 'null'
          title: Uri
        database:
          type: integer
          title: Database
        ssl:
          type: boolean
          title: Ssl
        batch_size:
          type: integer
          title: Batch Size
      type: object
      required:
        - host
        - port
        - database
        - ssl
        - batch_size
      title: RedisDestinationConnectorConfig
    QdrantCloudDestinationConnectorConfig:
      properties:
        url:
          type: string
          title: Url
        api_key:
          type: string
          title: Api Key
        collection_name:
          type: string
          title: Collection Name
        batch_size:
          type: integer
          title: Batch Size
      type: object
      required:
        - url
        - api_key
        - collection_name
        - batch_size
      title: QdrantCloudDestinationConnectorConfig
    S3DestinationConnectorConfig:
      properties:
        remote_url:
          type: string
          title: Remote Url
        anonymous:
          type: boolean
          title: Anonymous
        key:
          anyOf:
            - type: string
            - type: 'null'
          title: Key
        secret:
          anyOf:
            - type: string
            - type: 'null'
          title: Secret
        token:
          anyOf:
            - type: string
            - type: 'null'
          title: Token
        endpoint_url:
          anyOf:
            - type: string
            - type: 'null'
          title: Endpoint Url
      type: object
      required:
        - remote_url
        - anonymous
      title: S3DestinationConnectorConfig
    SnowflakeDestinationConnectorConfig:
      properties:
        account:
          type: string
          title: Account
        role:
          type: string
          title: Role
        user:
          type: string
          title: User
        password:
          type: string
          title: Password
        host:
          type: string
          title: Host
        port:
          type: integer
          minimum: 1
          title: Port
          default: 443
        database:
          type: string
          title: Database
        schema:
          type: string
          title: Schema
        table_name:
          type: string
          title: Table Name
          default: elements
        batch_size:
          type: integer
          minimum: 1
          title: Batch Size
          default: 50
        record_id_key:
          anyOf:
            - type: string
            - type: 'null'
          title: Record Id Key
          default: record_id
      type: object
      required:
        - account
        - role
        - user
        - password
        - host
        - database
        - schema
      title: SnowflakeDestinationConnectorConfig
    TeradataDestinationConnectorConfig:
      properties:
        host:
          type: string
          title: Host
        user:
          type: string
          title: User
        password:
          type: string
          title: Password
        database:
          anyOf:
            - type: string
            - type: 'null'
          title: Database
        dbs_port:
          type: integer
          minimum: 1
          title: Dbs Port
          default: 1025
        table_name:
          type: string
          title: Table Name
          default: elements
        batch_size:
          type: integer
          minimum: 1
          title: Batch Size
          default: 50
        record_id_key:
          anyOf:
            - type: string
            - type: 'null'
          title: Record Id Key
          default: record_id
      type: object
      required:
        - host
        - user
        - password
      title: TeradataDestinationConnectorConfig
    WeaviateDestinationConnectorConfig:
      properties:
        cluster_url:
          type: string
          title: Cluster Url
        api_key:
          type: string
          title: Api Key
        collection:
          anyOf:
            - type: string
            - type: 'null'
          title: Collection
      type: object
      required:
        - cluster_url
        - api_key
      title: WeaviateDestinationConnectorConfig
    IBMWatsonxS3DestinationConnectorConfig:
      properties:
        iam_api_key:
          type: string
          title: Iam Api Key
        access_key_id:
          type: string
          title: Access Key Id
        secret_access_key:
          type: string
          title: Secret Access Key
        iceberg_endpoint:
          type: string
          title: Iceberg Endpoint
        object_storage_endpoint:
          type: string
          title: Object Storage Endpoint
        object_storage_region:
          type: string
          title: Object Storage Region
        catalog:
          type: string
          title: Catalog
        max_retries_connection:
          type: integer
          title: Max Retries Connection
        namespace:
          type: string
          title: Namespace
        table:
          type: string
          title: Table
        max_retries:
          type: integer
          title: Max Retries
        record_id_key:
          type: string
          title: Record Id Key
      type: object
      required:
        - iam_api_key
        - access_key_id
        - secret_access_key
        - iceberg_endpoint
        - object_storage_endpoint
        - object_storage_region
        - catalog
        - max_retries_connection
        - namespace
        - table
        - max_retries
        - record_id_key
      title: IBMWatsonxS3DestinationConnectorConfig
    ValidationError:
      properties:
        loc:
          items:
            anyOf:
              - type: string
              - type: integer
          type: array
          title: Location
        msg:
          type: string
          title: Message
        type:
          type: string
          title: Error Type
      type: object
      required:
        - loc
        - msg
        - type
      title: ValidationError

````

---

# Source: https://docs.unstructured.io/api-reference/sources/update-source-connector.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Update source connector

> Update the configuration of an existing source connector.


## OpenAPI

````yaml https://platform.unstructuredapp.io/openapi.json put /api/v1/sources/{source_id}
openapi: 3.1.0
info:
  title: Platform API
  version: 3.1.0
servers:
  - url: https://platform.unstructuredapp.io/
    description: Unstructured Platform API
    x-speakeasy-server-id: platform-api
security: []
paths:
  /api/v1/sources/{source_id}:
    put:
      tags:
        - sources
      summary: Update source connector
      description: Update the configuration of an existing source connector.
      operationId: update_source
      parameters:
        - name: source_id
          in: path
          required: true
          schema:
            type: string
            format: uuid
            title: Source Id
        - name: unstructured-api-key
          in: header
          required: false
          schema:
            anyOf:
              - type: string
              - type: 'null'
            title: Unstructured-Api-Key
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/UpdateSourceConnector'
      responses:
        '200':
          description: Successful Response
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/SourceConnectorInformation'
        '422':
          description: Validation Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/HTTPValidationError'
components:
  schemas:
    UpdateSourceConnector:
      properties:
        config:
          anyOf:
            - $ref: '#/components/schemas/AzureSourceConnectorConfigInput'
            - $ref: '#/components/schemas/BoxSourceConnectorConfigInput'
            - $ref: '#/components/schemas/ConfluenceSourceConnectorConfigInput'
            - $ref: '#/components/schemas/CouchbaseSourceConnectorConfigInput'
            - $ref: '#/components/schemas/DatabricksVolumesConnectorConfigInput'
            - $ref: '#/components/schemas/DropboxSourceConnectorConfigInput'
            - $ref: '#/components/schemas/ElasticsearchConnectorConfigInput'
            - $ref: '#/components/schemas/GCSSourceConnectorConfigInput'
            - $ref: '#/components/schemas/GoogleDriveSourceConnectorConfigInput'
            - $ref: '#/components/schemas/KafkaCloudSourceConnectorConfigInput'
            - $ref: '#/components/schemas/MongoDBConnectorConfigInput'
            - $ref: '#/components/schemas/OneDriveSourceConnectorConfigInput'
            - $ref: '#/components/schemas/OpenSearchConnectorConfigInput'
            - $ref: '#/components/schemas/OutlookSourceConnectorConfigInput'
            - $ref: '#/components/schemas/PostgresSourceConnectorConfigInput'
            - $ref: '#/components/schemas/S3SourceConnectorConfigInput'
            - $ref: '#/components/schemas/SalesforceSourceConnectorConfigInput'
            - $ref: '#/components/schemas/SharePointSourceConnectorConfigInput'
            - $ref: '#/components/schemas/SnowflakeSourceConnectorConfigInput'
            - $ref: '#/components/schemas/TeradataSourceConnectorConfigInput'
            - $ref: '#/components/schemas/JiraSourceConnectorConfigInput'
            - $ref: '#/components/schemas/ZendeskSourceConnectorConfigInput'
            - additionalProperties: true
              type: object
          title: Config
      type: object
      required:
        - config
      title: UpdateSourceConnector
    SourceConnectorInformation:
      properties:
        id:
          type: string
          format: uuid
          title: Id
        name:
          type: string
          title: Name
        type:
          anyOf:
            - $ref: '#/components/schemas/SourceConnectorType'
            - type: string
          title: Type
        config:
          anyOf:
            - $ref: '#/components/schemas/AzureSourceConnectorConfig'
            - $ref: '#/components/schemas/BoxSourceConnectorConfig'
            - $ref: '#/components/schemas/ConfluenceSourceConnectorConfig'
            - $ref: '#/components/schemas/CouchbaseSourceConnectorConfig'
            - $ref: '#/components/schemas/DatabricksVolumesConnectorConfig'
            - $ref: '#/components/schemas/DropboxSourceConnectorConfig'
            - $ref: '#/components/schemas/ElasticsearchConnectorConfig'
            - $ref: '#/components/schemas/GCSSourceConnectorConfig'
            - $ref: '#/components/schemas/GoogleDriveSourceConnectorConfig'
            - $ref: '#/components/schemas/KafkaCloudSourceConnectorConfig'
            - $ref: '#/components/schemas/MongoDBConnectorConfig'
            - $ref: '#/components/schemas/OneDriveSourceConnectorConfig'
            - $ref: '#/components/schemas/OpenSearchConnectorConfig'
            - $ref: '#/components/schemas/OutlookSourceConnectorConfig'
            - $ref: '#/components/schemas/PostgresSourceConnectorConfig'
            - $ref: '#/components/schemas/S3SourceConnectorConfig'
            - $ref: '#/components/schemas/SalesforceSourceConnectorConfig'
            - $ref: '#/components/schemas/SharePointSourceConnectorConfig'
            - $ref: '#/components/schemas/SnowflakeSourceConnectorConfig'
            - $ref: '#/components/schemas/TeradataSourceConnectorConfig'
            - $ref: '#/components/schemas/JiraSourceConnectorConfig'
            - $ref: '#/components/schemas/ZendeskSourceConnectorConfig'
            - additionalProperties: true
              type: object
          title: Config
        created_at:
          type: string
          format: date-time
          title: Created At
        updated_at:
          anyOf:
            - type: string
              format: date-time
            - type: 'null'
          title: Updated At
      type: object
      required:
        - id
        - name
        - type
        - config
        - created_at
      title: SourceConnectorInformation
    HTTPValidationError:
      properties:
        detail:
          items:
            $ref: '#/components/schemas/ValidationError'
          type: array
          title: Detail
      type: object
      title: HTTPValidationError
    AzureSourceConnectorConfigInput:
      properties:
        remote_url:
          type: string
          pattern: ^(az:\/\/|abfs:\/\/)
          title: Remote Url
        account_name:
          anyOf:
            - type: string
            - type: 'null'
          title: Account Name
        account_key:
          anyOf:
            - type: string
            - type: 'null'
          title: Account Key
        connection_string:
          anyOf:
            - type: string
            - type: 'null'
          title: Connection String
        sas_token:
          anyOf:
            - type: string
            - type: 'null'
          title: Sas Token
        recursive:
          type: boolean
          title: Recursive
          default: true
      type: object
      required:
        - remote_url
      title: AzureSourceConnectorConfigInput
    BoxSourceConnectorConfigInput:
      properties:
        box_app_config:
          type: string
          title: Box App Config
        remote_url:
          type: string
          pattern: ^box:\/\/
          title: Remote Url
        recursive:
          type: boolean
          title: Recursive
          default: true
      type: object
      required:
        - box_app_config
        - remote_url
      title: BoxSourceConnectorConfigInput
    ConfluenceSourceConnectorConfigInput:
      properties:
        url:
          type: string
          title: Url
        username:
          type: string
          title: Username
        password:
          anyOf:
            - type: string
            - type: 'null'
          title: Password
        api_token:
          anyOf:
            - type: string
            - type: 'null'
          title: Api Token
        token:
          anyOf:
            - type: string
            - type: 'null'
          title: Token
        cloud:
          type: boolean
          title: Cloud
          default: false
        extract_images:
          type: boolean
          title: Extract Images
          default: false
        extract_files:
          type: boolean
          title: Extract Files
          default: false
        max_num_of_spaces:
          type: integer
          minimum: 1
          title: Max Num Of Spaces
          default: 500
        max_num_of_docs_from_each_space:
          type: integer
          minimum: 1
          title: Max Num Of Docs From Each Space
          default: 150
        spaces:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Spaces
      type: object
      required:
        - url
        - username
      title: ConfluenceSourceConnectorConfigInput
    CouchbaseSourceConnectorConfigInput:
      properties:
        bucket:
          type: string
          title: Bucket
        connection_string:
          type: string
          title: Connection String
        scope:
          anyOf:
            - type: string
            - type: 'null'
          title: Scope
        collection:
          anyOf:
            - type: string
            - type: 'null'
          title: Collection
        batch_size:
          type: integer
          title: Batch Size
        username:
          type: string
          title: Username
        password:
          type: string
          title: Password
        collection_id:
          type: string
          title: Collection Id
      type: object
      required:
        - bucket
        - connection_string
        - batch_size
        - username
        - password
        - collection_id
      title: CouchbaseSourceConnectorConfigInput
    DatabricksVolumesConnectorConfigInput:
      properties:
        host:
          type: string
          title: Host
        catalog:
          type: string
          title: Catalog
        schema:
          type: string
          title: Schema
          default: default
        volume:
          type: string
          title: Volume
        volume_path:
          type: string
          title: Volume Path
        client_secret:
          type: string
          title: Client Secret
        client_id:
          type: string
          title: Client Id
      type: object
      required:
        - host
        - catalog
        - volume
        - volume_path
        - client_secret
        - client_id
      title: DatabricksVolumesConnectorConfigInput
    DropboxSourceConnectorConfigInput:
      properties:
        token:
          type: string
          title: Token
        remote_url:
          type: string
          pattern: ^dropbox:\/\/
          title: Remote Url
        recursive:
          type: boolean
          title: Recursive
          default: true
      type: object
      required:
        - token
        - remote_url
      title: DropboxSourceConnectorConfigInput
    ElasticsearchConnectorConfigInput:
      properties:
        hosts:
          items:
            type: string
          type: array
          title: Hosts
        index_name:
          type: string
          title: Index Name
        es_api_key:
          type: string
          title: Es Api Key
      type: object
      required:
        - hosts
        - index_name
        - es_api_key
      title: ElasticsearchConnectorConfigInput
    GCSSourceConnectorConfigInput:
      properties:
        remote_url:
          type: string
          pattern: ^(gs|gcs):\/\/
          title: Remote Url
        service_account_key:
          type: string
          title: Service Account Key
        recursive:
          type: boolean
          title: Recursive
          default: true
      type: object
      required:
        - remote_url
        - service_account_key
      title: GCSSourceConnectorConfigInput
    GoogleDriveSourceConnectorConfigInput:
      properties:
        drive_id:
          type: string
          title: Drive Id
        service_account_key:
          anyOf:
            - $ref: '#/components/schemas/SecretReference'
            - type: string
          title: Service Account Key
        extensions:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Extensions
        recursive:
          type: boolean
          title: Recursive
          default: true
      type: object
      required:
        - drive_id
        - service_account_key
      title: GoogleDriveSourceConnectorConfigInput
    KafkaCloudSourceConnectorConfigInput:
      properties:
        bootstrap_servers:
          type: string
          title: Bootstrap Servers
        port:
          type: integer
          minimum: 1
          title: Port
          default: 9092
        group_id:
          anyOf:
            - type: string
            - type: 'null'
          title: Group Id
        topic:
          type: string
          title: Topic
        kafka_api_key:
          type: string
          title: Kafka Api Key
        secret:
          type: string
          title: Secret
        num_messages_to_consume:
          type: integer
          minimum: 1
          title: Num Messages To Consume
          default: 100
      type: object
      required:
        - bootstrap_servers
        - topic
        - kafka_api_key
        - secret
      title: KafkaCloudSourceConnectorConfigInput
    MongoDBConnectorConfigInput:
      properties:
        database:
          type: string
          title: Database
        collection:
          type: string
          title: Collection
        uri:
          type: string
          pattern: >-
            ^mongodb(?:\+srv)?:\/\/(?:([^:@/\s]+)(?::([^@/\s]*))?@)?([^/\s:,]+)(?::(\d+))?(?:,([^/\s:,]+)(?::(\d+))?)*(?:\/([^/\s?]*))?(?:\?([^/\s]*))?$
          title: Uri
      type: object
      required:
        - database
        - collection
        - uri
      title: MongoDBConnectorConfigInput
    OneDriveSourceConnectorConfigInput:
      properties:
        client_id:
          type: string
          title: Client Id
        user_pname:
          type: string
          title: User Pname
        tenant:
          type: string
          title: Tenant
        authority_url:
          type: string
          title: Authority Url
        client_cred:
          type: string
          title: Client Cred
        recursive:
          type: boolean
          title: Recursive
          default: false
        path:
          type: string
          title: Path
      type: object
      required:
        - client_id
        - user_pname
        - tenant
        - authority_url
        - client_cred
        - path
      title: OneDriveSourceConnectorConfigInput
    OpenSearchConnectorConfigInput:
      properties:
        hosts:
          items:
            type: string
          type: array
          minItems: 1
          title: Hosts
          description: List of OpenSearch hosts to connect to
        index_name:
          type: string
          title: Index Name
          description: Name of the OpenSearch index to read from or write to
        username:
          anyOf:
            - type: string
            - type: 'null'
          title: Username
          description: Username for basic authentication
        password:
          anyOf:
            - type: string
            - type: 'null'
          title: Password
          description: Password for basic authentication
        aws_access_key_id:
          anyOf:
            - type: string
            - type: 'null'
          title: Aws Access Key Id
          description: >-
            AWS access key ID for IAM authentication. When provided with
            aws_secret_access_key, IAM authentication is used instead of basic
            auth. Region and service type are auto-detected from the host URL.
        aws_secret_access_key:
          anyOf:
            - type: string
            - type: 'null'
          title: Aws Secret Access Key
          description: >-
            AWS secret access key for IAM authentication. Required when
            aws_access_key_id is provided.
        aws_session_token:
          anyOf:
            - type: string
            - type: 'null'
          title: Aws Session Token
          description: >-
            AWS session token for temporary credentials (optional). Only used
            when aws_access_key_id and aws_secret_access_key are provided.
        use_ssl:
          anyOf:
            - type: boolean
            - type: 'null'
          title: Use Ssl
          description: Whether to use SSL for the connection
          default: true
      type: object
      required:
        - hosts
        - index_name
      title: OpenSearchConnectorConfigInput
      description: Input model for creating/updating OpenSearch connectors via API.
    OutlookSourceConnectorConfigInput:
      properties:
        authority_url:
          anyOf:
            - type: string
            - type: 'null'
          title: Authority Url
        tenant:
          anyOf:
            - type: string
            - type: 'null'
          title: Tenant
        client_id:
          type: string
          title: Client Id
        client_cred:
          type: string
          title: Client Cred
        outlook_folders:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Outlook Folders
        recursive:
          type: boolean
          title: Recursive
          default: false
        user_email:
          type: string
          title: User Email
      type: object
      required:
        - client_id
        - client_cred
        - user_email
      title: OutlookSourceConnectorConfigInput
    PostgresSourceConnectorConfigInput:
      properties:
        host:
          type: string
          title: Host
        database:
          type: string
          title: Database
        port:
          type: integer
          title: Port
        username:
          type: string
          title: Username
        password:
          type: string
          title: Password
        table_name:
          type: string
          title: Table Name
        batch_size:
          type: integer
          title: Batch Size
        id_column:
          type: string
          title: Id Column
          default: id
        fields:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Fields
      type: object
      required:
        - host
        - database
        - port
        - username
        - password
        - table_name
        - batch_size
      title: PostgresSourceConnectorConfigInput
    S3SourceConnectorConfigInput:
      properties:
        remote_url:
          type: string
          pattern: ^s3:\/\/
          title: Remote Url
        anonymous:
          type: boolean
          title: Anonymous
          default: false
        key:
          anyOf:
            - type: string
            - type: 'null'
          title: Key
        secret:
          anyOf:
            - type: string
            - type: 'null'
          title: Secret
        token:
          anyOf:
            - type: string
            - type: 'null'
          title: Token
        endpoint_url:
          anyOf:
            - type: string
            - type: 'null'
          title: Endpoint Url
        recursive:
          type: boolean
          title: Recursive
          default: true
      type: object
      required:
        - remote_url
      title: S3SourceConnectorConfigInput
    SalesforceSourceConnectorConfigInput:
      properties:
        username:
          type: string
          title: Username
        consumer_key:
          type: string
          title: Consumer Key
        private_key:
          type: string
          title: Private Key
        categories:
          items:
            type: string
          type: array
          title: Categories
      type: object
      required:
        - username
        - consumer_key
        - private_key
        - categories
      title: SalesforceSourceConnectorConfigInput
    SharePointSourceConnectorConfigInput:
      properties:
        site:
          type: string
          pattern: >-
            ^https:\/\/[a-zA-Z0-9-]+(\.[a-zA-Z0-9-]+)+(\/(sites|teams)\/[a-zA-Z0-9-_]+)?$
          title: Site
        tenant:
          type: string
          title: Tenant
        authority_url:
          type: string
          title: Authority Url
          default: https://login.microsoftonline.com
        user_pname:
          type: string
          title: User Pname
        client_id:
          type: string
          title: Client Id
        client_cred:
          type: string
          title: Client Cred
        recursive:
          type: boolean
          title: Recursive
          default: false
        path:
          anyOf:
            - type: string
            - type: 'null'
          title: Path
      type: object
      required:
        - site
        - tenant
        - user_pname
        - client_id
        - client_cred
      title: SharePointSourceConnectorConfigInput
    SnowflakeSourceConnectorConfigInput:
      properties:
        account:
          type: string
          title: Account
        role:
          type: string
          title: Role
        user:
          type: string
          title: User
        password:
          type: string
          title: Password
        host:
          type: string
          title: Host
        port:
          type: integer
          minimum: 1
          title: Port
          default: 443
        database:
          type: string
          title: Database
        schema:
          type: string
          title: Schema
        table_name:
          type: string
          title: Table Name
        batch_size:
          type: integer
          minimum: 1
          title: Batch Size
          default: 100
        id_column:
          type: string
          title: Id Column
        fields:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Fields
      type: object
      required:
        - account
        - role
        - user
        - password
        - host
        - database
        - schema
        - table_name
        - id_column
      title: SnowflakeSourceConnectorConfigInput
    TeradataSourceConnectorConfigInput:
      properties:
        host:
          type: string
          title: Host
        user:
          type: string
          title: User
        password:
          type: string
          title: Password
        database:
          anyOf:
            - type: string
            - type: 'null'
          title: Database
        dbs_port:
          type: integer
          minimum: 1
          title: Dbs Port
          default: 1025
        id_column:
          type: string
          title: Id Column
          default: id
        table_name:
          type: string
          title: Table Name
        batch_size:
          type: integer
          minimum: 1
          title: Batch Size
          default: 100
        fields:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Fields
      type: object
      required:
        - host
        - user
        - password
        - table_name
      title: TeradataSourceConnectorConfigInput
    JiraSourceConnectorConfigInput:
      properties:
        url:
          type: string
          title: Url
        username:
          type: string
          title: Username
        password:
          anyOf:
            - type: string
            - type: 'null'
          title: Password
        token:
          anyOf:
            - type: string
            - type: 'null'
          title: Token
        cloud:
          anyOf:
            - type: boolean
            - type: 'null'
          title: Cloud
          default: false
        projects:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Projects
        boards:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Boards
        issues:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Issues
        status_filters:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Status Filters
        download_attachments:
          anyOf:
            - type: boolean
            - type: 'null'
          title: Download Attachments
          default: false
      type: object
      required:
        - url
        - username
      title: JiraSourceConnectorConfigInput
    ZendeskSourceConnectorConfigInput:
      properties:
        subdomain:
          type: string
          title: Subdomain
        email:
          type: string
          title: Email
        api_token:
          type: string
          title: Api Token
        item_type:
          anyOf:
            - type: string
            - type: 'null'
          title: Item Type
          default: tickets
        batch_size:
          anyOf:
            - type: integer
            - type: 'null'
          title: Batch Size
          default: 2
      type: object
      required:
        - subdomain
        - email
        - api_token
      title: ZendeskSourceConnectorConfigInput
    SourceConnectorType:
      type: string
      enum:
        - azure
        - box
        - confluence
        - couchbase
        - databricks_volumes
        - dropbox
        - elasticsearch
        - gcs
        - google_drive
        - kafka-cloud
        - mongodb
        - onedrive
        - opensearch
        - outlook
        - postgres
        - s3
        - salesforce
        - sharepoint
        - slack
        - snowflake
        - teradata
        - jira
        - zendesk
      title: SourceConnectorType
    AzureSourceConnectorConfig:
      properties:
        remote_url:
          type: string
          title: Remote Url
        account_name:
          anyOf:
            - type: string
            - type: 'null'
          title: Account Name
        account_key:
          anyOf:
            - type: string
            - type: 'null'
          title: Account Key
        connection_string:
          anyOf:
            - type: string
            - type: 'null'
          title: Connection String
        sas_token:
          anyOf:
            - type: string
            - type: 'null'
          title: Sas Token
        recursive:
          type: boolean
          title: Recursive
      type: object
      required:
        - remote_url
        - recursive
      title: AzureSourceConnectorConfig
    BoxSourceConnectorConfig:
      properties:
        box_app_config:
          type: string
          title: Box App Config
        recursive:
          type: boolean
          title: Recursive
      type: object
      required:
        - box_app_config
        - recursive
      title: BoxSourceConnectorConfig
    ConfluenceSourceConnectorConfig:
      properties:
        url:
          type: string
          title: Url
        username:
          type: string
          title: Username
        password:
          anyOf:
            - type: string
            - type: 'null'
          title: Password
        api_token:
          anyOf:
            - type: string
            - type: 'null'
          title: Api Token
        token:
          anyOf:
            - type: string
            - type: 'null'
          title: Token
        cloud:
          type: boolean
          title: Cloud
        extract_images:
          type: boolean
          title: Extract Images
          default: false
        extract_files:
          type: boolean
          title: Extract Files
          default: false
        max_num_of_spaces:
          type: integer
          title: Max Num Of Spaces
        max_num_of_docs_from_each_space:
          type: integer
          title: Max Num Of Docs From Each Space
        spaces:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Spaces
      type: object
      required:
        - url
        - username
        - cloud
        - max_num_of_spaces
        - max_num_of_docs_from_each_space
        - spaces
      title: ConfluenceSourceConnectorConfig
    CouchbaseSourceConnectorConfig:
      properties:
        bucket:
          type: string
          title: Bucket
        connection_string:
          type: string
          title: Connection String
        scope:
          anyOf:
            - type: string
            - type: 'null'
          title: Scope
        collection:
          anyOf:
            - type: string
            - type: 'null'
          title: Collection
        batch_size:
          type: integer
          title: Batch Size
        username:
          type: string
          title: Username
        password:
          type: string
          title: Password
        collection_id:
          type: string
          title: Collection Id
      type: object
      required:
        - bucket
        - connection_string
        - batch_size
        - username
        - password
        - collection_id
      title: CouchbaseSourceConnectorConfig
    DatabricksVolumesConnectorConfig:
      properties:
        host:
          type: string
          title: Host
        catalog:
          type: string
          title: Catalog
        schema:
          type: string
          title: Schema
          default: default
        volume:
          type: string
          title: Volume
        volume_path:
          type: string
          title: Volume Path
        client_secret:
          type: string
          title: Client Secret
        client_id:
          type: string
          title: Client Id
      type: object
      required:
        - host
        - catalog
        - volume
        - volume_path
        - client_secret
        - client_id
      title: DatabricksVolumesConnectorConfig
    DropboxSourceConnectorConfig:
      properties:
        token:
          type: string
          title: Token
        remote_url:
          type: string
          title: Remote Url
        recursive:
          type: boolean
          title: Recursive
      type: object
      required:
        - token
        - remote_url
        - recursive
      title: DropboxSourceConnectorConfig
    ElasticsearchConnectorConfig:
      properties:
        hosts:
          items:
            type: string
          type: array
          title: Hosts
        index_name:
          type: string
          title: Index Name
        es_api_key:
          type: string
          title: Es Api Key
      type: object
      required:
        - hosts
        - index_name
        - es_api_key
      title: ElasticsearchConnectorConfig
    GCSSourceConnectorConfig:
      properties:
        remote_url:
          type: string
          title: Remote Url
        service_account_key:
          type: string
          title: Service Account Key
        recursive:
          type: boolean
          title: Recursive
      type: object
      required:
        - remote_url
        - service_account_key
        - recursive
      title: GCSSourceConnectorConfig
    GoogleDriveSourceConnectorConfig:
      properties:
        drive_id:
          type: string
          title: Drive Id
        service_account_key:
          anyOf:
            - $ref: '#/components/schemas/SecretReference'
            - type: string
          title: Service Account Key
        extensions:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Extensions
        recursive:
          type: boolean
          title: Recursive
      type: object
      required:
        - drive_id
        - service_account_key
        - recursive
      title: GoogleDriveSourceConnectorConfig
    KafkaCloudSourceConnectorConfig:
      properties:
        bootstrap_servers:
          type: string
          title: Bootstrap Servers
        port:
          type: integer
          title: Port
        group_id:
          anyOf:
            - type: string
            - type: 'null'
          title: Group Id
        topic:
          type: string
          title: Topic
        kafka_api_key:
          type: string
          title: Kafka Api Key
        secret:
          type: string
          title: Secret
        num_messages_to_consume:
          type: integer
          title: Num Messages To Consume
      type: object
      required:
        - bootstrap_servers
        - port
        - topic
        - kafka_api_key
        - secret
        - num_messages_to_consume
      title: KafkaCloudSourceConnectorConfig
    MongoDBConnectorConfig:
      properties:
        database:
          type: string
          title: Database
        collection:
          type: string
          title: Collection
        uri:
          type: string
          title: Uri
      type: object
      required:
        - database
        - collection
        - uri
      title: MongoDBConnectorConfig
    OneDriveSourceConnectorConfig:
      properties:
        client_id:
          type: string
          title: Client Id
        user_pname:
          type: string
          title: User Pname
        tenant:
          type: string
          title: Tenant
        authority_url:
          type: string
          title: Authority Url
        client_cred:
          type: string
          title: Client Cred
        recursive:
          type: boolean
          title: Recursive
        path:
          type: string
          title: Path
      type: object
      required:
        - client_id
        - user_pname
        - tenant
        - authority_url
        - client_cred
        - recursive
        - path
      title: OneDriveSourceConnectorConfig
    OpenSearchConnectorConfig:
      properties:
        hosts:
          items:
            type: string
          type: array
          minItems: 1
          title: Hosts
          description: List of OpenSearch hosts to connect to
        index_name:
          type: string
          title: Index Name
          description: Name of the OpenSearch index to read from or write to
        username:
          anyOf:
            - type: string
            - type: 'null'
          title: Username
          description: Username for basic authentication
        password:
          anyOf:
            - type: string
            - type: 'null'
          title: Password
          description: Password for basic authentication
        aws_access_key_id:
          anyOf:
            - type: string
            - type: 'null'
          title: Aws Access Key Id
          description: >-
            AWS access key ID for IAM authentication. When provided with
            aws_secret_access_key, IAM authentication is used instead of basic
            auth. Region and service type are auto-detected from the host URL.
        aws_secret_access_key:
          anyOf:
            - type: string
            - type: 'null'
          title: Aws Secret Access Key
          description: >-
            AWS secret access key for IAM authentication. Required when
            aws_access_key_id is provided.
        aws_session_token:
          anyOf:
            - type: string
            - type: 'null'
          title: Aws Session Token
          description: >-
            AWS session token for temporary credentials (optional). Only used
            when aws_access_key_id and aws_secret_access_key are provided.
        use_ssl:
          anyOf:
            - type: boolean
            - type: 'null'
          title: Use Ssl
          description: Whether to use SSL for the connection
          default: true
      type: object
      required:
        - hosts
        - index_name
      title: OpenSearchConnectorConfig
      description: >-
        OpenSearch connector configuration.


        OpenSearch is a fork of Elasticsearch with similar functionality.

        Supports two authentication methods:

        1. Basic auth: username + password

        2. AWS IAM auth: aws_access_key_id + aws_secret_access_key (+ optional
        aws_session_token)


        For AWS OpenSearch Service or OpenSearch Serverless, region and service
        type

        are auto-detected from the host URL.
    OutlookSourceConnectorConfig:
      properties:
        authority_url:
          anyOf:
            - type: string
            - type: 'null'
          title: Authority Url
        tenant:
          anyOf:
            - type: string
            - type: 'null'
          title: Tenant
        client_id:
          type: string
          title: Client Id
        client_cred:
          type: string
          title: Client Cred
        outlook_folders:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Outlook Folders
        recursive:
          type: boolean
          title: Recursive
        user_email:
          type: string
          title: User Email
      type: object
      required:
        - client_id
        - client_cred
        - recursive
        - user_email
      title: OutlookSourceConnectorConfig
    PostgresSourceConnectorConfig:
      properties:
        host:
          type: string
          title: Host
        database:
          type: string
          title: Database
        port:
          type: integer
          title: Port
        username:
          type: string
          title: Username
        password:
          type: string
          title: Password
        table_name:
          type: string
          title: Table Name
        batch_size:
          type: integer
          title: Batch Size
        id_column:
          type: string
          title: Id Column
        fields:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Fields
      type: object
      required:
        - host
        - database
        - port
        - username
        - password
        - table_name
        - batch_size
        - id_column
      title: PostgresSourceConnectorConfig
    S3SourceConnectorConfig:
      properties:
        remote_url:
          type: string
          title: Remote Url
        anonymous:
          type: boolean
          title: Anonymous
        key:
          anyOf:
            - type: string
            - type: 'null'
          title: Key
        secret:
          anyOf:
            - type: string
            - type: 'null'
          title: Secret
        token:
          anyOf:
            - type: string
            - type: 'null'
          title: Token
        endpoint_url:
          anyOf:
            - type: string
            - type: 'null'
          title: Endpoint Url
        recursive:
          type: boolean
          title: Recursive
      type: object
      required:
        - remote_url
        - anonymous
        - recursive
      title: S3SourceConnectorConfig
    SalesforceSourceConnectorConfig:
      properties:
        username:
          type: string
          title: Username
        consumer_key:
          type: string
          title: Consumer Key
        private_key:
          type: string
          title: Private Key
        categories:
          items:
            type: string
          type: array
          title: Categories
      type: object
      required:
        - username
        - consumer_key
        - private_key
        - categories
      title: SalesforceSourceConnectorConfig
    SharePointSourceConnectorConfig:
      properties:
        site:
          type: string
          title: Site
        tenant:
          type: string
          title: Tenant
        authority_url:
          type: string
          title: Authority Url
          default: https://login.microsoftonline.com
        user_pname:
          type: string
          title: User Pname
        client_id:
          type: string
          title: Client Id
        client_cred:
          type: string
          title: Client Cred
        recursive:
          type: boolean
          title: Recursive
        path:
          anyOf:
            - type: string
            - type: 'null'
          title: Path
      type: object
      required:
        - site
        - tenant
        - user_pname
        - client_id
        - client_cred
        - recursive
      title: SharePointSourceConnectorConfig
    SnowflakeSourceConnectorConfig:
      properties:
        account:
          type: string
          title: Account
        role:
          type: string
          title: Role
        user:
          type: string
          title: User
        password:
          type: string
          title: Password
        host:
          type: string
          title: Host
        port:
          type: integer
          minimum: 1
          title: Port
          default: 443
        database:
          type: string
          title: Database
        schema:
          type: string
          title: Schema
        table_name:
          type: string
          title: Table Name
        batch_size:
          type: integer
          minimum: 1
          title: Batch Size
          default: 100
        id_column:
          type: string
          title: Id Column
        fields:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Fields
      type: object
      required:
        - account
        - role
        - user
        - password
        - host
        - database
        - schema
        - table_name
        - id_column
      title: SnowflakeSourceConnectorConfig
    TeradataSourceConnectorConfig:
      properties:
        host:
          type: string
          title: Host
        user:
          type: string
          title: User
        password:
          type: string
          title: Password
        database:
          anyOf:
            - type: string
            - type: 'null'
          title: Database
        dbs_port:
          type: integer
          minimum: 1
          title: Dbs Port
          default: 1025
        id_column:
          type: string
          title: Id Column
          default: id
        table_name:
          type: string
          title: Table Name
        batch_size:
          type: integer
          minimum: 1
          title: Batch Size
          default: 100
        fields:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Fields
      type: object
      required:
        - host
        - user
        - password
        - table_name
      title: TeradataSourceConnectorConfig
    JiraSourceConnectorConfig:
      properties:
        url:
          type: string
          title: Url
        username:
          type: string
          title: Username
        password:
          anyOf:
            - type: string
            - type: 'null'
          title: Password
        token:
          anyOf:
            - type: string
            - type: 'null'
          title: Token
        cloud:
          anyOf:
            - type: boolean
            - type: 'null'
          title: Cloud
          default: false
        projects:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Projects
        boards:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Boards
        issues:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Issues
        status_filters:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Status Filters
        download_attachments:
          anyOf:
            - type: boolean
            - type: 'null'
          title: Download Attachments
          default: false
      type: object
      required:
        - url
        - username
      title: JiraSourceConnectorConfig
    ZendeskSourceConnectorConfig:
      properties:
        subdomain:
          type: string
          title: Subdomain
        email:
          type: string
          title: Email
        api_token:
          type: string
          title: Api Token
        item_type:
          anyOf:
            - type: string
            - type: 'null'
          title: Item Type
          default: tickets
        batch_size:
          anyOf:
            - type: integer
            - type: 'null'
          title: Batch Size
          default: 2
      type: object
      required:
        - subdomain
        - email
        - api_token
      title: ZendeskSourceConnectorConfig
    ValidationError:
      properties:
        loc:
          items:
            anyOf:
              - type: string
              - type: integer
          type: array
          title: Location
        msg:
          type: string
          title: Message
        type:
          type: string
          title: Error Type
      type: object
      required:
        - loc
        - msg
        - type
      title: ValidationError
    SecretReference:
      properties:
        id:
          type: string
          title: Id
        type:
          $ref: '#/components/schemas/EncryptionType'
          default: rsa
      type: object
      required:
        - id
      title: SecretReference
    EncryptionType:
      type: string
      enum:
        - rsa
        - rsa_aes
      title: EncryptionType

````

---

# Source: https://docs.unstructured.io/api-reference/workflows/update-workflow.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Update Workflow

> Update an existing workflow's name, connectors, schedule, or workflow type.


## OpenAPI

````yaml https://platform.unstructuredapp.io/openapi.json put /api/v1/workflows/{workflow_id}
openapi: 3.1.0
info:
  title: Platform API
  version: 3.1.0
servers:
  - url: https://platform.unstructuredapp.io/
    description: Unstructured Platform API
    x-speakeasy-server-id: platform-api
security: []
paths:
  /api/v1/workflows/{workflow_id}:
    put:
      tags:
        - workflows
      summary: Update Workflow
      description: >-
        Update an existing workflow's name, connectors, schedule, or workflow
        type.
      operationId: update_workflow
      parameters:
        - name: workflow_id
          in: path
          required: true
          schema:
            type: string
            format: uuid
            title: Workflow Id
        - name: unstructured-api-key
          in: header
          required: false
          schema:
            anyOf:
              - type: string
              - type: 'null'
            title: Unstructured-Api-Key
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/UpdateWorkflow'
      responses:
        '200':
          description: Successful Response
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/WorkflowInformation'
        '422':
          description: Validation Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/HTTPValidationError'
components:
  schemas:
    UpdateWorkflow:
      properties:
        name:
          anyOf:
            - type: string
            - type: 'null'
          title: Name
        source_id:
          anyOf:
            - type: string
              format: uuid
            - type: 'null'
          title: Source Id
        destination_id:
          anyOf:
            - type: string
              format: uuid
            - type: 'null'
          title: Destination Id
        workflow_type:
          anyOf:
            - $ref: '#/components/schemas/WorkflowType'
            - type: 'null'
        workflow_nodes:
          anyOf:
            - items:
                $ref: '#/components/schemas/WorkflowNode'
              type: array
            - type: 'null'
          title: Workflow Nodes
        template_id:
          anyOf:
            - type: string
            - type: 'null'
          title: Template Id
        schedule:
          anyOf:
            - type: string
              enum:
                - every 15 minutes
                - every hour
                - every 2 hours
                - every 4 hours
                - every 6 hours
                - every 8 hours
                - every 10 hours
                - every 12 hours
                - daily
                - weekly
                - monthly
            - type: 'null'
          title: Schedule
        reprocess_all:
          anyOf:
            - type: boolean
            - type: 'null'
          title: Reprocess All
      type: object
      title: UpdateWorkflow
    WorkflowInformation:
      properties:
        id:
          type: string
          format: uuid
          title: Id
        name:
          type: string
          title: Name
        sources:
          items:
            type: string
            format: uuid
          type: array
          title: Sources
        destinations:
          items:
            type: string
            format: uuid
          type: array
          title: Destinations
        workflow_type:
          anyOf:
            - $ref: '#/components/schemas/WorkflowType'
            - type: 'null'
        workflow_nodes:
          items:
            $ref: '#/components/schemas/WorkflowNode'
          type: array
          title: Workflow Nodes
        schedule:
          anyOf:
            - $ref: '#/components/schemas/WorkflowSchedule'
            - type: 'null'
        status:
          $ref: '#/components/schemas/WorkflowState'
        created_at:
          type: string
          format: date-time
          title: Created At
        updated_at:
          anyOf:
            - type: string
              format: date-time
            - type: 'null'
          title: Updated At
        reprocess_all:
          type: boolean
          title: Reprocess All
          default: false
      type: object
      required:
        - id
        - name
        - sources
        - destinations
        - workflow_nodes
        - status
        - created_at
      title: WorkflowInformation
    HTTPValidationError:
      properties:
        detail:
          items:
            $ref: '#/components/schemas/ValidationError'
          type: array
          title: Detail
      type: object
      title: HTTPValidationError
    WorkflowType:
      type: string
      enum:
        - basic
        - advanced
        - platinum
        - custom
      title: WorkflowType
    WorkflowNode:
      properties:
        id:
          anyOf:
            - type: string
              format: uuid
            - type: 'null'
          title: Id
        name:
          type: string
          title: Name
        type:
          type: string
          title: Type
        subtype:
          type: string
          title: Subtype
        settings:
          anyOf:
            - additionalProperties: true
              type: object
            - type: 'null'
          title: Settings
      type: object
      required:
        - name
        - type
        - subtype
      title: WorkflowNode
    WorkflowSchedule:
      properties:
        crontab_entries:
          items:
            $ref: '#/components/schemas/CronTabEntry'
          type: array
          title: Crontab Entries
      type: object
      required:
        - crontab_entries
      title: WorkflowSchedule
      examples:
        - crontab_entries:
            - cron_expression: 0 0 * * *
    WorkflowState:
      type: string
      enum:
        - active
        - inactive
      title: WorkflowState
    ValidationError:
      properties:
        loc:
          items:
            anyOf:
              - type: string
              - type: integer
          type: array
          title: Location
        msg:
          type: string
          title: Message
        type:
          type: string
          title: Error Type
      type: object
      required:
        - loc
        - msg
        - type
      title: ValidationError
    CronTabEntry:
      properties:
        cron_expression:
          type: string
          title: Cron Expression
      type: object
      required:
        - cron_expression
      title: CronTabEntry

````

---

# Source: https://docs.unstructured.io/ui/account/usage.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Account usage

## View usage details for an Unstructured account

To view usage details for an Unstructured **Let's Go** or **Pay-As-You-Go** account, do the following:

1. If you are not already signed in, sign in to your Unstructured account.
2. On the sidebar, click your user icon, and then click **Account Settings**.
3. In the top navigation bar, click the **Usage History** tab. Your account's usage details are displayed.

To view usage details for an Unstructured **Business** account, do the following:

1. If you are not already signed in, sign in to Unstructured.
2. In the top navigation bar, in the list of available accounts, select the name of the [organizational account](/ui/account/organizations) that you want to view usage details for.
3. In the top navigation bar, click the **Usage History** tab. The account's usage details are displayed.

## View usage details for a workspace within an organizational account

<Note>
  The following information applies only to Unstructured **Business** accounts.

  Unstructured **Let's Go** and **Pay-As-You-Go** accounts do not support multiple workspaces. See the
  [previous section](#view-usage-details-for-an-unstructured-account) to view all usage details for
  the single personal workspace within a **Let's Go** or **Pay-As-You-Go** account.
</Note>

To view usage details for a specific [workspace](/ui/account/workspaces) within an [organizational account](/ui/account/organizations), do the following:

1. If you are not already signed in, sign in to your Unstructured account.

2. In the top navigation bar, in the list of available accounts, select the name of the organizational account that contains the workspace you want to view usage details for.

3. In the top navigation bar, in the list of available workspaces for the selected account, select the name of the workspace you want to view usage details for.

   <Note>
     If a list of available workspaces is not visible in the top navigation bar, click the gear (**Settings**) icon on the sidebar, click **View All Workspaces**, and then click the arrow next to the workspace you want to view usage details for.
   </Note>

4. In the upper navigation bar, click the **Usage** tab.

## How is usage calculated for billing purposes?

Unstructured calculates a page as follows:

* For these file types, a page is a page, slide, or image: `.pdf`, `.pptx`, and `.tiff`.
* For `.docx` files that have page metadata, Unstructured calculates the number of pages based on that metadata.
* For all other file types, Unstructured calculates the number of pages as the file's size divided by 100 KB.
* For non-file data, Unstructured calculates a page as 100 KB of incoming data to be processed.

The **Billing** tab shows your account's current billing rate per page.

To complete billing-related tasks in your account, see [Account billing](/ui/account/billing).

---

# Source: https://docs.unstructured.io/open-source/ingestion/destination-connectors/vectara.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Vectara

<NewDocument />

Batch process all your records to store structured outputs in Vectara.

The requirements are as follows.

* A [Vectara account](https://console.vectara.com/signup).
* The [customer ID](https://docs.vectara.com/docs/console-ui/vectara-console-overview#view-the-customer-id) for the account.
* The name and key for the target [corpus](https://docs.vectara.com/docs/console-ui/creating-a-corpus) in the account.
* The [OAuth authentication URL, client ID, and client secret](https://docs.vectara.com/docs/console-ui/app-clients) for accessing the target corpus.

The Vectara connector dependencies.

```bash  theme={null}
pip install "unstructured-ingest[vectara]"
```

You might also need to install additional dependencies, depending on your needs. [Learn more](/open-source/ingestion/ingest-dependencies).

The following environment variables:

* `VECTARA_CUSTOMER_ID` - The customer ID for the target Vectara account, represented by `--customer-id` (CLI) or `customer_id` (Python).
* `VECTARA_CORPUS_NAME` - The name of the target corpus in the account, represented by `--corpus-name` (CLI) or `corpus_name` (Python).
* `VECTARA_CORPUS_KEY` - The name of the corpus's key, represented by `--corpus-key` (CLI) or `corpus_key` (Python).
* `VECTARA_OAUTH_TOKEN_URL` - The OAuth token URL for getting and refreshing OAuth access tokens in the account, represented by `--token-url` (CLI) or `token_url` (Python).
* `VECTARA_OAUTH_CLIENT_ID` - A valid OAuth client ID in the account, represented by `--oauth-client-id` (CLI) or `oauth_client_id` (Python).
* `VECTARA_OAUTH_CLIENT_SECRET` - The OAuth client secret for the client ID, represented by `--oauth-client-secret` (CLI) or `oauth_client_secret` (Python).

Now call the Unstructured CLI or Python SDK. The source connector can be any of the ones supported.

This example uses the local source connector.

This example sends files to Unstructured for processing by default. To process files locally instead, see the instructions at the end of this page.

<CodeGroup>
  ```bash CLI theme={null}
  #!/usr/bin/env bash

  # Chunking and embedding is optional.

  unstructured-ingest \
    local \
      --input-path $LOCAL_FILE_INPUT_DIR \
      --chunking-strategy by_title \
      --embedding-provider huggingface \
      --partition-by-api \
      --api-key $UNSTRUCTURED_API_KEY \
      --partition-endpoint $UNSTRUCTURED_API_URL \
      --additional-partition-args="{\"split_pdf_page\":\"true\", \"split_pdf_allow_failed\":\"true\", \"split_pdf_concurrency_level\": 15}" \
    vectara \
      --customer-id $VECTARA_CUSTOMER_ID \
      --corpus-name $VECTARA_CORPUS_NAME \
      --corpus-key $VECTARA_CORPUS_KEY \
      --oauth-client-id $VECTARA_OAUTH_CLIENT_ID \
      --oauth-secret $VECTARA_OAUTH_CLIENT_SECRET \
      --token-url $VECTARA_OAUTH_TOKEN_URL
  ```

  ```python Python Ingest theme={null}
  import os

  from unstructured_ingest.pipeline.pipeline import Pipeline
  from unstructured_ingest.interfaces import ProcessorConfig

  from unstructured_ingest.processes.connectors.vectara import (
      VectaraAccessConfig,
      VectaraConnectionConfig,
      VectaraUploadStagerConfig,
      VectaraUploaderConfig
  )
  from unstructured_ingest.processes.connectors.local import (
      LocalIndexerConfig,
      LocalConnectionConfig,
      LocalDownloaderConfig
  )
  from unstructured_ingest.processes.partitioner import PartitionerConfig
  from unstructured_ingest.processes.chunker import ChunkerConfig
  from unstructured_ingest.processes.embedder import EmbedderConfig

  # Chunking and embedding is optional.

  if __name__ == "__main__":
      Pipeline.from_configs(
          context=ProcessorConfig(),
          indexer_config=LocalIndexerConfig(input_path=os.getenv("LOCAL_FILE_INPUT_DIR")),
          downloader_config=LocalDownloaderConfig(),
          source_connection_config=LocalConnectionConfig(),
          partitioner_config=PartitionerConfig(
              partition_by_api=True,
              api_key=os.getenv("UNSTRUCTURED_API_KEY"),
              partition_endpoint=os.getenv("UNSTRUCTURED_API_URL"),
              additional_partition_args={
                  "split_pdf_page": True,
                  "split_pdf_allow_failed": True,
                  "split_pdf_concurrency_level": 15
              }
          ),
          chunker_config=ChunkerConfig(chunking_strategy="by_title"),
          embedder_config=EmbedderConfig(embedding_provider="huggingface"),
          destination_connection_config=VectaraConnectionConfig(
              access_config=VectaraAccessConfig(
                  oauth_client_id=os.getenv("VECTARA_OAUTH_CLIENT_ID"),
                  oauth_secret=os.getenv("VECTARA_OAUTH_CLIENT_SECRET")
              ),
              customer_id=os.getenv("VECTARA_CUSTOMER_ID"),
              corpus_name=os.getenv("VECTARA_CORPUS_NAME"),
              corpus_key=os.getenv("VECTARA_CORPUS_KEY"),
              token_url=os.getenv("VECTARA_OAUTH_TOKEN_URL")
          ),
          stager_config=VectaraUploadStagerConfig(),
          uploader_config=VectaraUploaderConfig()
      ).run()
  ```
</CodeGroup>

For the Unstructured Ingest CLI and the Unstructured Ingest Python library, you can use the `--partition-by-api` option (CLI) or `partition_by_api` (Python) parameter to specify where files are processed:

* To do local file processing, omit `--partition-by-api` (CLI) or `partition_by_api` (Python), or explicitly specify `partition_by_api=False` (Python).

  Local file processing does not use an Unstructured API key or API URL, so you can also omit the following, if they appear:

  * `--api-key $UNSTRUCTURED_API_KEY` (CLI) or `api_key=os.getenv("UNSTRUCTURED_API_KEY")` (Python)
  * `--partition-endpoint $UNSTRUCTURED_API_URL` (CLI) or `partition_endpoint=os.getenv("UNSTRUCTURED_API_URL")` (Python)
  * The environment variables `UNSTRUCTURED_API_KEY` and `UNSTRUCTURED_API_URL`

* To send files to the legacy [Unstructured Partition Endpoint](/api-reference/legacy-api/partition/overview) for processing, specify `--partition-by-api` (CLI) or `partition_by_api=True` (Python).

  Unstructured also requires an Unstructured API key and API URL, by adding the following:

  * `--api-key $UNSTRUCTURED_API_KEY` (CLI) or `api_key=os.getenv("UNSTRUCTURED_API_KEY")` (Python)
  * `--partition-endpoint $UNSTRUCTURED_API_URL` (CLI) or `partition_endpoint=os.getenv("UNSTRUCTURED_API_URL")` (Python)
  * The environment variables `UNSTRUCTURED_API_KEY` and `UNSTRUCTURED_API_URL`, representing your API key and API URL, respectively.

  <Note>
    You must specify the API URL only if you are not using the default API URL for Unstructured Ingest, which applies to **Let's Go**, **Pay-As-You-Go**, and **Business SaaS** accounts.

    The default API URL for Unstructured Ingest is `https://api.unstructuredapp.io/general/v0/general`, which is the API URL for the legacy[Unstructured Partition Endpoint](/api-reference/legacy-api/partition/overview). However, you should always use the URL that was provided to you when your Unstructured account was created. If you do not have this URL, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

    If you do not have an API key, [get one now](/api-reference/legacy-api/partition/overview).

    If you are using a **Business** account, the process
    for generating Unstructured API keys, and the Unstructured API URL that you use, are different.
    For instructions, see your Unstructured account administrator, or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).
  </Note>

---

# Source: https://docs.unstructured.io/examplecode/codesamples/oss/vector-database.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Vector database ingestion

<Note> This sample code utilizes the [Unstructured Open Source](/open-source/introduction/overview "Open Source") Library. </Note>

In this guide, we demonstrate how to leverage Unstructured.IO, ChromaDB, and LangChain to summarize topics from the front page of CNN Lite. Utilizing the modern LLM stack, including Unstructured, Chroma, and LangChain, this workflow is streamlined to less than two dozen lines of code.

## Gather Links with Unstructured

First, we gather links from the CNN Lite homepage using the partition\_html function from Unstructured. When Unstructured partitions HTML pages, links are included in the metadata for each element, making link collection straightforward.

```python  theme={null}
from unstructured.partition.html import partition_html

cnn_lite_url = "https://lite.cnn.com/"
elements = partition_html(url=cnn_lite_url)
links = []

for element in elements:
    if element.metadata.link_urls:
        relative_link = element.metadata.link_urls[0][1:]
        if relative_link.startswith("2024"):
            links.append(f"{cnn_lite_url}{relative_link}")

```

## Ingest Individual Articles with UnstructuredURLLoader

With the links in hand, we preprocess individual news articles using UnstructuredURLLoader. This loader fetches content from the web and then uses the unstructured partition function to extract content and metadata. Here we preprocess HTML files, but it also works with other response types like application/pdf. The result is a list of LangChain Document objects.

```python  theme={null}
from langchain.document_loaders import UnstructuredURLLoader

loaders = UnstructuredURLLoader(urls=links, show_progress_bar=True)
docs = loaders.load()

```

## Load Documents into ChromaDB

The next step is to load the preprocessed documents into ChromaDB. This process involves vectorizing the documents using OpenAI embeddings and loading them into Chroma’s vector store. Once in Chroma, similarity search can be performed to retrieve documents related to specific topics.

```python  theme={null}
from langchain.vectorstores.chroma import Chroma
from langchain.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(docs, embeddings)
query_docs = vectorstore.similarity_search("Update on the coup in Niger.", k=1)

```

## Summarize the Documents

After retrieving relevant documents from Chroma, we summarize them using LangChain. The load\_summarization\_chain function allows for easy summarization, simply requiring the selection of an LLM and summarization chain.

```python  theme={null}
from langchain.chat_models import ChatOpenAI
from langchain.chains.summarize import load_summarize_chain

llm = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo-16k")
chain = load_summarize_chain(llm, chain_type="stuff")
chain.run(query_docs)

```

## Jupyter Notebook

To delve deeper into this example, you can access the full Jupyter Notebook here: [News of the Day Notebook](https://github.com/Unstructured-IO/unstructured/blob/main/examples/chroma-news-of-the-day/news-of-the-day.ipynb)

---

# Source: https://docs.unstructured.io/examplecode/tools/vectorshift.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# VectorShift

[VectorShift](https://vectorshift.ai/) is an integrated framework of no-code, low-code, and out of the box generative AI solutions
to build AI search engines, assistants, chatbots, and automations.

VectorShift's platform allows you to design, prototype, build, deploy,
and manage generative AI workflows and automation across two interfaces: no-code and code SDK.
This hands-on demonstration uses the no-code interface to walk you through creating a VectorShift pipeline project. This project
enables you to use GPT-4o-mini to chat in real time with a PDF document that is processed by Unstructured and has its processed data stored in a
[Pinecone](https://www.pinecone.io/) vector database.

## Prerequisites

<iframe width="560" height="315" src="https://www.youtube.com/embed/Ydj74uBnJyA" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* A Pinecone account. [Get an account](https://app.pinecone.io/).

* A Pinecone API key in your Pinecone account. To create an API key, do the following:

  1. [Sign in to your Pinecone account and open the API keys page](https://app.pinecone.io/organizations/-/projects/-/keys).
  2. Click **+ API key**.
  3. For **API key name**, enter some descriptive name for the API key.
  4. Click **Create key**.
  5. Copy the generated API key to some secure location. You will not be able to access this API key again after you close the dialog.

* A Pinecone serverless index in your Pinecone account.

  Creating a serverless index on your own is optional.
  An index is not required to exist in advance.

  When you set up the connector, at runtime, the index behavior is as follows:

  For the [Unstructured UI](/ui/overview) and [Unstructured API](/api-reference/overview):

  * Your workflow must contain an embedder node, and the embedder node must specify the embedding model that Unstructured will use to generate the embeedings.
  * If an existing index name is specified,
    and the number of dimensions that Unstructured generates does not match the number of dimensions that are specified in the existing index's embedding settings, the run will fail.
    You must change the number of dimensions in your workflow's embedder node or your existing index's embedding settings to match, and try the run again.
  * If an index name is not specified, Unstructured creates a new index in your Pinecone account. The
    new index's name will be `u<short-workflow-id>-<short-embedding-model-name>-<number-of-dimensions>`.

  For [Unstructured Ingest](/open-source/ingestion/overview):

  * If an existing index name is specified, and Unstructured generates embeddings,
    but the number of dimensions that are generated does not match the existing index's embedding settings, the run will fail.
    You must change your Unstructured embedding settings or your existing index's embedding settings to match, and try the run again.
  * If an index name is not specified, Unstructured creates a new index in your Pinecone account. The new index's name will be `unstructuredautocreated`.

  <Note>
    If you create a new index or use an existing one, Unstructured recommends that all records in the target index have a field
    named `record_id` with a string data type.
    Unstructured can use this field to do intelligent document overwrites. Without this field, duplicate documents
    might be written to the index or, in some cases, the operation could fail altogether.
  </Note>

  To create a serverless index on your own, do the following:

  1. [Sign in to your Pinecone account and open the Create a new index page](https://app.pinecone.io/organizations/-/projects/-/create-index/serverless).

  2. For **Enter index name**, enter some descriptive name for the index.

  3. For **Configuration**, select the check box labelled **Custom settings**, or click the tile labelled **Manual configuration**.

     <Warning>
       Do not click any of the other tiles, such as **text-embedding-3-large**. Clicking any of these other tiles will cause Pinecone to generate embeddings instead of
       having Unstructured generate them. If Pinecone generates embeddings instead of Unstructured, this could cause any related Unstructured workflows to fail.
     </Warning>

  4. For **Vector type**, select **Dense**.

  5. For **Dimension**, enter the number of dimensions for the embeddings that Unstructured will generate.

     <Warning>
       The number of dimensions that you enter here must match the number of dimensions for the embedding model that you use in any related Unstructured workflows or ingestion pipelines. If these numbers do not
       match in both places, this could cause any related Unstructured workflows or ingestion pipelines to fail.
     </Warning>

  6. For **Metric**, select **cosine**.

  7. Leave **Capacity mode** set to **Serverless**.

  8. You can leave **Cloud provider** and **Region** set to their default values, or you can select a cloud provider and region that is closest to you, if available.

  9. Click **Create index**.

* Within a Pinecone serverless index, custom [namespaces](https://docs.pinecone.io/guides/index-data/indexing-overview#namespaces) are supported but are not required.

Also:

* [Sign up for an OpenAI account](https://platform.openai.com/signup), and [get your OpenAI API key](https://help.openai.com/en/articles/4936850-where-do-i-find-my-openai-api-key).
* [Sign up for a VectorShift Starter account](https://app.vectorshift.ai/api/signup).
* If you do not already have an Unstructured account, sign up for one:

  1. If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
     After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).

     <Note>
       To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/api-reference/overview#pricing).
     </Note>

  2. If you have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at [https://platform.unstructured.io](https://platform.unstructured.io).

     <Note>
       For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
       or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).
     </Note>

  3. Get your Unstructured API key:<br />

     a. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business** account, click **API Keys** on the sidebar.<br />

     <Note>
       For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
       for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).
     </Note>

     b. Click **Generate API Key**.<br />
     c. Follow the on-screen instructions to finish generating the key.<br />
     d. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

## Create and run the demonstration project

<Steps>
  <Step title="Get source data into Pinecone">
    Although you can use any [supported file type](/ui/supported-file-types) or data in any
    [supported source type](/ui/sources/overview) for the input into Pinecone, this demonstration uses [the text of the United States Constitution in PDF format](https://constitutioncenter.org/media/files/constitution.pdf).

    1. If you are not already signed in, sign in to your Unstructured account.
    2. [Create a source connector](/ui/sources/overview), if you do not already have one, to connect Unstructured to the source location where the PDF file is stored.
    3. [Create a Pinecone destination connector](/ui/destinations/pinecone), if you do not already have one, to connect Unstructured to your Pinecone serverless index.
    4. [Create a workflow](/ui/workflows#create-a-workflow) that references this source connector and destination connector.
    5. [Run the workflow](/ui/workflows#edit-delete-or-run-a-workflow).
  </Step>

  <Step title="Create the VectorShift project">
    1. Sign in to your VectorShift account dashboard.
    2. On the sidebar, click **Pipelines**.
    3. Click **New**.
    4. Click **Create Pipeline from Scratch**.

        <img src="https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/CreateProject.png?fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=fb659e9b23b278951c243a75296bd866" alt="Create the VectorShift project" data-og-width="946" width="946" data-og-height="354" height="354" data-path="img/vectorshift/CreateProject.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/CreateProject.png?w=280&fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=f5de588b3b993ab1db3bca88c5e9da7c 280w, https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/CreateProject.png?w=560&fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=f62b046985d3672159cf78077559e4a5 560w, https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/CreateProject.png?w=840&fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=a847307e776933e2b955addd88b18a03 840w, https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/CreateProject.png?w=1100&fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=07bb89bb6e19f29ac168081fdb5a2fd5 1100w, https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/CreateProject.png?w=1650&fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=adbaa9add891ad15b865c62a35f09a3e 1650w, https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/CreateProject.png?w=2500&fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=aca75d812664a08def5d652fc691170b 2500w" />
  </Step>

  <Step title="Add the Input node">
    In this step, you add a node to the pipeline. This node takes user-supplied chat messages and sends them as input to Pinecone, and as input to a text-based LLM, for contextual searching.

    In the top pipeline node chooser bar, on the **General** tab, click **Input**.

        <img src="https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/InputComponent.png?fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=517d6688a1461c1d6bf4c233f5b0d1e1" alt="Adding the Input node" data-og-width="479" width="479" data-og-height="400" height="400" data-path="img/vectorshift/InputComponent.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/InputComponent.png?w=280&fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=27f536e5365baad3cf67c77e6fcc9562 280w, https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/InputComponent.png?w=560&fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=efd0f6533f2f9fed2c7d9f2b723dd065 560w, https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/InputComponent.png?w=840&fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=cb2a32bafac75da802328cc762b5df4a 840w, https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/InputComponent.png?w=1100&fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=73b259761ebb388e9d3cdb6d671254bc 1100w, https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/InputComponent.png?w=1650&fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=660b9c112f9aee882de268ad6cc84732 1650w, https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/InputComponent.png?w=2500&fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=0aee6e829a86ecbf30e7e0aab0414d48 2500w" />
  </Step>

  <Step title="Add the Pinecone node">
    In this step, you add a node that connects to the Pinecone serverless index.

    1. In the top pipeline node chooser bar, on the **Integrations** tab, click **Pinecone**.
    2. In the **Pinecone** node, for **Embedding Model**, select **openai/text-embedding-3-large**.
    3. Click **Connected Account**.
    4. In the **Select Pinecone Account** dialog, click **Connect New**.
    5. Enter the **API Key** and **Region** for your Pinecone serverless index, and then click **Save**.
    6. For **Index**, selet the name of your Pinecone serverless index.
    7. Connect the **input\_1** output from the **Input** node to the **query** input in the **Pinecone** node.

       To make the connection, click and hold your mouse pointer inside of the circle next to **input\_1** in the **Input** node.
       While holding your mouse pointer, drag it over into the circle next to **query** in the **Pinecone** node. Then
       release your mouse pointer. A line appears between these two circles.

        <img src="https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/PineconeComponent.png?fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=29e12998812c98ad681925e298c68ea5" alt="Adding the Pinecone node" data-og-width="835" width="835" data-og-height="753" height="753" data-path="img/vectorshift/PineconeComponent.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/PineconeComponent.png?w=280&fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=73fdf7e1a79fffb37e12d0c81dd5014c 280w, https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/PineconeComponent.png?w=560&fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=1b49b2748f33e6bf499be7180a8aeca5 560w, https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/PineconeComponent.png?w=840&fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=920d604ae366f44876492ca408df1154 840w, https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/PineconeComponent.png?w=1100&fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=e44889a39d92350fc8ba3024c32282da 1100w, https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/PineconeComponent.png?w=1650&fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=5f545bb936ffbec8ca3719896e7196ff 1650w, https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/PineconeComponent.png?w=2500&fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=0d7bc51ac27b4e49c844f18011d89057 2500w" />
  </Step>

  <Step title="Add the OpenAI LLM node">
    In this step, you add a node that builds a prompt and then sends it to a text-based LLM.

    1. In the top pipeline node chooser bar, on the **LLMs** tab, click **OpenAI**.

    2. In the **OpenAI LLM** node, for **System**, enter the following text:

       ```
       Answer the Question based on Context. Use Memory when relevant.
       ```

       <Tip>
         To answer the question, the preceding prompt uses the context along with general information that the text-based LLM is
         trained on. To use only the context to answer the question, you can change the prompt, for example to something like this:

         ```text  theme={null}
         Answer the Question based only on the Context. Do not use any other sources of
         information. If the context does not provide enough information to answer the 
         question, reply with 'I do not have enough context to answer the question.' 
         Use Memory when relevant.
         ```
       </Tip>

    3. For **Prompt**, enter the following text:

       ```
       Question: {{Question}}
       Context: {{Context}}
       Memory: {{Memory}}
       ```

    4. For **Model**, select **gpt-4o-mini**.

    5. Check the box titled **Use Personal API Key**.

    6. For **API Key**, enter your OpenAI API key.

    7. Connect the **input\_1** output from the **Input** node to the **Question** input in the **OpenAI LLM** node.

    8. Connect the **output** output from the **Pinecone** node to the **Context** input in the **OpenAI LLM** node.

        <img src="https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/OpenAILLMComponent.png?fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=e4eabf6917590c36dd1893e30523cca5" alt="Adding the OpenAI LLM node" data-og-width="1241" width="1241" data-og-height="704" height="704" data-path="img/vectorshift/OpenAILLMComponent.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/OpenAILLMComponent.png?w=280&fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=ed6ad8e59b2b29c534478aa5798a7a51 280w, https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/OpenAILLMComponent.png?w=560&fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=2a7384c6d59e6460cce38ae96b2cd85a 560w, https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/OpenAILLMComponent.png?w=840&fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=7592c05c094df06c17599c4bef0259b6 840w, https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/OpenAILLMComponent.png?w=1100&fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=4bb0d061c57655fac478f761b3bf2ac7 1100w, https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/OpenAILLMComponent.png?w=1650&fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=6a207d360d72060640da04d3a146a6c4 1650w, https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/OpenAILLMComponent.png?w=2500&fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=d7c91b24271647926e3cc44d208176e4 2500w" />
  </Step>

  <Step title="Add the Chat Memory node">
    In this step, you add a node that adds chat memory to the session.

    1. In the top pipeline node chooser bar, on the **Chat** tab, click **Chat Memory**.
    2. Connect the output from the **Chat Memory** node to the **Memory** input in the **OpenAI LLM** node.

        <img src="https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/ChatMemoryComponent.png?fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=6dc89762089c5a015a38d2c41e344539" alt="Adding the Chat Memory node" data-og-width="1238" width="1238" data-og-height="883" height="883" data-path="img/vectorshift/ChatMemoryComponent.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/ChatMemoryComponent.png?w=280&fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=75cfa425cf7e88800cc19e0408dc0e43 280w, https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/ChatMemoryComponent.png?w=560&fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=8c01ce60da10c9e067631896be37522d 560w, https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/ChatMemoryComponent.png?w=840&fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=ed2ff8b7204f8c435eb53df69d8b350f 840w, https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/ChatMemoryComponent.png?w=1100&fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=348ba098bd97cd9c8a0c19341bb6c5bd 1100w, https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/ChatMemoryComponent.png?w=1650&fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=15ad719bba59f9bfcd128cafc92b8e30 1650w, https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/ChatMemoryComponent.png?w=2500&fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=555bb263c3bffa3f1f75581a93e0a18f 2500w" />
  </Step>

  <Step title="Add the Output node">
    In this step, you add a node that displays the chat output.

    1. In the top pipeline node chooser bar, on the **General** tab, click **Output**.
    2. Connect the **response** output from the **OpenAI LLM** node to the input in the **Output** node.

        <img src="https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/OutputComponent.png?fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=c83ae5e3e849aef33dcbc47b1d148904" alt="Adding the Output node" data-og-width="1430" width="1430" data-og-height="883" height="883" data-path="img/vectorshift/OutputComponent.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/OutputComponent.png?w=280&fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=a0f10cf66b18ec6e184256ff568105d7 280w, https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/OutputComponent.png?w=560&fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=d30a962f0fcc513c709826de87b3a165 560w, https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/OutputComponent.png?w=840&fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=1f8155f1458d07ce956166d9f14b6649 840w, https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/OutputComponent.png?w=1100&fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=ea277d1c9f98ef72a0c0751029c7fe4b 1100w, https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/OutputComponent.png?w=1650&fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=bcf3af3b2129ac78caafeb086d9fce4e 1650w, https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/OutputComponent.png?w=2500&fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=cee99c915b0cf35c7ce73063b262f233 2500w" />
  </Step>

  <Step title="Run the project">
    1. In the upper corner of the pipeline designer, click the play (**Run Pipeline**) button.

       <img src="https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/RunPipeline.png?fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=df3c2eb545c6d0d05907f6c9f4341f6a" alt="Running the pipeline" data-og-width="332" width="332" data-og-height="92" height="92" data-path="img/vectorshift/RunPipeline.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/RunPipeline.png?w=280&fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=825a204ebc7e0b3b87c1660679657907 280w, https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/RunPipeline.png?w=560&fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=a06caa2911c4b8e7f4ee31543087f544 560w, https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/RunPipeline.png?w=840&fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=59e4f00f7f63e4f751428ca8fb9d21f1 840w, https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/RunPipeline.png?w=1100&fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=0b93ba2f2c277bb826fbad86621de043 1100w, https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/RunPipeline.png?w=1650&fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=462935eab7e714d4776be08d3202b22a 1650w, https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/RunPipeline.png?w=2500&fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=48a2b11d344281329bde5380f742bd37 2500w" />

    2. In the chat pane, on the **Chatbot** tab, enter a question into the **Message Assistant** box, for example, `What rights does the fifth amendment guarantee?` Then press the send button.

       <img src="https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/ChatbotResults.png?fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=e0fdae5780a58d1478b63ad748799fd5" alt="Chatbot results" data-og-width="384" width="384" data-og-height="873" height="873" data-path="img/vectorshift/ChatbotResults.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/ChatbotResults.png?w=280&fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=18e614cf26892a3a21da47f9e46252f7 280w, https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/ChatbotResults.png?w=560&fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=92a8e84ef90eb605097a1b22cf4effa4 560w, https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/ChatbotResults.png?w=840&fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=521f29e237dfece49b59d271dea3cdf2 840w, https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/ChatbotResults.png?w=1100&fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=078bbec0532b6069e2066865afc74f19 1100w, https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/ChatbotResults.png?w=1650&fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=823ee6c2b38e5ae72dd94e1ab537bb59 1650w, https://mintcdn.com/unstructured-53/0PpGBVwVpmOG7-W9/img/vectorshift/ChatbotResults.png?w=2500&fit=max&auto=format&n=0PpGBVwVpmOG7-W9&q=85&s=ed37bf605e8a88044c2a9bf378d2b937 2500w" />

    3. Wait until the answer appears.

    4. Ask as many additional questions as you want to.
  </Step>
</Steps>

## Learn more

See the [VectorShift documentation](https://docs.vectorshift.ai/).

---

# Source: https://docs.unstructured.io/ui/walkthrough.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Unstructured UI walkthrough

This walkthrough provides you with deep, hands-on experience with the [Unstructured user interface (UI)](/ui/overview). As you follow along, you will learn how to use many of Unstructured's
features for [partitioning](/ui/partitioning), [enriching](/ui/enriching/overview), [chunking](/ui/chunking), and [embedding](/ui/embedding). These features are optimized for turning
your source documents and data into information that is well-tuned for
[retrieval-augmented generation (RAG)](https://unstructured.io/blog/rag-whitepaper),
[agentic AI](https://unstructured.io/problems-we-solve#powering-agentic-ai),
and [model fine-tuning](https://www.geeksforgeeks.org/deep-learning/what-is-fine-tuning/).

This walkthrough uses two sample files to demonstrate how Unstructured identifies and processes content such as image, graphs, complex tables, non-English characters, and handwriting.
These files, which are available for you to download to your local machine, include:

* Wang, Z., Liu, X., & Zhang, M. (2022, November 23).
  *Breaking the Representation Bottleneck of Chinese Characters: Neural Machine Translation with Stroke Sequence Modeling*.
  arXiv.org. [https://arxiv.org/pdf/2211.12781](https://arxiv.org/pdf/2211.12781). This 12-page PDF file features English and non-English characters, images, graphs, and complex tables.
  Throughout this walkthrough, this file's title is shortened to "Chinese Characters" for brevity.
* United States Central Security Service. (2012, January 27). *National Cryptologic Museum Opens New Exhibit on Dr. John Nash*.
  United States National Security Agency. [https://courses.csail.mit.edu/6.857/2012/files/H03-Cryptosystem-proposed-by-Nash.pdf](https://courses.csail.mit.edu/6.857/2012/files/H03-Cryptosystem-proposed-by-Nash.pdf).
  This PDF file features English handwriting and scanned images of documents.
  Throughout this walkthrough, this file's title is shortened to "Nash letters" for brevity.

<Note>
  This walkthrough focuses on local files for ease-of-use demonstration purposes.

  This walkthrough does not cover how to use
  Unstructured to set up [connectors](/ui/connectors) to do large-scale batch processing of multiple files and semi-structured data that are stored in remote locations.
  To learn how to set up connectors and do large-scale batch processing later, see the [next steps](#next-steps) after you finish this walkthrough.
</Note>

If you are not able to complete any of the following steps, contact Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

<Tip>
  *What are these green boxes?*

  As you move through this walkthrough, you will notice tips like this one. These tips are designed to help expand
  your knowledge about Unstructured as you go. Feel free to skip these tips for now if you are in a hurry. You can always return to them later to learn more.
</Tip>

## Step 1: Sign up and sign in to Unstructured

Let's get started!

1. If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
   After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).

   <Note>
     To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/ui/overview#how-am-i-billed%3F).
   </Note>

2. If you have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at [https://platform.unstructured.io](https://platform.unstructured.io).

   <Note>
     For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
     or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).
   </Note>

## Step 2: Create a custom workflow

In this step, you create a custom [workflow](/ui/workflows) in your Unstructured account.

Workflows are defined sequences of processes that automate the flow of data from your source documents and data into Unstructured for processing.
Unstructured then sends its processed data over into your destination file storage locations, databases, and vector stores. Your
RAG apps, agents, and models can then use this processed data in those destinations to do things more quickly and accurately such as
[answering users' questions](https://learn.microsoft.com/en-us/azure/developer/ai/advanced-retrieval-augmented-generation),
[automating business processes](https://unstructured.io/problems-we-solve#business-process-automation),
and [expanding your organization's available body of knowledge](http://knowledgemanagement.ie/the-critical-role-of-knowledge-management-as-a-foundation-for-llms-and-ai/).

<Tip>
  *Which kinds of sources and destinations does Unstructured support?*

  Unstructured can connect to many types of sources and destinations including file storage services such as Amazon S3 and Google Cloud Storage;
  databases such as PostgreSQL; and vector storage and database services such as MongoDB Atlas and Pinecone.

  See the full list of [supported source and destination connectors](/ui/connectors).
</Tip>

<Tip>
  *Which kinds of files does Unstructured support?*

  Unstructured can process a wide variety of file types including PDFs, word processing documents, spreadsheets, slide decks, HTML, image files, emails, and more.

  See the full list of [supported file types](/ui/supported-file-types).
</Tip>

Let's get going!

1. After you are signed in to your Unstructured account, on the sidebar, click **Workflows**.

   <img src="https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/WorkflowsSidebar.png?fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=a186dac681921c7549bd4c4a177ea2c0" alt="Workflows button on the sidebar" data-og-width="154" width="154" data-og-height="576" height="576" data-path="img/ui/walkthrough/WorkflowsSidebar.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/WorkflowsSidebar.png?w=280&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=69d6baeed15d2daa2cf959435e5b5fdc 280w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/WorkflowsSidebar.png?w=560&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=1d05d3180399e17848ca0bc12694c486 560w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/WorkflowsSidebar.png?w=840&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=9174985fbda5b8fea2465d1165402d8b 840w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/WorkflowsSidebar.png?w=1100&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=a0ad18378a92b5b1d84993f8c057c410 1100w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/WorkflowsSidebar.png?w=1650&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=24a7e78f3735c1790ed528ef30207b85 1650w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/WorkflowsSidebar.png?w=2500&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=6c4638587fe11d7d21aa4e23fcc0c0b8 2500w" />

   <Tip>
     *What do the other buttons on the sidebar do?*

     * **Start** takes you to the UI home page. The home page features a simple way to process one local file at a time with limited default settings. [Learn how](/welcome#unstructured-ui-quickstart).
     * **Connectors** allows you to create and manage your [source](/ui/sources/overview) and
       [destination](/ui/destinations/overview) connectors.
     * **Jobs** allows you to see the results of your workflows that are run manually (once) and automatically (on a regular time schedule). [Learn more](/ui/jobs).
     * **API Keys** allows you to use code to create and manage connectors, workflows, and jobs programmatically instead of by using the UI. [Learn more](/ui/account/api-key-url).
     * Your user icon at the bottom of the sidebar allows you to manage your Unstructured account. You can also sign out of your account from here. [Learn more](/ui/account/overview).
   </Tip>

2. Click **New Workflow**.

   <img src="https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/NewWorkflow.png?fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=51f8e5ea86c39fa060e6f00b6ff25d86" alt="New Workflow button" data-og-width="302" width="302" data-og-height="92" height="92" data-path="img/ui/walkthrough/NewWorkflow.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/NewWorkflow.png?w=280&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=f066dc6c870e5aadd72614ad0c2c15e4 280w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/NewWorkflow.png?w=560&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=7faf4f5406e3842a8a720bb4809c2284 560w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/NewWorkflow.png?w=840&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=e3ff5a08b831d62a2c931eda06ea012b 840w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/NewWorkflow.png?w=1100&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=4b775c07b8c3be215ea58513a1da7583 1100w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/NewWorkflow.png?w=1650&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=aabdde8b96d8dbbe45643dcb3ee8fac6 1650w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/NewWorkflow.png?w=2500&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=56668eb2c797ea5e8bf65ec774cd28c4 2500w" />

3. With **Build it Myself** already selected, click **Continue**.

   <img src="https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/BuildItMyself.png?fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=9c8163fcfd133b78ce3fea528716bbb7" alt="Build it Myself workflow option" data-og-width="1156" width="1156" data-og-height="840" height="840" data-path="img/ui/walkthrough/BuildItMyself.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/BuildItMyself.png?w=280&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=e22555d99cb3544a2eb6bf3324b311ef 280w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/BuildItMyself.png?w=560&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=8dd92dd65a4698374d6d9d3678177c2b 560w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/BuildItMyself.png?w=840&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=93c855ec54c5c4577f1c0ef7288d099e 840w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/BuildItMyself.png?w=1100&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=3308af186a7cf6096fd76caaa8385ffc 1100w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/BuildItMyself.png?w=1650&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=3244be6c003226335a9fd78fd30eb79a 1650w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/BuildItMyself.png?w=2500&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=f086430c81d6cd32468aa7e8f7bc31bc 2500w" />

   <Tip>
     *What does **Build it For Me** do?*

     The **Build it For Me** option creates an [automatic workflow](/ui/workflows#create-an-automatic-workflow) with limited, sensible default settings to enable you to get good-quality results faster. However,
     this option requires that you first have an existing remote source and destination connector to add to the workflow. To speed things up here and keep things simple,
     this walkthrough only processes files from your local machine and skips the use of connectors. To learn how to use connectors later,
     see the [next steps](#next-steps) at the end of this walkthrough.
   </Tip>

4. The workflow designer appears.

   <img src="https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/WorkflowDesigner.png?fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=60a471f3f104f39ecaabbd872cd0ae6a" alt="The workflow designer" data-og-width="2140" width="2140" data-og-height="1500" height="1500" data-path="img/ui/walkthrough/WorkflowDesigner.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/WorkflowDesigner.png?w=280&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=a52d30b30531fd752cebb1eed27afbff 280w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/WorkflowDesigner.png?w=560&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=f6961d1340f333891279dcc27f277575 560w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/WorkflowDesigner.png?w=840&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=9677273e8c44e2e3449a1eb7f949ddb6 840w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/WorkflowDesigner.png?w=1100&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=9db3ab0c08037b003b70f3f1ef201107 1100w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/WorkflowDesigner.png?w=1650&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=a884e119d27627938283fa4832210f27 1650w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/WorkflowDesigner.png?w=2500&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=83ab39ffda8b7a28e912d465a8d570a5 2500w" />

   <Tip>
     *What are all the parts of the workflow designer?*

     The middle portion of the workflow designer is the workflow *directed acyclic graph* (DAG), which contains a
     a collection of *nodes* connected by *edges* that go in only one direction. You can think of the DAG similar to a flowchart
     for a process. *Directed* means the arrows show the flow from one step to the next, and *acyclic* means you cannot
     follow the arrows backward to get back to the starting point.

     The workflow settings pane on the right includes the following tabs:

     * The settings on the **Details** tab allow you to change this workflow's name. You can also see when this workflow was created and which jobs were run for this workflow.
     * **Schedule** allows you to set up a schedule for this workflow to run automatically (on a regular time schedule).
     * **Settings** allows you to specify whether every time this workflow runs, that Unstructured's results will overwrite any previous results in the destination location.
       To turn on this behavior, check the **Overwrite existing results** box. To turn it off, uncheck the box. Note that this setting works only
       for blog storage destination connectors such as the ones for Amazon S3, Azure Blob Storage, and Google Cloud Storage.
     * **FAQ** contains additional information about how to use the workflow designer.

     If the workflow settings pane is not visible, click the **Settings** button near the bottom to show it.

     There are also buttons near the bottom to undo or redo recent edits to the workflow DAG, zoom in and out of the workflow
     designer, re-center the DAG within the designer, and add a new node to the DAG.
   </Tip>

## Step 3: Experiment with partitioning

In this step, you use your new workflow to [partition](/ui/partitioning) the sample PDF files that you downloaded earlier onto your local machine.
Partitioning is the process where Unstructured identifies and extracts content from your source documents and then
outputs this content as a series of contextually-rich [document elements and metadata](/ui/document-elements), which are
well-tuned for RAG, agentic AI, and model fine-tuning. This step
shows how well Unstructured's **VLM** partitioning strategy handles challenging content such as complex tables, multilanguage characters, and handwriting.

1. With the workflow designer active from the previous step, at the bottom of the **Source** node, click **Drop file to test**.

   <img src="https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/DropFileToTest.png?fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=52077c448ce7a41cbee4dda614be7316" alt="Drop file to test button" data-og-width="262" width="262" data-og-height="208" height="208" data-path="img/ui/walkthrough/DropFileToTest.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/DropFileToTest.png?w=280&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=5087d5653dcca2e8f6cb6cd001b24ac5 280w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/DropFileToTest.png?w=560&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=414f994854bc441b694e800f8e167695 560w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/DropFileToTest.png?w=840&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=9d49fce9a043f6ce9a9b03069e0c33a3 840w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/DropFileToTest.png?w=1100&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=d4151c7ea621d15dff3328bef16ea5ba 1100w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/DropFileToTest.png?w=1650&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=679dae353372fc2d6be1219b571cf479 1650w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/DropFileToTest.png?w=2500&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=dfcf79a62465cb9a59961b2e2c7a64f6 2500w" />

2. Browse to and select the "Chinese Characters" PDF file that you downloaded earlier.

3. In the workflow designer, click the **Partitioner** node and then, in the node's settings pane's **Details** tab, select **VLM**.<br />

   <Tip>
     *When would I choose **Auto**, **Fast**, **High Res**, or **VLM**?*

     * **Auto** is recommended in most cases. It lets Unstructured figure out the best strategy to switch over to for each incoming file (and even for each page if the incoming file is a PDF), so you don't have to!
     * **Fast** is only for when you know for certain that none of your files have tables, images, or multilanguage, scanned, or handwritten content in them. It's optimized for partitioning text-only content and is the fastest of all the strategies. It can recognize the text for only a few languages other than English.
     * **High Res** is only for when you know for certain that at least one of your files has images or simple tables in them, and that none of your files also have scanned or handwritten content in them. It can recognize the text for more languages than **Fast** but not as many as **VLM**.
     * **VLM** is great for any file, but it is best when you know for certain that some of your files have a combination of tables (especially complex ones), images, and multilanguage, scanned, or handwritten content. It's the highest quality but slowest of all the strategies.
   </Tip>

4. Under **Select VLM Model**, under **Vertex AI**, select **Gemini 2.0 Flash**.<br />

   <img src="https://mintcdn.com/unstructured-53/pyo7Q_5IJVFy-j1Z/img/ui/walkthrough/VLMPartitioner.png?fit=max&auto=format&n=pyo7Q_5IJVFy-j1Z&q=85&s=066c0a3b8958a8b7461862749aa2a9ce" alt="Selecting the VLM for partitioning" data-og-width="3456" width="3456" data-og-height="1868" height="1868" data-path="img/ui/walkthrough/VLMPartitioner.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/pyo7Q_5IJVFy-j1Z/img/ui/walkthrough/VLMPartitioner.png?w=280&fit=max&auto=format&n=pyo7Q_5IJVFy-j1Z&q=85&s=74e86039dd90f753e5eb952a46458e17 280w, https://mintcdn.com/unstructured-53/pyo7Q_5IJVFy-j1Z/img/ui/walkthrough/VLMPartitioner.png?w=560&fit=max&auto=format&n=pyo7Q_5IJVFy-j1Z&q=85&s=6c0f49b0aad04b853c131c32e79698a9 560w, https://mintcdn.com/unstructured-53/pyo7Q_5IJVFy-j1Z/img/ui/walkthrough/VLMPartitioner.png?w=840&fit=max&auto=format&n=pyo7Q_5IJVFy-j1Z&q=85&s=a1a0b3de833895d0472e368e672c6936 840w, https://mintcdn.com/unstructured-53/pyo7Q_5IJVFy-j1Z/img/ui/walkthrough/VLMPartitioner.png?w=1100&fit=max&auto=format&n=pyo7Q_5IJVFy-j1Z&q=85&s=516e87c89e048e89088914fceba58679 1100w, https://mintcdn.com/unstructured-53/pyo7Q_5IJVFy-j1Z/img/ui/walkthrough/VLMPartitioner.png?w=1650&fit=max&auto=format&n=pyo7Q_5IJVFy-j1Z&q=85&s=6b7dcc4c6059564c59227b5e53f59702 1650w, https://mintcdn.com/unstructured-53/pyo7Q_5IJVFy-j1Z/img/ui/walkthrough/VLMPartitioner.png?w=2500&fit=max&auto=format&n=pyo7Q_5IJVFy-j1Z&q=85&s=c1f5faa79edb9edff6b68d9451b88642 2500w" />

   <Note>
     The list of available models shown is constantly being updated. Your list might also differ from the preceding screenshot, depending on your Unstructured
     account type. If **Gemini 2.0 Flash** is not available, choose another available model from the list.

     If you have an Unstructured **Business** account and want to add more models to this list, contact your
     Unstructured account administrator or Unstructured sales representative, or email Unstructured Support at
     [support@unstructured.io](mailto:support@unstructured.io).
   </Note>

   <Tip>
     *When I choose **VLM**, when would I choose one of these models over another?*

     A *vision language model* (VLM) is designed to use sophisticated AI techniques and logic to combine advanced image and text understanding, resulting in more accurate and
     contextually-rich output.

     As VLMs are constantly being released and improved, Unstructured is always adding to and updating its list of supported VLMs.
     If you aren't getting consistent results with one VLM for a particular set of files, switching over to another one might improve your results, depending on that VLM's capabilities and the sample data that is was trained on.
   </Tip>

5. Click **Test**.<br />

   <img src="https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/TestLocalFile.png?fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=1879649f0470091648cf9d2dd0b80174" alt="Begin testing the local file" data-og-width="258" width="258" data-og-height="268" height="268" data-path="img/ui/walkthrough/TestLocalFile.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/TestLocalFile.png?w=280&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=e27b6de5df2a6728ab9bd543fcae81b5 280w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/TestLocalFile.png?w=560&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=43a44e067df1f5e7b4835ef076de5c3a 560w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/TestLocalFile.png?w=840&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=5987140ec1836711f8fbc6bde50052dc 840w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/TestLocalFile.png?w=1100&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=700342b5cd086c3f2d02bdcd8cd0e570 1100w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/TestLocalFile.png?w=1650&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=24501ef8d4e504d6b640810fde797c2d 1650w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/TestLocalFile.png?w=2500&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=a5c34808fd95f208b50367f8eb8e4e3e 2500w" />

6. The PDF file appears in a pane on the left side of the screen, and Unstructured's output appears in a **Test output** pane on the right side of the screen.

   <img src="https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/TestOutputResults.png?fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=8321e62ca7454823a235c02d0a4be25d" alt="Showing the test output results" data-og-width="2140" width="2140" data-og-height="1496" height="1496" data-path="img/ui/walkthrough/TestOutputResults.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/TestOutputResults.png?w=280&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=7cdad9879545b79c2511150a93c9b858 280w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/TestOutputResults.png?w=560&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=de44e7a06a286bbbeb4efb8366b2fdd5 560w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/TestOutputResults.png?w=840&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=15ced22b19ab13415c5b31aadb855866 840w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/TestOutputResults.png?w=1100&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=62d765731852f1c2528e1b3857029f68 1100w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/TestOutputResults.png?w=1650&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=c1dc86896e4790c782b490b9b1eeb49b 1650w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/TestOutputResults.png?w=2500&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=de533417b248e3b5ea80998e44ce206a 2500w" />

   <Tip>
     *What am I looking at in the output here?*

     * Unstructured outputs its results in industry-standard [JSON](https://www.json.org) format, which is ideal for RAG, agentic AI, and model fine-tuning.
     * Each object in the JSON is called a [document element](/ui/document-elements) and contains a `text` representation of the content that Unstructured detected for the particular portion of the document that was analyzed.
     * The `type` is the kind of document element that Unstructured categorizes it as, such as whether it is a title (`Title`), a table (`Table`), an image (`Image`), a series of well-formulated sentences
       (`NarrativeText`), some kind of free text (`UncategorizedText`), a part of a list (`ListItem`), and so on. [Learn more](/ui/document-elements#element-type).
     * The `element_id` is a unique identifier that Unstructured generates to refer to each document element. [Learn more](/ui/document-elements#element-id).
     * `metadata` contains supporting details about each document element, such as the page number it occurred on, the file it occurred in, and so on. [Learn more](/ui/document-elements#metadata).
   </Tip>

   <Tip>
     *What else can I do here?*

     * You can scroll through the original file on the left or, where supported for a given file type, click the up and down arrows to page through the file one page at a time.
     * You can scroll through Unstructured's JSON output on the right, and you can click **Search JSON** to search for specific text in the JSON output. You will do this next.
     * **Download Full JSON** allows you to download the full output to your local machine as a JSON file.
     * **View JSON at this step** allows you to view the JSON output at each step in the workflow as it was further processed. There's only one step right now (the **Partitioner** step),
       but as you add more nodes to the workflow DAG, this can be a useful tool to see how the JSON output changes along the way.
     * The close (**X**) button returns you to the workflow designer.
   </Tip>

7. Notice the following in the JSON output, which you can get to by clicking **Search JSON** above the output:

   <img src="https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/SearchJSON.png?fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=3028d7115aed7d90b490347befe89e83" alt="Searching the JSON output" data-og-width="2136" width="2136" data-og-height="1500" height="1500" data-path="img/ui/walkthrough/SearchJSON.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/SearchJSON.png?w=280&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=026ef7b3f3f07681b853729e821ea4f7 280w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/SearchJSON.png?w=560&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=9a6302b83929246d20320120df5e01b4 560w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/SearchJSON.png?w=840&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=cf7982f6cdc4444e33aa5e8377a66785 840w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/SearchJSON.png?w=1100&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=c5cf497d058a9f06ca0239eb6c47814a 1100w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/SearchJSON.png?w=1650&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=10fee17bf5c5f753ecb22436619b5225 1650w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/SearchJSON.png?w=2500&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=760a35166e8946a8109b14543f3fbece 2500w" />

   * The Chinese characters on page 1. Search for the text `verbs. The characters`. Notice how the Chinese characters are output. We'll see accuracy improvements to this output later in Step 4 in the enrichments portion of this walkthrough.
   * The tables on pages 1, 6, 7, 8, 9, and 12. Search for the text `"Table"` (including the quotation marks) to see how the VLM interprets the various tables. We'll see changes to these elements' `text` and `metadata.text_as_html` contents later in Step 4 in the enrichments portion of this walkthrough.
   * The images on pages 3, 7, and 8. Search for the text `"Image"` (including the quotation marks) to see how the VLM interprets the various images. We'll see changes to these elements' `text` contents later in Step 4 in the enrichments portion of this walkthrough.

   <Note>
     If **Search JSON** is not clickable, this is probably because the JSON output is too large for the online viewer.
     Click **Download full JSON** and open the downloaded JSON file in an offline text editor (such as Visual Studio Code).
     [Learn more](/examplecode/tools/json-tools#visual-studio-code).
   </Note>

8. Now try looking at the "Nash letters" PDF file's output. To do this:

   a. Click the close (**X**) button above the output on the right side of the screen.<br />
   b. At the bottom of the **Source** node, click the existing PDF's file name.<br />
   c. Browse to and select the "Nash letters" file that you downloaded earlier to your local machine.<br />
   d. Click **Test**.<br />

9. Notice the following in the JSON output:

   * The handwriting on page 3. Search for the text `I have written RAND`. Notice how well the handwriting is recognized.
   * The mimeograph on page 18. Search for the text `The system which`. Notice how well the mimeographed content is recognized.

10. When you are done, be sure to click the close (**X**) button above the output on the right side of the screen, to return to
    the workflow designer for the next step.

## Step 4: Experiment with enriching

In this step, you add several [enrichments](/ui/enriching/overview) to your workflow, such as generating summary descriptions of detected images and tables,
HTML representations of detected tables, and detected entities (such as people and organizations) and the inferred relationships among these entities.

1. With the workflow designer active from the previous step, change the **Partitioner** node to use **High Res**.

2. Between the **Partitioner** and **Destination** nodes, click the add (**+**) icon, and then click **Enrich > Enrichment**.

   <img src="https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/AddEnrichment.png?fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=41e46e3cf540af32a23febb00a45161b" alt="Adding an enrichment node" data-og-width="1380" width="1380" data-og-height="494" height="494" data-path="img/ui/walkthrough/AddEnrichment.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/AddEnrichment.png?w=280&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=e714c2769f1e9c623412dafeefc3ed0e 280w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/AddEnrichment.png?w=560&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=328fd1a7a31c7eca067f8ceaa7d3a499 560w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/AddEnrichment.png?w=840&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=ce73ba90e1fd58baed66fca4d9f10a66 840w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/AddEnrichment.png?w=1100&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=1911ff51771cc824c4ca564d5a68916a 1100w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/AddEnrichment.png?w=1650&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=39f4a71051c48a54c25c6fb19d6885f9 1650w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/AddEnrichment.png?w=2500&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=b9ffa0cd362a84917e65009cba6bb688 2500w" />

3. In the node's settings pane's **Details** tab, click:

   * **Image** under **Input Type**.
   * Any available choice for **Provider** (for example, **Anthropic**).
   * Any available choice for **Model** (for example, **Claude Sonnet 4.5** if you chose **Anthropic** for **Provider**).
   * If not already selected, **Image Description** under **Task**.

   <Note>
     The lists of available models for these enrichments are constantly being updated. Your lists might also differ from these instructions, depending on your Unstructured
     account type. If the suggested model for an enrichment is not available, choose another available model from the list.

     If you have an Unstructured **Business** account and want to add more models to these lists, contact your
     Unstructured account administrator or Unstructured sales representative, or email Unstructured Support at
     [support@unstructured.io](mailto:support@unstructured.io).
   </Note>

   <Tip>
     The image description enrichment generates a summary description of each detected image. This can help you to more quickly and easily understand
     what each image is all about without having to stop to manually visualize and interpret the image's content yourself. This also provides additional helpful context about the image for your RAG apps, agents, and models. [Learn more](/ui/enriching/image-descriptions).
   </Tip>

4. Repeat this process to add four more nodes between the **Partitioner** and **Destination** nodes. To do this, click
   the add (**+**) icon, and then click **Enrich > Enrichment**, as follows:

   a. Add a table description enrichment (click the add (**+**) icon to the right of the preceding node, and then click \*\*Enrich > Enrichment).

   In the node's settings pane's **Details** tab, click:

   * **Table** under **Input Type**.
   * Any available choice for **Provider** (for example, **Anthropic**).
   * Any available choice for **Model** (for example, **Claude Sonnet 4.5** if you chose **Anthropic** for **Provider**).
   * If not already selected, **Table Description** under **Task**.

   <Tip>
     The table description enrichment generates a summary description of each detected table. This can help you to more quickly and easily understand
     what each table is all about without having to stop to manually read through the table's content yourself. This also provides additional helpful context about the table for your RAG apps, agents, and models. [Learn more](/ui/enriching/table-descriptions).
   </Tip>

   b. Add a table to HTML enrichment (click the add (**+**) icon to the right of the preceding node, and then click \*\*Enrich > Enrichment).

   In the node's settings pane's **Details** tab, click:

   * **Table** under **Input Type**.
   * Any available choice for **Provider** (for example, **Agentic Table Parsing** to use agentic AI, or **Anthropic** to use a VLM).
   * If you did not choose **Agentic Table Parsing** for **Provider**, click the following:

     * Any available choice for **Model** (for example, **Claude Sonnet 4.5** if you chose **Anthropic** for **Provider**).
     * **Table to HTML** under **Task**.

   <Tip>
     The table to HTML enrichment generates an HTML representation of each detected table. This can help you to more quickly and accurately recreate the table's content elsewhere later as needed. This also provides additional context about the table's structure for your RAG apps, agents, and models. [Learn more](/ui/enriching/table-to-html).
   </Tip>

   c. Add a named entity recognition enrichment (click the add (**+**) icon to the right of the preceding node, and then click \*\*Enrich > Enrichment).

   In the node's settings pane's **Details** tab, click:

   * **Text** under **Input Type**.
   * Any available choice for **Provider** (for example, **Anthropic**).
   * Any available choice for **Model** (for example, **Claude Sonnet 4.5** if you chose **Anthropic** for **Provider**).

   <Tip>
     The named entity recognition (NER) enrichment generates a list of detected entities (such as people and organizations) and the inferred relationships among these entities. This provides additional context about these entities' types and their relationships for your graph databases, RAG apps, agents, and models. [Learn more](/ui/enriching/ner).
   </Tip>

   d. Add a generative OCR enrichment (click the add (**+**) icon to the right of the preceding node, and then click \*\*Enrich > Enrichment).

   In the node's settings pane's **Details** tab, click:

   * **Image** under **Input Type**.
   * Any available choice for **Provider** (for example, **Anthropic**).
   * Any available choice for **Model** (for example, **Claude Sonnet 4.5** if you chose **Anthropic** for **Provider**).
   * **Generative OCR** under **Task**.

   <Tip>
     The generative OCR enrichment improves, as needed, the accuracy of text blocks that Unstructured initially processed during its partitioning phase. [Learn more](/ui/enriching/generative-ocr).
   </Tip>

   <Warning>
     Generative OCR does not process any text blocks by default. You must also explicitly specify which document element
     types containing text that you want generative OCR to process. To do this:

     1. Click the **Partitioner** node.
     2. In the node's settings pane, scroll down to and then click a blank area inside of the **Extract Image Block Types** list.
     3. Select each [document element types](/ui/document-elements#element-type) that you want generative OCR to process. For this
        walkthrough, select only **NarrativeText**.

     Generative OCR does not process the text of any `Image` or `Table` elements if they have already been processed by
     [image description](#image-description-task) or [table description](#table-description-task) enrichments, respectively. Do
     not remove the **Image** or **Table** document elements types from this **Extract Image Block Types** list, or else
     the image description and table description enrichments in your workflow might produce unexepcted results or might not work at all.
   </Warning>

   <Note>
     The **Generative OCR** enrichment appears under the **Input Type** of **Image**, even though this is not an image-related enrichment.
     This is a known issue and will be addressed in a future release.
   </Note>

   The workflow designer should now look similar to this:

   <img src="https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/walkthrough/EnrichedWorkflow.png?fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=10a8f84b1810538346101f9d02eabdfd" alt="The workflow with enrichments added" data-og-width="2194" width="2194" data-og-height="174" height="174" data-path="img/ui/walkthrough/EnrichedWorkflow.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/walkthrough/EnrichedWorkflow.png?w=280&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=960e4fc78ae25b1402fe6c9a7206fac2 280w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/walkthrough/EnrichedWorkflow.png?w=560&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=ecca8f8a3148bc0186e8df66582527de 560w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/walkthrough/EnrichedWorkflow.png?w=840&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=a7a2ca1abb45fc3fe2a8fd86f357d062 840w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/walkthrough/EnrichedWorkflow.png?w=1100&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=4810319cda8b482da0c1f8d9b63c7bce 1100w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/walkthrough/EnrichedWorkflow.png?w=1650&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=ab4deabfb5303161ce64bb5c046c7287 1650w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/walkthrough/EnrichedWorkflow.png?w=2500&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=87d7a4120ddd14d0ea0755a7419a70fa 2500w" />

5. Change the **Source** node to use the "Chinese Characters" PDF file, and then click **Test**.

6. In the **Test output** pane, make sure that **Enrichment (6 of 6)** is showing. If not, click the right arrow (**>**) until **Enrichment (6 of 6)** appears, which will show the output from the last node in the workflow.

   <img src="https://mintcdn.com/unstructured-53/Tq3ntZXiZPvizKeV/img/ui/walkthrough/GoToEnrichmentNode.png?fit=max&auto=format&n=Tq3ntZXiZPvizKeV&q=85&s=d16a686d8feabe07322be15c00bffa18" alt="The final Enrichment node's output" data-og-width="658" width="658" data-og-height="124" height="124" data-path="img/ui/walkthrough/GoToEnrichmentNode.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/Tq3ntZXiZPvizKeV/img/ui/walkthrough/GoToEnrichmentNode.png?w=280&fit=max&auto=format&n=Tq3ntZXiZPvizKeV&q=85&s=afa8e57035924fd46e613b986b5eded4 280w, https://mintcdn.com/unstructured-53/Tq3ntZXiZPvizKeV/img/ui/walkthrough/GoToEnrichmentNode.png?w=560&fit=max&auto=format&n=Tq3ntZXiZPvizKeV&q=85&s=30a51bbd558b55e6aad2988d60bec9d9 560w, https://mintcdn.com/unstructured-53/Tq3ntZXiZPvizKeV/img/ui/walkthrough/GoToEnrichmentNode.png?w=840&fit=max&auto=format&n=Tq3ntZXiZPvizKeV&q=85&s=be5b93ccaca2455fd9625a603a3cb3c0 840w, https://mintcdn.com/unstructured-53/Tq3ntZXiZPvizKeV/img/ui/walkthrough/GoToEnrichmentNode.png?w=1100&fit=max&auto=format&n=Tq3ntZXiZPvizKeV&q=85&s=244d8637c75904b6f67dedcdc1cabc2b 1100w, https://mintcdn.com/unstructured-53/Tq3ntZXiZPvizKeV/img/ui/walkthrough/GoToEnrichmentNode.png?w=1650&fit=max&auto=format&n=Tq3ntZXiZPvizKeV&q=85&s=cfde0fe257912da8767ea044f61b0559 1650w, https://mintcdn.com/unstructured-53/Tq3ntZXiZPvizKeV/img/ui/walkthrough/GoToEnrichmentNode.png?w=2500&fit=max&auto=format&n=Tq3ntZXiZPvizKeV&q=85&s=a341bac7ef0bfad7be1e436438314411 2500w" />

7. Some interesting portions of the output include the following:

   * The Chinese characters on page 1. Search again for the text `verbs. The characters`. Notice how the accuracy of the Chinese character output is improved.
   * The images on pages 3, 7, and 8. Search again for the text `"Image"` (including the quotation marks). Notice the summary description for each image.
   * The tables on pages 1, 6, 7, 8, 9, and 12. Search again for the text `"Table"` (including the quotation marks). Notice the summary description for each of these tables.
     Also notice the `text_as_html` field for each of these tables.
   * The identified entities and inferred relationships among them. Search for the text `Zhijun Wang`. Of the eight instances of this name, notice
     the author's identification as a `PERSON` three times, the author's `published` relationship twice, and the author's `affiliated_with` relationship twice.

8. When you are done, be sure to click the close (**X**) button above the output on the right side of the screen, to return to
   the workflow designer for the next step.

## Step 5: Experiment with chunking

In this step, you apply [chunking](/ui/chunking) to your workflow. Chunking is the process where Unstructured rearranges
the resulting document elements' `text` content into manageable "chunks" to stay within the limits of an AI model and to improve retrieval precision.

<Tip>
  *What kind of chunking strategy should I use, and how big should my chunks be?*

  Unfortunately, there is no one-size-fits-all answer to this question. However, there are some general considerations and guidelines that can help you to determine
  the best chunking strategy and chunk size for your specific use case. Be sure of course to also consult the documentation for your target AI model and downstream application toolsets.

  Is your content primarily organized by title, by page, by interrelated subject matter, or none of these? This can help you determine whether a
  by-title, by-page, by-similarity, or basic (by-character) chunking strategy is best. (You'll experiment with each of these strategies here later.)

  If your chunks are too small, they might lose necessary context, leading to the model providing inaccurate, irrelevant, or hallucinated results.
  On the other hand, if your chunks are too large, the model can struggle with the sheer volume of information, leading to information overload, diluted meaning, and potentially higher processing costs.
  You should aim to find a balance between chunks that are big enough to contain meaningful information, while small enough to enable performant applications and low latency responses.

  For example, smaller chunks of 128 or 256 tokens might be sufficient for capturing more granular semantic information, while larger chunks of 512 or 1024 tokens might be better for retaining more context.
  It's important here to note that *tokens* and *characters* are not the same thing! In terms of
  characters, for English text, a common approximation is 1 token being equal to about 3 or 4 characters or three-quarters of a word. Many AI model providers publish their own token-to-character calculators online that you can use for estimation purposes.

  You should experiement with a variety of chunk sizes, taking into account the kinds of content, the length and complexity of user queries and agent tasks,
  the intended end use, and of course the limits of the models you are using. Try different chunking strategies and sizes with your models and evaluate the results for yourself.
</Tip>

1. With the workflow designer active from the previous step, just before the **Destination** node, click the add (**+**) icon, and then click **Enrich > Chunker**.

   <img src="https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/AddChunker.png?fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=47f6b2cd36cb6f0df036711ea9426605" alt="Adding a chunker node" data-og-width="672" width="672" data-og-height="416" height="416" data-path="img/ui/walkthrough/AddChunker.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/AddChunker.png?w=280&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=340e10547b7fd52fe7ff476936cb1925 280w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/AddChunker.png?w=560&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=8cd2d6acac2fba70f6242d9a43fce0a7 560w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/AddChunker.png?w=840&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=93c30fcf997272f420e985d0cce36037 840w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/AddChunker.png?w=1100&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=a89c6dec6dcac999a063881d310a5f1a 1100w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/AddChunker.png?w=1650&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=9ca9516c8e63d5ebe2a447527634730f 1650w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/AddChunker.png?w=2500&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=adb5677c9934e5ad66cc1c667e203db8 2500w" />

2. In the node's settings pane's **Details** tab, select **Chunk by Character**.

3. Under **Chunk by Character**, specify the following settings:

   * Check the box labelled **Include Original Elements**.
   * Set **Max Characters** to **500**.
   * Set **New After N Characters** to **400**.
   * Set **Overlap** to **50**.
   * Leave **Contextual Chunking** turned off and **Overlap All** unchecked.

   <img src="https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/ChunkByCharacter.png?fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=ecb8f8abe3a0b8289ac807a9092eb66f" alt="Setting up the Chunk by Character strategy" data-og-width="1624" width="1624" data-og-height="1500" height="1500" data-path="img/ui/walkthrough/ChunkByCharacter.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/ChunkByCharacter.png?w=280&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=cd40d4a76a99458a886b8f5a328d4e9c 280w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/ChunkByCharacter.png?w=560&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=7fdb66c932e566bc4e4bafd8ccafe569 560w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/ChunkByCharacter.png?w=840&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=6990da22a2a522c9d25a95f10cbe4f2a 840w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/ChunkByCharacter.png?w=1100&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=b8777a302ae1cdd0e808216dc77dd123 1100w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/ChunkByCharacter.png?w=1650&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=ecc05fd02d05d0f2eeb6e83a70cc95b3 1650w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/ChunkByCharacter.png?w=2500&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=10fef9fc9571df92512aab2e71f0b249 2500w" />

   <Tip>
     *What do each of these chunking settings do?*

     * **Contextual Chunking** prepends chunk-specific explanatory context to each chunk, which has been shown to yield significant improvements in downstream retrieval accuracy. [Learn more](/ui/chunking#contextual-chunking).
     * **Include Original Elements** outputs into each chunk's `metadata` field's `orig_elements` value the elements that were used to form that particular chunk. [Learn more](/ui/chunking#include-original-elements-setting).
     * **Max Characters** is the "hard" or maximum number of characters that any one chunk can contain. Unstructured cannot exceed this number when forming chunks. [Learn more](/ui/chunking#max-characters-setting).
     * **New After N Characters**: is the "soft" or approximate number of characters that any one chunk can contain. Unstructured can exceed this number if needed when forming chunks (but still cannot exceed the **Max Characters** setting). [Learn more](/ui/chunking#new-after-n-characters-setting).
     * **Overlap**, when applied (see **Overlap All**), prepends to the current chunk the specified number of characters from the previous chunk, which can help provide additional context about this chunk relative to the previous chunk. [Learn more](/ui/chunking#overlap-setting)
     * **Overlap All** applies the **Overlap** setting (if greater than zero) to all chunks. Otherwise, unchecking this box means that the **Overlap** setting (if greater than zero)is applied only in edge cases where "normal" chunks cannot be formed by combining whole elements. Check this box with caution as it can introduce noise into otherwise clean semantic units. [Learn more](/ui/chunking#overlap-all-setting).
   </Tip>

4. With the "Chinese Characters" PDF file still selected in the **Source** node, click **Test**.

5. In the **Test output** pane, make sure that **Chunker (7 of 7)** is showing. If not, click the right arrow (**>**) until **Chunker (7 of 7)** appears, which will show the output from the last node in the workflow.

6. To explore the chunker's results, search for the text `"CompositeElement"` (including the quotation marks).

   <Tip>
     *In the chunked output, where did all of the document elements I saw before, such as `Title`, `Image`, and `Table`, go?*

     During chunking, the document elements that were generated during partitioning are now chunked. Because some of these document elements can be split into multiple chunks
     or combined with other chunks, these chunked document elements are now of type `CompositeElement` and `TableChunk`.

     You can have Unstructured also output the original document elements that
     these chunks were derived from by putting them into each chunk's `metadata`. To have Unstructured do this, use the **Include Original Elements** setting, as described in the preceding tip.
   </Tip>

7. Optionally, you can try running this workflow again with the **Chunk by Title** strategy, as follows:

   a. Click the close (**X**) button above the output on the right side of the screen.<br />
   b. In the workflow designer, click the **Chunker** node and then, in the node's settings pane's **Details** tab, select **Chunk by Title**.<br />
   c. Under **Chunk by Title**, specify the following settings:

   * Check the box labelled **Include Original Elements**.
   * Set **Max Characters** to **500**.
   * Set **New After N Characters** to **400**.
   * Set **Overlap** to **50**.
   * Leave **Contextual Chunking** turned off, leave **Combine Text Under N Characters** blank, and leave **Multipage Sections** and **Overlap All** unchecked.

   <Tip>
     *What do each of the chunking settings here that were not already described in the preceding tip do?*

     * **Combine Text Under N Characters** combines elements from a section into a chunk until a section reaches a length of this many characters. [Learn more](/ui/chunking#combine-text-under-n-characters-setting).
     * **Multipage Sections** when checked, allows sections to span multiple pages. [Learn more](/ui/chunking#multipage-sections-setting).
   </Tip>

   d. Click **Test**.<br />
   e. In the **Test output** pane, make sure that **Chunker (7 of 7)** is showing. If not, click the right arrow (**>**) until **Chunker (7 of 7)** appears, which will show the output from the last node in the workflow.<br />
   f. To explore the chunker's results, search for the text `"CompositeElement"` (including the quotation marks). Notice that the lengths of some of the chunks that immediately
   precede titles might be shortened due to the presence of the title impacting the chunk's size.

8. Optionally, you can try running this workflow again with the **Chunk by Page** strategy, as follows:

   a. Click the close (**X**) button above the output on the right side of the screen.<br />
   b. In the workflow designer, click the **Chunker** node and then, in the node's settings pane's **Details** tab, select **Chunk by Page**.<br />
   c. Under **Chunk by Page**, specify the following settings:

   * Check the box labelled **Include Original Elements**.
   * Set **Max Characters** to **500**.
   * Set **New After N Characters** to **400**.
   * Set **Overlap** to **50**.
   * Leave **Contextual Chunking** turned off, and leave **Overlap All** unchecked.

   d. Click **Test**.<br />
   e. In the **Test output** pane, make sure that **Chunker (7 of 7)** is showing. If not, click the right arrow (**>**) until **Chunker (7 of 7)** appears, which will show the output from the last node in the workflow.<br />
   f. To explore the chunker's results, search for the text `"CompositeElement"` (including the quotation marks). Notice that the lengths of some of the chunks that immediately
   precede page breaks might be shortened due to the presence of the page break impacting the chunk's size.<br />

9. Optionally, you can try running this workflow again with the **Chunk by Similarity** strategy, as follows:

   a. Click the close (**X**) button above the output on the right side of the screen.<br />
   b. In the workflow designer, click the **Chunker** node and then, in the node's settings pane's **Details** tab, select **Chunk by Similarity**.<br />
   c. Under **Chunk by Similarity**, specify the following settings:

   * Check the box labelled **Include Original Elements**.
   * Set **Max Characters** to **500**.
   * Set **Similarity Threshold** to **0.99**.
   * Leave **Contextual Chunking** turned off.

   <Tip>
     *What does **Similarity Threshold** mean?*

     * The **Similarity Threshold** is a number between 0 and 1 exclusive (**0.01** to **0.99** inclusive).
     * **0.01** means that any two segments of text that are being compared to each other and are considered least identical in semantic meaning to each other are more likely to be combined into the same chunk together, when such combining must occur.
     * **0.99** means that any two segments of text that are being compared to each other and are considered almost identical in semantic meaning to each other are more likely to be combined into the same chunk together, when such combining must occur.
     * Numbers toward **0.01** bias toward least-identical semantic matches, while numbers toward **0.99** bias toward near-identical semantic matches.
   </Tip>

   d. Click **Test**.<br />
   e. In the **Test output** pane, make sure that **Chunker (7 of 7)** is showing. If not, click the right arrow (**>**) until **Chunker (7 of 7)** appears, which will show the output from the last node in the workflow.<br />
   f. To explore the chunker's results, search for the text `"CompositeElement"` (including the quotation marks). Notice that the lengths of many of the chunks fall well short of the **Max Characters** limit. This is because a similarity threshold
   of **0.99** means that only sentences or text segments with a near-perfect semantic match will be grouped together into the same chunk. This is an extremely high threshold, resulting in very short, highly specific chunks of text. <br />
   g. If you change **Similarity Threshold** to **0.01** and run the workflow again, searching for the text `"CompositeElement"` (including the quotation marks), many of the chunks will now come closer to the **Max Characters** limit. This is because a similarity threshold
   of **0.01** provides an extreme tolerance of differences between pieces of text, grouping almost anything together.<br />

10. When you are done, be sure to click the close (**X**) button above the output on the right side of the screen, to return to
    the workflow designer for the next step.

## Step 6 (Optional): Experiment with embedding

In this step, you generate [embeddings](/ui/embedding) for your workflow. Embeddings are vectors of numbers that represent various aspects of the text that is extracted by Unstructured.
These vectors are stored or "embedded" next to the text itself in a vector store or vector database. Chatbots, agents, and other AI solutions can use
these vector embeddings to more efficiently and effectively find, analyze, and use the associated text. These vector embeddings are generated by an
embedding model that is provided by an embedding provider. For the best embedding model to apply to your use case, see the documentation for your target downstream application toolsets.

1. With the workflow designer active from the previous step, just before the **Destination** node, click the add (**+**) icon, and then click **Transform > Embedder**.

   <img src="https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/AddEmbedder.png?fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=4d049be4f18543fef20362134cf84f41" alt="Adding an embedder node" data-og-width="516" width="516" data-og-height="426" height="426" data-path="img/ui/walkthrough/AddEmbedder.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/AddEmbedder.png?w=280&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=57d5ffa906c9d6e5b469c6dcd9f76aa2 280w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/AddEmbedder.png?w=560&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=018e968a2dff5cf9fb8caadbfce0445c 560w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/AddEmbedder.png?w=840&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=d79683579a735185714eeac6a2fb91a7 840w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/AddEmbedder.png?w=1100&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=dde31a1658a3d67781fc971ae7888507 1100w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/AddEmbedder.png?w=1650&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=a8f321eaf4412b988ee0c4e7861d3eef 1650w, https://mintcdn.com/unstructured-53/Ab8t-j4Gj_ozXcQ7/img/ui/walkthrough/AddEmbedder.png?w=2500&fit=max&auto=format&n=Ab8t-j4Gj_ozXcQ7&q=85&s=7c8e53991551435bace6b6c8839e4bd9 2500w" />

2. In the node's settings pane's **Details** tab, under **Select Embedding Model**, for **Azure OpenAI**, select **Text Embedding 3 Small \[dim 1536]**.

   <Note>
     The list of available embedding models is constantly being updated. Your list might also be different, depending on your Unstructured
     account type. If **Azure OpenAI** and **Text Embedding 3 Small \[dim 1536]** is not available, choose another available model from the list.

     If you have an Unstructured **Business** account and want to add more models to this list, contact your
     Unstructured account administrator or Unstructured sales representative, or email Unstructured Support at
     [support@unstructured.io](mailto:support@unstructured.io).
   </Note>

3. With the "Chinese Characters" PDF file still selected in the **Source** node, click **Test**.

4. In the **Test output** pane, make sure that **Embedder (8 of 8)** is showing. If not, click the right arrow (**>**) until **Embedder (8 of 8)** appears, which will show the output from the last node in the workflow.

5. To explore the embeddings, search for the text `"embeddings"` (including the quotation marks).

   <Tip>
     *What do all of these numbers mean?*

     All by themselves, the numbers in the `embeddings` field of the output have no human-interpretable meaning on their own.
     However, when combined with the specific text that these numbers are associated with, and the embedding model's
     logic that was used to generate these numbers, the numbers in the `embeddings` field are extremely powerful when leveraged by
     downstream chatbots, agents, and other AI solutions.

     These numbers typically represent complex, abstract attributes about the text that are known only to the embedding model that generated these numbers. These
     attributes can be about the text's overall sentiment, intent, subject, semantic meaning, grammatical function, relationships between words, or any number of other things that the model is good at figuring out. This
     is why the embedding model you choose here must be the exact same embedding model that you use in any related chatbot, agent, or other
     AI solution that relies on these numbers. Otherwise, the numbers that are generated here will not have the same meaning
     downstream as well. Also, the number of *dimensions* (or the number of numbers in the `embeddings` field)
     you choose here must also be the exact same number of dimensions downstream as well.

     To repeat, the name and number of dimensions
     for the embedding model you choose here must be the exact same name and number of dimensions for the embedding model you use in your
     related downstream chatbots, agents, and other AI solutions that rely on this particular text and its associated embeddings that were generated here.
   </Tip>

6. When you are done, be sure to click the close (**X**) button above the output on the right side of the screen, to return to
   the workflow designer so that you can continue designing things later as you see fit.

## Next steps

Congratulations! You now have an Unstructured workflow that partitions, enriches, chunks, and embeds your source documents, producing
context-rich data that is ready for retrieval-augmented generation (RAG), agentic AI, and model fine-tuning.

Right now, your workflow only accepts one local file at a time for input. Your workflow also only sends Unstructured's processed data to your screen or to be saved locally as a JSON file.
You can modify your workflow to accept multiple files and data from—and send Unstructured's processed data to—one or more file storage
locations, databases, and vector stores. To learn how to do this, try one or more of the following quickstarts:

* [Remote quickstart](/ui/quickstart#remote-quickstart) - This quickstart show you how to begin processing files and semi-structured data from remote source locations at scale, instead of just one local file at a time.
* [Dropbox source connector quickstart](/ui/sources/dropbox-source-quickstart) - If you don't have any remote source locations available for Unstructured to connect to, this quickstart shows you how to set up a Dropbox account to store your documents in, and then connect Unstructured to your Dropbox account.
* [Pinecone destination connector quickstart](/ui/destinations/pinecone-destination-quickstart) - If you don't have any remote destination locations available for Unstructured to send its processed data to, this quickstart shows you how to set up a Pinecone account to have Unstructured store its processed data in, and then connect Unstructured to your Pinecone account.

Unstructured also offers an API and SDKs, which allow you to use code to work with Unstructured programmatically instead of only with the UI. For details, see:

* [Unstructured API quickstart](/api-reference/workflow/overview#quickstart) - This quickstart uses the Unstructured API's workflow operations to programmatically create a Dropbox source connector and a Pinecone destination connector in your Unstructured account. You then programmatically add these connectors to a workflow in your Unstructured account, run that workflow as a job, and then explore the job's results.
* [Unstructured Python SDK](/api-reference/workflow/overview#unstructured-python-sdk) - This article provides an overview of the Unstructured Python SDK and how to use it.
* [Unstructured API overview](/api-reference/overview) - This article provides an overview of the Unstructured API and how to use it.

If you are not able to complete any of the preceding quickstarts, contact Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

---

# Source: https://docs.unstructured.io/ui/destinations/weaviate.md

# Source: https://docs.unstructured.io/open-source/ingestion/destination-connectors/weaviate.md

# Source: https://docs.unstructured.io/api-reference/workflow/destinations/weaviate.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Weaviate

<Note>
  If you're new to Unstructured, read this note first.

  Before you can create a destination connector, you must first sign in to your Unstructured account:

  * If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
    After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).
    To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/api-reference/overview#pricing).
  * If you already have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at
    [https://platform.unstructured.io](https://platform.unstructured.io). For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
    or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  After you sign in, the [Unstructured user interface](/ui/overview) (UI) appears, which you use to get your Unstructured API key.

  1. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business** account, click **API Keys** on the sidebar.<br />

     For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
     for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).

  2. Click **Generate API Key**.<br />

  3. Follow the on-screen instructions to finish generating the key.<br />

  4. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

  After you create the destination connector, add it along with a
  [source connector](/api-reference/workflow/sources/overview) to a [workflow](/api-reference/workflow/overview#workflows).
  Then run the worklow as a [job](/api-reference/workflow/overview#jobs). To learn how, try out the
  the notebook [Dropbox-To-Pinecone Connector API Quickstart for Unstructured](https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Dropbox_To_Pinecone_Connector_Quickstart.ipynb),
  or watch the two 4-minute video tutorials for the [Unstructured Python SDK](/api-reference/workflow/overview#unstructured-python-sdk).

  You can also create destination connectors with the Unstructured user interface (UI).
  [Learn how](/ui/destinations/overview).

  If you need help, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  You are now ready to start creating a destination connector! Keep reading to learn how.
</Note>

Send processed data from Unstructured to Weaviate.

The requirements are as follows.

* For the [Unstructured UI](/ui/overview) or the [Unstructured API](/api-reference/overview): only [Weaviate Cloud](https://weaviate.io/developers/wcs) clusters are supported.
* For [Unstructured Ingest](/open-source/ingestion/overview): Weaviate Cloud clusters,
  [Weaviate installed locally](https://weaviate.io/developers/weaviate/quickstart/local),
  and [Embedded Weaviate](https://weaviate.io/developers/weaviate/installation/embedded) are supported.
* For Weaviate installed locally, you will need the name of the target collection on the local instance.
* For Embedded Weaviate, you will need the instance's connection URL and the name of the target collection on the instance.
* For Weaviate Cloud, you will need:

  <iframe width="560" height="315" src="https://www.youtube.com/embed/FvhJDUZM078" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

  * A Weaviate database instance. The following information assumes that you have a Weaviate Cloud (WCD) account with a Weaviate database cluster in that account.
    [Create a WCD account](https://weaviate.io/developers/wcs/quickstart#create-a-wcd-account). [Create a database cluster](https://weaviate.io/developers/wcs/quickstart#create-a-weaviate-cluster). For other database options, [learn more](https://weaviate.io/developers/weaviate/installation).
  * The URL and API key for the database cluster. [Get the URL and API key](https://weaviate.io/developers/wcs/quickstart#explore-the-details-panel).
  * The name of the target collection in the database. [Create a collection](https://weaviate.io/developers/wcs/tools/collections-tool).

    An existing collection is not required. At runtime, the collection behavior is as follows:

    For the [Unstructured UI](/ui/overview) or the [Unstructured API](/api-reference/overview):

    * If an existing collection name is specified, and Unstructured generates embeddings,
      but the number of dimensions that are generated does not match the existing collection's embedding settings, the run will fail.
      You must change your Unstructured embedding settings or your existing collection's embedding settings to match, and try the run again.
    * If a collection name is not specified, Unstructured creates a new collection in your Weaviate cluster. If Unstructured generates embeddings,
      the new collection's name will be `U<short-workflow-id>_<short-embedding-model-name>_<number-of-dimensions>`.
      If Unstructured does not generate embeddings, the new collection's name will be `U<short-workflow-id`.

    For [Unstructured Ingest](/open-source/ingestion/overview):

    * If an existing collection name is specified, and Unstructured generates embeddings,
      but the number of dimensions that are generated does not match the existing collection's embedding settings, the run will fail.
      You must change your Unstructured embedding settings or your existing collection's embedding settings to match, and try the run again.
    * If a collection name is not specified, Unstructured creates a new collection in your Weaviate cluster. The new collection's name will be `Unstructuredautocreated`.

    If Unstructured creates a new collection and generates embeddings, you will not see an `embeddings` property in tools such as the Weaviate Cloud
    **Collections** user interface. To view the generated embeddings, you can run a Weaviate GraphQL query such as the following. In this query, replace `<collection-name>` with
    the name of the new collection, and replace `<property-name>` with the name of each additional available property that
    you want to return results for, such as `text`, `type`, `element_id`, `record_id`, and so on. The embeddings will be
    returned in the `vector` property.

    ```text  theme={null}
    {
      Get {
        <collection-name> {
          _additional {
            vector
          }
          <property-name>
          <property-name>
        }
      }
    }
    ```

If [auto-schema](https://docs.weaviate.io/weaviate/config-refs/collections#auto-schema) is enabled in Weaviate (which it is by default),
Weaviate can infer missing properties and add them to the collection definition at run time. However, it is a Weaviate best practice to manually define as much
of the data schema in advance as possible, since manual definition gives you the most control.

The minimum viable schema for Unstructured includes only the `element_id` and `record_id` properties. The `text` and `type` properties should also be included, but they are technically optional.
If you are using Unstructured to generate embeddings, you must

The following code example shows how to use the [weaviate-client](https://pypi.org/project/weaviate-client/) Python package to create a
collection in a Weaviate Cloud database cluster with this minimum viable schema, and to specify that Unstructured will generate the embeddings for this collection.
To connect to a locally hosted Weaviate instance instead, call [weaviate.connect\_to\_local](https://docs.weaviate.io/weaviate/connections/connect-local).
To connect to Embedded Weaviate instead, call [weaviate.connect\_to\_embedded](https://docs.weaviate.io/weaviate/connections/connect-embedded).

```python  theme={null}
import os
import weaviate
from weaviate.classes.init import Auth
import weaviate.classes.config as wvc

client = weaviate.connect_to_weaviate_cloud(
    cluster_url=os.getenv("WEAVIATE_URL"),
    auth_credentials=Auth.api_key(api_key=os.getenv("WEAVIATE_API_KEY")),
)

collection = client.collections.create(
    name="MyCollection",
    properties=[
        wvc.Property(name="element_id", data_type=wvc.DataType.UUID),
        wvc.Property(name="record_id", data_type=wvc.DataType.TEXT),
        wvc.Property(name="text", data_type=wvc.DataType.TEXT),
        wvc.Property(name="type", data_type=wvc.DataType.TEXT),
    ],
    vectorizer_config=None, # Unstructured will generate the embeddings instead of Weaviate.
)

client.close()
```

For objects in the `metadata` field that Unstructured produces and that you want to store in a Weaviate collection, be sure to follow
Unstructured's `metadata` field naming convention. For example, if Unstructured produces a `metadata` field with the following
child objects:

```json  theme={null}
"metadata": {
  "is_extracted": "true",
  "coordinates": {
    "points": [
      [
        134.20055555555555,
        241.36027777777795
      ],
      [
        134.20055555555555,
        420.0269444444447
      ],
      [
        529.7005555555555,
        420.0269444444447
      ],
      [
        529.7005555555555,
        241.36027777777795
      ]
    ],
    "system": "PixelSpace",
    "layout_width": 1654,
    "layout_height": 2339
  },
  "filetype": "application/pdf",
  "languages": [
    "eng"
  ],
  "page_number": 1,
  "image_mime_type": "image/jpeg",
  "filename": "realestate.pdf",
  "data_source": {
    "url": "file:///home/etl/node/downloads/00000000-0000-0000-0000-000000000001/7458635f-realestate.pdf",
    "record_locator": {
      "protocol": "file",
      "remote_file_path": "file:///home/etl/node/downloads/00000000-0000-0000-0000-000000000001/7458635f-realestate.pdf"
    }
  },
  "entities": {
    "items": [
      {
        "entity": "HOME FOR FUTURE",
        "type": "ORGANIZATION"
      },
      {
        "entity": "221 Queen Street, Melbourne VIC 3000",
        "type": "LOCATION"
      }
    ],
    "relationships": [
      {
        "from": "HOME FOR FUTURE",
        "relationship": "based_in",
        "to": "221 Queen Street, Melbourne VIC 3000"
      }
    ]
  }
}
```

You could create corresponding properties in your collection's schema by using the following property names and data types:

```python  theme={null}
import os
import weaviate
from weaviate.classes.init import Auth
import weaviate.classes.config as wvc

client = weaviate.connect_to_weaviate_cloud(
    cluster_url=os.getenv("WEAVIATE_URL"),
    auth_credentials=Auth.api_key(api_key=os.getenv("WEAVIATE_API_KEY")),
)

collection = client.collections.create(
    name="MyCollection",
    properties=[
        wvc.Property(name="element_id", data_type=wvc.DataType.UUID),
        wvc.Property(name="record_id", data_type=wvc.DataType.TEXT),
        wvc.Property(name="text", data_type=wvc.DataType.TEXT),
        wvc.Property(name="type", data_type=wvc.DataType.TEXT),
        wvc.Property(
            name="metadata",
            data_type=wvc.DataType.OBJECT,
            nested_properties=[
                wvc.Property(name="is_extracted", data_type=wvc.DataType.TEXT),
                wvc.Property(
                    name="coordinates",
                    data_type=wvc.DataType.OBJECT,
                    nested_properties=[
                        wvc.Property(name="points", data_type=wvc.DataType.TEXT),
                        wvc.Property(name="system", data_type=wvc.DataType.TEXT),
                        wvc.Property(name="layout_width", data_type=wvc.DataType.NUMBER),
                        wvc.Property(name="layout_height", data_type=wvc.DataType.NUMBER),
                    ],
                ),
                wvc.Property(name="filetype", data_type=wvc.DataType.TEXT),
                wvc.Property(name="languages", data_type=wvc.DataType.TEXT_ARRAY),
                wvc.Property(name="page_number", data_type=wvc.DataType.TEXT),
                wvc.Property(name="image_mime_type", data_type=wvc.DataType.TEXT),
                wvc.Property(name="filename", data_type=wvc.DataType.TEXT),
                wvc.Property(
                    name="data_source",
                    data_type=wvc.DataType.OBJECT,
                    nested_properties=[
                        wvc.Property(name="url", data_type=wvc.DataType.TEXT),
                        wvc.Property(name="record_locator", data_type=wvc.DataType.TEXT),
                    ],
                ),
                wvc.Property(
                    name="entities", 
                    data_type=wvc.DataType.OBJECT,
                    nested_properties=[
                        wvc.Property(
                            name="items", 
                            data_type=wvc.DataType.OBJECT_ARRAY,
                            nested_properties=[
                                wvc.Property(name="entity", data_type=wvc.DataType.TEXT),
                                wvc.Property(name="type", data_type=wvc.DataType.TEXT),
                            ],
                        ),
                        wvc.Property(
                            name="relationships", 
                            data_type=wvc.DataType.OBJECT_ARRAY,
                            nested_properties=[
                                wvc.Property(name="to", data_type=wvc.DataType.TEXT),
                                wvc.Property(name="from", data_type=wvc.DataType.TEXT),
                                wvc.Property(name="relationship", data_type=wvc.DataType.TEXT),
                            ],
                        ),
                    ],
                ),
            ],
        ),
    ],
    vectorizer_config=None, # Unstructured will generate the embeddings instead of Weaviate.
)

client.close()
```

Unstructured cannot provide a schema that is guaranteed to work in all
circumstances. This is because these schemas will vary based on your source files' types; how you
want Unstructured to partition, chunk, and generate embeddings; any custom post-processing code that you run; and other factors.

See also:

* [Collection schema](https://weaviate.io/developers/weaviate/config-refs/schema)
* [Unstructured document elements and metadata](/api-reference/legacy-api/partition/document-elements)

To create a Weaviate destination connector, see the following examples.

<CodeGroup>
  ```python Python SDK theme={null}
  import os

  from unstructured_client import UnstructuredClient
  from unstructured_client.models.operations import CreateDestinationRequest
  from unstructured_client.models.shared import CreateDestinationConnector

  with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
      response = client.destinations.create_destination(
          request=CreateDestinationRequest(
              create_destination_connector=CreateDestinationConnector(
                  name="<name>",
                  type="weaviate-cloud",
                  config={
                      "cluster_url": "<host-url>",
                      "collection": "<class-name>",
                      "api_key": "<api-key>"
                  }
              )
          )
      )

      print(response.destination_connector_information)
  ```

  ```bash curl theme={null}
  curl --request 'POST' --location \
  "$UNSTRUCTURED_API_URL/destinations" \
  --header 'accept: application/json' \
  --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
  --header 'content-type: application/json' \
  --data \
  '{
      "name": "<name>",
      "type": "weaviate",
      "config": {
          "host_url": "<host-url>",
          "class_name": "<class-name>",
          "api_key": "<api-key>"
      }
  }'
  ```
</CodeGroup>

Replace the preceding placeholders as follows:

* `<name>` (*required*) - A unique name for this connector.
* `<host-url>` (*required*) - The URL of the Weaviate database cluster.
* `<class-name>` - The name of the target collection within the cluster. If no value is provided, see the beginning of this article
  for the behavior at run time.
* `<api-key>` (*required*) - The API key provided by Weaviate to access the cluster.

---

# Source: https://docs.unstructured.io/welcome.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Welcome to Unstructured!

<Tip>To start using Unstructured right away, skip ahead to the <Icon icon="desktop" /> [UI quickstart](#unstructured-ui-quickstart) or <Icon icon="rectangle-terminal" /> [API quickstart](#unstructured-api-quickstart) now!</Tip>

## What is Unstructured?

<Frame caption="Unstructured turns your organization's unstructured documents into structured, AI-ready data at scale with high performance, accuracy, and relevance">
  <img src="https://mintcdn.com/unstructured-53/HIy1tdX-YgPq1b1H/img/welcome/conceptual-flow.png?fit=max&auto=format&n=HIy1tdX-YgPq1b1H&q=85&s=eda1bc5280bfd7a7ee9ae07d6058e9ed" alt="Unstructured conceptual data flow" data-og-width="768" width="768" data-og-height="182" height="182" data-path="img/welcome/conceptual-flow.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/HIy1tdX-YgPq1b1H/img/welcome/conceptual-flow.png?w=280&fit=max&auto=format&n=HIy1tdX-YgPq1b1H&q=85&s=5434ec0887912432d07058a4494cc87e 280w, https://mintcdn.com/unstructured-53/HIy1tdX-YgPq1b1H/img/welcome/conceptual-flow.png?w=560&fit=max&auto=format&n=HIy1tdX-YgPq1b1H&q=85&s=3407069e35a8872f022fe2c3f110343e 560w, https://mintcdn.com/unstructured-53/HIy1tdX-YgPq1b1H/img/welcome/conceptual-flow.png?w=840&fit=max&auto=format&n=HIy1tdX-YgPq1b1H&q=85&s=8378838d3e06ca529ce742cc5d7db004 840w, https://mintcdn.com/unstructured-53/HIy1tdX-YgPq1b1H/img/welcome/conceptual-flow.png?w=1100&fit=max&auto=format&n=HIy1tdX-YgPq1b1H&q=85&s=90e34f0293f1a8301098c153788f2ae5 1100w, https://mintcdn.com/unstructured-53/HIy1tdX-YgPq1b1H/img/welcome/conceptual-flow.png?w=1650&fit=max&auto=format&n=HIy1tdX-YgPq1b1H&q=85&s=6d7f5b9585a2b09eeca0407d803c6a37 1650w, https://mintcdn.com/unstructured-53/HIy1tdX-YgPq1b1H/img/welcome/conceptual-flow.png?w=2500&fit=max&auto=format&n=HIy1tdX-YgPq1b1H&q=85&s=00fcaf9db899ad53b964c1e51d15823b 2500w" />
</Frame>

Unstructured provides a platform and tools to ingest and process your unstructured documents for:

* [Enhancing retrieval-augmented generation (RAG)](https://unstructured.io/blog/rag-whitepaper): RAG boosts AI accuracy and relevance by working with the data that's most important to you, providing results that are more current, focused, and meaningful to your queries and tasks.
* [Fueling agentic AI](https://unstructured.io/problems-we-solve#powering-agentic-ai): Agentic AI acts like virtual teammates that can plan, decide, and take action on their own to get things done on your behalf, freeing you up for bigger challenges.

We empower your organization to take full advantage of RAG and agentic AI opportunities through:

* [Rapid prototyping](https://unstructured.io/problems-we-solve#rapid-prototyping): Quickly iterating on AI experiences powered by enterprise data.
* [Replacing do-it-yourself data transformation pipelines](https://unstructured.io/problems-we-solve#replacing-diy-pipelines): Building your own data transformation pipelines is time-consuming and error-prone.
* [Unifying data silos](https://unstructured.io/problems-we-solve#unifying-data-silos): Eliminating data barriers and creating shared GenAI data layers.
* [Continuous data delivery](https://unstructured.io/problems-we-solve#continuous-data-delivery): Scheduling jobs, routing documents, and optimizing every task.
* [Business process automation](https://unstructured.io/problems-we-solve#business-process-automation): Report generation, automatic responses, decision-making, content analysis, and more!

This 60-second video describes more about what Unstructured does and its benefits (no sound):

<iframe width="560" height="315" src="https://www.youtube.com/embed/b2AcxJDXOLs" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

This 40-second video demonstrates a simple use case that Unstructured helps solve (no sound):

<iframe width="560" height="315" src="https://www.youtube.com/embed/E-tupjji22U" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

This 60-second video shows why using Unstructured is preferable to building your own similar solution:

<iframe width="560" height="315" src="https://www.youtube.com/embed/P9HzldV72ho" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

You can use Unstructured through a user interface (UI), an API, or both. Read on to learn more.

<Frame caption="An example of the Unstructured user interface, showing a local file being partitioned">
  <img src="https://mintcdn.com/unstructured-53/HIy1tdX-YgPq1b1H/img/welcome/user-interface.png?fit=max&auto=format&n=HIy1tdX-YgPq1b1H&q=85&s=ce845e77a23b31d82aa2d3eea405e07e" alt="Unstructured user interface example" data-og-width="847" width="847" data-og-height="454" height="454" data-path="img/welcome/user-interface.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/HIy1tdX-YgPq1b1H/img/welcome/user-interface.png?w=280&fit=max&auto=format&n=HIy1tdX-YgPq1b1H&q=85&s=29a32b0aebc8f086cd7174f54664e26f 280w, https://mintcdn.com/unstructured-53/HIy1tdX-YgPq1b1H/img/welcome/user-interface.png?w=560&fit=max&auto=format&n=HIy1tdX-YgPq1b1H&q=85&s=14c13dd9265fca636177c76a76999d70 560w, https://mintcdn.com/unstructured-53/HIy1tdX-YgPq1b1H/img/welcome/user-interface.png?w=840&fit=max&auto=format&n=HIy1tdX-YgPq1b1H&q=85&s=2be1c444b6eea963c18db2d16760ab4f 840w, https://mintcdn.com/unstructured-53/HIy1tdX-YgPq1b1H/img/welcome/user-interface.png?w=1100&fit=max&auto=format&n=HIy1tdX-YgPq1b1H&q=85&s=86907d2abfaf06254d2d3d225f24ac53 1100w, https://mintcdn.com/unstructured-53/HIy1tdX-YgPq1b1H/img/welcome/user-interface.png?w=1650&fit=max&auto=format&n=HIy1tdX-YgPq1b1H&q=85&s=6c85afdb0f0c44ff8c90672c5d5a77e0 1650w, https://mintcdn.com/unstructured-53/HIy1tdX-YgPq1b1H/img/welcome/user-interface.png?w=2500&fit=max&auto=format&n=HIy1tdX-YgPq1b1H&q=85&s=ad9a607a5deda7a3d7bc071372fe08cb 2500w" />
</Frame>

<Frame caption="An example of the Unstructured API, showing the code to process a local file on demand">
  <img src="https://mintcdn.com/unstructured-53/HIy1tdX-YgPq1b1H/img/welcome/api.png?fit=max&auto=format&n=HIy1tdX-YgPq1b1H&q=85&s=33a4944ecb1c94b25e5a70f0c4723089" alt="Unstructured API example" data-og-width="547" width="547" data-og-height="635" height="635" data-path="img/welcome/api.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/HIy1tdX-YgPq1b1H/img/welcome/api.png?w=280&fit=max&auto=format&n=HIy1tdX-YgPq1b1H&q=85&s=04eca8e1293c949e2d1d70e837b1f423 280w, https://mintcdn.com/unstructured-53/HIy1tdX-YgPq1b1H/img/welcome/api.png?w=560&fit=max&auto=format&n=HIy1tdX-YgPq1b1H&q=85&s=b48a286058c01d0081b99bacdaee2d25 560w, https://mintcdn.com/unstructured-53/HIy1tdX-YgPq1b1H/img/welcome/api.png?w=840&fit=max&auto=format&n=HIy1tdX-YgPq1b1H&q=85&s=d11584bd2f75b3b446e16670421ac542 840w, https://mintcdn.com/unstructured-53/HIy1tdX-YgPq1b1H/img/welcome/api.png?w=1100&fit=max&auto=format&n=HIy1tdX-YgPq1b1H&q=85&s=dc2161e7726ac4b5ed81f39e6d37175b 1100w, https://mintcdn.com/unstructured-53/HIy1tdX-YgPq1b1H/img/welcome/api.png?w=1650&fit=max&auto=format&n=HIy1tdX-YgPq1b1H&q=85&s=990c1ff16459ef167c774021b16850e8 1650w, https://mintcdn.com/unstructured-53/HIy1tdX-YgPq1b1H/img/welcome/api.png?w=2500&fit=max&auto=format&n=HIy1tdX-YgPq1b1H&q=85&s=efe183400a78d8879edb5d09cbd94d47 2500w" />
</Frame>

***

## <Icon icon="desktop" />    Unstructured UI quickstart

This quickstart shows how, in just a few minutes, you can use the Unstructured user interface (UI) to quickly and easily see Unstructured's
best-in-class transformation results for a single file that is stored on your local computer.

<Tip>
  This quickstart focuses on a single, local file for ease-of-use demonstration purposes.

  To use Unstructured later to do
  large-scale batch processing of multiple files and semi-structured data that are stored in remote locations,
  [skip over](/ui/quickstart#remote-quickstart) to the remote quickstart after you finish this one.
</Tip>

If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).

<Note>
  If you already have an Unstructured **Pay-As-You-Go** or **Business SaaS** account, you are already signed up for Unstructured.
  Sign in to your existing Unstructured **Pay-As-You-Go** or **Business SaaS** account, at [https://platform.unstructured.io](https://platform.unstructured.io).

  If you already have an Unstructured **dedicated instance** or **in-VPC** deployment, your sign-in link will be unique to your deployment.
  If you're not sure what your unique sign-in link is, see your Unstructured account administrator, or email Unstructured Support at
  [support@unstructured.io](mailto:support@unstructured.io).
</Note>

Do the following:

1. After you are signed in, the **Start** page appears.

2. In the **Welcome** area, do one of the following:

   * Click one of the sample files, such as **realestate.pdf**, to have Unstructured parse and transform that sample file.
   * Click **Browse files**, or drag and drop a file onto **Drop file to test**, to have Unstructured parse and transform your own file.

     If you choose to use your own file, the file must be 10 MB or less in size. Also, the file must be one of the following supported file types:

     | File extension |
     | -------------- |
     | `.bmp`         |
     | `.csv`         |
     | `.doc`         |
     | `.docx`        |
     | `.eml`         |
     | `.epub`        |
     | `.heic`        |
     | `.html`        |
     | `.jpeg`        |
     | `.jpg`         |
     | `.md`          |
     | `.msg`         |
     | `.odt`         |
     | `.org`         |
     | `.p7s`         |
     | `.pdf`         |
     | `.png`         |
     | `.ppt`         |
     | `.pptx`        |
     | `.rst`         |
     | `.rtf`         |
     | `.tif`         |
     | `.tiff`        |
     | `.tsv`         |
     | `.txt`         |
     | `.xls`         |
     | `.xlsx`        |
     | `.xml`         |

   <img src="https://mintcdn.com/unstructured-53/rMc-icsfF1CGWovL/img/ui/single-file/welcome.png?fit=max&auto=format&n=rMc-icsfF1CGWovL&q=85&s=4afd67de1ca6df270150830c59dfecb8" alt="Welcome interface on the Start page" data-og-width="1346" width="1346" data-og-height="591" height="591" data-path="img/ui/single-file/welcome.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/rMc-icsfF1CGWovL/img/ui/single-file/welcome.png?w=280&fit=max&auto=format&n=rMc-icsfF1CGWovL&q=85&s=818915212bfeef72c2ad4a4cab6791b9 280w, https://mintcdn.com/unstructured-53/rMc-icsfF1CGWovL/img/ui/single-file/welcome.png?w=560&fit=max&auto=format&n=rMc-icsfF1CGWovL&q=85&s=90e90e00480392191b2e6fb1f60988d4 560w, https://mintcdn.com/unstructured-53/rMc-icsfF1CGWovL/img/ui/single-file/welcome.png?w=840&fit=max&auto=format&n=rMc-icsfF1CGWovL&q=85&s=6834886e1bcc3a152a96b09bf32a7a23 840w, https://mintcdn.com/unstructured-53/rMc-icsfF1CGWovL/img/ui/single-file/welcome.png?w=1100&fit=max&auto=format&n=rMc-icsfF1CGWovL&q=85&s=c31509604224a6f506926c46304b5a30 1100w, https://mintcdn.com/unstructured-53/rMc-icsfF1CGWovL/img/ui/single-file/welcome.png?w=1650&fit=max&auto=format&n=rMc-icsfF1CGWovL&q=85&s=99830aaf06da67c702e50df0916728de 1650w, https://mintcdn.com/unstructured-53/rMc-icsfF1CGWovL/img/ui/single-file/welcome.png?w=2500&fit=max&auto=format&n=rMc-icsfF1CGWovL&q=85&s=6b0bc597eeeec032355169fcdb0e2e21 2500w" />

3. After Unstructured has finished parsing and transforming the file (a process known as
   [partitioning](/ui/partitioning)), you will see the file's contents in the
   **Preview** pane in the center and Unstructured's results in the **Result** pane on the right.

   <img src="https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/results.png?fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=57319fe777c4db089f8eb4fcd18006b9" alt="Unstructured's parse and transform results" data-og-width="3454" width="3454" data-og-height="1912" height="1912" data-path="img/ui/single-file/results.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/results.png?w=280&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=75626bd309bed12d5775f7861c5d267f 280w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/results.png?w=560&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=37be4068bfdb16674d06340d3e5c3072 560w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/results.png?w=840&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=97b2a4f5659532eb82dca29d1683bf3f 840w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/results.png?w=1100&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=c6a51560737fa64966bf77ccc467eede 1100w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/results.png?w=1650&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=e202516fad57db627141f7d8349ffd08 1650w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/results.png?w=2500&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=7f083c7412c7da412ba130a35e1bb5dc 2500w" />

4. The **Result** pane shows a formatted view of Unstructured's results by default. This formatted view is designed for human
   readability. To see the underlying JSON view of the results, which is designed for RAG and agentic AI,
   click **JSON** at the top of the **Result** pane.
   [Learn about what's in the JSON view](/ui/document-elements).

   <img src="https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/json-view.png?fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=0947a7c013ae1e4127f9bb1cb4dc050c" alt="Switching to the JSON view of the results" data-og-width="784" width="784" data-og-height="191" height="191" data-path="img/ui/single-file/json-view.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/json-view.png?w=280&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=c886e31ae32d65c395311975a48e3735 280w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/json-view.png?w=560&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=2c5dc92ff56b32da8484177b55a00fad 560w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/json-view.png?w=840&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=65d097403265d3ee3f9fc83031cba1b7 840w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/json-view.png?w=1100&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=01e946242f70c299ad8c3ae5b9ea03f4 1100w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/json-view.png?w=1650&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=05aeb2509c8979a8575a3aad5aa84b7e 1650w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/json-view.png?w=2500&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=7ea6e1c05e51d09311248430512ef19f 2500w" />

5. Unstructured's initial results are based on its **High Res** [partitioning strategy](/ui/partitioning), which
   begins processing the file's contents and converting these contents into a series of Unstructured
   [document elements and metadata](/ui/document-elements). This partitioning strategy provides good results overall, depending on the complexity of the file's contents.
   This partitioning strategy also generates a bounding box for each detected object in the file. A *bounding box* is
   an imaginary rectangular box drawn around the object to show its location and extent within the file.

   After the High Res partitioning results are shown, Unstructured begins improving these initial results by
   using vision language models (VLMs) to apply a series of generative refinements known as *enrichments*. These
   enrichments include:

   * An [image description](ui/enriching/image-descriptions) enrichment, which uses a VLM to provide a text-based summary of the contents of each detected image.
   * A [generative OCR](/ui/enriching/generative-ocr) enrichment, which uses a VLM to improve the accuracy of each block of initially-processed text.
   * A [table to HTML](/ui/enriching/table-to-html) enrichment, which uses a VLM to provide an HTML-structured representation of each detected table.

   While these enrichments are being applied, a banner appears at the top of the **Result** pane.

   <img src="https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/generative-refinement.png?fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=aaa55fe34d402659f4bb83d7c93e003e" alt="Updating the initial results with enrichments" data-og-width="779" width="779" data-og-height="138" height="138" data-path="img/ui/single-file/generative-refinement.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/generative-refinement.png?w=280&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=f462804d182e7d8152edae863dea917f 280w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/generative-refinement.png?w=560&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=924b8bd75faf8b79004287d0980967ab 560w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/generative-refinement.png?w=840&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=81de21e1de9ebbfa2e1f93dd272b0c7e 840w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/generative-refinement.png?w=1100&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=dfd3dcf0a6fed54a5afa4ce185ace462 1100w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/generative-refinement.png?w=1650&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=bac362b5cd2ac8f2cba28f3fcc54e35d 1650w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/generative-refinement.png?w=2500&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=1302f2aadbf8059f1a47cb71bf6a301b 2500w" />

   To see these enrichments applied to the initial results, click **Update results** in the banner as soon as this button appears,
   which might take up to a minute or more.

   <img src="https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/apply-generative-refinement.png?fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=c202060fb848a8d6b2f9e66d56f237e3" alt="Seeing the initial results updated with the enrichments" data-og-width="780" width="780" data-og-height="136" height="136" data-path="img/ui/single-file/apply-generative-refinement.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/apply-generative-refinement.png?w=280&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=17e2c361e58150cc835104992fcc1f94 280w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/apply-generative-refinement.png?w=560&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=807074d22044d67c874ed64905ff0bc2 560w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/apply-generative-refinement.png?w=840&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=b51f77a1feb7c274c413b341864acb39 840w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/apply-generative-refinement.png?w=1100&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=85191b797d7497912da2c307082fdd4a 1100w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/apply-generative-refinement.png?w=1650&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=8a41a763664ec3b6838e2613770c6f7c 1650w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/apply-generative-refinement.png?w=2500&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=1f9015b9a4aa1bb4837736cc05f62aac 2500w" />

   <Warning>
     Each page that Unstructured processes by using this approach is counted as two pages for usage and billing purposes.

     This is because Unstructured processes each page once with its **High Res** partitioning strategy and then reprocesses each
     page with a VLM to improve the quality, accuracy, and relevance of the initial partitioning results.
     The final results of these two processing passes for each page count as two pages for usage and billing purposes.
     This two-pass process happens regardless of whether you click **Update results** in the banner.

     This two-page usage and billing behavior is a known issue and will be addressed in a future release.
   </Warning>

6. To synchronize the scrolling of the **Preview** pane's selected contents with the **Result** pane's **Formatted** results,
   rest your mouse pointer anywhere inside the contents of the **Preview** pane until a bounding box appears.
   Then click the bounding box. Unstructured automatically scrolls the **Result** pane's **Formatted**
   results to match the selected bounding box. (You cannot synchronize the scrolling of the **JSON** results.)

   <img src="https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/bounding-box.png?fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=c5a484c1385880cbdc2bbde4a3a834e5" alt="Selecting a bounding box" data-og-width="2890" width="2890" data-og-height="688" height="688" data-path="img/ui/single-file/bounding-box.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/bounding-box.png?w=280&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=0c8d1f5e270559ad98a81810db22b3ca 280w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/bounding-box.png?w=560&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=28f092a737f0e95124d471b3f2e974f0 560w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/bounding-box.png?w=840&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=9de350d69271ca8fee42016b271e1bd3 840w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/bounding-box.png?w=1100&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=709239c6e08a0a577244447b8303045c 1100w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/bounding-box.png?w=1650&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=8f2da6c9a7063343e7a0a55b4b012a0c 1650w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/bounding-box.png?w=2500&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=809f3a54800de57262a6884292a34b50 2500w" />

   To show all of the bounding boxes in the **Preview** pane at once, turn on the **Show all bounding boxes** toggle at the top of the **Preview** pane.
   You can now click any of the bounding boxes without first needing to rest your mouse pointer on them to show them.

   <img src="https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/show-all-bounding-boxes.png?fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=128e06817ba5fa7ad8b2d953926450e4" alt="Showing all bounding boxes" data-og-width="1448" width="1448" data-og-height="856" height="856" data-path="img/ui/single-file/show-all-bounding-boxes.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/show-all-bounding-boxes.png?w=280&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=0e9f20c2eafb8eee57ef40cad0269f68 280w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/show-all-bounding-boxes.png?w=560&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=d54dc49ba20c167edc076b6b13f464f6 560w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/show-all-bounding-boxes.png?w=840&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=3eb61f726e0ae6665310c0d5d0d833fb 840w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/show-all-bounding-boxes.png?w=1100&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=193dbb29ca31d0d99503de31c374fa81 1100w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/show-all-bounding-boxes.png?w=1650&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=94736d75a145fa1a624c8236b8e5194d 1650w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/show-all-bounding-boxes.png?w=2500&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=f022c60a98cb66cda9455236afa7245b 2500w" />

You can also do the following:

* To download the JSON view of the results as a local JSON file, click the download icon to the left of the **Formatted** and **JSON** buttons in the **Result** pane.
  (You cannot download the formatted view of the results.)

  <img src="https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/download.png?fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=456cb0a98ef3a3ef0d86e22462f5a11f" alt="Downloading the results as a local JSON file" data-og-width="784" width="784" data-og-height="191" height="191" data-path="img/ui/single-file/download.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/download.png?w=280&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=fda4c4752406ca36683396c236561186 280w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/download.png?w=560&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=304350da7c151e2aef43bdc9383bc5b8 560w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/download.png?w=840&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=c8103fc50b1c7b72d92b02c59acb72a6 840w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/download.png?w=1100&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=019df03cbd4db8c43becfa30d844c52e 1100w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/download.png?w=1650&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=da42a107483e4400797702b2d34c9af5 1650w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/download.png?w=2500&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=85093ecb522c74769c89d62de23ebf6f 2500w" />

* To have Unstructured partition a different file, click **Add new file** in the **Files** pane on the left, and then browse to and select the target file.

* To view the results for a file that was previously partitioned during this session, click the file's name in the **Recent files** list in the **Files** pane.

* To return to the **Start** page, click the **X** (close) button at the left on the title bar, next to **Transform**.

* To have Unstructured do more—such as
  [chunking](/ui/chunking), [embedding](/ui/embedding),
  applying additional kinds of [enrichments](/ui/enriching/overview), and
  processing larger files and semi-structured data in batches at scale—click
  **Edit in Workflow Editor** at the right on the title bar, and then [skip over to the walkthrough](/ui/walkthrough-2).

  <img src="https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/workflow-editor.png?fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=26be748d279c116a252e5b6c6f67a4d9" alt="Switching to the workflow editor" data-og-width="732" width="732" data-og-height="204" height="204" data-path="img/ui/single-file/workflow-editor.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/workflow-editor.png?w=280&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=642196362851e6712f147852b25f4c2e 280w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/workflow-editor.png?w=560&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=ce9d1f294fcce62e765aa0eb779b7d6b 560w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/workflow-editor.png?w=840&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=630afa312e041fe9108b4b2619b81b61 840w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/workflow-editor.png?w=1100&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=0e0c3660d434f1cd0b4ec3105fc90220 1100w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/workflow-editor.png?w=1650&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=95c8df27fd2134ec386f5a7f89f4a308 1650w, https://mintcdn.com/unstructured-53/7V13mJi2iZBZVTP5/img/ui/single-file/workflow-editor.png?w=2500&fit=max&auto=format&n=7V13mJi2iZBZVTP5&q=85&s=a258ffbc479e29cdeb6ded440a5e4e33 2500w" />

What's next?

* <Icon icon="plus" />  [Learn how to add chunking, embeddings, and additional enrichments to your local file results](/ui/walkthrough-2).
* <Icon icon="database" />  [Learn how to do large-scale batch processing of multiple files and semi-structured data that are stored in remote locations instead](/ui/quickstart#remote-quickstart).
* <Icon icon="desktop" />  [Learn more about the Unstructured user interface](/ui/overview).

***

## <Icon icon="rectangle-terminal" />    Unstructured API quickstart

This quickstart shows how you can use the Unstructured API to quickly and easily see Unstructured's
transformation results for files that are stored locally.

<Note>
  This quickstart is designed to work *only by processing local files*.

  To process files (and data) in remote file and blob storage, databases, and vector stores, you must use
  workflow operations in the Unstructured API other than the ones that are used in this quickstart. To learn how, see for example the notebook
  [Dropbox-To-Pinecone Connector API Quickstart for Unstructured](https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Dropbox_To_Pinecone_Connector_Quickstart.ipynb).
</Note>

1. If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
   After you sign up, you are automatically signed in to your Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).

   <Note>
     If you already have an Unstructured **Pay-As-You-Go** or **Business SaaS** account, you are already signed up for Unstructured.
     Sign in to your existing Unstructured **Pay-As-You-Go** or **Business SaaS** account, at [https://platform.unstructured.io](https://platform.unstructured.io).

     If you already have an Unstructured **dedicated instance** or **in-VPC** deployment, your sign-in link will be unique to your deployment.
     If you're not sure what your unique sign-in link is, see your Unstructured account administrator, or email Unstructured Support at
     [support@unstructured.io](mailto:support@unstructured.io).
   </Note>

2. Get your Unstructured API key:

   a. After you sign in to your Unstructured account, click **API Keys** on the sidebar.<br />

   <Note>
     For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
     for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).
   </Note>

   b. Click **Generate API Key**.<br />
   c. Follow the on-screen instructions to finish generating the key.<br />
   d. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

3. Now that you have your Unstructured API key, choose one of the following options to continue:

   * [Use a remote notebook](https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Unstructured_API_On_Demand_Jobs_Quickstart.ipynb) - This option uses a remotely hosted Google Colab notebook. There are no additional setup steps required.
   * [Use your local machine](/api-reference/quickstart) - This option requires you to install the Unstructured Python SDK on your local machine.

<Icon icon="person-chalkboard" />  [Learn more](/api-reference/overview).

***

## <Icon icon="money-check-dollar" />    Pricing

Unstructured offers different account types with different pricing plans:

* <Icon icon="person" />  **Let's Go** and **Pay-As-You-Go** - A single user, with a single workspace, hosted alongside other accounts on Unstructured's cloud infrastructure.
* <Icon icon="building" />  **Business** - Multiple users and workspaces, with three options:

  * <Icon icon="people-group" />  **Business SaaS** - Hosted alongside other accounts on Unstructured's cloud infrastructure.
  * <Icon icon="shield-halved" />  **Dedicated instance** - Hosted within a virtual private cloud (VPC) running inside Unstructured's cloud infrastructure. Dedicated instances are isolated from all other accounts, for additional security and control.
  * <Icon icon="shield" />  **In-VPC** - Hosted within your own VPC on your own cloud infrastructure.

  **Business** accounts also allow for robust customization of Unstructured's features for your unique needs.

For more details, see the [Unstructured Pricing](https://unstructured.io/pricing) page.

To upgrade your account from **Let's Go** or **Pay-As-You-Go** to **Business**,
email Unstructured Sales at [sales@unstructured.io](mailto:sales@unstructured.io).

Some of these plans have billing details that are determined on a per-page basis.

Unstructured calculates a page as follows:

* For these file types, a page is a page, slide, or image: `.pdf`, `.pptx`, and `.tiff`.
* For `.docx` files that have page metadata, Unstructured calculates the number of pages based on that metadata.
* For all other file types, Unstructured calculates the number of pages as the file's size divided by 100 KB.
* For non-file data, Unstructured calculates a page as 100 KB of incoming data to be processed.

***

## <Icon icon="question" />    Questions? Need help?

* For general questions about Unstructured products and pricing, email Unstructured Sales at [sales@unstructured.io](mailto:sales@unstructured.io).
* For technical support for Unstructured accounts, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).
* For technical support for the Unstructured open source library, use our [Slack community](https://short.unstructured.io/pzw05l7).

---

# Source: https://docs.unstructured.io/support/issues/workflow-job-in-progress.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Workflow already has a job scheduled or in progress

## Issue

When you try to run a workflow, you get a warning message that contains text similar to the following:

```text  theme={null}
Failed to schedule job for workflow ... Workflow already has a job scheduled or in progress. Retrying in 60 seconds.
```

## Causes

* Unstructured tried to start a new job run for the workflow, but the previous job run for that workflow is stuck or has otherwise not finished yet.
* The workflow's schedule is too frequent for how long the job takes to run. For example, the workflow might be scheduled to
  run every 5 minutes, but each job run for that workflow typically takes 10 minutes or longer to complete.

## Solutions

* Unstructured will wait and try the new job run again in 60 seconds.
* Check previous job runs for the workflow to see if any of them are stuck. If so, try to cancel them.
* Check previous job runs for the workflow to see if any of them seem to be running much longer than usual. If so,
  consider adjusting the workflow's schedule to run less frequently. Or, temporarily turn off the workflow's schedule and then run the workflow manually as needed.

## Additional resources

To ask questions or get additional help with this issue, see [requesting support](/support/request).

---

# Source: https://docs.unstructured.io/ui/workflows.md

# Source: https://docs.unstructured.io/api-reference/workflow/workflows.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Workflows

To use the [Unstructured API's workflow operations](/api-reference/workflow/overview) to manage workflows, do the following:

* To get a list of available workflows, use the `UnstructuredClient` object's `workflows.list_workflows` function (for the Python SDK) or
  the `GET` method to call the `/workflows` endpoint (for `curl` or Postman). [Learn more](/api-reference/workflow/overview#list-workflows).
* To get information about a workflow, use the `UnstructuredClient` object's `workflows.get_workflow` function (for the Python SDK) or
  the `GET` method to call the `/workflows/<workflow-id>` endpoint (for `curl` or Postman)use the `GET` method to call the `/workflows/<workflow-id>` endpoint. [Learn more](/api-reference/workflow/overview#get-a-workflow).
* To create a workflow, use the `UnstructuredClient` object's `workflows.create_workflow` function (for the Python SDK) or
  the `POST` method to call the `/workflows` endpoint (for `curl` or Postman). [Learn more](#create-a-workflow).
* To run a workflow manually, use the `UnstructuredClient` object's `workflows.run_workflow` function (for the Python SDK) or
  the `POST` method to call the `/workflows/<workflow-id>/run` endpoint (for `curl` or Postman). [Learn more](/api-reference/workflow/overview#run-a-workflow).
* To update a workflow, use the `UnstructuredClient` object's `workflows.update_workflow` function (for the Python SDK) or
  the `PUT` method to call the `/workflows/<workflow-id>` endpoint (for `curl` or Postman). [Learn more](#update-a-workflow).
* To delete a workflow, use the `UnstructuredClient` object's `workflows.delete_workflow` function (for the Python SDK) or
  the `DELETE` method to call the `/workflows/<workflow-id>` endpoint (for `curl` or Postman). [Learn more](/api-reference/workflow/overview#delete-a-workflow).
* To get a list of available workflow templates, use the `GET` method to call the `/templates` endpoint (for `curl` or Postman).
  [Learn more](#list-templates).
* To get information about a workflow template, use the `GET` method to call the `/templates/<template-id>` endpoint (for `curl` or Postman).
  [Learn more](#get-a-template).

The following examples assume that you have already met the [requirements](/api-reference/workflow/overview#requirements) and
understand the [basics](/api-reference/workflow/overview#basics) of working with the Unstructured API's workflow operations.

## Create a workflow

To create a workflow, use the `UnstructuredClient` object's `workflows.create_workflow` function (for the Python SDK) or
the `POST` method to call the `/workflows` endpoint (for `curl` or Postman).

<Note>
  The following instructions create a workflow that exists until it is explicitly deleted (also known as a *long-lived workflow*).
  To create a workflow that exists only for the duration of
  that workflow's associated job run, and that job run's temporary workflow takes one or more local files only as input, see [Run an on-demand job](/api-reference/workflow/overview#run-an-on-demand-job).
</Note>

In the `CreateWorkflow` object (for the Python SDK) or
the request body (for `curl` or Postman),
specify the settings for the workflow, as follows:

<AccordionGroup>
  <Accordion title="Python SDK (remote source and remote destination)">
    ```python  theme={null}
    import os

    from unstructured_client import UnstructuredClient
    from unstructured_client.models.shared import (
        WorkflowNode,
        CreateWorkflow,
        WorkflowType,
        Schedule
    )
    from unstructured_client.models.operations import CreateWorkflowRequest


    client = UnstructuredClient(
        api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")
    )

    workflow_node = WorkflowNode(
        name="<node-name>",
        type="<node-type>",
        subtype="<node-subtype>",
        settings={
            "...": "..."
        }
    )

    another_workflow_node = WorkflowNode(
        name="<node-name>",
        type="<node-type>",
        subtype="<node-subtype>",
        settings={
            "...": "..."
        }
    )

    # And so on for any additional nodes.

    workflow = CreateWorkflow(
        name="<name>",
        source_id="<source-connector-id>",
        destination_id="<destination-connector-id>",
        workflow_type=WorkflowType.<workflow-type>,
        reprocess_all=True|False,
        template_id="<workflow-template-id>",
        workflow_nodes=[
            workflow_node,
            another_workflow_node
            # And so on for any additional nodes.
        ],
        schedule=Schedule("<schedule-timeframe>")
    )

    response = client.workflows.create_workflow(
        request=CreateWorkflowRequest(
            create_workflow=workflow
        )
    )

    info = response.workflow_information

    print(f"name:           {info.name}")
    print(f"id:             {info.id}")
    print(f"status:         {info.status}")
    print(f"type:           {info.workflow_type}")
    print("source(s):")

    for source in info.sources:
        print(f"            {source}")

    print("destination(s):")

    for destination in info.destinations:
        print(f"            {destination}")

    print("schedule(s):")

    for crontab_entry in info.schedule.crontab_entries:
        print(f"            {crontab_entry.cron_expression}")
    ```
  </Accordion>

  <Accordion title="Python SDK (local source and local destination)">
    To use a local source and a local destination, do not specify a `source_id` or `destination_id` value. Also, the `workflow_type` must be set to `CUSTOM`.

    <Note>
      A workflow with a local source has the following limitations:

      * The workflow cannot be set to run on a repeating schedule.
      * The workflow cannot be run from the Unstructured user interface (UI),
        even though the workflow is visible in the UI. However, you can
        run the workflow with the Unstructured Python SDK, or REST API clients such as `curl` or Postman. [Learn how](/api-reference/workflow/overview#run-a-workflow).
    </Note>

    ```python  theme={null}
    import os

    from unstructured_client import UnstructuredClient
    from unstructured_client.models.shared import (
        WorkflowNode,
        CreateWorkflow,
        WorkflowType
    )
    from unstructured_client.models.operations import CreateWorkflowRequest

    workflow_node = WorkflowNode(
        name="<node-name>",
        type="<node-type>",
        subtype="<node-subtype>",
        settings={
            "...": "..."
        }
    )

    another_workflow_node = WorkflowNode(
        name="<node-name>",
        type="<node-type>",
        subtype="<node-subtype>",
        settings={
            "...": "..."
        }
    )

    # And so on for any additional nodes.

    workflow=CreateWorkflow(
        name="<name>",
        workflow_type=WorkflowType.CUSTOM,
        workflow_nodes=[
            workflow_node,
            another_workflow_node
            # And so on for any additional nodes.
        ]
    )

    with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
        response = client.workflows.create_workflow(
            request=CreateWorkflowRequest(
                create_workflow=workflow
            )
        )

        info = response.workflow_information

        print(f"name: {info.name}")
        print(f"id: {info.id}")
        print(f"status: {info.status}")
        print(f"type: {info.workflow_type}")
        print("source(s):")

        for source in info.sources:
            print(f"    {source}")

        print("destination(s):")

        for destination in info.destinations:
            print(f"    {destination}")

        print("schedule(s):")

        for crontab_entry in info.schedule.crontab_entries:
            print(f"    {crontab_entry.cron_expression}")
    ```
  </Accordion>

  <Accordion title="Python SDK (local source and remote destination)">
    To use a local source and a remote destination, specify a `destination_id` value, and do not specify a `source_id` value. Also, the `workflow_type` must be set to `CUSTOM`.

    <Note>
      A workflow with a local source has the following limitations:

      * The workflow cannot be set to run on a repeating schedule.
      * The workflow cannot be run from the Unstructured user interface (UI),
        even though the workflow is visible in the UI. However, you can
        run the workflow with the Unstructured Python SDK, or REST API clients such as `curl` or Postman. [Learn how](/api-reference/workflow/overview#run-a-workflow).
    </Note>

    ```python  theme={null}
    import os

    from unstructured_client import UnstructuredClient
    from unstructured_client.models.shared import (
        WorkflowNode,
        CreateWorkflow,
        WorkflowType
    )
    from unstructured_client.models.operations import CreateWorkflowRequest

    workflow_node = WorkflowNode(
        name="<node-name>",
        type="<node-type>",
        subtype="<node-subtype>",
        settings={
            "...": "..."
        }
    )

    another_workflow_node = WorkflowNode(
        name="<node-name>",
        type="<node-type>",
        subtype="<node-subtype>",
        settings={
            "...": "..."
        }
    )

    # And so on for any additional nodes.

    workflow=CreateWorkflow(
        name="<name>",
        destination_id="<destination-connector-id>",
        workflow_type=WorkflowType.CUSTOM,
        workflow_nodes=[
            workflow_node,
            another_workflow_node
            # And so on for any additional nodes.
        ]
    )

    with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
        response = client.workflows.create_workflow(
            request=CreateWorkflowRequest(
                create_workflow=workflow
            )
        )

        info = response.workflow_information

        print(f"name: {info.name}")
        print(f"id: {info.id}")
        print(f"status: {info.status}")
        print(f"type: {info.workflow_type}")
        print("source(s):")

        for source in info.sources:
            print(f"    {source}")

        print("destination(s):")

        for destination in info.destinations:
            print(f"    {destination}")

        print("schedule(s):")

        for crontab_entry in info.schedule.crontab_entries:
            print(f"    {crontab_entry.cron_expression}")
    ```
  </Accordion>

  <Accordion title="Python SDK (async) (remote source and remote destination)">
    ```python  theme={null}
    import os
    import asyncio

    from unstructured_client import UnstructuredClient
    from unstructured_client.models.shared import (
        WorkflowNode,
        CreateWorkflow,
        WorkflowType,
        Schedule
    )
    from unstructured_client.models.operations import CreateWorkflowRequest

    async def create_workflow():
        client = UnstructuredClient(
            api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")
        )

        workflow_node = WorkflowNode(
            name="<node-name>",
            type="<node-type>",
            subtype="<node-subtype>",
            settings={
                "...": "..."
            }
        )

        another_workflow_node = WorkflowNode(
            name="<node-name>",
            type="<node-type>",
            subtype="<node-subtype>",
            settings={
                "...": "..."
            }
        )

        # And so on for any additional nodes.

        workflow = CreateWorkflow(
            name="<name>",
            source_id="<source-connector-id>",
            destination_id="<destination-connector-id>",
            workflow_type=WorkflowType.<workflow-type>,
            reprocess_all=True|False,
            template_id="<workflow-template-id>",
            workflow_nodes=[
                workflow_node,
                another_workflow_node
                # And so on for any additional nodes.
            ],
            schedule=Schedule("<schedule-timeframe>")
        )

        response = await client.workflows.create_workflow_async(
            request=CreateWorkflowRequest(
                create_workflow=workflow
            )
        )

        info = response.workflow_information

        print(f"name:           {info.name}")
        print(f"id:             {info.id}")
        print(f"status:         {info.status}")
        print(f"type:           {info.workflow_type}")
        print("source(s):")

        for source in info.sources:
            print(f"            {source}")

        print("destination(s):")

        for destination in info.destinations:
            print(f"            {destination}")

        print("schedule(s):")

        for crontab_entry in info.schedule.crontab_entries:
            print(f"            {crontab_entry.cron_expression}")

    asyncio.run(create_workflow())
    ```
  </Accordion>

  <Accordion title="Python SDK (async) (local source and local destination)">
    To use a local source and a local destination, do not specify a `source_id` or `destination_id` value. Also, the `workflow_type` must be set to `CUSTOM`.

    <Note>
      A workflow with a local source has the following limitations:

      * The workflow cannot be set to run on a repeating schedule.
      * The workflow cannot be run from the Unstructured user interface (UI),
        even though the workflow is visible in the UI. However, you can
        run the workflow with the Unstructured Python SDK, or REST API clients such as `curl` or Postman. [Learn how](/api-reference/workflow/overview#run-a-workflow).
    </Note>

    ```python  theme={null}
    import os
    import asyncio

    from unstructured_client import UnstructuredClient
    from unstructured_client.models.shared import (
        WorkflowNode,
        CreateWorkflow,
        WorkflowType
    )
    from unstructured_client.models.operations import CreateWorkflowRequest

    async def create_workflow():
        workflow_node = WorkflowNode(
            name="<node-name>",
            type="<node-type>",
            subtype="<node-subtype>",
            settings={
                "...": "..."
            }
        )

        another_workflow_node = WorkflowNode(
            name="<node-name>",
            type="<node-type>",
            subtype="<node-subtype>",
            settings={
                "...": "..."
            }
        )

        # And so on for any additional nodes.

        workflow = CreateWorkflow(
            name="<name>",
            workflow_type=WorkflowType.CUSTOM,
            workflow_nodes=[
                workflow_node,
                another_workflow_node
                # And so on for any additional nodes.
            ]
        )

        with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
            response = await client.workflows.create_workflow_async(
                request=CreateWorkflowRequest(
                    create_workflow=workflow
                )
            )

            info = response.workflow_information

            print(f"name: {info.name}")
            print(f"id: {info.id}")
            print(f"status: {info.status}")
            print(f"type: {info.workflow_type}")
            print("source(s):")

            for source in info.sources:
                print(f"    {source}")

            print("destination(s):")

            for destination in info.destinations:
                print(f"    {destination}")

            print("schedule(s):")

            for crontab_entry in info.schedule.crontab_entries:
                print(f"    {crontab_entry.cron_expression}")

    asyncio.run(create_workflow())
    ```
  </Accordion>

  <Accordion title="Python SDK (async) (local source and remote destination)">
    To use a local source and a remote destination, specify a `destination_id` value, and do not specify a `source_id` value. Also, the `workflow_type` must be set to `CUSTOM`.

    <Note>
      A workflow with a local source has the following limitations:

      * The workflow cannot be set to run on a repeating schedule.
      * The workflow cannot be run from the Unstructured user interface (UI),
        even though the workflow is visible in the UI. However, you can
        run the workflow with the Unstructured Python SDK, or REST API clients such as `curl` or Postman. [Learn how](/api-reference/workflow/overview#run-a-workflow).
    </Note>

    ```python  theme={null}
    import os
    import asyncio

    from unstructured_client import UnstructuredClient
    from unstructured_client.models.shared import (
        WorkflowNode,
        CreateWorkflow,
        WorkflowType
    )
    from unstructured_client.models.operations import CreateWorkflowRequest

    async def create_workflow():
        workflow_node = WorkflowNode(
            name="<node-name>",
            type="<node-type>",
            subtype="<node-subtype>",
            settings={
                "...": "..."
            }
        )

        another_workflow_node = WorkflowNode(
            name="<node-name>",
            type="<node-type>",
            subtype="<node-subtype>",
            settings={
                "...": "..."
            }
        )

        # And so on for any additional nodes.

        workflow = CreateWorkflow(
            name="<name>",
            destination_id="<destination-connector-id>",
            workflow_type=WorkflowType.CUSTOM,
            workflow_nodes=[
                workflow_node,
                another_workflow_node
                # And so on for any additional nodes.
            ]
        )

        with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
            response = await client.workflows.create_workflow_async(
                request=CreateWorkflowRequest(
                    create_workflow=workflow
                )
            )

            info = response.workflow_information

            print(f"name: {info.name}")
            print(f"id: {info.id}")
            print(f"status: {info.status}")
            print(f"type: {info.workflow_type}")
            print("source(s):")

            for source in info.sources:
                print(f"    {source}")

            print("destination(s):")

            for destination in info.destinations:
                print(f"    {destination}")

            print("schedule(s):")

            for crontab_entry in info.schedule.crontab_entries:
                print(f"    {crontab_entry.cron_expression}")

    asyncio.run(create_workflow())
    ```
  </Accordion>

  <Accordion title="curl (remote source and remote destination)">
    ```bash  theme={null}
    curl --request 'POST' --location \
    "$UNSTRUCTURED_API_URL/workflows" \
    --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
    --header 'accept: application/json' \
    --data \
    '{
        "name": "<name>",
        "source_id": "<source-connector-id>",
        "destination_id": "<destination-connector-id>",
        "workflow_type": "<workflow-type>",
        "reprocess_all": "true"|"false",
        "template_id": "<workflow-template-id>",
        "workflow_nodes": [
            {
                "name": "<node-name>",
                "type": "<node-type>",
                "subtype": "<node-subtype>",
                "settings": {
                    "...": "..."
                }
            },
            {
                "...": "..." 
            } 
        ],
        "schedule": "<schedule-timeframe>"
    }'
    ```
  </Accordion>

  <Accordion title="curl (local source and local destination)">
    To use a local source and a local destination, do not specify a `source_id` or `destination_id` value. Also, the `workflow_type` must be set to `custom`.

    <Note>
      A workflow with a local source has the following limitations:

      * The workflow cannot be set to run on a repeating schedule.
      * The workflow cannot be run from the Unstructured user interface (UI),
        even though the workflow is visible in the UI. However, you can
        run the workflow with the Unstructured Python SDK, or REST API clients such as `curl` (or Postman). [Learn how](/api-reference/workflow/overview#run-a-workflow).
    </Note>

    ```bash  theme={null}
    curl --request 'POST' --location \
    "$UNSTRUCTURED_API_URL/workflows" \
    --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
    --header 'accept: application/json' \
    --data \
    '{
        "name": "<name>",
        "workflow_type": "custom",
        "template_id": "<workflow-template-id>",
        "workflow_nodes": [
            {
                "name": "<node-name>",
                "type": "<node-type>",
                "subtype": "<node-subtype>",
                "settings": {
                    "...": "..."
                }
            },
            {
                "...": "..." 
            } 
        ]
    }'
    ```
  </Accordion>

  <Accordion title="curl (local source and remote destination)">
    To use a local source and a remote destination, specify a `destination_id` value, and do not specify a `source_id` value. Also, the `workflow_type` must be set to `custom`.

    <Note>
      A workflow with a local source has the following limitations:

      * The workflow cannot be set to run on a repeating schedule.
      * The workflow cannot be run from the Unstructured user interface (UI),
        even though the workflow is visible in the UI. However, you can
        run the workflow with the Unstructured Python SDK, or REST API clients such as `curl` (or Postman). [Learn how](/api-reference/workflow/overview#run-a-workflow).
    </Note>

    ```bash  theme={null}
    curl --request 'POST' --location \
    "$UNSTRUCTURED_API_URL/workflows" \
    --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
    --header 'accept: application/json' \
    --data \
    '{
        "name": "<name>",
        "destination_id": "<destination-connector-id>",
        "workflow_type": "custom",
        "template_id": "<workflow-template-id>",
        "workflow_nodes": [
            {
                "name": "<node-name>",
                "type": "<node-type>",
                "subtype": "<node-subtype>",
                "settings": {
                    "...": "..."
                }
            },
            {
                "...": "..." 
            } 
        ]
    }'
    ```
  </Accordion>

  <Accordion title="Postman (remote source and remote destination)">
    1. In the method drop-down list, select **POST**.

    2. In the address box, enter the following URL:

       ```text  theme={null}
       {{UNSTRUCTURED_API_URL}}/workflows
       ```

    3. On the **Headers** tab, enter the following headers:

       * **Key**: `unstructured-api-key`, **Value**: `{{UNSTRUCTURED_API_KEY}}`
       * **Key**: `accept`, **Value**: `application/json`

    4. On the **Body** tab, select **raw** and **JSON**, and specify the settings for the workflow:

       ```json  theme={null}
       {
           "name": "<name>",
           "source_id": "<source-connector-id>",
           "destination_id": "<destination-connector-id>",
           "workflow_type": "<workflow-type>",
           "reprocess_all": "true"|"false",
           "template_id": "<workflow-template-id>",
           "workflow_nodes": [
               {
                   "name": "<node-name>",
                   "type": "<node-type>",
                   "subtype": "<node-subtype>",
                   "settings": {
                       "...": "..."
                   }
               },
               {
                   "...": "..." 
               } 
           ],
           "schedule": "<schedule-timeframe>"
       }
       ```

    5. Click **Send**.
  </Accordion>

  <Accordion title="Postman (local source and local destination)">
    To use a local source and a local destination do not specify a `source_id` or `destination_id` value. Also, the `workflow_type` must be set to `custom`.

    <Note>
      A workflow with a local source has the following limitations:

      * The workflow cannot be set to run on a repeating schedule.
      * The workflow cannot be run from the Unstructured user interface (UI),
        even though the workflow is visible in the UI. However, you can
        run the workflow with the Unstructured Python SDK, or REST API clients such as Postman (or `curl`). [Learn how](/api-reference/workflow/overview#run-a-workflow).
    </Note>

    1. In the method drop-down list, select **POST**.

    2. In the address box, enter the following URL:

       ```text  theme={null}
       {{UNSTRUCTURED_API_URL}}/workflows
       ```

    3. On the **Headers** tab, enter the following headers:

       * **Key**: `unstructured-api-key`, **Value**: `{{UNSTRUCTURED_API_KEY}}`
       * **Key**: `accept`, **Value**: `application/json`

    4. On the **Body** tab, select **raw** and **JSON**, and specify the settings for the workflow:

       ```json  theme={null}
       {
           "name": "<name>",
           "workflow_type": "custom",
           "template_id": "<workflow-template-id>",
           "workflow_nodes": [
               {
                   "name": "<node-name>",
                   "type": "<node-type>",
                   "subtype": "<node-subtype>",
                   "settings": {
                       "...": "..."
                   }
               },
               {
                   "...": "..." 
               } 
           ]
       }
       ```

    5. Click **Send**.
  </Accordion>

  <Accordion title="Postman (local source and remote destination)">
    To use a local source and a remote destination, specify a `destination_id` value, and do not specify a `source_id` value. Also, the `workflow_type` must be set to `custom`.

    <Note>
      A workflow with a local source has the following limitations:

      * The workflow cannot be set to run on a repeating schedule.
      * The workflow cannot be run from the Unstructured user interface (UI),
        even though the workflow is visible in the UI. However, you can
        run the workflow with the Unstructured Python SDK, or REST API clients such as Postman (or `curl`). [Learn how](/api-reference/workflow/overview#run-a-workflow).
    </Note>

    1. In the method drop-down list, select **POST**.

    2. In the address box, enter the following URL:

       ```text  theme={null}
       {{UNSTRUCTURED_API_URL}}/workflows
       ```

    3. On the **Headers** tab, enter the following headers:

       * **Key**: `unstructured-api-key`, **Value**: `{{UNSTRUCTURED_API_KEY}}`
       * **Key**: `accept`, **Value**: `application/json`

    4. On the **Body** tab, select **raw** and **JSON**, and specify the settings for the workflow:

       ```json  theme={null}
       {
           "name": "<name>",
           "workflow_type": "custom",
           "template_id": "<workflow-template-id>",
           "workflow_nodes": [
               {
                   "name": "<node-name>",
                   "destination_id": "<destination-connector-id>",
                   "type": "<node-type>",
                   "subtype": "<node-subtype>",
                   "settings": {
                       "...": "..."
                   }
               },
               {
                   "...": "..." 
               } 
           ]
       }
       ```

    5. Click **Send**.
  </Accordion>
</AccordionGroup>

Replace the preceding placeholders as follows:

* `<name>` (*required*) - A unique name for this workflow.

* `<source-connector-id>` (*required*) - The ID of the target source connector. To get the ID,
  use the `UnstructuredClient` object's `sources.list_sources` function (for the Python SDK) or
  the `GET` method to call the `/sources` endpoint (for `curl` or Postman). [Learn more](/api-reference/workflow/overview#list-source-connectors).

* `<destination-connector-id>` (*required*) - The ID of the target destination connector. To get the ID,
  use the `UnstructuredClient` object's `destinations.list_destinations` function (for the Python SDK) or
  the `GET` method to call the `/destinations` endpoint (for `curl` or Postman). [Learn more](/api-reference/workflow/overview#list-destination-connectors).

* `<workflow-type>` (*required*) - The workflow type. Available values include:

  * `CUSTOM` (for the Python SDK) and `custom` (for `curl` or Postman) - This workflow's nodes are manually specified in the `workflow_nodes` array. For instructions, see [Custom workflow DAG nodes](#custom-workflow-dag-nodes).
  * `TEMPLATE` (for the Python SDK) and `template` (for `curl` or Postman) - This workflow's nodes are specified by a workflow template. In this case, you would not add a `workflow_nodes` array. Instead, you must specify the workflow template's unique ID in the `template_id` field. For instructions, see [List workflow templates](#get-a-template).

  <Note>
    The previously-available workflow optimization types `ADVANCED`, `BASIC`, and `PLATINUM` (for the Python SDK) and
    `advanced`, `basic`, and `platinum` (for `curl` or Postman) are non-operational and planned to be fully removed in a future release.

    The ability to create an [automatic workflow](/ui/workflows#create-an-automatic-workflow) type is currently not available but is planned to be added in a future release.
  </Note>

* `reprocess_all`: *Optional*. If set to true (the default), the workflow will reprocess all documents in the source location on every workflow run.
  If set to `false`, the workflow will only process new documents that are added to the source location, or existing documents that are updated in the
  source location (as determined by checking whether the file's version has changed), since the last workflow run. This setting applies only to blob storage
  connectors such as the Amazon S3, Azure Blob Storage, and Google Cloud Storage source connectors.

* `<schedule-timeframe>` - The repeating automatic run schedule, specified as a predefined phrase. The available predefined phrases are:

  * `every 15 minutes` (for `curl` or Postman): Every 15 minutes (cron expression: `*/15 * * * *`).
  * `every hour`: At the first minute of every hour (cron expression: `0 * * * *`).
  * `every 2 hours`: At the first minute of every second hour (cron expression: `0 */2 * * *`).
  * `every 4 hours`: At the first minute of every fourth hour (cron expression: `0 */4 * * *`).
  * `every 6 hours`: At the first minute of every sixth hour (cron expression: `0 */6 * * *`).
  * `every 8 hours`: At the first minute of every eighth hour (cron expression: `0 */8 * * *`).
  * `every 10 hours`: At the first minute of every tenth hour (cron expression: `0 */10 * * *`).
  * `every 12 hours`: At the first minute of every twelfth hour (cron expression: `0 */12 * * *`).
  * `daily`: At the first minute of every day (cron expression: `0 0 * * *`).
  * `weekly`: At the first minute of every Sunday (cron expression: `0 0 * * 0`).
  * `monthly`: At the first minute of the first day of every month (cron expression: `0 0 1 * *`).

  If `schedule` is not specified, the workflow does not automatically run on a repeating schedule.

  Workflows with a local source cannot be set to run on a repeating schedule.

## Update a workflow

To update information about a workflow, use the `UnstructuredClient` object's `workflows.update_workflow` function (for the Python SDK) or
the `PUT` method to call the `/workflows/<workflow-id>` endpoint (for `curl` or Postman), replacing
`<workflow-id>` with the workflow's unique ID. To get this ID, see [List workflows](#list-workflows).

<Note>
  The following instructions can be used only to update a long-lived workflow (a workflow that exists until it is explicitly deleted).
  A workflow that exists only for the duration of a running [on-demand job](/api-reference/workflow/overview#run-an-on-demand-job)
  cannot be updated.
</Note>

In the request body, specify the settings for the workflow. For the specific settings to include, see
[Create a workflow](/api-reference/workflow/workflows#create-a-workflow).

<AccordionGroup>
  <Accordion title="Python SDK">
    ```python  theme={null}
    import os

    from unstructured_client import UnstructuredClient
    from unstructured_client.models.shared import (
        WorkflowNode,
        UpdateWorkflow,
        WorkflowType,
        Schedule,
    ),
    from unstructured_client.models.operations import UpdateWorkflowRequest


    client = UnstructuredClient(
        api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")
    )

    workflow_node = WorkflowNode(
        # Specify the settings for a workflow node here.
    )

    another_workflow_node = WorkflowNode(
        # Specify the settings for another workflow node here.
    )

    # And so on for any additional nodes.

    workflow = UpdateWorkflow(
        # Specify the settings for the workflow here.
    )

    response = client.workflows.update_workflow(
        request=UpdateWorkflowRequest(
            workflow_id="<workflow-id>",
            update_workflow=workflow
        )
    )

    info = response.workflow_information

    print(f"name:           {info.name}")
    print(f"id:             {info.id}")
    print(f"status:         {info.status}")
    print(f"type:           {info.workflow_type}")
    print("source(s):")

    for source in info.sources:
        print(f"            {source}")

    print("destination(s):")

    for destination in info.destinations:
        print(f"            {destination}")

    print("schedule(s):")

    for crontab_entry in info.schedule.crontab_entries:
        print(f"            {crontab_entry.cron_expression}")
    ```
  </Accordion>

  <Accordion title="Python SDK (async)">
    ```python  theme={null}
    import os
    import asyncio

    from unstructured_client import UnstructuredClient
    from unstructured_client.models.shared import (
        WorkflowNode,
        UpdateWorkflow,
        WorkflowType,
        Schedule,
    ),
    from unstructured_client.models.operations import UpdateWorkflowRequest

    async def update_workflow():
        client = UnstructuredClient(
            api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")
        )

        workflow_node = WorkflowNode(
            # Specify the settings for a workflow node here.
        )

        another_workflow_node = WorkflowNode(
            # Specify the settings for another workflow node here.
        )

        # And so on for any additional nodes.

        workflow = UpdateWorkflow(
            # Specify the settings for the workflow here.
        )

        response = await client.workflows.update_workflow_async(
            request=UpdateWorkflowRequest(
                workflow_id="<workflow-id>",
                update_workflow=workflow
            )
        )

        info = response.workflow_information

        print(f"name:           {info.name}")
        print(f"id:             {info.id}")
        print(f"status:         {info.status}")
        print(f"type:           {info.workflow_type}")
        print("source(s):")

        for source in info.sources:
            print(f"            {source}")

        print("destination(s):")

        for destination in info.destinations:
            print(f"            {destination}")

        print("schedule(s):")

        for crontab_entry in info.schedule.crontab_entries:
            print(f"            {crontab_entry.cron_expression}")

    asyncio.run(update_workflow())
    ```
  </Accordion>

  <Accordion title="curl">
    ```bash  theme={null}
    curl --request 'PUT' --location \
    "$UNSTRUCTURED_API_URL/workflows/<workflow-id>" \
    --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
    --header 'accept: application/json' \
    --data \
    '{
        # Specify the settings for the workflow here.
    }'
    ```
  </Accordion>

  <Accordion title="Postman">
    1. In the method drop-down list, select **PUT**.

    2. In the address box, enter the following URL:

       ```text  theme={null}
       {{UNSTRUCTURED_API_URL}}/workflows/<workflow-id>
       ```

    3. On the **Headers** tab, enter the following headers:

       * **Key**: `unstructured-api-key`, **Value**: `{{UNSTRUCTURED_API_KEY}}`
       * **Key**: `accept`, **Value**: `application/json`

    4. On the **Body** tab, select **raw** and **JSON**, and specify the settings for the workflow.

    5. Click **Send**.
  </Accordion>
</AccordionGroup>

## Custom workflow DAG nodes

If `WorkflowType` is set to `CUSTOM` (for the Python SDK), or if `workflow_type` is set to `custom` (for `curl` or Postman), you must also specify the settings for the workflow's
directed acyclic graph (DAG) nodes. These nodes' settings are specified in the `workflow_nodes` array.

* A **Source** node is automatically created when you specify the `source_id` value outside of the\
  `workflow_nodes` array.

* A **Destination** node is automatically created when you specify the `destination_id` value outside of the
  `workflow_nodes` array.

* You can specify [Partitioner](#partitioner-node), [Enrichment](#enrichment-node),
  [Chunker](#chunker-node), and [Embedder](#embedder-node) nodes.

  <Warning>
    Unstructured can potentially generate image summary descriptions, table summary descriptions, table-to-HTML output, and generative OCR optimizations, only for workflows that are configured as follows:

    * With a **Partitioner** node set to use the **Auto** or **High Res** partitioning strategy, and an image summary description node, table summary description node, table-to-HTML output node, or generative OCR optimization node is added.
    * With a **Partitioner** node set to use the **VLM** partitioning strategy. No image summary description node, table summary description node, table-to-HTML output node, or generative OCR optimization node is needed (or allowed).

    Even with these configurations, Unstructured actually generates image summary descriptions, table summary descriptions, and table-to-HTML output only for files that contain images or tables and are also eligible
    for processing with the following partitioning strategies:

    * **High Res**, when the workflow's **Partitioner** node is set to use **Auto** or **High Res**.
    * **VLM** or **High Res**, when the workflow's **Partitioner** node is set to use **VLM**.

    Unstructured never generates image summary descriptions, table summary descriptions, or table-to-HTML output for workflows that are configured as follows:

    * With a **Partitioner** node set to use the **Fast** partitioning strategy.
    * With a **Partitioner** node set to use the **Auto**, **High Res**, or **VLM** partitioning strategy, for all files that Unstructured encounters that do not contain images or tables.

    Unstructured never produces generative OCR optimizations for workflows with a **Partitioner** node set to use the **Fast** partitioning strategy.
  </Warning>

* The order of the nodes in the `workflow_nodes` array will be the same order that these nodes appear in the DAG,
  with the first node in the array added directly after the **Source** node. The **Destination** node
  follows the last node in the array.

* Be sure to specify nodes in the allowed order. The following DAG placements are all allowed:

```mermaid  theme={null}
flowchart LR
    Partitioner
```

```mermaid  theme={null}
flowchart LR
    Partitioner-->Chunker
```

```mermaid  theme={null}
flowchart LR
    Partitioner-->Chunker-->Embedder
```

```mermaid  theme={null}
flowchart LR
    Partitioner-->Enrichment-->Chunker
```

```mermaid  theme={null}
flowchart LR
    Partitioner-->Enrichment-->Chunker-->Embedder
```

<Note>
  For workflows that use **Chunker** and **Enrichment** nodes together, the **Chunker** node should be placed after all **Enrichment** nodes. Placing the
  **Chunker** node before any **Enrichment** nodes could cause incomplete or no enrichment results to be generated.
</Note>

### Partitioner node

A **Partitioner** node has a `type` of `partition`.

[Learn about the available partitioning strategies](/ui/partitioning).

#### Auto strategy

<AccordionGroup>
  <Accordion title="Python SDK">
    ```python  theme={null}
    auto_partitioner_workflow_node = WorkflowNode(
        name="Partitioner",
        subtype="vlm",
        type="partition",
        settings={
            "provider": "<provider>",
            "provider_api_key": None,
            "model": "<model>",
            "output_format": "<output-format>",
            "format_html": <True|False>,
            "unique_element_ids": <True|False>,
            "is_dynamic": True,
            "allow_fast": True
       }
    )
    ```
  </Accordion>

  <Accordion title="curl, Postman">
    ```json  theme={null}
    {
        "name": "Partitioner",
        "type": "vlm",
        "subtype": "partition",
        "settings": {
            "provider": "<provider>",
            "provider_api_key": null,
            "model": "<model>",
            "output_format": "<output-format>",
            "format_html": <true|false>,
            "unique_element_ids": <true|false>,
            "is_dynamic": true,
            "allow_fast": true
        }
    }
    ```
  </Accordion>
</AccordionGroup>

Fields for `settings` include:

* `provider`: *Optional*. If the Auto partitioning strategy needs to use the VLM partitioning strategy, then use the specified VLM provider. Allowed values include `anthropic`, `auto`, `bedrock`, `openai`, and `vertexai`. The default value is `anthropic`.

* `provider_api_key`: *Optional*. If specified, use a non-default API key for calls to the specified VLM provider as needed. The default is none, which means to rely on using Unstructured's internal default API key for the VLM provider.

* `model`: *Optional*. If the Auto partitioning strategy needs to use the VLM partitioning strategy, then use the specified VLM. The default value is `claude-sonnet-4-20250514`.

  <Warning>
    The following models are no longer available as of the following dates:

    * For `anthropic`, `claude-3-5-sonnet-20241022`: October 22, 2025
    * For `bedrock`, `us.anthropic.claude-3-5-sonnet-20241022-v2:0`: October 22, 2025

    Unstructured recommends the following actions:

    * For new workflows, do not use any of these models.
    * For any workflow that uses any of these models, update that workflow as soon as possible to use a different model.

    Workflows that attempt to use any of these models on or after its associated date will return errors.
  </Warning>

  <Note>
    The following list applies only to Unstructured **Let's Go** and **Pay-As-You-Go** accounts.

    For Unstructured **Business** accounts, to get your current list of available models, contact your
    Unstructured account administrator or Unstructured sales representative, or email Unstructured Support at
    [support@unstructured.io](mailto:support@unstructured.io).
  </Note>

  * For `anthropic`, available values for `model` are:

    * `claude-3-7-sonnet-20250219`
    * `claude-sonnet-4-20250514`
    * `claude-sonnet-4-5-20250929`

  * For `bedrock`, available values for `model` are:

    * `us.amazon.nova-lite-v1:0`
    * `us.amazon.nova-pro-v1:0`
    * `us.anthropic.claude-3-opus-20240229-v1:0`
    * `us.anthropic.claude-3-haiku-20240307-v1:0`
    * `us.anthropic.claude-3-sonnet-20240229-v1:0`
    * `us.anthropic.claude-3-7-sonnet-20250219-v1:0`
    * `us.anthropic.claude-sonnet-4-20250514-v1:0`

  * For `openai`, available values for `model` are:

    * `gpt-4o`
    * `gpt-5-mini-2025-08-07`

  * For `vertexai`, available values for `model` are:

    * `gemini-2.0-flash-001`

* `output_format`: *Output*. The format of the response. Allowed values include `text/html` and `application/json`. The default is `text/html`.

* `format_html`: *Optional*. If the Auto partitioning strategy needs to use the VLM partitioning strategy, true (the default) to apply Beautiful Soup's `prettify` method to the HTML that is generated by the VLM partitioner, which for example adds indentation for better readability.

* `unique_element_ids`: *Optional*. True (the default) to assign UUIDs to element IDs, which guarantees their uniqueness. This is useful for example when using them as primary keys in a database. False to assign a SHA-256 of the element's text as its element ID.

* `is_dynamic`: *Required*. True to enable dynamic routing of pages to Fast, High Res, or VLM as needed for better overall performance and cost savings.

* `allow_fast`: *Required*. True to allow routing of pages to Fast as needed for better overall performance and cost savings.

#### VLM strategy

<AccordionGroup>
  <Accordion title="Python SDK">
    ```python  theme={null}
    vlm_partitioner_workflow_node = WorkflowNode(
        name="Partitioner",
        subtype="vlm",
        type="partition",
        settings={
            "provider": "<provider>",
            "provider_api_key": None,
            "model": "<model>",
            "output_format": "<output-format>",
            "format_html": <True|False>,
            "unique_element_ids": <True|False>,
            "is_dynamic": False,
            "allow_fast": True|False
        }
    )
    ```
  </Accordion>

  <Accordion title="curl, Postman">
    ```json  theme={null}
    {
        "name": "Partitioner",
        "type": "partition",
        "subtype": "vlm",
        "settings": {
            "provider": "<provider>",
            "provider_api_key": null,
            "model": "<model>",
            "output_format": "<output-format>",
            "format_html": <true|false>,
            "unique_element_ids": <true|false>,
            "is_dynamic": false,
            "allow_fast": <true|false>
        }
    }
    ```
  </Accordion>
</AccordionGroup>

Fields for `settings` include:

* `provider`: *Optional*. Use the specified VLM provider. Allowed values include `anthropic`, `auto`, `bedrock`, `openai`, and `vertexai`. The default value is `anthropic`.

* `provider_api_key`: *Optional*. If specified, use a non-default API key for calls to the specified VLM provider as needed. The default is none, which means to rely on using Unstructured's internal default API key for the VLM provider.

* `model`: *Optional*. If the Auto partitioning strategy needs to use the VLM partitioning strategy, then use the specified VLM. The default value is `claude-sonnet-4-20250514`.

  <Warning>
    The following models are no longer available as of the following dates:

    * For `anthropic`, `claude-3-5-sonnet-20241022`: October 22, 2025
    * For `bedrock`, `us.anthropic.claude-3-5-sonnet-20241022-v2:0`: October 22, 2025

    Unstructured recommends the following actions:

    * For new workflows, do not use any of these models.
    * For any workflow that uses any of these models, update that workflow as soon as possible to use a different model.

    Workflows that attempt to use any of these models on or after its associated date will return errors.
  </Warning>

  <Note>
    The following list applies only to Unstructured **Let's Go** and **Pay-As-You-Go** accounts.

    For Unstructured **Business** accounts, to get your current list of available models, contact your
    Unstructured account administrator or Unstructured sales representative, or email Unstructured Support at
    [support@unstructured.io](mailto:support@unstructured.io).
  </Note>

  * For `anthropic`, available values for `model` are:

    * `claude-opus-4-5-20251101`
    * `claude-3-7-sonnet-20250219`
    * `claude-sonnet-4-20250514`
    * `claude-sonnet-4-5-20250929`

  * For `bedrock`, available values for `model` are:

    * `us.amazon.nova-lite-v1:0`
    * `us.amazon.nova-pro-v1:0`
    * `us.anthropic.claude-3-opus-20240229-v1:0`
    * `us.anthropic.claude-3-haiku-20240307-v1:0`
    * `us.anthropic.claude-3-sonnet-20240229-v1:0`
    * `us.anthropic.claude-3-7-sonnet-20250219-v1:0`
    * `us.anthropic.claude-sonnet-4-20250514-v1:0`

  * For `openai`, available values for `model` are:

    * `gpt-4o`
    * `gpt-5-mini-2025-08-07`

  * For `vertexai`, available values for `model` are:

    * `gemini-2.0-flash-001`

* `output_format`: *Output*. The format of the response. Allowed values include `text/html` and `application/json`. The default is `text/html`.

* `format_html`: *Optional*. True (the default) to apply Beautiful Soup's `prettify` method to the HTML that is generated by the VLM partitioner, which for example adds indentation for better readability.

* `unique_element_ids`: *Optional*. True (the default) to assign UUIDs to element IDs, which guarantees their uniqueness. This is useful for example when using them as primary keys in a database. False to assign a SHA-256 of the element's text as its element ID.

* `is_dynamic`: *Required*. False to use the VLM strategy.

* `allow_fast`: *Optional*. True (the default) to allow routing of pages to Fast as needed for better overall performance and cost savings.

#### High Res strategy

<AccordionGroup>
  <Accordion title="Python SDK">
    ```python  theme={null}
    high_res_paritioner_workflow_node = WorkflowNode(
        name="Partitioner",
        subtype="unstructured_api",
        type="partition",
        settings={
            "strategy": "hi_res",
            "include_page_breaks": <True|False>,
            "pdf_infer_table_structure": <True|False>,
            "exclude_elements": [
                "<element-name>",
                "<element-name>"
            ],
            "xml_keep_tags": <True|False>,
            "encoding": "<encoding>",
            "ocr_languages": [
                "<language>",
                "<language>"
            ],
            "extract_image_block_types": [
                "image",
                "table"
            ],
            "infer_table_structure": <True|False>,
            "coordinates": <True|False>
        }
    )
    ```
  </Accordion>

  <Accordion title="curl, Postman">
    ```json  theme={null}
    {
        "name": "Partitioner",
        "type": "partition",
        "subtype": "unstructured_api",
        "settings": {
            "strategy": "hi_res",
            "include_page_breaks": <true|false>,
            "pdf_infer_table_structure": <true|false>,
            "exclude_elements": [
                "<element-name>",
                "<element-name>"
            ],
            "xml_keep_tags": <true|false>,
            "encoding": "<encoding>",
            "ocr_languages": [
                "<language>",
                "<language>"
            ],
            "extract_image_block_types": [
                "image",
                "table"
            ],
            "infer_table_structure": <true|false>,
            "coordinates": <true|false>
        }
    }
    ```
  </Accordion>
</AccordionGroup>

* `strategy`: *Required*. The partitioning strategy to use. This field must be set to `hi_res`.

* `include_page_breaks`: *Optional*. True to include page breaks in the output if supported by the file type. The default is false.

* `pdf_infer_table_structure`: *Optional*. True for any `Table` elements extracted from a PDF to include an additional metadata field, `text_as_html`, where the value (string) is a just a transformation of the data into an HTML table. The default is false.

* `exclude_elements`: *Optional*. A list of any Unstructured element types to exclude from the output. The default is none. Available values include:

  * `Address`
  * `EmailAddress`
  * `FigureCaption`
  * `Footer`
  * `Formula`
  * `Header`
  * `Image`
  * `ListItem`
  * `NarrativeText`
  * `PageBreak`
  * `Table`
  * `Title`
  * `UncategorizedText`

* `xml_keep_tags`: *Optional*. True to retain any XML tags in the output. False (the default) to just extract the text from any XML tags instead.

* `encoding`: *Optional*. The encoding method used to decode the text input. The default is `utf-8`.

* `ocr_languages`: *Optional*. A list of languages present in the input, for use in partitioning, OCR, or both. Multiple languages indicate that the text could be in any of the specified languages. The default is `[ 'eng' ]`. [See the language codes list](https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/partition/common/lang.py).

* `extract_image_block_types`: *Optional*. A list of the Unstructured element types for use in extracting image blocks as Base64 encoded data stored in `metadata` fields. Available values include:

  * `Abstract`
  * `BulletedText`
  * `Caption`
  * `CodeSnippet`
  * `CompositeElement`
  * `Figure`
  * `FigureCaption`
  * `Form`
  * `FormKeysValues`
  * `Formula`
  * `Header`
  * `Image`
  * `List`
  * `List-item`
  * `ListItem`
  * `NarrativeText`
  * `Paragraph`
  * `Picture`
  * `Table`
  * `Text`
  * `Threading`
  * `Title`
  * `UncategorizedText`

* `infer_table_structure`: *Optional*. True to have any table elements extracted from a PDF to include an additional `metadata` field named `text_as_html`, containing an HTML `<table>` transformation. The default is false.

* `coordinates`: *Optional*. True to have each element extracted from a PDF to include information about that element's position relative to its corresponding page. The default is false.

#### Fast strategy

<AccordionGroup>
  <Accordion title="Python SDK">
    ```python  theme={null}
    fast_partitioner_workflow_node = WorkflowNode(
        name="Partitioner",
        subtype="unstructured_api",
        type="partition",
        settings={
            "strategy": "fast",
            "include_page_breaks": <True|False>,
            "pdf_infer_table_structure": <True|False>,
            "exclude_elements": [
                "<element-name>",
                "<element-name>"
            ],
            "xml_keep_tags": <True|False>,
            "encoding": "<encoding>",
            "ocr_languages": [
                "<language>",
                "<language>"
            ],
            "extract_image_block_types": [
                "image",
                "table"
            ],
            "infer_table_structure": <True|False>,
            "coordinates": <True|False>
        }
    )
    ```
  </Accordion>

  <Accordion title="curl, Postman">
    ```json  theme={null}
    {
        "name": "Partitioner",
        "type": "partition",
        "subtype": "unstructured_api",
        "settings": {
            "strategy": "fast",
            "include_page_breaks": <true|false>,
            "pdf_infer_table_structure": <true|false>,
            "exclude_elements": [
                "<element-name>",
                "<element-name>"
            ],
            "xml_keep_tags": <true|false>,
            "encoding": "<encoding>",
            "ocr_languages": [
                "<language-code>",
                "<language-code>"
            ],
            "extract_image_block_types": [
                "image",
                "table"
            ],
            "infer_table_structure": <true|false>,
            "coordinates": <true|false>
        }
    }
    ```
  </Accordion>
</AccordionGroup>

Fields for `settings` include:

* `strategy`: *Required*. The partitioning strategy to use. This field must be set to `fast`.

* \`include\_page\_breaks: *Optional*. True to include page breaks in the output if supported by the file type. The default is false.

* `pdf_infer_table_structure`: *Optional*. Although this field is listed, it applies only to the `hi_res` strategy and will not work if set to true. The default is false.

* `exclude_elements`: *Optional*. A list of any Unstructured element types to exclude from the output. The default is none. Available values include:

  * `Address`
  * `EmailAddress`
  * `FigureCaption`
  * `Footer`
  * `Formula`
  * `Header`
  * `Image`
  * `ListItem`
  * `NarrativeText`
  * `PageBreak`
  * `Table`
  * `Title`
  * `UncategorizedText`

* `xml_keep_tags`: *Optional*. True to retain any XML tags in the output. False (the default) to just extract the text from any XML tags instead.

* `encoding`: *Optional*. The encoding method used to decode the text input. The default is `utf-8`.

* `ocr_languages`: *Optional*. A list of languages present in the input, for use in partitioning, OCR, or both. Multiple languages indicate that the text could be in any of the specified languages. The default is `[ 'eng' ]`. [See the language codes list](https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/partition/common/lang.py).

* `extract_image_block_types`: *Optional*. A list of the Unstructured element types for use in extracting image blocks as Base64 encoded data stored in `metadata` fields. Available values include:

  * `Abstract`
  * `BulletedText`
  * `Caption`
  * `CodeSnippet`
  * `CompositeElement`
  * `Figure`
  * `FigureCaption`
  * `Form`
  * `FormKeysValues`
  * `Formula`
  * `Header`
  * `Image`
  * `List`
  * `List-item`
  * `ListItem`
  * `NarrativeText`
  * `Paragraph`
  * `Picture`
  * `Table`
  * `Text`
  * `Threading`
  * `Title`
  * `UncategorizedText`

* `infer_table_structure`: *Optional*. True to have any table elements extracted from a PDF to include an additional `metadata` field named `text_as_html`, containing an HTML `<table>` transformation. The default is false.

* `coordinates`: *Optional*. True to have each element extracted from a PDF to include information about that element's position relative to its corresponding page. The default is false.

### Enrichment node

An **Enrichment** node has a `type` of `prompter`.

[Learn about the available enrichments](/ui/enriching/overview).

<Warning>
  Unstructured can potentially generate image summary descriptions, table summary descriptions, table-to-HTML output, and generative OCR optimizations, only for workflows that are configured as follows:

  * With a **Partitioner** node set to use the **Auto** or **High Res** partitioning strategy, and an image summary description node, table summary description node, table-to-HTML output node, or generative OCR optimization node is added.
  * With a **Partitioner** node set to use the **VLM** partitioning strategy. No image summary description node, table summary description node, table-to-HTML output node, or generative OCR optimization node is needed (or allowed).

  Even with these configurations, Unstructured actually generates image summary descriptions, table summary descriptions, and table-to-HTML output only for files that contain images or tables and are also eligible
  for processing with the following partitioning strategies:

  * **High Res**, when the workflow's **Partitioner** node is set to use **Auto** or **High Res**.
  * **VLM** or **High Res**, when the workflow's **Partitioner** node is set to use **VLM**.

  Unstructured never generates image summary descriptions, table summary descriptions, or table-to-HTML output for workflows that are configured as follows:

  * With a **Partitioner** node set to use the **Fast** partitioning strategy.
  * With a **Partitioner** node set to use the **Auto**, **High Res**, or **VLM** partitioning strategy, for all files that Unstructured encounters that do not contain images or tables.

  Unstructured never produces generative OCR optimizations for workflows with a **Partitioner** node set to use the **Fast** partitioning strategy.
</Warning>

#### Image Description task

<Warning>
  Unstructured can potentially generate image summary descriptions only for workflows that are configured as follows:

  * With a **Partitioner** node set to use the **Auto** or **High Res** partitioning strategy, and an image summary description node is added.
  * With a **Partitioner** node set to use the **VLM** partitioning strategy. No image summary description node is needed (or allowed).

  Even with these configurations, Unstructured actually generates image summary descriptions only for files that contain images and are also eligible
  for processing with the following partitioning strategies:

  * **High Res**, when the workflow's **Partitioner** node is set to use **Auto** or **High Res**.
  * **VLM** or **High Res**, when the workflow's **Partitioner** node is set to use **VLM**.

  Unstructured never generates image summary descriptions for workflows that are configured as follows:

  * With a **Partitioner** node set to use the **Fast** partitioning strategy.
  * With a **Partitioner** node set to use the **Auto**, **High Res**, or **VLM** partitioning strategy, for all files that Unstructured encounters that do not contain images.
</Warning>

<AccordionGroup>
  <Accordion title="Python SDK">
    ```python  theme={null}
    image_description_enrichment_workflow_node = WorkflowNode(
        name="Enrichment",
        subtype="<subtype>",
        type="prompter",
        settings={
            "provider_type": "<provider-type>",
            "model": "<model>"
        }
    )
    ```
  </Accordion>

  <Accordion title="curl, Postman">
    ```json  theme={null}
    {
        "name": "Enrichment",
        "type": "prompter",
        "subtype": "<subtype>",
        "settings": {
            "provider_type": "<provider-type>",
            "model": "<model>"
        }
    }
    ```
  </Accordion>
</AccordionGroup>

Allowed values for `<subtype>`, `<provider-type>`, and `<model>` include, respectively:

* `anthropic_image_description`, `anthropic`, and:

  * `claude-opus-4-5-20251101`
  * `claude-3-7-sonnet-20250219`
  * `claude-sonnet-4-20250514`
  * `claude-sonnet-4-5-20250929`

* `bedrock_image_description`, `bedrock`, and:

  * `us.amazon.nova-lite-v1:0`
  * `us.amazon.nova-pro-v1:0`
  * `us.anthropic.claude-3-haiku-20240307-v1:0`
  * `us.anthropic.claude-3-opus-20240229-v1:0`
  * `us.anthropic.claude-3-sonnet-20240229-v1:0`
  * `us.anthropic.claude-3-7-sonnet-20250219-v1:0`
  * `us.anthropic.claude-sonnet-4-20250514-v1:0`
  * `us.anthropic.claude-sonnet-4-5-20250929-v1:0`

* `openai_image_description`, `openai`, and:

  * `gpt-4o`
  * `gpt-4o-mini`
  * `gpt-5-mini`

* `vertexai_image_description`, `vertexai`, and:

  * `gemini-2.0-flash-001`

<Note>
  The preceding list applies only to Unstructured **Let's Go** and **Pay-As-You-Go** accounts.

  For Unstructured **Business** accounts, to get your current list of available `<subtype>` values, contact your
  Unstructured account administrator or Unstructured sales representative, or email Unstructured Support at
  [support@unstructured.io](mailto:support@unstructured.io).
</Note>

#### Table Description task

<Warning>
  Unstructured can potentially generate table summary descriptions only for workflows that are configured as follows:

  * With a **Partitioner** node set to use the **Auto** or **High Res** partitioning strategy, and a table summary description node is added.
  * With a **Partitioner** node set to use the **VLM** partitioning strategy. No table summary description node is needed (or allowed).

  Even with these configurations, Unstructured actually generates table summary descriptions only for files that contain tables and are also eligible
  for processing with the following partitioning strategies:

  * **High Res**, when the workflow's **Partitioner** node is set to use **Auto** or **High Res**.
  * **VLM** or **High Res**, when the workflow's **Partitioner** node is set to use **VLM**.

  Unstructured never generates table summary descriptions for workflows that are configured as follows:

  * With a **Partitioner** node set to use the **Fast** partitioning strategy.
  * With a **Partitioner** node set to use the **Auto**, **High Res**, or **VLM** partitioning strategy, for all files that Unstructured encounters that do not contain tables.
</Warning>

<AccordionGroup>
  <Accordion title="Python SDK">
    ```python  theme={null}
    table_description_enrichment_workflow_node = WorkflowNode(
        name="Enrichment",
        subtype="<subtype>",
        type="prompter",
        settings={
            "provider_type": "<provider-type>",
            "model": "<model>"
        }
    )
    ```
  </Accordion>

  <Accordion title="curl, Postman">
    ```json  theme={null}
    {
        "name": "Enrichment",
        "type": "prompter",
        "subtype": "<subtype>",
        "settings": {
            "provider_type": "<provider-type>",
            "model": "<model>"
        }
    }
    ```
  </Accordion>
</AccordionGroup>

Allowed values for `<subtype>`, `<provider-type>`, and `<model>` include, respectively:

* `anthropic_table_description`, `anthropic`, and:

  * `claude-opus-4-5-20251101`
  * `claude-3-7-sonnet-20250219`
  * `claude-sonnet-4-20250514`
  * `claude-sonnet-4-5-20250929`

* `bedrock_table_description`, `bedrock`, and:

  * `us.amazon.nova-lite-v1:0`
  * `us.amazon.nova-pro-v1:0`
  * `us.anthropic.claude-3-haiku-20240307-v1:0`
  * `us.anthropic.claude-3-opus-20240229-v1:0`
  * `us.anthropic.claude-3-sonnet-20240229-v1:0`
  * `us.anthropic.claude-3-7-sonnet-20250219-v1:0`
  * `us.anthropic.claude-sonnet-4-20250514-v1:0`
  * `us.anthropic.claude-sonnet-4-5-20250929-v1:0`

* `openai_table_description`, `openai`, and:

  * `gpt-4o`
  * `gpt-4o-mini`
  * `gpt-5-mini`

* `vertexai_table_description`, `vertexai`, and:

  * `gemini-2.0-flash-001`

<Note>
  The preceding list applies only to Unstructured **Let's Go** and **Pay-As-You-Go** accounts.

  For Unstructured **Business** accounts, to get your current list of available `<subtype>` values, contact your
  Unstructured account administrator or Unstructured sales representative, or email Unstructured Support at
  [support@unstructured.io](mailto:support@unstructured.io).
</Note>

#### Table to HTML task

<Warning>
  Unstructured can potentially generate table-to-HTML output only for workflows that are configured as follows:

  * With a **Partitioner** node set to use the **Auto** or **High Res** partitioning strategy, and a table-to-HTML output node is added.
  * With a **Partitioner** node set to use the **VLM** partitioning strategy. No table-to-HTML output node is needed (or allowed).

  Even with these configurations, Unstructured actually generates table-to-HTML output only for files that contain tables and are also eligible
  for processing with the following partitioning strategies:

  * **High Res**, when the workflow's **Partitioner** node is set to use **Auto** or **High Res**.
  * **VLM** or **High Res**, when the workflow's **Partitioner** node is set to use **VLM**.

  Unstructured never generates table-to-HTML output for workflows that are configured as follows:

  * With a **Partitioner** node set to use the **Fast** partitioning strategy.
  * With a **Partitioner** node set to use the **Auto**, **High Res**, or **VLM** partitioning strategy, for all files that Unstructured encounters that do not contain tables.
</Warning>

<AccordionGroup>
  <Accordion title="Python SDK">
    ```python  theme={null}
    table_to_html_enrichment_workflow_node = WorkflowNode(
        name="Enrichment",
        subtype="<subtype>",
        type="prompter",
        settings={
            "provider_type": "<provider-type>",
            "model": "<model>"
        }
    )
    ```
  </Accordion>

  <Accordion title="curl, Postman">
    ```json  theme={null}
    {
        "name": "Enrichment",
        "type": "prompter",
        "subtype": "<subtype>",
        "settings": {
            "provider_type": "<provider-type>",
            "model": "<model>"
        }
    }
    ```
  </Accordion>
</AccordionGroup>

Allowed values for `<subtype>`, `<provider-type>`, and `<model>` include, respectively:

* `twopass_table2html` for agentic AI table-to-HTML output. Do not specify a `<provider-type>` or `<model>` for this subtype.

* `anthropic_table2html` for VLM table-to-HTML output by using Anthropic, `anthropic`, and:

  * `claude-opus-4-5-20251101`
  * `claude-3-7-sonnet-20250219`
  * `claude-sonnet-4-20250514`
  * `claude-sonnet-4-5-20250929`

* `openai_table2html` for VLM table-to-HTML output by using OpenAI, `openai`, and:

  * `gpt-4o`
  * `gpt-4o-mini`
  * `gpt-5-mini`

<Note>
  The preceding `subtype` value applies only to Unstructured **Let's Go** and **Pay-As-You-Go** accounts.

  For Unstructured **Business** accounts, to get your current list of available `subtype` values, contact your
  Unstructured account administrator or Unstructured sales representative, or email Unstructured Support at
  [support@unstructured.io](mailto:support@unstructured.io).
</Note>

#### Named Entity Recognition (NER) task

<AccordionGroup>
  <Accordion title="Python SDK">
    ```python  theme={null}
    ner_enrichment_workflow_node = WorkflowNode(
        name="Enrichment",
        subtype="<subtype>",
        type="prompter",
        settings={
            "provider_type": "<provider-type>",
            "model": "<model>",
            "prompt_interface_overrides": {
                "prompt": {
                    "user": "<user-prompt-override>"
                } 
            }             
        }
    )
    ```
  </Accordion>

  <Accordion title="curl, Postman">
    ```json  theme={null}
    {
        "name": "Enrichment",
        "type": "prompter",
        "subtype": "<subtype>",
        "settings": {
            "provider_type": "<provider-type>",
            "model": "<model>",
            "prompt_interface_overrides": {
                "prompt": {
                    "user": "<user-prompt-override>"
                } 
            }
        }
    }
    ```
  </Accordion>
</AccordionGroup>

Fields for settings include:

* Allowed values for `<subtype>`, `<provider-type>`, and `<model>` include, respectively:

  * `anthropic_ner`, `anthropic`, and:

    * `claude-opus-4-5-20251101`
    * `claude-3-7-sonnet-20250219`
    * `claude-sonnet-4-20250514`
    * `claude-sonnet-4-5-20250929`

  * `openai_ner`, `openai`, and:

    * `gpt-4o`
    * `gpt-4o-mini`
    * `gpt-5-mini`

* `prompt_interface_overrides.prompt.user`: *Optional*. Any alternative prompt to use with the underlying NER model. The default is none, which means to rely on using Unstructured's internal default prompt when calling the NER model.
  The internal default prompt is as follows, which you can override by providing an alternative prompt:

  ```text  theme={null}
  Extract named entities and their relationships from the following text.

  Provide the entities, their corresponding types and relationships as a structured JSON response.

  Entity types:
  - PERSON
  - ORGANIZATION
  - LOCATION
  - DATE
  - TIME
  - EVENT
  - MONEY
  - PERCENT
  - FACILITY
  - PRODUCT
  - ROLE
  - DOCUMENT
  - DATASET

  Relationship types:
  - PERSON - ORGANIZATION: works_for, affiliated_with, founded
  - PERSON - LOCATION: born_in, lives_in, traveled_to
  - ORGANIZATION - LOCATION: based_in, has_office_in
  - Entity - DATE: occurred_on, founded_on, died_on, published_in
  - PERSON - PERSON: married_to, parent_of, colleague_of
  - PRODUCT - ORGANIZATION: developed_by, owned_by
  - EVENT - LOCATION: held_in, occurred_in
  - Entity - ROLE: has_title, acts_as, has_role
  - DATASET - PERSON: mentions
  - DATASET - DOCUMENT: located_in
  - PERSON - DATASET: published
  - DOCUMENT - DOCUMENT: referenced_in, contains
  - DOCUMENT - DATE: dated
  - PERSON - DOCUMENT: published

  [START OF TEXT]
  {{text}}
  [END OF TEXT]


  Response format json schema: {
      "items": [
          { "entity": "Entity name", "type": "Entity type" },
          { "entity": "Entity name", "type": "Entity type" }
      ],
      "relationships": [
          {"from": "Entity name", "relationship": "Relationship type", "to": "Entity name"},
          {"from": "Entity name", "relationship": "Relationship type", "to": "Entity name"}
      ]
  }
  ```

  If you provide an alternative prompt, you must provide the entire alternative prompt in the preceding format. For best results, Unstructured strongly recommends that you limit your changes only to certain portions of the internal default prompt, specifically:

  * Adding, renaming, or deleting items in the list of predefined types (such as `PERSON`, `ORGANIZATION`, `LOCATION`, and so on).

  * Adding, renaming, or deleting items in the list of predefined relationships (such as `works_for`, `based_in`, `has_role`, and so on).

  * As needed, adding any clarifying instructions only between these two lines:

    ```text  theme={null}
    ...
    Provide the entities and their corresponding types as a structured JSON response.

    (Add any clarifying instructions here only.)

    [START OF TEXT]
    ...
    ```

  * Changing any other portions of the internal default prompt could produce unexpected results.

Allowed values for `<subtype>` include:

* `openai_ner`
* `anthropic_ner`

<Note>
  The preceding list applies only to Unstructured **Let's Go** and **Pay-As-You-Go** accounts.

  For Unstructured **Business** accounts, to get your current list of available `<subtype>` values, contact your
  Unstructured account administrator or Unstructured sales representative, or email Unstructured Support at
  [support@unstructured.io](mailto:support@unstructured.io).
</Note>

#### Generative OCR task

<Warning>
  Unstructured can produce generative OCR optimizations for workflows that are configured as follows:

  * With a **Partitioner** node set to use the **Auto** or **High Res** partitioning strategy, and a generative OCR optimizations node is added.
  * With a **Partitioner** node set to use the **VLM** partitioning strategy. No generative OCR optimization node is needed (or allowed).

  Unstructured never produces generative OCR optimizations for workflows with a **Partitioner** node set to use the **Fast** partitioning strategy.
</Warning>

<Warning>
  Generative OCR does not process any text blocks by default. You must also explicitly specify which document element
  types containing text that you want generative OCR to process.

  To do this, in a workflow's [High Res](#high-res-strategy) partitioner node,
  add each [document element type](/ui/document-elements#element-type) that you want generative OCR to process to the
  `extract_image_block_types` list within the partitioner node's `settings` field definition.

  Generative OCR does not process the text of any `Image` or `Table` elements if they have already been processed by
  [image description](#image-description-task) or [table description](#table-description-task) enrichments, respectively.
</Warning>

<AccordionGroup>
  <Accordion title="Python SDK">
    ```python  theme={null}
    generative_ocr_enrichment_workflow_node = WorkflowNode(
        name="Enrichment",
        subtype="<subtype>",
        type="prompter",
        settings={
            "provider_type": "<provider-type>",
            "model": "<model>"
        }
    )
    ```
  </Accordion>

  <Accordion title="curl, Postman">
    ```json  theme={null}
    {
        "name": "Enrichment",
        "type": "prompter",
        "subtype": "<subtype>",
        "settings": {
            "provider_type": "<provider-type>",
            "model": "<model>"
        }
    }
    ```
  </Accordion>
</AccordionGroup>

Allowed values for `<subtype>`, `<provider-type>`, and `<model>` include, respectively:

* `anthropic_ocr`, `anthropic`, and:

  * `claude-opus-4-5-20251101`
  * `claude-3-7-sonnet-20250219`
  * `claude-sonnet-4-20250514`
  * `claude-sonnet-4-5-20250929`

* `bedrock_ocr`, `bedrock`, and:

  * `us.amazon.nova-lite-v1:0`
  * `us.amazon.nova-pro-v1:0`
  * `us.anthropic.claude-3-haiku-20240307-v1:0`
  * `us.anthropic.claude-3-opus-20240229-v1:0`
  * `us.anthropic.claude-3-sonnet-20240229-v1:0`
  * `us.anthropic.claude-3-7-sonnet-20250219-v1:0`
  * `us.anthropic.claude-sonnet-4-20250514-v1:0`
  * `us.anthropic.claude-sonnet-4-5-20250929-v1:0`

* `openai_ocr`, `openai`, and:

  * `gpt-4o`
  * `gpt-4o-mini`
  * `gpt-5-mini`

<Note>
  The preceding list applies only to Unstructured **Let's Go** and **Pay-As-You-Go** accounts.

  For Unstructured **Business** accounts, to get your current list of available `<subtype>` values, contact your
  Unstructured account administrator or Unstructured sales representative, or email Unstructured Support at
  [support@unstructured.io](mailto:support@unstructured.io).
</Note>

### Chunker node

A **Chunker** node has a `type` of `chunk`.

[Learn about the available chunking strategies](/ui/chunking).

#### Chunk by Character strategy

<AccordionGroup>
  <Accordion title="Python SDK">
    ```python  theme={null}
    chunk_by_character_chunker_workflow_node = WorkflowNode(
        name="Chunker",
        subtype="chunk_by_character",
        type="chunk",
        settings={
            "unstructured_api_url": None,
            "unstructured_api_key": None,
            "include_orig_elements": <True|False>,
            "new_after_n_chars": <new-after-n-chars>,
            "max_characters": <max-characters>,
            "overlap": <overlap>,
            "overlap_all": <True|False>,
            "contextual_chunking_strategy": "<contextual-chunking-strategy>"
        }
    )
    ```
  </Accordion>

  <Accordion title="curl, Postman">
    ```json  theme={null}
    {
        "name": "Chunker",
        "type": "chunk",
        "subtype": "chunk_by_character",
        "settings": {
            "unstructured_api_url": null,
            "unstructured_api_key": null,
            "include_orig_elements": <true|false>,
            "new_after_n_chars": <new-after-n-chars>,
            "max_characters": <max-characters>,
            "overlap": <overlap>,
            "overlap_all": <true|false>,
            "contextual_chunking_strategy": "<contextual-chunking-strategy>"
        }
    }
    ```
  </Accordion>
</AccordionGroup>

Fields for `settings` include:

* `unstructured_api_url`: *Optional*. If specified, use a non-default API URL for calls to the specified chunker as needed. The default is none, which means to rely on using Unstructured's internal default API URL for the chunker.
* `unstructured_api_key`: *Optional*. If specified, use a non-default API key for calls to the specified chunker as needed. The default is none, which means to rely on using Unstructured's internal default API key for the chunker.
* `include_orig_elements`: *Optional*. True to have the elements that are used to form a chunk appear in `.metadata.orig_elements` for that chunk. The default is false.
* `new_after_n_chars`: *Optional*. Closes new sections after reaching a length of this many characters. This is an approximate limit. The default is none.
* `max_characters`: *Optional*. The absolute maximum number of characters in a chunk. The default is none.
* `overlap`: *Optional*. Applies a prefix of this many trailing characters from the prior text-split chunk to second and later chunks formed from oversized elements by text-splitting. The default is none.
* `overlap_all`: *Optional*. True to apply overlap to "normal" chunks formed by combining whole elements. Use with caution as this can introduce noise into otherwise clean semantic units. The default is false.
* `contextual_chunking_strategy`: *Optional*. If specified, prepends chunk-specific explanatory context to each chunk. Allowed values include `v1`. The default is none.

#### Chunk by Title strategy

<AccordionGroup>
  <Accordion title="Python SDK">
    ```python  theme={null}
    chunk_by_title_chunker_workflow_node = WorkflowNode(
        name="Chunker",
        subtype="chunk_by_title",
        type="chunk",
        settings={
            "unstructured_api_url": None,
            "unstructured_api_key": None,
            "multipage_sections": <True|False>,
            "combine_text_under_n_chars": <combine-text-under-n-chars>,
            "include_orig_elements": <True|False>,
            "new_after_n_chars": <new-after-n-chars>,
            "max_characters": <max-characters>,
            "overlap": <overlap>,
            "overlap_all": <True|False>,
            "contextual_chunking_strategy": "<contextual-chunking-strategy>"
        }
    )
    ```
  </Accordion>

  <Accordion title="curl, Postman">
    ```json  theme={null}
    {
        "name": "Chunker",
        "type": "chunk",
        "subtype": "chunk_by_title",
        "settings": {
            "unstructured_api_url": null,
            "unstructured_api_key": null,
            "multipage_sections": <true|false>,
            "combine_text_under_n_chars": <combine-text-under-n-chars>,
            "include_orig_elements": <true|false>,
            "new_after_n_chars": <new-after-n-chars>,
            "max_characters": <max-characters>,
            "overlap": <overlap>,
            "overlap_all": <true|false>,
            "contextual_chunking_strategy": "<contextual-chunking-strategy>"
        }
    }
    ```
  </Accordion>
</AccordionGroup>

Fields for `settings` include:

* `unstructured_api_url`: *Optional*. If specified, use a non-default API URL for calls to the specified chunker as needed. The default is none, which means to rely on using Unstructured's internal default API URL for the chunker.
* `unstructured_api_key`: *Optional*. If specified, use a non-default API key for calls to the specified chunker as needed. The default is none, which means to rely on using Unstructured's internal default API key for the chunker.- `multipage_sections`: *Optional*. ... The default is false.
* `combine_text_under_n_chars`: *Optional*. Combines elements from a section into a chunk until a section reaches a length of this many characters. The default is none.
* `include_orig_elements`: *Optional*. True to have the elements that are used to form a chunk appear in `.metadata.orig_elements` for that chunk. The default is false.
* `new_after_n_chars`: *Optional*. Closes new sections after reaching a length of this many characters. This is an approximate limit. The default is none.
* `max_characters`: *Optional*. The absolute maximum number of characters in a chunk. The default is none.
* `overlap`: *Optional*. Applies a prefix of this many trailing characters from the prior text-split chunk to second and later chunks formed from oversized elements by text-splitting. The default is none.
* `overlap_all`: *Optional*. True to apply overlap to "normal" chunks formed by combining whole elements. Use with caution as this can introduce noise into otherwise clean semantic units. The default is false.
* `contextual_chunking_strategy`: *Optional*. If specified, prepends chunk-specific explanatory context to each chunk. Allowed values include `v1`. The default is none.

#### Chunk by Page strategy

<AccordionGroup>
  <Accordion title="Python SDK">
    ```python  theme={null}
    chunk_by_page_chunker_workflow_node = WorkflowNode(
        name="Chunker",
        subtype="chunk_by_page",
        type="chunk",
        settings={
            "unstructured_api_url": None,
            "unstructured_api_key": None,
            "include_orig_elements": <True|False>,
            "new_after_n_chars": <new-after-n-chars>,
            "max_characters": <max-characters>,
            "overlap": <overlap>,
            "overlap_all": <True|False>,
            "contextual_chunking_strategy": "<contextual-chunking-strategy>"
        }
    )
    ```
  </Accordion>

  <Accordion title="curl, Postman">
    ```json  theme={null}
    {
        "name": "Chunker",
        "type": "chunk",
        "subtype": "chunk_by_page",
        "settings": {
            "unstructured_api_url": null,
            "unstructured_api_key": null,
            "include_orig_elements": <true|false>,
            "new_after_n_chars": <new-after-n-chars>,
            "max_characters": <max-characters>,
            "overlap": <overlap>,
            "overlap_all": <true|false>,
            "contextual_chunking_strategy": "<contextual-chunking-strategy>"
        }
    }
    ```
  </Accordion>
</AccordionGroup>

Fields for `settings` include:

* `unstructured_api_url`: *Optional*. If specified, use a non-default API URL for calls to the specified chunker as needed. The default is none, which means to rely on using Unstructured's internal default API URL for the chunker.
* `unstructured_api_key`: *Optional*. If specified, use a non-default API key for calls to the specified chunker as needed. The default is none, which means to rely on using Unstructured's internal default API key for the chunker.- `include_orig_elements`: *Optional*. ... The default is false.
* `include_orig_elements`: *Optional*. True to have the elements that are used to form a chunk appear in `.metadata.orig_elements` for that chunk. The default is false.
* `new_after_n_chars`: *Optional*. Closes new sections after reaching a length of this many characters. This is an approximate limit. The default is none.
* `max_characters`: *Optional*. The absolute maximum number of characters in a chunk. The default is none.
* `overlap`: *Optional*. Applies a prefix of this many trailing characters from the prior text-split chunk to second and later chunks formed from oversized elements by text-splitting. The default is none.
* `overlap_all`: *Optional*. True to apply overlap to "normal" chunks formed by combining whole elements. Use with caution as this can introduce noise into otherwise clean semantic units. The default is false.
* `contextual_chunking_strategy`: *Optional*. If specified, prepends chunk-specific explanatory context to each chunk. Allowed values include `v1`. The default is none.

#### Chunk by Similarity strategy

<AccordionGroup>
  <Accordion title="Python SDK">
    ```python  theme={null}
    chunk_by_similarity_chunker_workflow_node = WorkflowNode(
        name="Chunker",
        subtype="chunk_by_similarity",
        type="chunk",
        settings={
            "unstructured_api_url": None,
            "unstructured_api_key": None,
            "include_orig_elements": <True|False>,
            "new_after_n_chars": <True|False>,
            "max_characters": <max-characters>,
            "overlap": <overlap>,
            "overlap_all": <True|False>,
            "contextual_chunking_strategy": "<contextual-chunking-strategy>",
            "similarity_threshold": <similarity-threshold>
        }
    )
    ```
  </Accordion>

  <Accordion title="curl, Postman">
    ```json  theme={null}
    {
        "name": "Chunker",
        "type": "chunk",
        "subtype": "chunk_by_similarity",
        "settings": {
            "unstructured_api_url": null,
            "unstructured_api_key": null,
            "include_orig_elements": <true|false>,
            "new_after_n_chars": <true|false>,
            "max_characters": <max-characters>,
            "overlap": <overlap>,
            "overlap_all": <true|false>,
            "contextual_chunking_strategy": "<contextual-chunking-strategy>",
            "similarity_threshold": <similarity-threshold>
        }
    }
    ```
  </Accordion>
</AccordionGroup>

Fields for `settings` include:

* `unstructured_api_url`: *Optional*. If specified, use a non-default API URL for calls to the specified chunker as needed. The default is none, which means to rely on using Unstructured's internal default API URL for the chunker.
* `unstructured_api_key`: *Optional*. If specified, use a non-default API key for calls to the specified chunker as needed. The default is none, which means to rely on using Unstructured's internal default API key for the chunker.
* `include_orig_elements`: *Optional*. True to have the elements that are used to form a chunk appear in `.metadata.orig_elements` for that chunk. The default is false.
* `new_after_n_chars`: *Optional*. Closes new sections after reaching a length of this many characters. This is an approximate limit. The default is none.
* `max_characters`: *Optional*. The absolute maximum number of characters in a chunk. The default is none.
* `overlap`: *Optional*. Applies a prefix of this many trailing characters from the prior text-split chunk to second and later chunks formed from oversized elements by text-splitting. The default is none.
* `overlap_all`: *Optional*. True to apply overlap to "normal" chunks formed by combining whole elements. Use with caution as this can introduce noise into otherwise clean semantic units. The default is false.
* `contextual_chunking_strategy`: *Optional*. If specified, prepends chunk-specific explanatory context to each chunk. Allowed values include `v1`. The default is none.
* `similarity_threshold`: *Optional*. The minimum similarity that text in consecutive elements must have to be included in the same chunk. This must be a value between `0.0` and `1.0`, exclusive (`0.01` to `0.99`). The default is none.

### Embedder node

An **Embedder** node has a `type` of `embed`.

<Warning>
  If you add an embedder node, you must set the workflow's chunker node's `max_characters` setting to a value at or below Unstructured's recommended
  maximum chunk size for your specified embedding model. [Learn more](/ui/embedding#chunk-sizing-and-embedding-models).
</Warning>

[Learn about the available embedding providers and models](/ui/embedding).

<AccordionGroup>
  <Accordion title="Python SDK">
    ```python  theme={null}
    embedder_workflow_node = WorkflowNode(
        name="Embedder",
        subtype="<subtype>",
        type="embed",
        settings={
            "model_name": "<model-name>"             
        }
    )
    ```
  </Accordion>

  <Accordion title="curl, Postman">
    ```json  theme={null}
    {
        "name": "Embedder",
        "type": "embed",
        "subtype": "<subtype>",
        "settings": {
            "model_name": "<model-name>"
        }
    }
    ```
  </Accordion>
</AccordionGroup>

Allowed values for `subtype` and `model_name` include the following:

<Note>
  The following list applies only to Unstructured **Let's Go** and **Pay-As-You-Go** accounts.

  For Unstructured **Business** accounts, to get your current list of available models, contact your
  Unstructured account administrator or Unstructured sales representative, or email Unstructured Support at
  [support@unstructured.io](mailto:support@unstructured.io).
</Note>

* `"subtype": "azure_openai"`

  * `"model_name": "text-embedding-3-small"`
  * `"model_name": "text-embedding-3-large"`
  * `"model_name": "text-embedding-ada-002"`

* `"subtype": "bedrock"`

  * `"model_name": "amazon.titan-embed-text-v2:0"`
  * `"model_name": "amazon.titan-embed-text-v1"`
  * `"model_name": "amazon.titan-embed-image-v1"`
  * `"model_name": "cohere.embed-english-v3"`
  * `"model_name": "cohere.embed-multilingual-v3"`

* `"subtype": "ibm"`

  * `"model_name": "sentence-transformers/all-minilm-l6-v2"`
  * `"model_name": "ibm/granite-embedding-278m-multilingual"`
  * `"model_name": "intfloat/multilingual-e5-large"`
  * `"model_name": "ibm/slate-125m-english-rtrvr-v2"`
  * `"model_name": "ibm/slate-125m-multilingual-rtrvr-v2"`

* `"subtype": "togetherai"`

  * `"model_name": "togethercomputer/m2-bert-80M-32k-retrieval"`

* `"subtype": "voyageai"`

  * `"model_name": "voyage-3"`
  * `"model_name": "voyage-3-large"`
  * `"model_name": "voyage-3-lite"`
  * `"model_name": "voyage-code-3"`
  * `"model_name": "voyage-finance-2"`
  * `"model_name": "voyage-law-2"`
  * `"model_name": "voyage-code-2"`
  * `"model_name": "voyage-multimodal-3"`

## List templates

To list templates, use the `UnstructuredClient` object's `templates.list_templates` function (for the Python SDK) or the `GET` method to call the `/templates` endpoint (for `curl` or Postman).

<AccordionGroup>
  <Accordion title="Python SDK">
    ```python  theme={null}
    import os
    import json

    from unstructured_client import UnstructuredClient
    from unstructured_client.models.operations import ListTemplatesRequest

    client = UnstructuredClient(
        api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")
    )

    response = client.templates.list_templates(
        request=ListTemplatesRequest()
    )

    print(json.dumps(response.response_list_templates, indent=4))
    ```
  </Accordion>

  <Accordion title="Python SDK (async)">
    ```python  theme={null}
    import os
    import json
    import asyncio

    from unstructured_client import UnstructuredClient
    from unstructured_client.models.operations import ListTemplatesRequest

    async def list_templates():
        client = UnstructuredClient(
            api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")
        )

        response = client.templates.list_templates(
            request=ListTemplatesRequest()
        )

        print(json.dumps(response.response_list_templates, indent=4))

    asyncio.run(list_templates())
    ```
  </Accordion>

  <Accordion title="curl">
    ```bash  theme={null}
    curl --request 'GET' --location \
    "$UNSTRUCTURED_API_URL/templates" \
    --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
    --header 'accept: application/json'
    ```
  </Accordion>

  <Accordion title="Postman">
    1. In the method drop-down list, select **GET**.

    2. In the address box, enter the following URL:

       ```text  theme={null}
       {{UNSTRUCTURED_API_URL}}/templates
       ```

    3. On the **Headers** tab, enter the following headers:

       * **Key**: `unstructured-api-key`, **Value**: `{{UNSTRUCTURED_API_KEY}}`
       * **Key**: `accept`, **Value**: `application/json`

    4. Click **Send**.
  </Accordion>
</AccordionGroup>

## Get a template

To get information about a template, use the `UnstructuredClient` object's `templates.get_template` function (for the Python SDK) or the `GET` method to call the
`/templates/<template-id>` endpoint (for `curl` or Postman), replacing `<template-id>` with the
template's unique ID. To get this ID, see [List templates](#list-templates).

<AccordionGroup>
  <Accordion title="Python SDK">
    ```python  theme={null}
    import os
    import json

    from unstructured_client import UnstructuredClient
    from unstructured_client.models.operations import GetTemplateRequest

    client = UnstructuredClient(
        api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")
    )

    response = client.templates.get_template(
        request=GetTemplateRequest(
            template_id="<template-id>"
        )
    )

    print(json.dumps(response.response_get_template, indent=4))
    ```
  </Accordion>

  <Accordion title="Python SDK (async)">
    ```python  theme={null}
    import os
    import json
    import asyncio

    from unstructured_client import UnstructuredClient
    from unstructured_client.models.operations import GetTemplateRequest

    async def get_template():
        client = UnstructuredClient(
            api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")
        )

        response = client.templates.get_template(
            request=GetTemplateRequest(
                template_id="<template-id>"
            )
        )

        print(json.dumps(response.response_get_template, indent=4))

    asyncio.run(get_template())
    ```
  </Accordion>

  <Accordion title="curl">
    ```bash  theme={null}
    curl --request 'GET' --location \
    "$UNSTRUCTURED_API_URL/templates/<template-id>" \
    --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
    --header 'accept: application/json'
    ```
  </Accordion>

  <Accordion title="Postman">
    1. In the method drop-down list, select **GET**.

    2. In the address box, enter the following URL:

       ```text  theme={null}
       {{UNSTRUCTURED_API_URL}}/templates/<template-id>
       ```

    3. On the **Headers** tab, enter the following headers:

       * **Key**: `unstructured-api-key`, **Value**: `{{UNSTRUCTURED_API_KEY}}`
       * **Key**: `accept`, **Value**: `application/json`

    4. Click **Send**.
  </Accordion>
</AccordionGroup>

---

# Source: https://docs.unstructured.io/ui/account/workspaces.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Workspaces

<Note>
  The following information applies only to Unstructured [organizational accounts](/ui/account/organizations), which are
  available only for Unstructured **Business** accounts. To upgrade from an Unstructured **Let's Go** or **Pay-As-You-Go** account
  to a **Business** account, contact your Unstructured sales representative, or
  email Unstructured Sales at [sales@unstructured.io](mailto:sales@unstructured.io).
</Note>

A *workspace* is a collection of resources such as connectors, workflows, jobs, and API keys.

Each workspace operates independently of all other workspaces.

## Workspace architecture

A workspace can belong to an [organizational account](/ui/account/organizations) or to a user's personal account.

Every organizational account begins with one default workspace. An organizational account can have multiple workspaces. (A user's personal account always has one and only one workspace.)

```mermaid  theme={null}
graph LR
    Org1[Organization 1]
    Org2[Organization 2]
    Org3[Organization 3]
    WS1A[Workspace 1A]
    WS2A[Workspace 2A]
    WS2B[Workspace 2B]
    WS3A[Workspace 3A]
    WS3B[Workspace 3B]
    WS3C[Workspace 3C]

    Org1 --> WS1A
    
    Org2 --> WS2A
    Org2 --> WS2B

    Org3 --> WS3A
    Org3 --> WS3B
    Org3 --> WS3C
```

Users can have access to multiple workspaces within an organizational account.

```mermaid  theme={null}
graph LR
    User1[User 1]
    User2_Org1[User 2]
    User2_Org2[User 2]
    User3_Org1[User 3] 
    User3_Org2[User 3]
    User3_Org3[User 3]

    Org1[Organization 1]
    Org2[Organization 2]
    Org3[Organization 3]

    WS1A[Workspace 1A]
    WS2A[Workspace 2A]
    WS2B[Workspace 2B]
    WS3A[Workspace 3A]
    WS3B[Workspace 3B]
    WS3C[Workspace 3C]
    
    Org1 --> WS1A
    Org1 -.-> User1
    Org1 -.-> User2_Org1
    Org1 -.-> User3_Org1
    User1 -.-> WS1A
    User2_Org1 -.-> WS1A
    User3_Org1 -.-> WS1A
    
    Org2 --> WS2A
    Org2 --> WS2B
    Org2 -.-> User2_Org2
    Org2 -.-> User3_Org2
    User2_Org2 -.-> WS2A
    User2_Org2 -.-> WS2B
    User3_Org2 -.-> WS2B
    
    Org3 --> WS3A
    Org3 --> WS3B
    Org3 --> WS3C
    Org3 -.-> User3_Org3
    User3_Org3 -.-> WS3A
    User3_Org3 -.-> WS3B
    User3_Org3 -.-> WS3C
```

Each user must be explicitly added to each workspace that they are allowed to access. (A user always has access to their personal workspace within their personal account.) This assumes that the user has first been added
to the workspace's parent organizational account.

To make programmatic API calls to a workspace instead of using the UI, an API key must first exist for that workspace. The caller then uses that API key to make API calls to that workspace.
Each API key is valid only for its related workspace.

## Create a workspace

To create a workspace, you must first have an available Unstructured [organizational account](/ui/account/organizations).

Currently, only Unstructured representatives can create organizational accounts. To request an organizational account to be created,
contact your Unstructured
sales representative, or email Unstructured Sales at [sales@unstructured.io](mailto:sales@unstructured.io).

<Info>
  Your user must have the **Super Administrator** [role](/ui/account/roles) in the intended workspace's parent organizational account to create workspaces in that organizational account.

  (A user always has one and only one personal workspace within their personal account. You cannot create additional workspaces in personal accounts.)
</Info>

1. If you are not already signed in, sign in to your Unstructured account.
2. In the top navigation bar, in the organizational account selector, select the name of the organizational account that you want to add a workspace to.
3. In the sidebar, above your user icon, click the **Settings** (gear) icon.
4. Click **View All Workspaces**.
5. Click **New Workspace**.
6. In the **Create New Workspace** dialog, enter a name for the new workspace.
7. Click **Continue**.

## Access a workspace

1. If you are not already signed in, sign in to your Unstructured account.
2. In the top navigation bar, in the organizational account selector, select the name of the organizational account that contains the workspace you want to access.
3. In the workspace selector, select the name of the workspace that you want to access.

   If the workspace selector does not show the workspace you want to access, then
   your user has not been added to that workspace. To be added to the workspace, contact one of the workspace's administrators or its
   parent organizational account's administrators.

## Add a member to a workspace

<Info>
  Your user must have either the **Super Administrator** [role](/ui/account/roles) in the workspace's parent organizational account or the **Workspace Admin** role in the
  workspace to add members to that workspace.

  The user to be added must also have the **Developer** role in the workspace's parent organizational account.

  (A user always has access to their personal workspace within their personal account.)
</Info>

To add a user to a workspace as a member:

1. Make sure the account user that you want to add has already been added as a member of the workspace's parent organizational account.
   [Learn how](/ui/account/organizations#add-a-member-to-an-organization).
2. If you are not already signed in, sign in to your Unstructured account.
3. In the top navigation bar, in the organizational account selector, select the name of the organizational account that contains the workspace you want to add a member to.
4. In the workspace selector, select the name of the workspace that you want to add a member to.
5. In the sidebar, above your user icon, click the **Settings** (gear) icon.
6. Click **Manage Workspace**.
7. In the top navigation bar, click **Members**.
8. Click **Add New Member**.
9. In the **Add New Member** dialog, select one of the organizational account's members to add to this workspace as a member.
10. In the role selector, select the new team member's [role](/ui/account/roles) for this workspace.
11. Click **Continue**.

## Change a workspace role for a member

<Info>
  Your user must have either the **Super Administrator** role in the workspace's parent organizational account or the **Workspace Admin** role in the
  workspace to change the roles for that workspace's members.
</Info>

1. If you are not already signed in, sign in to your Unstructured account.
2. In the top navigation bar, in the organizational account selector, select the name of the organizational account that contains the workspace for the member you want to change roles for.
3. In the workspace selector, select the name of the workspace that contains the member you want to change roles for.
4. In the sidebar, above your user icon, click the **Settings** (gear) icon.
5. Click **Manage Workspace**.
6. In the top navigation bar, click **Members**.
7. Next to the member you want to change roles for, click the ellipses.
8. Click **Edit Permissions**.
9. Select the new role.
10. Click **Continue**.

## Remove a member from a workspace

<Warning>
  A member who is removed from a workspace will no longer have access to the workspace. The member will continue to be able to use their personal account and personal workspace.

  Any connectors, workflows, jobs, or other resources that were created by the member within the workspace will not be removed.

  The only way to reverse removing a member from a workspace is to add the member back to the workspace.
</Warning>

<Info>
  Your user must have either the **Super Administrator** [role](/ui/account/roles) in the workspace's parent organizational account or the **Workspace Admin** role in the
  workspace to remove members from that workspace.

  (A user cannot be removed from their personal workspace within their personal account.)
</Info>

1. If you are not already signed in, sign in to your Unstructured account.
2. In the top navigation bar, in the organizational account selector, select the name of the organizational account that contains the workspace you want to remove a member from.
3. In the workspace selector, select the name of the workspace that you want to remove a member from.
4. In the sidebar, above your user icon, click the **Settings** (gear) icon.
5. Click **Manage Workspace**.
6. In the top navigation bar, click **Members**.
7. In the list of members, click the ellipses next to the member that you want to remove.
8. Click **Remove Member**.
9. Confirm the removal by clicking **Continue**.

## Create an API key for a workspace

<Info>
  Your user must have either the **Super Administrator** [role](/ui/account/roles) in the workspace's parent organizational account or the **Workspace Admin** or **Developer** role in the
  workspace to create API keys for that workspace.
</Info>

To make programmatic API calls to a workspace instead of using the UI, an API key must first exist for that workspace, and then the caller uses that API key to make API calls to that workspace.

An API key can be used only with the workspace for which it was created.

1. If you are not already signed in, sign in to your Unstructured account.
2. In the top navigation bar, in the organizational account selector, select the name of the organizational account that contains the workspace you want to create an API key for.
3. In the workspace selector, select the name of the workspace that you want to create an API key for.
4. In the sidebar, above your user icon, click the **Settings** (gear) icon.
5. Click **Manage Workspace**.
6. In the top navigation bar, click **API Keys**.
7. Click **Generate New Key**.
8. Enter some name for the key, and click **Continue**.
9. To get the key's value, in the list of available keys, click the **Copy** icon next to the newly created key. The key's value is
   copies to your system's clipboard.

## Delete an API key for a workspace

An API key is valid only for the workspace for which it was created.

<Warning>
  Deleting an existing API key is a permanent operation. Deleted API keys cannot be recovered.

  You cannot use a deleted API key to call the Unstructured APIs.
</Warning>

<Info>
  Your user must have either the **Super Administrator** [role](/ui/account/roles) in the workspace's parent organizational account or the **Workspace Admin** or **Developer** role in the
  workspace to delete API keys from that workspace.
</Info>

1. If you are not already signed in, sign in to your Unstructured account.
2. In the top navigation bar, in the organizational account selector, select the name of the organizational account that contains the workspace you want to delete an API key for.
3. In the workspace selector, select the name of the workspace that you want to delete an API key for.
4. In the sidebar, above your user icon, click the **Settings** (gear) icon.
5. Click **Manage Workspace**.
6. In the top navigation bar, click **API Keys**.
7. In the list of available keys, click the ellipses next to the key you want to delete.
8. Click **Delete**.
9. Confirm the deletion by clicking **Continue**.

## Permanently delete a workspace

<Warning>
  Deleting a workspace is a permanent action and is not recoverable. Once a workspace is deleted, all connectors, workflows, jobs, and
  API keys that are associated with that workspace are also deleted and are not recoverable.

  The workspace's members are not deleted, but they will no longer have access to the deleted workspace's connectors, workflows, jobs, and API keys.

  Any code or scripts that reference the deleted workspace's connectors, workflows, jobs, or workspace API keys will fail.
</Warning>

<Info>
  A user's personal workspace cannot be deleted from their personal account.
</Info>

Only authorized Unstructured representatives can delete workspaces. To request a workspace to be permanently deleted, contact your Unstructured
sales representative, or email Unstructured Sales at [sales@unstructured.io](mailto:sales@unstructured.io).

---

# Source: https://docs.unstructured.io/ui/sources/zendesk.md

# Source: https://docs.unstructured.io/open-source/ingestion/source-connectors/zendesk.md

# Source: https://docs.unstructured.io/api-reference/workflow/sources/zendesk.md

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Zendesk

<Note>
  If you're new to Unstructured, read this note first.

  Before you can create a source connector, you must first sign in to your Unstructured account:

  * If you do not already have an Unstructured account, [sign up for free](https://unstructured.io/?modal=try-for-free).
    After you sign up, you are automatically signed in to your new Unstructured **Let's Go** account, at [https://platform.unstructured.io](https://platform.unstructured.io).
    To sign up for a **Business** account instead, [contact Unstructured Sales](https://unstructured.io/?modal=contact-sales), or [learn more](/api-reference/overview#pricing).
  * If you already have an Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business SaaS** account and are not already signed in, sign in to your account at
    [https://platform.unstructured.io](https://platform.unstructured.io). For other types of **Business** accounts, see your Unstructured account administrator for sign-in instructions,
    or email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  After you sign in, the [Unstructured user interface](/ui/overview) (UI) appears, which you use to get your Unstructured API key, as follows:

  1. After you sign in to your Unstructured **Let's Go**, **Pay-As-You-Go**, or **Business** account, click **API Keys** on the sidebar.<br />

     For a **Business** account, before you click **API Keys**, make sure you have selected the organizational workspace you want to create an API key
     for. Each API key works with one and only one organizational workspace. [Learn more](/ui/account/workspaces#create-an-api-key-for-a-workspace).

  2. Click **Generate API Key**.<br />

  3. Follow the on-screen instructions to finish generating the key.<br />

  4. Click the **Copy** icon next to your new key to add the key to your system's clipboard. If you lose this key, simply return and click the **Copy** icon again.<br />

  After you create the source connector, add it along with a
  [destination connector](/api-reference/workflow/destinations/overview) to a [workflow](/api-reference/workflow/overview#workflows).
  Then run the worklow as a [job](/api-reference/workflow/overview#jobs). To learn how, try out the
  the notebook [Dropbox-To-Pinecone Connector API Quickstart for Unstructured](https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Dropbox_To_Pinecone_Connector_Quickstart.ipynb),
  or watch the two 4-minute video tutorials for the [Unstructured Python SDK](/api-reference/workflow/overview#unstructured-python-sdk).

  You can also create source connectors with the Unstructured user interface (UI).
  [Learn how](/ui/sources/overview).

  If you need help, email Unstructured Support at [support@unstructured.io](mailto:support@unstructured.io).

  You are now ready to start creating a source connector! Keep reading to learn how.
</Note>

Ingest your files into Unstructured from Zendesk.

The requirements are as follows.

<iframe width="560" height="315" src="https://www.youtube.com/embed/f2yeXwSvZtE" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen />

* A [Zendesk account](https://www.zendesk.com/register/).
* Your [Zendesk subdomain](https://support.zendesk.com/hc/en-us/articles/4409381383578-Where-can-I-find-my-Zendesk-subdomain), for example, the `<organization>` part of `<organization>.zendesk.com`.
* The login email address for your Zendesk account.
* An [API token](https://support.zendesk.com/hc/en-us/articles/4408889192858-Managing-access-to-the-Zendesk-API#topic_tcb_fk1_2yb) (not an OAuth token) for your login email address.

To create a Zendesk source connector, see the following examples.

<CodeGroup>
  ```python Python SDK theme={null}
  import os

  from unstructured_client import UnstructuredClient
  from unstructured_client.models.operations import CreateSourceRequest
  from unstructured_client.models.shared import CreateSourceConnector

  with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
      response = client.sources.create_source(
          request=CreateSourceRequest(
              create_source_connector=CreateSourceConnector(
                  name="<name>",
                  type="zendesk",
                  config={
                      "subdomain": "<subdomain>",
                      "email": "<email>",
                      "api_token": "<api-token>",
                      "item_type": "<item-type>",
                      "batch_size": <batch-size>
                  }
              )
          )
      )

      print(response.source_connector_information)

  ```

  ```bash curl theme={null}
  curl --request 'POST' --location \
  "$UNSTRUCTURED_API_URL/sources" \
  --header 'accept: application/json' \
  --header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
  --header 'content-type: application/json' \
  --data \
  '{
      "name": "<name>",
      "type": "zendesk",
      "config": {
          "subdomain": "<subdomain>",
          "email": "<email>",
          "api-token": "<api-token>",
          "item-type": "<item-type>",
          "batch-size": <batch-size>
      }
  }'
  ```
</CodeGroup>

Replace the preceding placeholders as follows:

* `<name>` (*required*) - A unique name for this connector.
* `<subdomain>` (*required*): The subdomain of your Zendesk site, for example  the `<organization>` part of `<organization>.zendesk.com`.
* `<email>` (*required*): The email address of the user account that has access to the Zendesk subdomain.
* `<api-token>` (*required*): The API token (not an OAuth token) for the user account. For more information, see [Managing access to the Zendesk API](https://support.zendesk.com/hc/en-us/articles/226022787-Generating-a-new-API-token).
* `<item-type>`: The type of item to parse. Allowed values include `tickets` and `articles`. If no value is provided, the default is `tickets`.
* `<batch-size>`: The maximum number of items to structure per batch. The default is `2`.

## Learn more

* <Icon icon="blog" />  [Unlock Customer Support Knowledge: Unstructured Integrates with Zendesk](https://unstructured.io/blog/unlock-customer-support-knowledge-unstructured-integrates-with-zendesk)