# Mozilla Data Collective > All API requests should be made to the following base URL: --- # Source: https://datacollective.mozillafoundation.org/api-reference/docs #### Base URL All API requests should be made to the following base URL: ``` https://datacollective.mozillafoundation.org/api ``` #### Authentication All authenticated endpoints require an API key in the Authorization header: ``` Authorization: Bearer YOUR_API_KEY ``` You can create and manage your API keys in your [profile settings](/profile/credentials). ### API Endpoints [GET]`/datasets/:datasetId` ##### Get Dataset Details Retrieves the details of a specific dataset. ###### Authentication [Required] Bearer token in Authorization header ###### Path Parameters `datasetId` string [Required] The ID of the dataset ###### Success Response (200 OK) ``` , "license": "CC0-1.0", "task": "ASR", "format": "MP3", "datasetUrl": "https://datacollective.mozillafoundation.org/datasets/dataset-1" } ``` ###### Error Responses [404] Dataset not found [403] Access denied. Private dataset requires organization membership [POST]`/datasets/:datasetId/download` ##### Create Download Session Creates a download session and returns a download token. The user must have previously agreed to the dataset\'s terms of use through the web interface. ###### Authentication [Required] Bearer token in Authorization header ###### Path Parameters `datasetId` string [Required] The ID of the dataset ###### Success Response (200 OK) ``` ``` ###### Error Responses [403] You must agree to the terms of use before downloading this dataset [404] Dataset not found [401] Authentication required [429] Rate limit exceeded [GET]`/datasets/:datasetId/download/:downloadToken` ##### Download Dataset File Downloads the actual dataset file. ###### Authentication [Required] Bearer token in Authorization header ###### Request Headers `Range` string [Optional] Byte range for partial downloads e.g. \'Range: bytes=0-100\' ###### Path Parameters `datasetId` string [Required] The ID of the dataset `downloadToken` string [Required] The temporary download token ###### Success Response (200 OK) Response Headers: Content-Length: 268435456000 Content-Type: application/zip Content-Disposition: attachment; filename=\"common-voice-corpus-22.zip\" ``` Binary file data ``` ###### Success Response (206 Partial Content) Response Headers: Content-Length: 100 Content-Type: application/zip Content-Range: bytes 0-100/268435456000 Content-Disposition: attachment; filename=\"common-voice-corpus-22.zip\" ``` Partial binary file data ``` ###### Error Responses [401] Invalid or expired download token [404] Dataset or download session not found [416] Requested Range Not Satisfiable [429] Bandwidth limit exceeded ### Rate Limiting The API employs organization-level rate limiting to ensure fair usage and stability. Rate limits apply to both API requests and bandwidth consumption. ##### Request Rate Limiting When request limits are exceeded, the API responds with status code 429 and includes these headers: `X-RateLimit-Limit`[Total requests allowed in current window] `X-RateLimit-Remaining`[Requests remaining in current window] `Retry-After`[Seconds until next request allowed] ##### Bandwidth Rate Limiting Download endpoints enforce bandwidth limits at the organization level. When exceeded, connections are terminated with a 429 error. ``` } ``` ### Implementation Notes ###### Single Use Downloads Each download token can only be used for one complete download session. Once a file is fully downloaded, the token is invalidated. ###### Proxied Downloads All downloads are proxied through the API server for real-time rate limiting, access control, and analytics tracking. ###### Terms Agreement Required Users must agree to dataset terms through the web interface before downloading. API-only terms agreement is not supported. ### Error Handling ##### Common Error Responses [400] ###### Bad Request Malformed request or invalid parameters ``` } ``` [401] ###### Unauthorized Missing or invalid authentication ``` ``` [429] ###### Too Many Requests Rate limit exceeded ``` ``` --- # Source: https://datacollective.mozillafoundation.org/api-reference  [](/) [](/datasets) Datasets [](/api-reference) API [](https://community.mozilladatacollective.com/about) About [\...] ## Harness community-driven datasets with our API  [] **Version:** Beta The Mozilla Data Collective API gives developers access to community-created datasets while empowering contributors to maintain control over their data.  Get API Access Browse API Docs  ### Mozilla Data Collective API at a glance  ###### Create access credentials Manage your API credentials by going to Profile \> API  ###### Secure your key Store your access credentials in a secret key  ###### Authentication Provide your API key in your request header to authenticate  ###### Select your dataset Choose from over 300 global datasets to use  ###### Agree to dataset terms You will only be able to download datasets after accepting terms  ###### Download Use our REST endpoint or the MDC python library to get started [](/profile/credentials) Create API credentials  ### API Overview Power your projects with diverse, ethically-created datasets that are just one REST call away.  Get API Access Browse API Docs   copy ``` // Test code for connecting to Mozilla Data API from datacollective import DataCollective client = DataCollective() client.get_dataset("mdc-dataset-id") ``` ### Give it a try Get up and running with datacollective-python, a Python library for authenticating and interacting with the MDC API. [](https://pypi.org/project/datacollective/) Python Library Browse Docs  ### Links & Docs [[]](/profile/credentials) Get API Access [[]](/api-reference/docs) Browse API Docs [[]](https://pypi.org/project/datacollective/) Python Library [[]](https://github.com/Mozilla-Data-Collective/datacollective-python) Python Library Source   [](/privacy) Privacy Policy [](/terms) Terms [](https://www.mozilla.org/en-US/privacy/websites/#cookies) Cookies [](https://community.mozilladatacollective.com/tag/faq/) FAQs [](https://www.mozilla.org/en-US/about/governance/policies/participation/) Participation Guidelines [](mailto:mozilladatacollective@mozillafoundation.org) mozilladatacollective@mozillafoundation.org  [](/privacy) Privacy Policy [](/terms/consumers) Terms [](/cookies) Cookies [](/faqs) FAQs  [](mailto:mozilladatacollective@mozillafoundation.org) mozilladatacollective@mozillafoundation.org   Brought to you by [Mozilla Foundation](https://www.mozillafoundation.org) --- # Source: https://datacollective.mozillafoundation.org/datasets # Datasets ## Explore Datasets ### Featured Datasets - **Common Voice Kinyarwanda** - High-quality speech data for machine learning applications - [Download](https://datacollective.mozillafoundation.org/datasets/cmjk758i00cfumk070r7nwve7) - **Common Voice Chinese** - High-quality speech data for machine learning applications - [Download](https://datacollective.mozillafoundation.org/datasets/cmjhe0xap09gamb078g9loi3q) - **Common Voice Spanish** - High-quality speech data for machine learning applications - [Download](https://datacollective.mozillafoundation.org/datasets/cmjepxo6t08nmmk07iauvua6v) - **Common Voice Catalan** - High-quality speech data for machine learning applications - [Download](https://datacollective.mozillafoundation.org/datasets/cmjcc6g9z06c7mk07yolcdyjr) ## Datasets - **Bamun-French Parallel Corpus** - This dataset is a parallel corpus of Bamun (Shupamem) to French texts. Text were obtained by transcription of raw audio files. Translation were added to enrich the original corpus. Alignment of Bamun and French texts were made in the process of creating this dataset. - [Download](https://datacollective.mozillafoundation.org/datasets/cmjk758i00cfumk070r7nwve7) - [License](https://datacollective.mozillafoundation.org/datasets/cmjk758i00cfumk070r7nwve7#license) - [Format](https://datacollective.mozillafoundation.org/datasets/cmjk758i00cfumk070r7nwve7#format) - [Size](https://datacollective.mozillafoundation.org/datasets/cmjk758i00cfumk070r7nwve7#size) - **Surmiran Newspaper Corpus** - 2.9 million tokens in the Surmiran variety of Romansh from the daily newspaper “La Quotidiana”. - [Download](https://datacollective.mozillafoundation.org/datasets/cmjhe0xap09gamb078g9loi3q) - [License](https://datacollective.mozillafoundation.org/datasets/cmjhe0xap09gamb078g9loi3q#license) - [Format](https://datacollective.mozillafoundation.org/datasets/cmjhe0xap09gamb078g9loi3q#format) - [Size](https://datacollective.mozillafoundation.org/datasets/cmjhe0xap09gamb078g9loi3q#size) - **DhoNam: Dholuo Speech dataset** - DhoNam: Dholuo Speech dataset is a speech corpus designed to supercharge Automatic Speech Recognition (ASR) and other speech technologies for Dholuo, one of Kenya’s major indigenous languages. - This dataset contains native-speaker audio recordings collected through a platform where users read aloud a displayed sentence. The dataset includes the audio recordings and the corresponding prompt/sentence that was read. - [Download](https://datacollective.mozillafoundation.org/datasets/cmjepxo6t08nmmk07iauvua6v) - [License](https://datacollective.mozillafoundation.org/datasets/cmjepxo6t08nmmk07iauvua6v#license) - [Format](https://datacollective.mozillafoundation.org/datasets/cmjepxo6t08nmmk07iauvua6v#format) - [Size](https://datacollective.mozillafoundation.org/datasets/cmjepxo6t08nmmk07iauvua6v#size) - **Archivo de la Comisionada María de los Ángeles Guzmán García (COTAI Nuevo León / InfoNL)** - Este archivo preserva la memoria institucional y académica de la gestión de la Dra. María de los Ángeles Guzmán García como Comisionada de la Comisión de Transparencia y Acceso a la Información del Estado de Nuevo León (COTAI / INFONL) durante el periodo 2018-2025. - El dataset consolida el legado documental de una de las perfiles más técnicos y académicos del Sistema Nacional de Transparencia. Doctora en Derecho Constitucional por la Universidad Complutense de Madrid, la Comisionada Guzmán García - [Download](https://datacollective.mozillafoundation.org/datasets/cmjcc6g9z06c7mk07yolcdyjr) - [License](https://datacollective.mozillafoundation.org/datasets/cmjcc6g9z06c7mk07yolcdyjr#license) - [Format](https://datacollective.mozillafoundation.org/datasets/cmjcc6g9z06c7mk07yolcdyjr#format) - [Size](https://datacollective.mozillafoundation.org/datasets/cmjcc6g9z06c7mk07yolcdyjr#size) - **Common Voice Spontaneous Speech 2.0 - Kenyah** - A collection of spontaneous spoken phrases in Kenyah. - [Download](https://datacollective.mozillafoundation.org/datasets/cmj8u48hr006pnxzp3s43beqr) - [License](https://datacollective.mozillafoundation.org/datasets/cmj8u48hr006pnxzp3s43beqr#license) - [Format](https://datacollective.mozillafoundation.org/datasets/cmj8u48hr006pnxzp3s43beqr#format) - [Size](https://datacollective.mozillafoundation.org/datasets/cmj8u48hr006pnxzp3s43beqr#size) - **Common Voice Spontaneous Speech 2.0 - Ushojo** - A collection of spontaneous spoken phrases in Ushojo. - [Download](https://datacollective.mozillafoundation.org/datasets/cmj8u48hj006lnxzpnj14uhpz) - [License](https://datacollective.mozillafoundation.org/datasets/cmj8u48hj006lnxzpnj14uhpz#license) - [Format](https://datacollective.mozillafoundation.org/datasets/cmj8u48hj006lnxzpnj14uhpz#format) - [Size](https://datacollective.mozillafoundation.org/datasets/cmj8u48hj006lnxzpnj14uhpz#size) - **Common Voice Spontaneous Speech 2.0 - Kuku** - A collection of spontaneous spoken phrases in Kuku. - [Download](https://datacollective.mozillafoundation.org/datasets/cmj8u48hc006hnxzprn4k1cxx) - [License](https://datacollective.mozillafoundation.org/datasets/cmj8u48hc006hnxzprn4k1cxx#license) - [Format](https://datacollective.mozillafoundation.org/datasets/cmj8u48hc006hnxzprn4k1cxx#format) - [Size](https://datacollective.mozillafoundation.org/datasets/cmj8u48hc006hnxzprn4k1cxx#size) - **Common Voice Spontaneous Speech 2.0 - Rutoro** - A collection of spontaneous spoken phrases in Rutoro. - [Download](https://datacollective.mozillafoundation.org/datasets/cmj8u48h7006dnxzp3y4uqb69) - [License](https://datacollective.mozillafoundation.org/datasets/cmj8u48h7006dnxzp3y4uqb69#license) - [Format](https://datacollective.mozillafoundation.org/datasets/cmj8u48h7006dnxzp3y4uqb69#format) - [Size](https://datacollective.mozillafoundation.org/datasets/cmj8u48h7006dnxzp3y4uqb69#size) - **Common Voice Spontaneous Speech 2.0 - Turkish** - A collection of spontaneous spoken phrases in Turkish. - [Download](https://datacollective.mozillafoundation.org/datasets/cmj8u48h10069nxzpo6tghopr) - [License](https://datacollective.mozillafoundation.org/datasets/cmj8u48h10069nxzpo6tghopr#license) - [Format](https://datacollective.mozillafoundation.org/datasets/cmj8u48h10069nxzpo6tghopr#format) - [Size](https://datacollective.mozillafoundation.org/datasets/cmj8u48h10069nxzpo6tghopr#size) - **Common Voice Spontaneous Speech 2.0 - Amba** - A collection of spontaneous spoken phrases in Amba. - [Download](https://datacollective.mozillafoundation.org/datasets/cmj8u48gq0061nxzpb3y7vi7v) - [License](https://datacollective.mozillafoundation.org/datasets/cmj8u48gq0061nxzpb3y7vi7v#license) - [Format](https://datacollective.mozillafoundation.org/datasets/cmj8u48gq0061nxzpb3y7vi7v#format) - [Size](https://datacollective.mozillafoundation.org/datasets/cmj8u48gq0061nxzpb3y7vi7v#size) - **Common Voice Spontaneous Speech 2.0 - Ruuli** - A collection of spontaneous spoken phrases in Ruuli. - [Download](https://datacollective.mozillafoundation.org/datasets/cmj8u48fu005hnxzp78hiv9ll) - [License](https://datacollective.mozillafoundation.org/datasets/cmj8u48fu005hnxzp78hiv9ll#license) - [Format](https://datacollective.mozillafoundation.org/datasets/cmj8u48fu005hnxzp78hiv9ll#format) - [Size](https://datacollective.mozillafoundation.org/datasets/cmj8u48fu005hnxzp78hiv9ll#size) - **Common Voice Spontaneous Speech 2.0 - Russian** - A collection of spontaneous spoken phrases in Russian. - [Download](https://datacollective.mozillafoundation.org/datasets/cmj8u48ey004xnxzpphv4udzz) - [License](https://datacollective.mozillafoundation.org/datasets/cmj8u48ey004xnxzpphv4udzz#license) - [Format](https://datacollective.mozillafoundation.org/datasets/cmj8u48ey004xnxzpphv4udzz#format) - [Size](https://datacollective.mozillafoundation.org/datasets/cmj8u48ey004xnxzpphv4udzz#size) - **Common Voice Spontaneous Speech 2.0 - Puno Quechua** - A collection of spontaneous spoken phrases in Puno Quechua. - [Download](https://datacollective.mozillafoundation.org/datasets/cmj8u48et004tnxzps28psruc) - [License](https://datacollective.mozillafoundation.org/datasets/cmj8u48et004tnxzps28psruc#license) - [Format](https://datacollective.mozillafoundation.org/datasets/cmj8u48et004tnxzps28psruc#format) - [Size](https://datacollective.mozillafoundation.org/datasets/cmj8u48et004tnxzps28psruc#size) - **Common Voice Spontaneous Speech 2.0 - Western Penan** - A collection of spontaneous spoken phrases in Western Penan. - [Download](https://datacollective.mozillafoundation.org/datasets/cmj8u48eo004pnxzp991piql1) - [License](https://datacollective.mozillafoundation.org/datasets/cmj8u48eo004pnxzp991piql1#license) - [Format](https://datacollective.mozillafoundation.org/datasets/cmj8u48eo004pnxzp991piql1#format) - [Size](https://datacollective.mozillafoundation.org/datasets/cmj8u48eo004pnxzp991piql1#size) - **Common Voice Spontaneous Speech 2.0 - Sabah Malay** - A collection of spontaneous spoken phrases in Sabah Malay. - [Download](https://datacollective.mozillafoundation.org/datasets/cmj8u48ej004lnxzp8sdt5z8c) - [License](https://datacollective.mozillafoundation.org/datasets/cmj8u48ej004lnxzp8sdt5z8c#license) - [Format](https://datacollective.mozillafoundation.org/datasets/cmj8u48ej004lnxzp8sdt5z8c#format) - [Size](https://datacollective.mozillafoundation.org/datasets/cmj8u48ej004lnxzp8sdt5z8c#size) ## Datasets - **Bamun-French Parallel Corpus** - [Download](https://datacollective.mozillafoundation.org/datasets/cmjk758i00cfumk070r7nwve7) - [License](https://datacollective.mozillafoundation.org/datasets/cmjk758i00cfumk070r7nwve7#license) - [Format](https://datacollective.mozillafoundation.org/datasets/cmjk758i00cfumk070r7nwve7#format) - [Size](https://datacollective.mozillafoundation.org/datasets/cmjk758i00cfumk070r7nwve7#size) - **Surmiran Newspaper Corpus** - [Download](https://datacollective.mozillafoundation.org/datasets/cmjhe0xap09gamb078g9loi3q) - [License](https://datacollective.mozillafoundation.org/datasets/cmjhe0xap09gamb078g9loi3q#license) - [Format](https://datacollective.mozillafoundation.org/datasets/cmjhe0xap09gamb078g9loi3q#format) - [Size](https://datacollective.mozillafoundation.org/datasets/cmjhe0xap09gamb078g9loi3q#size) - **DhoNam: Dholuo Speech dataset** - [Download](https://datacollective.mozillafoundation.org/datasets/cmjepxo6t08nmmk07iauvua6v) - [License](https://datacollective.mozillafoundation.org/datasets/cmjepxo6t08nmmk07iauvua6v#license) - [Format](https://datacollective.mozillafoundation.org/datasets/cmjepxo6t08nmmk07iauvua6v#format) - [Size](https://datacollective.mozillafoundation.org/datasets/cmjepxo6t08nmmk07iauvua6v#size) - **Archivo de la Comisionada María de los Ángeles Guzmán García (COTAI Nuevo León / InfoNL)** - [Download](https://datacollective.mozillafoundation.org/datasets/cmjcc6g9z06c7mk07yolcdyjr) - [License](https://datacollective.mozillafoundation.org/datasets/cmjcc6g9z06c7mk07yolcdyjr#license) - [Format](https://datacollective.mozillafoundation.org/datasets/cmjcc6g9z06c7mk07yolcdyjr#format) - [Size](https://datacollective.mozillafoundation.org/datasets/cmjcc6g9z06c7mk07yolcdyjr#size) - **Common Voice Spontaneous Speech 2.0 - Kenyah** - [Download](https://datacollective.mozillafoundation.org/datasets/cmj8u48hr006pnxzp3s43beqr) - [License](https://datacollective.mozillafoundation.org/datasets/cmj8u48hr006pnxzp3s43beqr#license) - [Format](https://datacollective.mozillafoundation.org/datasets/cmj8u48hr006pnxzp3s43beqr#format) - [Size](https://datacollective.mozillafoundation.org/datasets/cmj8u48hr006pnxzp3s43beqr#size) - **Common Voice Spontaneous Speech 2.0 - Ushojo** - [Download](https://datacollective.mozillafoundation.org/datasets/cmj8u48hj006lnxzpnj14uhpz) - [License](https://datacollective.mozillafoundation.org/datasets/cmj8u48hj006lnxzpnj14uhpz#license) - [Format](https://datacollective.mozillafoundation.org/datasets/cmj8u48hj006lnxzpnj14uhpz#format) - [Size](https://datacollective.mozillafoundation.org/datasets/cmj8u48hj006lnxzpnj14uhpz#size) - **Common Voice Spontaneous Speech 2.0 - Kuku** - [Download](https://datacollective.mozillafoundation.org/datasets/cmj8u48hc006hnxzprn4k1cxx) - [License](https://datacollective.mozillafoundation.org/datasets/cmj8u48hc006hnxzprn4k1cxx#license) - [Format](https://datacollective.mozillafoundation.org/datasets/cmj8u48hc006hnxzprn4k1cxx#format) - [Size](https://datacollective.mozillafoundation.org/datasets/cmj8u48hc006hnxzprn4k1cxx#size) - **Common Voice Spontaneous Speech 2.0 - Rutoro** - [Download](https://datacollective.mozillafoundation.org/datasets/cmj8u48h7006dnxzp3y4uqb69) - [License](https://datacollective.mozillafoundation.org/datasets/cmj8u48h7006dnxzp3y4uqb69#license) - [Format](https://datacollective.mozillafoundation.org/datasets/cmj8u48h7006dnxzp3y4uqb69#format) - [Size](https://datacollective.mozillafoundation.org/datasets/cmj8u48h7006dnxzp3y4uqb69#size) - **Common Voice Spontaneous Speech 2.0 - Turkish** - [Download](https://datacollective.mozillafoundation.org/datasets/cmj8u48h10069nxzpo6tghopr) - [License](https://datacollective.mozillafoundation.org/datasets/cmj8u48h10069nxzpo6tghopr#license) - [Format](https://datacollective.mozillafoundation.org/datasets/cmj8u48h10069nxzpo6tghopr#format) - [Size](https://datacollective.mozillafoundation.org/datasets/cmj8u48h10069nxzpo6tghopr#size) - **Common Voice Spontaneous Speech 2.0 - Amba** - [Download](https://datacollective.mozillafoundation.org/datasets/cmj8u48gq0061nxzpb3y7vi7v) - [License](https://datacollective.mozillafoundation.org/datasets/cmj8u48gq0061nxzpb3y7vi7v#license) - [Format](https://datacollective.mozillafoundation.org/datasets/cmj8u48gq0061nxzpb3y7vi7v#format) - [Size](https://datacollective.mozillafoundation.org/datasets/cmj8u48gq0061nxzpb3y7vi7v#size) - **Common Voice Spontaneous Speech 2.0 - Ruuli** - [Download](https://datacollective.mozillafoundation.org/datasets/cmj8u48fu005hnxzp78hiv9ll) - [License](https://datacollective.mozillafoundation.org/datasets/cmj8u48fu005hnxzp78hiv9ll#license) - [Format](https://datacollective.mozillafoundation.org/datasets/cmj8u48fu005hnxzp78hiv9ll#format) - [Size](https://datacollective.mozillafoundation.org/datasets/cmj8u48fu005hnxzp78hiv9ll#size) - **Common Voice Spontaneous Speech 2.0 - Russian** - [Download](https://datacollective.mozillafoundation.org/datasets/cmj8u48ey004xnxzpphv4udzz) - [License](https://datacollective.mozillafoundation.org/datasets/cmj8u48ey004xnxzpphv4udzz#license) - [Format](https://datacollective.mozillafoundation.org/datasets/cmj8u48ey004xnxzpphv4udzz#format) - [Size](https://datacollective.mozillafoundation.org/datasets/cmj8u48ey004xnxzpphv4udzz#size) - **Common Voice Spontaneous Speech 2.0 - Puno Quechua** - [Download](https://datacollective.mozillafoundation.org/datasets/cmj8u48et004tnxzps28psruc) - [License](https://datacollective.mozillafoundation.org/datasets/cmj8u48et004tnxzps28psruc#license) - [Format](https://datacollective.mozillafoundation.org/datasets/cmj8u48et004tnxzps28psruc#format) - [Size](https://datacollective.mozillafoundation.org/datasets/cmj8u48et004tnxzps28psruc#size) - **Common Voice Spontaneous Speech 2.0 - Western Penan** - [Download](https://datacollective.mozillafoundation.org/datasets/cmj8u48eo004pnxzp991piql1) - [License](https://datacollective.mozillafoundation.org/datasets/cmj8u48eo004pnxzp991piql1#license) - [Format](https://datacollective.mozillafoundation.org/datasets/cmj8u48eo004pnxzp991piql1#format) - [Size](https://datacollective.mozillafoundation.org/datasets/cmj8u48eo004pnxzp991piql1#size) - **Common Voice Spontaneous Speech 2.0 - Sabah Malay** - [Download](https://datacollective.mozillafoundation.org/datasets/cmj8u48ej004lnxzp8sdt5z8c) - [License](https://datacollective.mozillafoundation.org/datasets/cmj8u48ej004lnxzp8sdt5z8c#license) - [Format](https://datacollective.mozillafoundation.org/datasets/cmj8u48ej004lnxzp8sdt5z8c#format) - [Size](https://datacollective.mozillafoundation.org/datasets/cmj8u48ej004lnxzp8sdt5z8c#size) --- # Source: https://raw.githubusercontent.com/Mozilla-Data-Collective/datacollective-python/main/README.md