# Kaggle

> CAUTION: Most of the files in this directory are obsolete and will be deleted. This file is valid,

---

# Kaggle API

CAUTION: Most of the files in this directory are obsolete and will be deleted. This file is valid,
as are `model_card.md`, `modelinstance_usage.md`, and `models_metadata.md`.

Official API for https://www.kaggle.com, accessible using a command line tool implemented in Python 3.  

Beta release - Kaggle reserves the right to modify the API functionality currently offered.

IMPORTANT: Competitions submissions using an API version prior to 1.5.0 may not work.  If you are encountering difficulties with submitting to competitions, please check your version with `kaggle --version`.  If it is below 1.5.0, please update with `pip install kaggle --upgrade`.

## Installation

Ensure you have Python 3 and the package manager `pip` installed.

Run the following command to access the Kaggle API using the command line:

`pip install kaggle` (You may need to do `pip install --user kaggle` on Mac/Linux.  This is recommended if problems come up during the installation process.) Installations done through the root user (i.e. `sudo pip install kaggle`) will not work correctly unless you understand what you're doing.  Even then, they still might not work.  User installs are strongly recommended in the case of permissions errors.

You can now use the `kaggle` command as shown in the examples below.

If you run into a `kaggle: command not found` error, ensure that your python binaries are on your path.  You can see where `kaggle` is installed by doing `pip uninstall kaggle` and seeing where the binary is (then cancel the uninstall when prompted).  For a local user install on Linux, the default location is `~/.local/bin`.  On Windows, the default location is `$PYTHON_HOME/Scripts`.

IMPORTANT: We do not offer Python 2 support.  Please ensure that you are using Python 3 before reporting any issues.

## API credentials

To use the Kaggle API, sign up for a Kaggle account at https://www.kaggle.com. Then go to the 'Account' tab of your user profile (`https://www.kaggle.com/<username>/account`) and select 'Generate New Token'. Copy the generated token and save it in an environment
variable named `KAGGLE_API_TOKEN` or place the value in a file named `~/.kaggle/access_token`. (The file may optionally have a `.txt` extension.)

Note: If you already have a `~/.kaggle/kaggle.json` file with legacy API credentials, it will continue to work.

What's noteworthy is that now when generating new tokens you do not have to update existing token usage. The old tokens will continue to work; previously they expired when a new token was generated.

## Commands

The command line tool supports the following commands:

``` 
kaggle competitions {list, files, download, submit, submissions, leaderboard}
kaggle datasets {list, files, download, create, version, init, metadata, status}
kaggle kernels {list, init, push, pull, output, status}
kaggle models {get, list, init, create, delete, update}
kaggle models instances {get, init, create, delete, update}
kaggle models instances versions {init, create, download, delete}
kaggle config {view, set, unset}
```

See more details below for using each of these commands.

### Competitions

The API supports the following commands for Kaggle Competitions.

```
usage: kaggle competitions [-h]
                           {list,files,download,submit,submissions,leaderboard}
                           ...

optional arguments:
  -h, --help            show this help message and exit

commands:
  {list,files,download,submit,submissions,leaderboard}
    list                List available competitions
    files               List competition files
    download            Download competition files
    submit              Make a new competition submission
    submissions         Show your competition submissions
    leaderboard         Get competition leaderboard information
```

##### List competitions

```
usage: kaggle competitions list [-h] [--group GROUP] [--category CATEGORY] [--sort-by SORT_BY] [-p PAGE] [-s SEARCH] [-v]

optional arguments:
  -h, --help            show this help message and exit
  --group GROUP         Search for competitions in a specific group. Default is 'general'. Valid options are 'general', 'entered', and 'inClass'
  --category CATEGORY   Search for competitions of a specific category. Default is 'all'. Valid options are 'all', 'featured', 'research', 'recruitment', 'gettingStarted', 'masters', and 'playground'
  --sort-by SORT_BY     Sort list results. Default is 'latestDeadline'. Valid options are 'grouped', 'prize', 'earliestDeadline', 'latestDeadline', 'numberOfTeams', and 'recentlyCreated'
  -p PAGE, --page PAGE  Page number for results paging. Page size is 20 by default 
  -s SEARCH, --search SEARCH
                        Term(s) to search for
  -v, --csv             Print results in CSV format
                        (if not set then print in table format)
```

Example: 

`kaggle competitions list -s health`

`kaggle competitions list --category gettingStarted`

##### List competition files

```
usage: kaggle competitions files [-h] [-v] [-q] [competition]

optional arguments:
  -h, --help   show this help message and exit
  competition  Competition URL suffix (use "kaggle competitions list" to show options)
               If empty, the default competition will be used (use "kaggle config set competition")"
  -v, --csv    Print results in CSV format (if not set then print in table format)
  -q, --quiet  Suppress printing information about the upload/download progress
```

Example:

`kaggle competitions files favorita-grocery-sales-forecasting`

##### Download competition files

```
usage: kaggle competitions download [-h] [-f FILE_NAME] [-p PATH] [-w] [-o]
                                    [-q]
                                    [competition]

optional arguments:
  -h, --help            show this help message and exit
  competition           Competition URL suffix (use "kaggle competitions list" to show options)
                        If empty, the default competition will be used (use "kaggle config set competition")"
  -f FILE_NAME, --file FILE_NAME
                        File name, all files downloaded if not provided
                        (use "kaggle competitions files -c <competition>" to show options)
  -p PATH, --path PATH  Folder where file(s) will be downloaded, defaults to current working directory
  -w, --wp              Download files to current working path
  -o, --force           Skip check whether local version of file is up to date, force file download
  -q, --quiet           Suppress printing information about the upload/download progress
```

Examples:

`kaggle competitions download favorita-grocery-sales-forecasting`

`kaggle competitions download favorita-grocery-sales-forecasting -f test.csv.7z`

Note: you will need to accept competition rules at `https://www.kaggle.com/c/<competition-name>/rules`.

##### Submit to a competition

```
usage: kaggle competitions submit [-h] [-f FILE_NAME] [-k KERNEL] -m MESSAGE [-v VERSION] [-q] [competition]

options:
  -h, --help            show this help message and exit
  competition           Competition URL suffix (use "kaggle competitions list" to show options)
                        If empty, the default competition will be used (use "kaggle config set competition")"
  -f FILE_NAME, --file FILE_NAME
                        File for upload (full path), or the name of the output file produced by a kernel (for code competitions)
  -k KERNEL, --kernel KERNEL
                        Name of kernel (notebook) to submit to a code competition
  -m MESSAGE, --message MESSAGE
                        Message describing this submission
  -v VERSION, --version VERSION
                        Version of kernel to submit to a code competition, e.g. "Version 1"
  -q, --quiet           Suppress printing information about the upload/download progress
```

Example: 

`kaggle competitions submit favorita-grocery-sales-forecasting \
    -f sample_submission_favorita.csv.7z -m "My submission message"`

`kaggle competitions submit llms-you-cant-please-them-all \
    -k username/llms-can-t-please-all-submission -v 9 -f submission.csv -m "My submission message"'`

Note: you will need to accept competition rules at `https://www.kaggle.com/c/<competition-name>/rules`.

##### List competition submissions

```
usage: kaggle competitions submissions [-h] [-v] [-q] [competition]

optional arguments:
  -h, --help   show this help message and exit
  competition  Competition URL suffix (use "kaggle competitions list" to show options)
               If empty, the default competition will be used (use "kaggle config set competition")"
  -v, --csv    Print results in CSV format (if not set then print in table format)
  -q, --quiet  Suppress printing information about the upload/download progress
```
 
Example:

`kaggle competitions submissions favorita-grocery-sales-forecasting`

Note: you will need to accept competition rules at `https://www.kaggle.com/c/<competition-name>/rules`.

##### Get competition leaderboard

```
usage: kaggle competitions leaderboard [-h] [-s] [-d] [-p PATH] [-v] [-q]
                                       [competition]

optional arguments:
  -h, --help            show this help message and exit
  competition           Competition URL suffix (use "kaggle competitions list" to show options)
                        If empty, the default competition will be used (use "kaggle config set competition")"
  -s, --show            Show the top of the leaderboard
  -d, --download        Download entire leaderboard
  -p PATH, --path PATH  Folder where file(s) will be downloaded, defaults to current working directory
  -v, --csv             Print results in CSV format (if not set then print in table format)
  -q, --quiet           Suppress printing information about the upload/download progress
```

Example:

`kaggle competitions leaderboard favorita-grocery-sales-forecasting -s`


### Datasets

The API supports the following commands for Kaggle Datasets.

```
usage: kaggle datasets [-h]
                       {list,files,download,create,version,init,metadata,status} ...

optional arguments:
  -h, --help            show this help message and exit

commands:
  {list,files,download,create,version,init,metadata, status}
    list                List available datasets
    files               List dataset files
    download            Download dataset files
    create              Create a new dataset
    version             Create a new dataset version
    init                Initialize metadata file for dataset creation
    metadata            Download metadata about a dataset
    status              Get the creation status for a dataset
```

##### List datasets

```
usage: kaggle datasets list [-h] [--sort-by SORT_BY] [--min-size MIN_SIZE] [--max-size MAX_SIZE] [--file-type FILE_TYPE] [--license LICENSE_NAME] [--tags TAG_IDS] [-s SEARCH] [-m] [--user USER] [-p PAGE] [-v]

optional arguments:
  -h, --help            show this help message and exit
  --sort-by SORT_BY     Sort list results. Default is 'hottest'. Valid options are 'hottest', 'votes', 'updated', and 'active'
  --max-size MAX_SIZE   Specify the maximum size of the dataset to return (bytes)
  --min-size MIN_SIZE   Specify the minimum size of the dataset to return (bytes)
  --file-type FILE_TYPE Search for datasets with a specific file type. Default is 'all'. Valid options are 'all', 'csv', 'sqlite', 'json', 'parquet', and 'bigQuery'. Please note that bigQuery datasets cannot be downloaded
  --license LICENSE_NAME 
                        Search for datasets with a specific license. Default is 'all'. Valid options are 'all', 'cc', 'gpl', 'odb', and 'other'
  --tags TAG_IDS        Search for datasets that have specific tags. Tag list should be comma separated                      
  -s SEARCH, --search SEARCH
                        Term(s) to search for
  -m, --mine            Display only my items
  --user USER           Find public datasets owned by a specific user or organization
  -p PAGE, --page PAGE  Page number for results paging. Page size is 20 by default
  -v, --csv             Print results in CSV format (if not set then print in table format)
```

Example:

`kaggle datasets list -s demographics`

`kaggle datasets list --sort-by votes`

##### List files for a dataset

```
usage: kaggle datasets files [-h] [-v] [dataset]

required arguments:
  dataset               Dataset URL suffix in format <owner>/<dataset-name> (use "kaggle datasets list" to show options), or <owner>/<dataset-name>/<version-number> for a specific version

optional arguments:
  -h, --help  show this help message and exit
  -v, --csv   Print results in CSV format (if not set then print in table format)
```

Example:

`kaggle datasets files zillow/zecon`

`kaggle datasets files zillow/zecon/3`

##### Download dataset files

```
usage: kaggle datasets download [-h] [-f FILE_NAME] [-p PATH] [-w] [--unzip]
                                [-o] [-q]
                                [dataset]

required arguments:
  dataset               Dataset URL suffix in format <owner>/<dataset-name> (use "kaggle datasets list" to show options), or <owner>/<dataset-name>/<version-number> for a specific version

optional arguments:
  -h, --help            show this help message and exit
  -f FILE_NAME, --file FILE_NAME
                        File name, all files downloaded if not provided
                        (use "kaggle datasets files -d <dataset>" to show options)
  -p PATH, --path PATH  Folder where file(s) will be downloaded, defaults to current working directory
  -w, --wp              Download files to current working path
  --unzip               Unzip the downloaded file. Will delete the zip file when completed.
  -o, --force           Skip check whether local version of file is up to date, force file download
  -q, --quiet           Suppress printing information about the upload/download progress
```


Examples:

`kaggle datasets download zillow/zecon`

`kaggle datasets download zillow/zecon/3`

`kaggle datasets download zillow/zecon -f State_time_series.csv`

Please note that BigQuery datasets cannot be downloaded.

##### Initialize metadata file for dataset creation

```
usage: kaggle datasets init [-h] [-p FOLDER]

optional arguments:
  -h, --help            show this help message and exit
  -p FOLDER, --path FOLDER
                        Folder where the special dataset-metadata.json file (https://github.com/Kaggle/kaggle-api/wiki/Dataset-Metadata) will be created. Defaults to current working directory
```

Example:

`kaggle datasets init -p /path/to/dataset`

##### Create a new dataset

If you want to create a new dataset, you need to initialize metadata file at first. You could fulfill this by running `kaggle datasets init` as describe above.

```
usage: kaggle datasets create [-h] [-p FOLDER] [-u] [-q] [-t] [-r {skip,zip,tar}]

optional arguments:
  -h, --help            show this help message and exit
  -p FOLDER, --path FOLDER
                        Folder for upload, containing data files and a special dataset-metadata.json file (https://github.com/Kaggle/kaggle-api/wiki/Dataset-Metadata). Defaults to current working directory
  -u, --public          Create publicly (default is private)
  -q, --quiet           Suppress printing information about the upload/download progress
  -t, --keep-tabular    Do not convert tabular files to CSV (default is to convert)
  -r {skip,zip,tar}, --dir-mode {skip,zip,tar}
                        What to do with directories: "skip" - ignore; "zip" - compressed upload; "tar" - uncompressed upload
```

Example:

`kaggle datasets create -p /path/to/dataset`

##### Create a new dataset version

```
usage: kaggle datasets version [-h] -m VERSION_NOTES [-p FOLDER] [-q] [-t]
                               [-r {skip,zip,tar}] [-d]

required arguments:
  -m VERSION_NOTES, --message VERSION_NOTES
                        Message describing the new version

optional arguments:
  -h, --help            show this help message and exit
  -p FOLDER, --path FOLDER
                        Folder for upload, containing data files and a special dataset-metadata.json file (https://github.com/Kaggle/kaggle-api/wiki/Dataset-Metadata). Defaults to current working directory
  -q, --quiet           Suppress printing information about the upload/download progress
  -t, --keep-tabular    Do not convert tabular files to CSV (default is to convert)
  -r {skip,zip,tar}, --dir-mode {skip,zip,tar}
                        What to do with directories: "skip" - ignore; "zip" - compressed upload; "tar" - uncompressed upload
  -d, --delete-old-versions
                        Delete old versions of this dataset
```

Example:

`kaggle datasets version -p /path/to/dataset -m "Updated data"`


##### Download metadata for an existing dataset

```
usage: kaggle datasets metadata [-h] [-p PATH] [dataset]

required arguments:
  dataset               Dataset URL suffix in format <owner>/<dataset-name> (use "kaggle datasets list" to show options)

optional arguments:
  -h, --help            show this help message and exit
  -p PATH, --path PATH  Location to download dataset metadata to. Defaults to current working directory
```

Example:

`kaggle datasets metadata -p /path/to/download zillow/zecon`


##### Get dataset creation status 

```
usage: kaggle datasets status [-h] [dataset]

required arguments:
  dataset     Dataset URL suffix in format <owner>/<dataset-name> (use "kaggle datasets list" to show options)

optional arguments:
  -h, --help  show this help message and exit
```

Example:

`kaggle datasets status zillow/zecon`

##### Delete a dataset

```
usage: kaggle datasets delete [-y] [-h] [dataset]

required arguments:
  dataset     Dataset URL suffix in format <owner>/<dataset-name> (use "kaggle datasets list" to show options)

optional arguments:
  -h, --help  show this help message and exit
  -ym --yes   Sets any confirmation values to "yes" automatically. Users will not be asked to confirm.
```

Example:

`kaggle datasets delete -y lastplacelarry/testdataset`

### Kernels

The API supports the following commands for Kaggle Kernels.

```
usage: kaggle kernels [-h] {list,init,push,pull,output,status} ...

optional arguments:
  -h, --help            show this help message and exit

commands:
  {get,list,init,push,pull,output,status}
    get                 Get the code for a kernel (formerly pull)
    list                List available kernels
    init                Initialize metadata file for a kernel
    push                Deprecated by update: Push new code to a kernel and run the kernel
    pull                Deprecated by get: Pull down code from a kernel
    output              Get data output from the latest kernel run
    update              Update a kernel with new code and run it (formerly push)
    status              Display the status of the latest kernel run
```

##### Get a kernel

```
usage: kaggle kernels get [-h] [-p PATH] [-w] [-m] [kernel]

optional arguments:
  -h, --help            show this help message and exit
  kernel                Kernel URL suffix in format <owner>/<kernel-name> (use "kaggle kernels list" to show options)
  -p PATH, --path PATH  Folder where file(s) will be downloaded, defaults to current working directory
  -w, --wp              Download files to current working path
  -m, --metadata        Generate metadata when pulling kernel
```

Example:

`kaggle kernels get rtatman/list-of-5-day-challenges -p /path/to/dest`

##### List kernels

```
usage: kaggle kernels list [-h] [-m] [-p PAGE] [--page-size PAGE_SIZE] [-s SEARCH] [-v]
                           [--parent PARENT] [--competition COMPETITION]
                           [--dataset DATASET]
                           [--user USER] [--language LANGUAGE]
                           [--kernel-type KERNEL_TYPE]
                           [--output-type OUTPUT_TYPE] [--sort-by SORT_BY]

optional arguments:
  -h, --help            show this help message and exit
  -m, --mine            Display only my items
  -p PAGE, --page PAGE  Page number for results paging. Page size is 20 by default
  --page-size PAGE_SIZE Number of items to show on a page. Default size is 20, max is 100
  -s SEARCH, --search SEARCH
                        Term(s) to search for
  -v, --csv             Print results in CSV format (if not set then print in table format)
  --parent PARENT       Find children of the specified parent kernel
  --competition COMPETITION
                        Find kernels for a given competition
  --dataset DATASET     Find kernels for a given dataset
  --user USER           Find kernels created by a given user
  --language LANGUAGE   Specify the language the kernel is written in. Default is 'all'. Valid options are 'all', 'python', 'r', 'sqlite', and 'julia'
  --kernel-type KERNEL_TYPE
                        Specify the type of kernel. Default is 'all'. Valid options are 'all', 'script', and 'notebook'
  --output-type OUTPUT_TYPE
                        Search for specific kernel output types. Default is 'all'. Valid options are 'all', 'visualizations', and 'data'
  --sort-by SORT_BY     Sort list results. Default is 'hotness'.  Valid options are 'hotness', 'commentCount', 'dateCreated', 'dateRun', 'relevance', 'scoreAscending', 'scoreDescending', 'viewCount', and 'voteCount'. 'relevance' is only applicable if a search term is specified.
```

Example:

`kaggle kernels list -s titanic`

`kaggle kernels list --language python`

##### Initialize metadata file for a kernel

```
usage: kaggle kernels init [-h] [-p FOLDER]

optional arguments:
  -h, --help            show this help message and exit
  -p FOLDER, --path FOLDER
                        Folder where the special kernel-metadata.json file (https://github.com/Kaggle/kaggle-api/wiki/Kernel-Metadata) will be created. Defaults to current working directory
```

Example:

`kaggle kernels init -p /path/to/folder`

##### Push a kernel (deprecated, use update)

```
usage: kaggle kernels push [-h] -p FOLDER

optional arguments:
  -h, --help            show this help message and exit
  -t N, --timeout N     Limit the run time of a kernel to the given number  of seconds.
                        The global maximum time will not be exceeded.
  -p FOLDER, --path FOLDER
                        Folder for upload, containing data files and a special kernel-metadata.json file (https://github.com/Kaggle/kaggle-api/wiki/Kernel-Metadata). Defaults to current working directory
```

Example:

`kaggle kernels push -p /path/to/folder`

##### Pull a kernel (deprecated, use get)

```
usage: kaggle kernels pull [-h] [-p PATH] [-w] [-m] [kernel]

optional arguments:
  -h, --help            show this help message and exit
  kernel                Kernel URL suffix in format <owner>/<kernel-name> (use "kaggle kernels list" to show options)
  -p PATH, --path PATH  Folder where file(s) will be downloaded, defaults to current working directory
  -w, --wp              Download files to current working path
  -m, --metadata        Generate metadata when pulling kernel
```

Example:

`kaggle kernels pull rtatman/list-of-5-day-challenges -p /path/to/dest`

##### Retrieve a kernel's output

```
usage: kaggle kernels output [-h] [-p PATH] [-w] [-o] [-q] [kernel]

required arguments:
  kernel      Kernel URL suffix in format <owner>/<kernel-name> (use "kaggle kernels list" to show options)

optional arguments:
  -h, --help            show this help message and exit
  -p PATH, --path PATH  Folder where file(s) will be downloaded, defaults to current working directory
  -w, --wp              Download files to current working path
  -o, --force           Skip check whether local version of file is up to date, force file download
  -q, --quiet           Suppress printing information about the upload/download progress
```

Example:

`kaggle kernels output mrisdal/exploring-survival-on-the-titanic -p /path/to/dest`

##### Update a kernel

```
usage: kaggle kernels update [-h] -p FOLDER

This command should only be used after "get". Use "create" to create a new kernel.

optional arguments:
  -h, --help            show this help message and exit
  -t N, --timeout N     Limit the run time of a kernel to the given number  of seconds.
                        The global maximum time will not be exceeded.
  -p FOLDER, --path FOLDER
                        Folder for upload, containing data files and a special kernel-metadata.json file
                        (https://github.com/Kaggle/kaggle-api/wiki/Kernel-Metadata).
                        Defaults to current working directory
```

Example:

`kaggle kernels update -p /path/to/folder`

##### Get the status of the latest kernel run

```
usage: kaggle kernels status [-h] [kernel]

required arguments:
  kernel      Kernel URL suffix in format <owner>/<kernel-name> (use "kaggle kernels list" to show options)

optional arguments:
  -h, --help  show this help message and exit
```

Example:

`kaggle kernels status mrisdal/exploring-survival-on-the-titanic`

### Models

The API supports the following commands for Kaggle Models.

```
usage: kaggle models [-h]
                     {get, list, init, create} ...

optional arguments:
  -h, --help            show this help message and exit

commands:
  {get, list, init, create}
    get                 Get the model
    list                List models
    init                Initialize metadata file for model creation
    create              Create a new model
```

##### Get model

```
usage: kaggle models get [-h] [-p FOLDER] [model]

required arguments:
  model                 Model URL suffix in format <owner>/<model-name>

optional arguments:
  -h, --help            show this help message and exit
  -p PATH, --path PATH  Folder where the special model-metadata.json file (https://github.com/Kaggle/kaggle-api/wiki/Model-Metadata) will be downloaded (if specified).
```

Example:

`kaggle models get tensorflow/toxicity`

##### List models

```
usage: kaggle models list [--sort-by SORT_BY] [-s SEARCH] [--owner OWNER] [--page-token PAGE_TOKEN] [--page-size PAGE_SIZE] [--csv]

optional arguments:
  -h, --help            show this help message and exit
  --sort-by SORT_BY     Sort list results. Default is 'hotness'. Valid options are 'hotness', 'downloadCount', 'voteCount', 'notebookCount' and 'createTime'
  -s SEARCH, --search SEARCH
                        Term(s) to search for
  --owner OWNER         Find models owned by a specific user or organization
  --page-token PAGE_TOKEN  
                        Page token for pagination
  --page-size PAGE_SIZE Number of items to show on a page. Default size is 20, max is 50
  -v, --csv             Print results in CSV format (if not set then print in table format)
```

Example:

`kaggle models list -s llm`

`kaggle models list --sort-by downloadCount`

##### Initialize metadata file for a model

```
usage: kaggle models init [-h] [-p FOLDER]

optional arguments:
  -h, --help            show this help message and exit
  -p FOLDER, --path FOLDER
                        Folder to create the model-metadata.json file (https://github.com/Kaggle/kaggle-api/wiki/Model-Metadata). Defaults to current working directory
```

Example:

`kaggle models init -p /path/to/model`

##### Create a new model

If you want to create a new model, you need to initiate metadata file at first. You could fulfill this by running `kaggle models init` as describe above.

```
usage: kaggle models create [-h] [-p FOLDER]

optional arguments:
  -h, --help            show this help message and exit
  -p FOLDER, --path FOLDER
                        Folder containing the special model-metadata.json file (https://github.com/Kaggle/kaggle-api/wiki/Model-Metadata). Defaults to current working directory
```

Example:

`kaggle models create -p /path/to/model`

##### Delete model

```
usage: kaggle models delete [-h] [model]

required arguments:
  model       Model URL suffix in format <owner>/<model-name>

optional arguments:
  -h, --help  show this help message and exit
```

Example:

`kaggle models delete tensorflow/toxicity`

##### Update a model

If you want to update a model, you need a metadata file at first. You can fetch the data by running `kaggle models get owner/slug -p folder`.

```
usage: kaggle models update [-h] [-p FOLDER]

optional arguments:
  -h, --help            show this help message and exit
  -p FOLDER, --path FOLDER
                        Folder containing the special model-metadata.json file (https://github.com/Kaggle/kaggle-api/wiki/Model-Metadata). Defaults to current working directory
```

Example:

`kaggle models update -p /path/to/model`

#### Model Instances

The API supports the following commands for Kaggle Model Instances.

```
usage: kaggle models instances [-h]
                             {init, create, delete, update} ...

optional arguments:
  -h, --help            show this help message and exit

commands:
  {get, init, create, delete}
    get                 Get a model instance
    init                Initialize metadata file for model instance creation
    create              Create a new model instance
    delete              Delete a model instance
    update              Update a model instance
```

##### Get model instance

```
usage: kaggle models instances get [-h] [-p FOLDER] [modelInstance]

required arguments:
  modelInstance         Model Instance URL suffix in format <owner>/<model-name>/<framework>/<instance-slug>

optional arguments:
  -h, --help            show this help message and exit
  -p PATH, --path PATH  Folder where the special model-instance-metadata.json file (https://github.com/Kaggle/kaggle-api/wiki/Model-Metadata) will be downloaded (if specified).
```

Example:

`kaggle models instances get tensorflow/toxicity/tfjs/default`

##### Initialize metadata file for a model instance

```
usage: kaggle models instances init [-h] [-p FOLDER]

optional arguments:
  -h, --help            show this help message and exit
  -p FOLDER, --path FOLDER
                        Folder to create the model-instance-metadata.json file (https://github.com/Kaggle/kaggle-api/wiki/Model-Metadata). Defaults to current working directory
```

Example:

`kaggle models instances init -p /path/to/modelinstance`

##### Create a new model instance

If you want to create a new model instance, you need to initialize metadata file at first. You could fulfill this by running `kaggle models instances init` as describe above.

```
usage: kaggle models instances create [-h] [-p FOLDER]

optional arguments:
  -h, --help            show this help message and exit
  -p FOLDER, --path FOLDER
                        Folder containing the special model-instance-metadata.json file (https://github.com/Kaggle/kaggle-api/wiki/Model-Metadata). Defaults to current working directory
  -q, --quiet           Suppress printing information about the upload progress
  -r {skip,zip,tar}, --dir-mode {skip,zip,tar}
                        What to do with directories: "skip" - ignore; "zip" - compressed upload; "tar" - uncompressed upload
```

Example:

`kaggle models instances create -p /path/to/modelinstance`

##### Delete model instance

```
usage: kaggle models instances delete [-h] [modelInstance]

required arguments:
  modelInstance         Model Instance URL suffix in format <owner>/<model-name>/<framework>/<instance-slug>

optional arguments:
  -h, --help     show this help message and exit
```

Example:

`kaggle models instances delete tensorflow/toxicity/tfjs/default`

##### Update a model instance

If you want to update a model instance, you need a metadata file at first. You can fetch the data by running `kaggle models instances get owner-slug/model-slug/framework/instance-slug -p folder`.

```
usage: kaggle models instances update [-h] [-p FOLDER]

optional arguments:
  -h, --help            show this help message and exit
  -p FOLDER, --path FOLDER
                        Folder containing the special model-instance-metadata.json file (https://github.com/Kaggle/kaggle-api/wiki/Model-Metadata). Defaults to current working directory
```

Example:

`kaggle models instances update -p /path/to/model`

#### Model Instance Versions

The API supports the following commands for Kaggle Model Instance Versions.

```
usage: kaggle models instances versions [-h]
                             {init, create, download, delete} ...

optional arguments:
  -h, --help            show this help message and exit

commands:
  {create, download, delete}
    create              Create a new model instance version
    download            Download a model instance version
    delete              Delete a model instance version
```

##### Create a new model instance version

```
usage: kaggle models instances versions create [-h] [modelInstance] [-p FOLDER] [-n NOTES]

required arguments:
  modelInstance         Model Instance URL suffix in format <owner>/<model-name>/<framework>/<instance-slug>

optional arguments:
  -h, --help            show this help message and exit
  -p FOLDER, --path FOLDER
                        Folder containing the model files to upload
  -n, --version-notes NOTES
                        Version notes to record for this new version
  -q, --quiet           Suppress printing information about the upload progress
  -r {skip,zip,tar}, --dir-mode {skip,zip,tar}
                        What to do with directories: "skip" - ignore; "zip" - compressed upload; "tar" - uncompressed upload
```

Example:

`kaggle models instances versions create tensorflow/toxicity/tfjs/default -p /path/to/files -n "updated weights"`

##### Download a model instance version

```
usage: kaggle models instances versions download [-h] [-p PATH] [--untar] [-f] [-q] [modelInstanceVersion]

required arguments:
  modelInstanceVersion  Model Instance version URL suffix in format <owner>/<model-name>/<framework>/<instance-slug>/<version_number>

optional arguments:
  -h, --help            show this help message and exit
  -p PATH, --path PATH  Folder where file(s) will be downloaded, defaults to current working directory
  --untar               Untar the downloaded file. Will delete the tar file when completed.
  -f, --force           Skip check whether local version of file is up to date, force file download
  -q, --quiet           Suppress printing information about the download progress
```


Examples:

`kaggle models instances versions download tensorflow/toxicity/tfjs/default/1`

##### Delete model instance

```
usage: kaggle models instances versions delete [-h] [modelInstanceVersion]

required arguments:
  modelInstanceVersion  Model Instance version URL suffix in format <owner>/<model-name>/<framework>/<instance-slug>/<version_number>

optional arguments:
  -h, --help            show this help message and exit
```

Example:

`kaggle models instances versions delete tensorflow/toxicity/tfjs/default/1`

### Config

The API supports the following commands for configuration.

```
usage: kaggle config [-h] {view,set,unset} ...

optional arguments:
  -h, --help        show this help message and exit

commands:
  {view,set,unset}
    view            View current config values
    set             Set a configuration value
    unset           Clear a configuration value
```

##### View current config values

```
usage: kaggle config path [-h] [-p PATH]

optional arguments:
  -h, --help            show this help message and exit
  -p PATH, --path PATH  folder where file(s) will be downloaded, defaults to current working directory
```

Example:

`kaggle config path -p C:\`

##### View current config values

```
usage: kaggle config view [-h]

optional arguments:
  -h, --help  show this help message and exit
```

Example:

`kaggle config view`


##### Set a configuration value

```
usage: kaggle config set [-h] -n NAME -v VALUE

required arguments:
  -n NAME, --name NAME  Name of the configuration parameter
                        (one of competition, path, proxy)
  -v VALUE, --value VALUE
                        Value of the configuration parameter, valid values depending on name
                        - competition: Competition URL suffix (use "kaggle competitions list" to show options)
                        - path: Folder where file(s) will be downloaded, defaults to current working directory
                        - proxy: Proxy for HTTP requests
```

Example:

`kaggle config set -n competition -v titanic`


##### Clear a configuration value

```
usage: kaggle config unset [-h] -n NAME

required arguments:
  -n NAME, --name NAME  Name of the configuration parameter
                        (one of competition, path, proxy)
```

Example:

`kaggle config unset -n competition`

## License

The Kaggle API is released under the [Apache 2.0 license](../LICENSE.txt).

---

# Kaggle API

Official API for https://www.kaggle.com, accessible using a command line tool implemented in Python 3.  

[User documentation](docs/README.md)

## Installation

Ensure you have Python 3 and the package manager `pip` installed.

Run the following command to access the Kaggle API using the command line:

```sh
pip install kaggle
```

## Development

### Kaggle Internal

Obviously, this depends on Kaggle services. When you're extending the API and modifying
or adding to those services, you should be working in your Kaggle mid-tier development
environment. You'll run Kaggle locally, in the container, and test the Python code by
running it in the container so it can connect to your local testing environment.

### Prerequisites

We use [hatch](https://hatch.pypa.io) to manage this project.

Follow these [instructions](https://hatch.pypa.io/latest/install/) to install it.

If you are working in a managed environment, you may want to use `pipx`. If it isn't already installed
try `sudo apt install pipx`. Then you should be able to proceed with `pipx install hatch`.

### Dependencies

```sh
hatch run install-deps
```

### Compile

```sh
hatch run compile
```

The compiled files are generated in the `kaggle/` directory from the `src/` directory.

All the changes must be done in the `src/` directory.

### Run

Use `hatch run install` to compile the program and install it in the default `hatch` environment.
To run that version locally for testing, use hatch: `hatch run kaggle -v`. If you'd rather not
type `hatch run` every time, launch a new shell in the hatch environment: `hatch shell`.

You can also run the code in python directly:

```sh
hatch run python
```

```python
import kaggle
from kaggle.api.kaggle_api_extended import KaggleApi
api = KaggleApi()
api.authenticate()
api.model_list_cli()

Next Page Token = [...]
[...]

```

Or in a single command:

```sh
hatch run python -c "import kaggle; from kaggle.api.kaggle_api_extended import KaggleApi; api = KaggleApi(); api.authenticate(); api.model_list_cli()"
```

### Example

Let's change the `model_list_cli` method in the source file: 

```sh
❯ git diff src/kaggle/api/kaggle_api_extended.py
[...]
+        print('hello Kaggle CLI update')^M
         models = self.model_list(sort_by, search, owner, page_size, page_token)
[...]

❯ hatch run compile
[...]

❯ hatch run python -c "import kaggle; from kaggle.api.kaggle_api_extended import KaggleApi; api = KaggleApi(); api.authenticate(); api.model_list_cli()"
hello Kaggle CLI update
Next Page Token = [...]
```

### Integration Tests

To run integration tests on your local machine, you need to set up your Kaggle API credentials. You can do this in one of these two ways described [this doc](docs/README.md). Refer to the sections: 
- Using environment variables
- Using credentials file

After setting up your credentials by any of these methods, you can run the integration tests as follows:

```sh
# Run all tests
hatch run integration-test
```

## License

The Kaggle API is released under the [Apache 2.0 license](LICENSE).

---

# kagglehub

The `kagglehub` library provides a simple way to interact with Kaggle resources such as datasets, models, notebook outputs in Python.

This library also integrates natively with the Kaggle notebook environment. This means the behavior differs when you download a Kaggle resource with `kagglehub` in the Kaggle notebook environment:

* In a Kaggle notebook:
    * The resource is automatically attached to your Kaggle notebook.
    * The resource will be shown under the "Input" panel in the Kaggle notebook editor.
    * The resource files are served from the shared Kaggle resources cache (not using the VM's disk).
* Outside a Kaggle notebook:
    * The resource files are downloaded to a local [cache folder](#change-the-default-cache-folder).

## Installation

Install the `kagglehub` package with pip:

```
pip install kagglehub
```

## Usage

### Authenticate

> [!NOTE]
> `kagglehub` is authenticated by default when running in a Kaggle notebook.

Authenticating is **only** needed to access public resources requiring user consent or private resources.

First, you will need a Kaggle account. You can sign up [here](https://www.kaggle.com/account/login).

After login, you can download your Kaggle API credentials at https://www.kaggle.com/settings by clicking on the "Create New Token" button under the "API" section.

You have four different options to authenticate. Note that if you use `kaggle-api` (the `kaggle` command-line tool) you have
already done Option 3 and can skip this.

#### Option 1: Calling kagglehub.login()

This will prompt you to enter your username and token:

```python
import kagglehub

kagglehub.login()
```

#### Option 2: Read credentials from environment variables

You can also choose to export your Kaggle username and token to the environment:

```sh
export KAGGLE_USERNAME=datadinosaur
export KAGGLE_KEY=xxxxxxxxxxxxxx
```

#### Option 3: Read credentials from `kaggle.json`

Store your `kaggle.json` credentials file at `~/.kaggle/kaggle.json`.

Alternatively, you can set the `KAGGLE_CONFIG_DIR` environment variable to change this location to `$KAGGLE_CONFIG_DIR/kaggle.json`.

Note for Windows users: The default directory is `%HOMEPATH%/kaggle.json`.

#### Option 4: Read credentials from Google Colab secrets

Store your username and key token as Colab secrets `KAGGLE_USERNAME` and `KAGGLE_KEY`.

Instructions on adding secrets in both Colab and Colab Enterprise can be found in [this article](https://www.googlecloudcommunity.com/gc/Cloud-Hub/How-do-I-add-secrets-in-Google-Colab-Enterprise/m-p/784866).

### Download Model

The following examples download the `answer-equivalence-bem` variation of this Kaggle model: https://www.kaggle.com/models/google/bert/tensorFlow2/answer-equivalence-bem

```python
import kagglehub

# Download the latest version.
kagglehub.model_download('google/bert/tensorFlow2/answer-equivalence-bem')

# Download a specific version.
kagglehub.model_download('google/bert/tensorFlow2/answer-equivalence-bem/1')

# Download a single file.
kagglehub.model_download('google/bert/tensorFlow2/answer-equivalence-bem', path='variables/variables.index')

# Download a model or file, even if previously downloaded to cache.
kagglehub.model_download('google/bert/tensorFlow2/answer-equivalence-bem', force_download=True)
```

### Upload Model
Uploads a new variation (or a new variation's version if it already exists).

```python
import kagglehub

# For example, to upload a new variation to this model:
# - https://www.kaggle.com/models/google/bert/tensorFlow2/answer-equivalence-bem
# 
# You would use the following handle: `google/bert/tensorFlow2/answer-equivalence-bem`
handle = '<KAGGLE_USERNAME>/<MODEL>/<FRAMEWORK>/<VARIATION>'
local_model_dir = 'path/to/local/model/dir'

kagglehub.model_upload(handle, local_model_dir)

# You can also specify some version notes (optional)
kagglehub.model_upload(handle, local_model_dir, version_notes='improved accuracy')

# You can also specify a license (optional)
kagglehub.model_upload(handle, local_model_dir, license_name='Apache 2.0')

# You can also specify a list of patterns for files/dirs to ignore.
# These patterns are combined with `kagglehub.models.DEFAULT_IGNORE_PATTERNS` 
# to determine which files and directories to exclude. 
# To ignore entire directories, include a trailing slash (/) in the pattern.
kagglehub.model_upload(handle, local_model_dir, ignore_patterns=["original/", "*.tmp"])
```

### Load Dataset

Loads a file from a Kaggle Dataset into a python object based on the selected `KaggleDatasetAdapter`:
- `KaggleDatasetAdapter.PANDAS` &rarr; [pandas DataFrame](https://pandas.pydata.org/docs/reference/frame.html)
  (or multiple given certain files/settings)
- `KaggleDatasetAdapter.HUGGING_FACE`&rarr; 
  [Hugging Face Dataset](https://huggingface.co/docs/datasets/main/en/package_reference/main_classes#datasets.Dataset)
- `KaggleDatasetAdapter.POLARS` &rarr; polars [LazyFrame](https://docs.pola.rs/api/python/stable/reference/lazyframe/index.html) or [DataFrame](https://docs.pola.rs/api/python/stable/reference/dataframe/index.html)
  (or multiple given certain files/settings)

**NOTE: To use these adapters, you must install the optional dependencies (or already have them available in your environment)**
- `KaggleDatasetAdapter.PANDAS` &rarr; `pip install kagglehub[pandas-datasets]`
- `KaggleDatasetAdapter.HUGGING_FACE`&rarr; `pip install kagglehub[hf-datasets]`
- `KaggleDatasetAdapter.POLARS`&rarr; `pip install kagglehub[polars-datasets]`

#### `KaggleDatasetAdapter.PANDAS`

This adapter supports the following file types, which map to a corresponding `pandas.read_*` method:
| File Extension                                  | `pandas` Method                                                                                    |
| ----------------------------------------------- | -------------------------------------------------------------------------------------------------- |
| .csv, .tsv[^1]                                  | [`pandas.read_csv`](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html)             |
| .json, .jsonl[^2]                               | [`pandas.read_json`](https://pandas.pydata.org/docs/reference/api/pandas.read_json.html)           |
| .xml                                            | [`pandas.read_xml`](https://pandas.pydata.org/docs/reference/api/pandas.read_xml.html)             |
| .parquet                                        | [`pandas.read_parquet`](https://pandas.pydata.org/docs/reference/api/pandas.read_parquet.html)     |
| .feather                                        | [`pandas.read_feather`](https://pandas.pydata.org/docs/reference/api/pandas.read_feather.html)     |
| .sqlite, .sqlite3, .db, .db3, .s3db, .dl3[^3]   | [`pandas.read_sql_query`](https://pandas.pydata.org/docs/reference/api/pandas.read_sql_query.html) |
| .xls, .xlsx, .xlsm, .xlsb, .odf, .ods, .odt[^4] | [`pandas.read_excel`](https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html)         |

[^1]: For TSV files, `\t` is automatically supplied for the `sep` parameter, but may be overridden with `pandas_kwargs`

[^2]: For JSONL files, `True` is supplied for the `lines` parameter

[^3]: For SQLite files, a `sql_query` must be provided to generate the `DataFrame`(s)

[^4]: The specific file extension will dictate which optional `engine` dependency needs to be installed to read the file

`dataset_load` also supports `pandas_kwargs` which will be passed as keyword arguments to the `pandas.read_*` method. Some examples include:

```python
import kagglehub
from kagglehub import KaggleDatasetAdapter

# Load a DataFrame with a specific version of a CSV
df = kagglehub.dataset_load(
    KaggleDatasetAdapter.PANDAS,
    "unsdsn/world-happiness/versions/1",
    "2016.csv",
)

# Load a DataFrame with specific columns from a parquet file
df = kagglehub.dataset_load(
    KaggleDatasetAdapter.PANDAS,
    "robikscube/textocr-text-extraction-from-images-dataset",
    "annot.parquet",
    pandas_kwargs={"columns": ["image_id", "bbox", "points", "area"]}
)

# Load a dictionary of DataFrames from an Excel file where the keys are sheet names 
# and the values are DataFrames for each sheet's data. NOTE: As written, this requires 
# installing the default openpyxl engine.
df_dict = kagglehub.dataset_load(
    KaggleDatasetAdapter.PANDAS,
    "theworldbank/education-statistics",
    "edstats-excel-zip-72-mb-/EdStatsEXCEL.xlsx",
    pandas_kwargs={"sheet_name": None},
)

# Load a DataFrame using an XML file (with the natively available etree parser)
df = dataset_load(
    KaggleDatasetAdapter.PANDAS,
    "parulpandey/covid19-clinical-trials-dataset",
    "COVID-19 CLinical trials studies/COVID-19 CLinical trials studies/NCT00571389.xml",
    pandas_kwargs={"parser": "etree"},
)

# Load a DataFrame by executing a SQL query against a SQLite DB
df = kagglehub.dataset_load(
    KaggleDatasetAdapter.PANDAS,
    "wyattowalsh/basketball",
    "nba.sqlite",
    sql_query="SELECT person_id, player_name FROM draft_history",
)
```

#### `KaggleDatasetAdapter.HUGGING_FACE`

The Hugging Face `Dataset` provided by this adapater is built exclusively using 
[`Dataset.from_pandas`](https://huggingface.co/docs/datasets/main/en/package_reference/main_classes#datasets.Dataset.from_pandas). 
As a result, all of the file type and `pandas_kwargs` support is the same as 
[`KaggleDatasetAdapter.PANDAS`](#kaggledatasetadapterpandas). Some important things to note about this:

1. Because `Dataset.from_pandas` cannot accept a collection of `DataFrame`s, any attempts to load a file with `pandas_kwargs`
   that produce a collection of `DataFrame`s will result in a raised exception
2. `hf_kwargs` may be provided, which will be passed as keyword arguments to `Dataset.from_pandas`
2. Because the use of `pandas` is transparent when `pandas_kwargs` are not needed, we default to `False` for `preserve_index`&mdash;this 
   can be overridden using `hf_kwargs`

Some examples include:

```python
import kagglehub
from kagglehub import KaggleDatasetAdapter
# Load a Dataset with a specific version of a CSV, then remove a column
dataset = kagglehub.dataset_load(
    KaggleDatasetAdapter.HUGGING_FACE,
    "unsdsn/world-happiness/versions/1",
    "2016.csv",
)
dataset = dataset.remove_columns('Region')

# Load a Dataset with specific columns from a parquet file, then split into test/train splits
dataset = kagglehub.dataset_load(
    KaggleDatasetAdapter.HUGGING_FACE,
    "robikscube/textocr-text-extraction-from-images-dataset",
    "annot.parquet",
    pandas_kwargs={"columns": ["image_id", "bbox", "points", "area"]}
)
dataset_with_splits = dataset.train_test_split(test_size=0.8, train_size=0.2)

# Load a Dataset by executing a SQL query against a SQLite DB, then rename a column
dataset = kagglehub.dataset_load(
    KaggleDatasetAdapter.HUGGING_FACE,
    "wyattowalsh/basketball",
    "nba.sqlite",
    sql_query="SELECT person_id, player_name FROM draft_history",
)
dataset = dataset.rename_column('season', 'year')
```

#### `KaggleDatasetAdapter.POLARS`

This adapter supports the following file types, which map to a corresponding `polars.scan_*` or `polars.read_*` method:
| File Extension                                  | `polars` Method                                                                                                                                                                                                  |
| ----------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| .csv, .tsv[^1]                                  | [`polars.scan_csv`](https://docs.pola.rs/api/python/stable/reference/api/polars.scan_csv.html#polars.scan_csv) or [`polars.read_csv`](https://docs.pola.rs/api/python/stable/reference/api/polars.read_csv.html) |
| .json                                           | [`polars.read_json`](https://docs.pola.rs/api/python/stable/reference/api/polars.read_json.html)                                                                                                                 |
| .jsonl                                          | [`polars.scan_ndjson`](https://docs.pola.rs/api/python/stable/reference/api/polars.scan_ndjson.html) or [`polars.read_ndjson`](https://docs.pola.rs/api/python/stable/reference/api/polars.read_ndjson.html)     |
| .parquet                                        | [`polars.scan_parquet`](https://docs.pola.rs/api/python/stable/reference/api/polars.scan_parquet.html) or [`polars.read_parquet`](https://docs.pola.rs/api/python/stable/reference/api/polars.read_parquet.html) |
| .feather                                        | [`polars.scan_ipc`](https://docs.pola.rs/api/python/stable/reference/api/polars.scan_ipc.html) or [`polars.read_ipc`](https://docs.pola.rs/api/python/stable/reference/api/polars.read_ipc.html)                 |
| .sqlite, .sqlite3, .db, .db3, .s3db, .dl3[^2]   | [`polars.read_database`](https://docs.pola.rs/api/python/stable/reference/api/polars.read_database.html)                                                                                                         |
| .xls, .xlsx, .xlsm, .xlsb, .odf, .ods, .odt[^3] | [`polars.read_excel`](https://docs.pola.rs/api/python/stable/reference/api/polars.read_excel.html)                                                                                                               |

[^1]: For TSV files, `\t` is automatically supplied for the `separator` parameter, but may be overridden with `polars_kwargs`

[^2]: For SQLite files, a `sql_query` must be provided to generate the `DataFrame`(s)

[^3]: The specific file extension may dictate which optional `engine` dependency needs to be installed to read the file

`dataset_load` also supports `polars_kwargs` which will be passed as keyword arguments to the `polars.scan_*` or `polars_read_*` method.

##### `LazyFrame` vs `DataFrame`

Per polars documentation, [LazyFrame](https://docs.pola.rs/api/python/stable/reference/lazyframe/index.html) "allows for whole-query optimisation in addition to parallelism, and is the preferred (and highest-performance) mode of operation for polars." As such, `scan_*` methods are used by default whenever possible--and when not possible the result of the `read_*` method is returned after calling [`.lazy()`](https://docs.pola.rs/api/python/stable/reference/dataframe/api/polars.DataFrame.lazy.html). If a [DataFrame](https://docs.pola.rs/api/python/stable/reference/dataframe/index.html) is preferred, `dataset_load` supports an optional `polars_frame_type` and `PolarsFrameType.DATA_FRAME` may be passed in. This will force a `read_*` method to be used with no `.lazy()` call. **NOTE:** For file types that support `scan_*`, changing the `polars_frame_type` may affect which `polars_kwargs` are acceptable to the underlying method since it will force a `read_*` method to be used rather than a `scan_*` method.

Some examples include:

```python
import kagglehub
from kagglehub import KaggleDatasetAdapter, PolarsFrameType

# Load a LazyFrame with a specific version of a CSV
lf = kagglehub.dataset_load(
    KaggleDatasetAdapter.POLARS,
    "unsdsn/world-happiness/versions/1",
    "2016.csv",
)

# Load a LazyFramefrom a parquet file, then select specific columns
lf = kagglehub.dataset_load(
    KaggleDatasetAdapter.POLARS,
    "robikscube/textocr-text-extraction-from-images-dataset",
    "annot.parquet",
)
lf.select(["image_id", "bbox", "points", "area"]).collect()

# Load a DataFrame with specific columns from a parquet file
df = kagglehub.dataset_load(
    KaggleDatasetAdapter.POLARS,
    "robikscube/textocr-text-extraction-from-images-dataset",
    "annot.parquet",
    polars_frame_type=PolarsFrameType.DATA_FRAME,
    polars_kwargs={"columns": ["image_id", "bbox", "points", "area"]}
)

# Load a dictionary of LazyFrames from an Excel file where the keys are sheet names 
# and the values are LazyFrames for each sheet's data. NOTE: As written, this requires 
# installing the default fastexcel engine.
lf_dict = kagglehub.dataset_load(
    KaggleDatasetAdapter.POLARS,
    "theworldbank/education-statistics",
    "edstats-excel-zip-72-mb-/EdStatsEXCEL.xlsx",
    # sheet_id of 0 returns all sheets
    polars_kwargs={"sheet_id": 0},
)

# Load a LazyFrame by executing a SQL query against a SQLite DB
lf = kagglehub.dataset_load(
    KaggleDatasetAdapter.POLARS,
    "wyattowalsh/basketball",
    "nba.sqlite",
    sql_query="SELECT person_id, player_name FROM draft_history",
)
```

### Download Dataset

The following examples download the `Spotify Recommendation` Kaggle dataset: https://www.kaggle.com/datasets/bricevergnou/spotify-recommendation

```python
import kagglehub

# Download the latest version.
kagglehub.dataset_download('bricevergnou/spotify-recommendation')

# Download a specific version.
kagglehub.dataset_download('bricevergnou/spotify-recommendation/versions/1')

# Download a single file.
kagglehub.dataset_download('bricevergnou/spotify-recommendation', path='data.csv')

# Download a dataset or file, even if previously downloaded to cache.
kagglehub.dataset_download('bricevergnou/spotify-recommendation', force_download=True)
```

### Upload Dataset

Uploads a new dataset (or a new version if it already exists).

```python
import kagglehub

# For example, to upload a new dataset (or version) at:
# - https://www.kaggle.com/datasets/bricevergnou/spotify-recommendation
# 
# You would use the following handle: `bricevergnou/spotify-recommendation`
handle = '<KAGGLE_USERNAME>/<DATASET>'
local_dataset_dir = 'path/to/local/dataset/dir'

# Create a new dataset
kagglehub.dataset_upload(handle, local_dataset_dir)

# You can then create a new version of this existing dataset and include version notes (optional).
kagglehub.dataset_upload(handle, local_dataset_dir, version_notes='improved data')

# You can also specify a list of patterns for files/dirs to ignore.
# These patterns are combined with `kagglehub.datasets.DEFAULT_IGNORE_PATTERNS` 
# to determine which files and directories to exclude. 
# To ignore entire directories, include a trailing slash (/) in the pattern.
kagglehub.dataset_upload(handle, local_dataset_dir, ignore_patterns=["original/", "*.tmp"])
```

### Download Competition

The following examples download the `Digit Recognizer` Kaggle competition: https://www.kaggle.com/competitions/digit-recognizer

```python
import kagglehub

# Download the latest version.
kagglehub.competition_download('digit-recognizer')

# Download a single file.
kagglehub.competition_download('digit-recognizer', path='train.csv')

# Download a competition or file, even if previously downloaded to cache. 
kagglehub.competition_download('digit-recognizer', force_download=True)
```

### Download Notebook Outputs

The following examples download the `Titanic Tutorial` notebook output: https://www.kaggle.com/code/alexisbcook/titanic-tutorial

```python
import kagglehub

# Download the latest version.
kagglehub.notebook_output_download('alexisbcook/titanic-tutorial')

# Download a specific version of the notebook output.
kagglehub.notebook_output_download('alexisbcook/titanic-tutorial/versions/1')

# Download a single file.
kagglehub.notebok_output_download('alexisbcook/titanic-tutorial', path='submission.csv')
```

### Install Utility Script

The following example installs the utility script `Physionet Challenge Utility Script` Utility Script: https://www.kaggle.com/code/bjoernjostein/physionet-challenge-utility-script. Using this command allows the code from this script to be available in your python environment.

```python
import kagglehub

# Install the latest version.
kagglehub.utility_script_install('bjoernjostein/physionet-challenge-utility-script')

```

### Options

#### Change the default cache folder

By default, `kagglehub` downloads files to your home folder at `~/.cache/kagglehub/`.

You can override this path by setting the `KAGGLEHUB_CACHE` environment variable.

## Development

### Prequisites

We use [hatch](https://hatch.pypa.io) to manage this project.

Follow these [instructions](https://hatch.pypa.io/latest/install/) to install it.

### Tests

```sh
# Run all tests for current Python version.
hatch test

# Run all tests for all Python versions.
hatch test --all

# Run all tests for a specific Python version.
hatch test -py 3.11

# Run a single test file
hatch test tests/test_<SOME_FILE>.py
```

### Integration Tests

To run integration tests on your local machine, you need to set up your Kaggle API credentials. You can do this in one of these two ways described in the earlier sections of this document. Refer to the sections: 
- [Using environment variables](#option-2-read-credentials-from-environment-variables)
- [Using credentials file](#option-3-read-credentials-from-kagglejson)

After setting up your credentials by any of these methods, you can run the integration tests as follows:

```sh
# Run all tests
hatch test integration_tests
```


### Run `kagglehub` from source

#### Option 1: Execute a one-liner of code from the command line

```sh
# Download a model & print the path
hatch run python -c "import kagglehub; print('path: ', kagglehub.model_download('google/bert/tensorFlow2/answer-equivalence-bem'))"
```

#### Option 2: Run a saved script from the /tools/scripts directory

```sh
# This runs the same code as the one-liner above, but reads it from a 
# checked in script located at tool/scripts/download_model.py
hatch run python tools/scripts/download_model.py
```

#### Option 3: Run a temporary script from the root of the repo

Any script created at the root of the repo is gitignore'd, so they're
just temporary scripts for testing in development. Placing temporary 
scripts at the root makes the run command easier to use during local 
development.

```sh
# Test out some new changes
hatch run python test_new_feature.py
```

### Lint / Format

```sh
# Lint check
hatch run lint:style
hatch run lint:typing
hatch run lint:all     # for both

# Format
hatch run lint:fmt
```

### Coverage report

```sh
hatch test --cover
```

### Build

```sh
hatch build
```

### Running `hatch` commands inside Docker

This is useful to run in a consistent environment and easily switch between Python versions.

The following shows how to run `hatch run lint:all` but this also works for any other hatch commands:

```
# Use default Python version
./docker-hatch run lint:all

# Use specific Python version (Must be a valid tag from: https://hub.docker.com/_/python)
./docker-hatch -v 3.10 run lint:all

# Run test in docker with specific Python version
./docker-hatch -v 3.10 test

# Run python from specific environment (e.g. one with optional dependencies installed)
./docker-hatch run extra-deps-env:python -c "print('hello world')"

# Run commands with other root-level hatch options (everything after -- gets passed to hatch)
./docker-hatch -v 3.10 -- -v env create debug-env-with-verbose-logging
```

## VS Code setup

### Prerequisites
Install the recommended extensions.

### Instructions

Configure hatch to create virtual env in project folder.
```
hatch config set dirs.env.virtual .env
```

After, create all the python environments needed by running `hatch test --all`.

Finally, configure vscode to use one of the selected environments:
`cmd + shift + p` -> `python: Select Interpreter` -> Pick one of the folders in `./.env`

## Support

The kagglehub library has configured automatic logging for console. For file based logging, setting the `KAGGLE_LOGGING_ENABLED=1` environment variable will output logs to a directory. The default log destination is resolved via the [os.path.expanduser](https://docs.python.org/3/library/os.path.html#os.path.expanduser)

The table below contains possible locations:
| os      | log path                                         |
| ------- | ------------------------------------------------ |
| osx     | /user/$USERNAME/.kaggle/logs/kagglehub.log       |
| linux   | ~/.kaggle/logs/kagglehub.log                     |
| windows | C:\Users\\%USERNAME%\\.kaggle\logs\kagglehub.log |

If needed, the root log directory can be overriden using the following environment variable: `KAGGLE_LOGGING_ROOT_DIR`

Please include the log to help troubleshoot issues.

## Contributing

If you'd like to contribute to `kagglehub`, please make sure to take a look at [CONTRIBUTING.md](CONTRIBUTING.md).

---

# Model Summary

Provide a brief overview of the model including details about its architecture, how it can be used, characteristics of the model, training data, and evaluation results.

## Usage

How can this model be used? You should provide a code snippet that demonstrates how to load and/or fine-tune your model, and you should define the shape of both the inputs and the outputs.  Are there known and preventable failures to be aware of?

## System

Is this a standalone model or part of a system? What are the input requirements? What are the downstream dependencies when using the model outputs?

## Implementation requirements

What hardware and software were used for training the model? Describe the compute requirements for training and inference (e.g., # of chips, training time, total computation, measured performance, energy consumption).

# Model Characteristics

## Model initialization

Was the model trained from scratch or fine-tuned from a pre-trained model?

## Model stats

What’s the size of the model? Provide information about size, weights, layers, and latency.

## Other details

Is the model pruned? Is it quantized? Describe any techniques to preserve differential privacy.

# Data Overview

Provide more details about the data used to train this model.

## Training data

Describe the data that was used to train the model. How was it collected? What pre-processing was done?

## Demographic groups

Describe any demographic data or attributes that suggest demographic groups

## Evaluation data

What was the train / test / dev split? Are there notable differences between training and test data?

# Evaluation Results

## Summary

Summarize and link to evaluation results for this analysis.

## Subgroup evaluation results

Did you do any subgroup analysis? Describe the results and any assumptions about disaggregating data. Are there any known and preventable failures about this model?

## Fairness 

How did you define fairness? What metrics and baselines did you use? What were the results of your analysis?

## Usage limitations

Are there sensitive use cases? What factors might limit model performance and what conditions should be satisfied to use this model? 

## Ethics

What ethical factors did the model developers consider? Were any risks identified? What mitigations or remediates were undertaken?

---

# Model Format

Describe the format for the model (e.g. a SavedModel file for TF 2.0)

# Training Data

Describe the data that the model instance was trained on.

# Model Inputs

Describe the type and the shape of the model inputs.

# Model Outputs

Describe the type and the shape of the model outputs.

# Model Usage

Provide code snippets that demonstrate how to load and make use of the model instance.

# Fine-tuning

Provide code snippets that demonstrate how to fine-tune the model instance (if applicable).

# Changelog

Describe the differences between the different versions for this specific model instance (if applicable).

---

A full model is composed of 3 types of entities:

1. The model
2. The instances
3. The instance versions

Let's take the example of [efficientnet](https://www.kaggle.com/models/tensorflow/efficientnet) to explain these entities.

A model like `efficientnet` contains multiple instances.

An instance is a specific variation of the model (e.g. B0, B1, ...) with a certain framework (e.g. TensorFlow2).

## Model

To create a model, a special `model-metadata.json` file must be specified. 

Here's a basic example for `model-metadata.json`:
```
{
  "ownerSlug": "INSERT_OWNER_SLUG_HERE",
  "title": "INSERT_TITLE_HERE",
  "slug": "INSERT_SLUG_HERE",
  "subtitle": "",
  "isPrivate": true,
  "description": "Model Card Markdown, see below",
  "publishTime": "",
  "provenanceSources": ""
}
```

You can also use the API command `kaggle models init -p /path/to/model` to have the API create this file for you for a new model. If you wish to get the metadata for an existing model, you can use `kaggle models get username/model-slug`.

### Contents

We currently support the following metadata fields for models.

* `ownerSlug`: the slug of the user or organization
* `title`: the model's title
* `slug`: the model's slug (unique per owner)
* `licenseName`: the name of the license (see the list below)
* `subtitle`: the model's subtitle
* `isPrivate`: whether or not the model should be private (only visible by the owners). If not specified, will be `true`
* `description`: the model's card in markdown syntax (see the template below)
* `publishTime`: the original publishing time of the model
* `provenanceSources`: the provenance of the model

### Description

You can find a template of the model card on this wiki page: https://github.com/Kaggle/kaggle-api/wiki/Model-Card

## Model Instance

To create a model instance, a special `model-instance-metadata.json` file must be specified. 

Here's a basic example for `model-instance-metadata.json`:
```
{
  "ownerSlug": "INSERT_OWNER_SLUG_HERE",
  "modelSlug": "INSERT_EXISTING_MODEL_SLUG_HERE",
  "instanceSlug": "INSERT_INSTANCE_SLUG_HERE",
  "framework": "INSERT_FRAMEWORK_HERE",
  "overview": "",
  "usage": "Usage Markdown, see below",
  "licenseName": "Apache 2.0",
  "fineTunable": False,
  "trainingData": [],
  "modelInstanceType": "Unspecified",
  "baseModelInstance": "",
  "externalBaseModelUrl": ""
}
```

You can also use the API command `kaggle models instances init -p /path/to/model-instance` to have the API create this file for you for a new model instance.

### Contents

We currently support the following metadata fields for model instances.

* `ownerSlug`: the slug of the user or organization of the model
* `modelSlug`: the existing model's slug
* `instanceSlug`: the slug of the instance
* `framework`: the instance's framework (possible options: `tensorFlow1`,`tensorFlow2`,`tfLite`,`tfJs`,`pyTorch`,`jax`,`coral`, ...)
* `overview`: a short overview of the instance
* `usage`: the instance's usage in markdown syntax (see the template below)
* `fineTunable`: whether the instance is fine tunable
* `trainingData`: a list of training data in the form of strings, URLs, Kaggle Datasets, etc...
* `modelInstanceType`: whether the model instance is a base model, external variant, internal variant, or unspecified
* `baseModelInstance`: if this is an internal variant, the `{owner-slug}/{model-slug}/{framework}/{instance-slug}` of the base model instance
* `externalBaseModelUrl`: if this is an external variant, a URL to the base model

### Licenses

Here is a list of the available licenses for models:

- Apache 2.0
- Attribution 3.0 IGO (CC BY 3.0 IGO)
- Attribution 3.0 Unported (CC BY 3.0)
- Attribution 4.0 International (CC BY 4.0)
- Attribution-NoDerivatives 4.0 International (CC BY-ND 4.0)
- Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)
- Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
- Attribution-NonCommercial-ShareAlike 3.0 IGO (CC BY-NC-SA 3.0 IGO)
- BSD-3-Clause
- CC BY-NC-SA 4.0
- CC BY-SA 3.0
- CC BY-SA 4.0
- CC0: Public Domain
- Community Data License Agreement - Permissive - Version 1.0
- Community Data License Agreement - Sharing - Version 1.0
- GNU Affero General Public License 3.0
- GNU Free Documentation License 1.3
- GNU Lesser General Public License 3.0
- GPL 2
- MIT
- ODC Attribution License (ODC-By)
- ODC Public Domain Dedication and Licence (PDDL)
- GPL 3

### Usage

You can find a template of the Usage markdown on this wiki page: https://github.com/Kaggle/kaggle-api/wiki/ModelInstance-Usage

The following template variables can be used in this markdown: 

- `${VERSION_NUMBER}` is replaced by the version number when rendered
- `${VARIATION_SLUG}` is replaced by the variation slug when rendered
- `${FRAMEWORK}` is replaced by the framework name
- `${PATH}` is replaced by `/kaggle/input/<model_slug>/<framework>/<variation_slug>/<version>`.
- `${FILEPATH}` is replaced by `/kaggle/input/<model_slug>/<framework>/<variation_slug>/<version>/<filename>`. This value is only defined if the databundle contain a single file
- `${URL}` is replaced by the absolute URL of the model