# Kaggle > CAUTION: Most of the files in this directory are obsolete and will be deleted. This file is valid, --- # Kaggle API CAUTION: Most of the files in this directory are obsolete and will be deleted. This file is valid, as are `model_card.md`, `modelinstance_usage.md`, and `models_metadata.md`. Official API for https://www.kaggle.com, accessible using a command line tool implemented in Python 3. Beta release - Kaggle reserves the right to modify the API functionality currently offered. IMPORTANT: Competitions submissions using an API version prior to 1.5.0 may not work. If you are encountering difficulties with submitting to competitions, please check your version with `kaggle --version`. If it is below 1.5.0, please update with `pip install kaggle --upgrade`. ## Installation Ensure you have Python 3 and the package manager `pip` installed. Run the following command to access the Kaggle API using the command line: `pip install kaggle` (You may need to do `pip install --user kaggle` on Mac/Linux. This is recommended if problems come up during the installation process.) Installations done through the root user (i.e. `sudo pip install kaggle`) will not work correctly unless you understand what you're doing. Even then, they still might not work. User installs are strongly recommended in the case of permissions errors. You can now use the `kaggle` command as shown in the examples below. If you run into a `kaggle: command not found` error, ensure that your python binaries are on your path. You can see where `kaggle` is installed by doing `pip uninstall kaggle` and seeing where the binary is (then cancel the uninstall when prompted). For a local user install on Linux, the default location is `~/.local/bin`. On Windows, the default location is `$PYTHON_HOME/Scripts`. IMPORTANT: We do not offer Python 2 support. Please ensure that you are using Python 3 before reporting any issues. ## API credentials To use the Kaggle API, sign up for a Kaggle account at https://www.kaggle.com. Then go to the 'Account' tab of your user profile (`https://www.kaggle.com//account`) and select 'Generate New Token'. Copy the generated token and save it in an environment variable named `KAGGLE_API_TOKEN` or place the value in a file named `~/.kaggle/access_token`. (The file may optionally have a `.txt` extension.) Note: If you already have a `~/.kaggle/kaggle.json` file with legacy API credentials, it will continue to work. What's noteworthy is that now when generating new tokens you do not have to update existing token usage. The old tokens will continue to work; previously they expired when a new token was generated. ## Commands The command line tool supports the following commands: ``` kaggle competitions {list, files, download, submit, submissions, leaderboard} kaggle datasets {list, files, download, create, version, init, metadata, status} kaggle kernels {list, init, push, pull, output, status} kaggle models {get, list, init, create, delete, update} kaggle models instances {get, init, create, delete, update} kaggle models instances versions {init, create, download, delete} kaggle config {view, set, unset} ``` See more details below for using each of these commands. ### Competitions The API supports the following commands for Kaggle Competitions. ``` usage: kaggle competitions [-h] {list,files,download,submit,submissions,leaderboard} ... optional arguments: -h, --help show this help message and exit commands: {list,files,download,submit,submissions,leaderboard} list List available competitions files List competition files download Download competition files submit Make a new competition submission submissions Show your competition submissions leaderboard Get competition leaderboard information ``` ##### List competitions ``` usage: kaggle competitions list [-h] [--group GROUP] [--category CATEGORY] [--sort-by SORT_BY] [-p PAGE] [-s SEARCH] [-v] optional arguments: -h, --help show this help message and exit --group GROUP Search for competitions in a specific group. Default is 'general'. Valid options are 'general', 'entered', and 'inClass' --category CATEGORY Search for competitions of a specific category. Default is 'all'. Valid options are 'all', 'featured', 'research', 'recruitment', 'gettingStarted', 'masters', and 'playground' --sort-by SORT_BY Sort list results. Default is 'latestDeadline'. Valid options are 'grouped', 'prize', 'earliestDeadline', 'latestDeadline', 'numberOfTeams', and 'recentlyCreated' -p PAGE, --page PAGE Page number for results paging. Page size is 20 by default -s SEARCH, --search SEARCH Term(s) to search for -v, --csv Print results in CSV format (if not set then print in table format) ``` Example: `kaggle competitions list -s health` `kaggle competitions list --category gettingStarted` ##### List competition files ``` usage: kaggle competitions files [-h] [-v] [-q] [competition] optional arguments: -h, --help show this help message and exit competition Competition URL suffix (use "kaggle competitions list" to show options) If empty, the default competition will be used (use "kaggle config set competition")" -v, --csv Print results in CSV format (if not set then print in table format) -q, --quiet Suppress printing information about the upload/download progress ``` Example: `kaggle competitions files favorita-grocery-sales-forecasting` ##### Download competition files ``` usage: kaggle competitions download [-h] [-f FILE_NAME] [-p PATH] [-w] [-o] [-q] [competition] optional arguments: -h, --help show this help message and exit competition Competition URL suffix (use "kaggle competitions list" to show options) If empty, the default competition will be used (use "kaggle config set competition")" -f FILE_NAME, --file FILE_NAME File name, all files downloaded if not provided (use "kaggle competitions files -c " to show options) -p PATH, --path PATH Folder where file(s) will be downloaded, defaults to current working directory -w, --wp Download files to current working path -o, --force Skip check whether local version of file is up to date, force file download -q, --quiet Suppress printing information about the upload/download progress ``` Examples: `kaggle competitions download favorita-grocery-sales-forecasting` `kaggle competitions download favorita-grocery-sales-forecasting -f test.csv.7z` Note: you will need to accept competition rules at `https://www.kaggle.com/c//rules`. ##### Submit to a competition ``` usage: kaggle competitions submit [-h] [-f FILE_NAME] [-k KERNEL] -m MESSAGE [-v VERSION] [-q] [competition] options: -h, --help show this help message and exit competition Competition URL suffix (use "kaggle competitions list" to show options) If empty, the default competition will be used (use "kaggle config set competition")" -f FILE_NAME, --file FILE_NAME File for upload (full path), or the name of the output file produced by a kernel (for code competitions) -k KERNEL, --kernel KERNEL Name of kernel (notebook) to submit to a code competition -m MESSAGE, --message MESSAGE Message describing this submission -v VERSION, --version VERSION Version of kernel to submit to a code competition, e.g. "Version 1" -q, --quiet Suppress printing information about the upload/download progress ``` Example: `kaggle competitions submit favorita-grocery-sales-forecasting \ -f sample_submission_favorita.csv.7z -m "My submission message"` `kaggle competitions submit llms-you-cant-please-them-all \ -k username/llms-can-t-please-all-submission -v 9 -f submission.csv -m "My submission message"'` Note: you will need to accept competition rules at `https://www.kaggle.com/c//rules`. ##### List competition submissions ``` usage: kaggle competitions submissions [-h] [-v] [-q] [competition] optional arguments: -h, --help show this help message and exit competition Competition URL suffix (use "kaggle competitions list" to show options) If empty, the default competition will be used (use "kaggle config set competition")" -v, --csv Print results in CSV format (if not set then print in table format) -q, --quiet Suppress printing information about the upload/download progress ``` Example: `kaggle competitions submissions favorita-grocery-sales-forecasting` Note: you will need to accept competition rules at `https://www.kaggle.com/c//rules`. ##### Get competition leaderboard ``` usage: kaggle competitions leaderboard [-h] [-s] [-d] [-p PATH] [-v] [-q] [competition] optional arguments: -h, --help show this help message and exit competition Competition URL suffix (use "kaggle competitions list" to show options) If empty, the default competition will be used (use "kaggle config set competition")" -s, --show Show the top of the leaderboard -d, --download Download entire leaderboard -p PATH, --path PATH Folder where file(s) will be downloaded, defaults to current working directory -v, --csv Print results in CSV format (if not set then print in table format) -q, --quiet Suppress printing information about the upload/download progress ``` Example: `kaggle competitions leaderboard favorita-grocery-sales-forecasting -s` ### Datasets The API supports the following commands for Kaggle Datasets. ``` usage: kaggle datasets [-h] {list,files,download,create,version,init,metadata,status} ... optional arguments: -h, --help show this help message and exit commands: {list,files,download,create,version,init,metadata, status} list List available datasets files List dataset files download Download dataset files create Create a new dataset version Create a new dataset version init Initialize metadata file for dataset creation metadata Download metadata about a dataset status Get the creation status for a dataset ``` ##### List datasets ``` usage: kaggle datasets list [-h] [--sort-by SORT_BY] [--min-size MIN_SIZE] [--max-size MAX_SIZE] [--file-type FILE_TYPE] [--license LICENSE_NAME] [--tags TAG_IDS] [-s SEARCH] [-m] [--user USER] [-p PAGE] [-v] optional arguments: -h, --help show this help message and exit --sort-by SORT_BY Sort list results. Default is 'hottest'. Valid options are 'hottest', 'votes', 'updated', and 'active' --max-size MAX_SIZE Specify the maximum size of the dataset to return (bytes) --min-size MIN_SIZE Specify the minimum size of the dataset to return (bytes) --file-type FILE_TYPE Search for datasets with a specific file type. Default is 'all'. Valid options are 'all', 'csv', 'sqlite', 'json', 'parquet', and 'bigQuery'. Please note that bigQuery datasets cannot be downloaded --license LICENSE_NAME Search for datasets with a specific license. Default is 'all'. Valid options are 'all', 'cc', 'gpl', 'odb', and 'other' --tags TAG_IDS Search for datasets that have specific tags. Tag list should be comma separated -s SEARCH, --search SEARCH Term(s) to search for -m, --mine Display only my items --user USER Find public datasets owned by a specific user or organization -p PAGE, --page PAGE Page number for results paging. Page size is 20 by default -v, --csv Print results in CSV format (if not set then print in table format) ``` Example: `kaggle datasets list -s demographics` `kaggle datasets list --sort-by votes` ##### List files for a dataset ``` usage: kaggle datasets files [-h] [-v] [dataset] required arguments: dataset Dataset URL suffix in format / (use "kaggle datasets list" to show options), or // for a specific version optional arguments: -h, --help show this help message and exit -v, --csv Print results in CSV format (if not set then print in table format) ``` Example: `kaggle datasets files zillow/zecon` `kaggle datasets files zillow/zecon/3` ##### Download dataset files ``` usage: kaggle datasets download [-h] [-f FILE_NAME] [-p PATH] [-w] [--unzip] [-o] [-q] [dataset] required arguments: dataset Dataset URL suffix in format / (use "kaggle datasets list" to show options), or // for a specific version optional arguments: -h, --help show this help message and exit -f FILE_NAME, --file FILE_NAME File name, all files downloaded if not provided (use "kaggle datasets files -d " to show options) -p PATH, --path PATH Folder where file(s) will be downloaded, defaults to current working directory -w, --wp Download files to current working path --unzip Unzip the downloaded file. Will delete the zip file when completed. -o, --force Skip check whether local version of file is up to date, force file download -q, --quiet Suppress printing information about the upload/download progress ``` Examples: `kaggle datasets download zillow/zecon` `kaggle datasets download zillow/zecon/3` `kaggle datasets download zillow/zecon -f State_time_series.csv` Please note that BigQuery datasets cannot be downloaded. ##### Initialize metadata file for dataset creation ``` usage: kaggle datasets init [-h] [-p FOLDER] optional arguments: -h, --help show this help message and exit -p FOLDER, --path FOLDER Folder where the special dataset-metadata.json file (https://github.com/Kaggle/kaggle-api/wiki/Dataset-Metadata) will be created. Defaults to current working directory ``` Example: `kaggle datasets init -p /path/to/dataset` ##### Create a new dataset If you want to create a new dataset, you need to initialize metadata file at first. You could fulfill this by running `kaggle datasets init` as describe above. ``` usage: kaggle datasets create [-h] [-p FOLDER] [-u] [-q] [-t] [-r {skip,zip,tar}] optional arguments: -h, --help show this help message and exit -p FOLDER, --path FOLDER Folder for upload, containing data files and a special dataset-metadata.json file (https://github.com/Kaggle/kaggle-api/wiki/Dataset-Metadata). Defaults to current working directory -u, --public Create publicly (default is private) -q, --quiet Suppress printing information about the upload/download progress -t, --keep-tabular Do not convert tabular files to CSV (default is to convert) -r {skip,zip,tar}, --dir-mode {skip,zip,tar} What to do with directories: "skip" - ignore; "zip" - compressed upload; "tar" - uncompressed upload ``` Example: `kaggle datasets create -p /path/to/dataset` ##### Create a new dataset version ``` usage: kaggle datasets version [-h] -m VERSION_NOTES [-p FOLDER] [-q] [-t] [-r {skip,zip,tar}] [-d] required arguments: -m VERSION_NOTES, --message VERSION_NOTES Message describing the new version optional arguments: -h, --help show this help message and exit -p FOLDER, --path FOLDER Folder for upload, containing data files and a special dataset-metadata.json file (https://github.com/Kaggle/kaggle-api/wiki/Dataset-Metadata). Defaults to current working directory -q, --quiet Suppress printing information about the upload/download progress -t, --keep-tabular Do not convert tabular files to CSV (default is to convert) -r {skip,zip,tar}, --dir-mode {skip,zip,tar} What to do with directories: "skip" - ignore; "zip" - compressed upload; "tar" - uncompressed upload -d, --delete-old-versions Delete old versions of this dataset ``` Example: `kaggle datasets version -p /path/to/dataset -m "Updated data"` ##### Download metadata for an existing dataset ``` usage: kaggle datasets metadata [-h] [-p PATH] [dataset] required arguments: dataset Dataset URL suffix in format / (use "kaggle datasets list" to show options) optional arguments: -h, --help show this help message and exit -p PATH, --path PATH Location to download dataset metadata to. Defaults to current working directory ``` Example: `kaggle datasets metadata -p /path/to/download zillow/zecon` ##### Get dataset creation status ``` usage: kaggle datasets status [-h] [dataset] required arguments: dataset Dataset URL suffix in format / (use "kaggle datasets list" to show options) optional arguments: -h, --help show this help message and exit ``` Example: `kaggle datasets status zillow/zecon` ##### Delete a dataset ``` usage: kaggle datasets delete [-y] [-h] [dataset] required arguments: dataset Dataset URL suffix in format / (use "kaggle datasets list" to show options) optional arguments: -h, --help show this help message and exit -ym --yes Sets any confirmation values to "yes" automatically. Users will not be asked to confirm. ``` Example: `kaggle datasets delete -y lastplacelarry/testdataset` ### Kernels The API supports the following commands for Kaggle Kernels. ``` usage: kaggle kernels [-h] {list,init,push,pull,output,status} ... optional arguments: -h, --help show this help message and exit commands: {get,list,init,push,pull,output,status} get Get the code for a kernel (formerly pull) list List available kernels init Initialize metadata file for a kernel push Deprecated by update: Push new code to a kernel and run the kernel pull Deprecated by get: Pull down code from a kernel output Get data output from the latest kernel run update Update a kernel with new code and run it (formerly push) status Display the status of the latest kernel run ``` ##### Get a kernel ``` usage: kaggle kernels get [-h] [-p PATH] [-w] [-m] [kernel] optional arguments: -h, --help show this help message and exit kernel Kernel URL suffix in format / (use "kaggle kernels list" to show options) -p PATH, --path PATH Folder where file(s) will be downloaded, defaults to current working directory -w, --wp Download files to current working path -m, --metadata Generate metadata when pulling kernel ``` Example: `kaggle kernels get rtatman/list-of-5-day-challenges -p /path/to/dest` ##### List kernels ``` usage: kaggle kernels list [-h] [-m] [-p PAGE] [--page-size PAGE_SIZE] [-s SEARCH] [-v] [--parent PARENT] [--competition COMPETITION] [--dataset DATASET] [--user USER] [--language LANGUAGE] [--kernel-type KERNEL_TYPE] [--output-type OUTPUT_TYPE] [--sort-by SORT_BY] optional arguments: -h, --help show this help message and exit -m, --mine Display only my items -p PAGE, --page PAGE Page number for results paging. Page size is 20 by default --page-size PAGE_SIZE Number of items to show on a page. Default size is 20, max is 100 -s SEARCH, --search SEARCH Term(s) to search for -v, --csv Print results in CSV format (if not set then print in table format) --parent PARENT Find children of the specified parent kernel --competition COMPETITION Find kernels for a given competition --dataset DATASET Find kernels for a given dataset --user USER Find kernels created by a given user --language LANGUAGE Specify the language the kernel is written in. Default is 'all'. Valid options are 'all', 'python', 'r', 'sqlite', and 'julia' --kernel-type KERNEL_TYPE Specify the type of kernel. Default is 'all'. Valid options are 'all', 'script', and 'notebook' --output-type OUTPUT_TYPE Search for specific kernel output types. Default is 'all'. Valid options are 'all', 'visualizations', and 'data' --sort-by SORT_BY Sort list results. Default is 'hotness'. Valid options are 'hotness', 'commentCount', 'dateCreated', 'dateRun', 'relevance', 'scoreAscending', 'scoreDescending', 'viewCount', and 'voteCount'. 'relevance' is only applicable if a search term is specified. ``` Example: `kaggle kernels list -s titanic` `kaggle kernels list --language python` ##### Initialize metadata file for a kernel ``` usage: kaggle kernels init [-h] [-p FOLDER] optional arguments: -h, --help show this help message and exit -p FOLDER, --path FOLDER Folder where the special kernel-metadata.json file (https://github.com/Kaggle/kaggle-api/wiki/Kernel-Metadata) will be created. Defaults to current working directory ``` Example: `kaggle kernels init -p /path/to/folder` ##### Push a kernel (deprecated, use update) ``` usage: kaggle kernels push [-h] -p FOLDER optional arguments: -h, --help show this help message and exit -t N, --timeout N Limit the run time of a kernel to the given number of seconds. The global maximum time will not be exceeded. -p FOLDER, --path FOLDER Folder for upload, containing data files and a special kernel-metadata.json file (https://github.com/Kaggle/kaggle-api/wiki/Kernel-Metadata). Defaults to current working directory ``` Example: `kaggle kernels push -p /path/to/folder` ##### Pull a kernel (deprecated, use get) ``` usage: kaggle kernels pull [-h] [-p PATH] [-w] [-m] [kernel] optional arguments: -h, --help show this help message and exit kernel Kernel URL suffix in format / (use "kaggle kernels list" to show options) -p PATH, --path PATH Folder where file(s) will be downloaded, defaults to current working directory -w, --wp Download files to current working path -m, --metadata Generate metadata when pulling kernel ``` Example: `kaggle kernels pull rtatman/list-of-5-day-challenges -p /path/to/dest` ##### Retrieve a kernel's output ``` usage: kaggle kernels output [-h] [-p PATH] [-w] [-o] [-q] [kernel] required arguments: kernel Kernel URL suffix in format / (use "kaggle kernels list" to show options) optional arguments: -h, --help show this help message and exit -p PATH, --path PATH Folder where file(s) will be downloaded, defaults to current working directory -w, --wp Download files to current working path -o, --force Skip check whether local version of file is up to date, force file download -q, --quiet Suppress printing information about the upload/download progress ``` Example: `kaggle kernels output mrisdal/exploring-survival-on-the-titanic -p /path/to/dest` ##### Update a kernel ``` usage: kaggle kernels update [-h] -p FOLDER This command should only be used after "get". Use "create" to create a new kernel. optional arguments: -h, --help show this help message and exit -t N, --timeout N Limit the run time of a kernel to the given number of seconds. The global maximum time will not be exceeded. -p FOLDER, --path FOLDER Folder for upload, containing data files and a special kernel-metadata.json file (https://github.com/Kaggle/kaggle-api/wiki/Kernel-Metadata). Defaults to current working directory ``` Example: `kaggle kernels update -p /path/to/folder` ##### Get the status of the latest kernel run ``` usage: kaggle kernels status [-h] [kernel] required arguments: kernel Kernel URL suffix in format / (use "kaggle kernels list" to show options) optional arguments: -h, --help show this help message and exit ``` Example: `kaggle kernels status mrisdal/exploring-survival-on-the-titanic` ### Models The API supports the following commands for Kaggle Models. ``` usage: kaggle models [-h] {get, list, init, create} ... optional arguments: -h, --help show this help message and exit commands: {get, list, init, create} get Get the model list List models init Initialize metadata file for model creation create Create a new model ``` ##### Get model ``` usage: kaggle models get [-h] [-p FOLDER] [model] required arguments: model Model URL suffix in format / optional arguments: -h, --help show this help message and exit -p PATH, --path PATH Folder where the special model-metadata.json file (https://github.com/Kaggle/kaggle-api/wiki/Model-Metadata) will be downloaded (if specified). ``` Example: `kaggle models get tensorflow/toxicity` ##### List models ``` usage: kaggle models list [--sort-by SORT_BY] [-s SEARCH] [--owner OWNER] [--page-token PAGE_TOKEN] [--page-size PAGE_SIZE] [--csv] optional arguments: -h, --help show this help message and exit --sort-by SORT_BY Sort list results. Default is 'hotness'. Valid options are 'hotness', 'downloadCount', 'voteCount', 'notebookCount' and 'createTime' -s SEARCH, --search SEARCH Term(s) to search for --owner OWNER Find models owned by a specific user or organization --page-token PAGE_TOKEN Page token for pagination --page-size PAGE_SIZE Number of items to show on a page. Default size is 20, max is 50 -v, --csv Print results in CSV format (if not set then print in table format) ``` Example: `kaggle models list -s llm` `kaggle models list --sort-by downloadCount` ##### Initialize metadata file for a model ``` usage: kaggle models init [-h] [-p FOLDER] optional arguments: -h, --help show this help message and exit -p FOLDER, --path FOLDER Folder to create the model-metadata.json file (https://github.com/Kaggle/kaggle-api/wiki/Model-Metadata). Defaults to current working directory ``` Example: `kaggle models init -p /path/to/model` ##### Create a new model If you want to create a new model, you need to initiate metadata file at first. You could fulfill this by running `kaggle models init` as describe above. ``` usage: kaggle models create [-h] [-p FOLDER] optional arguments: -h, --help show this help message and exit -p FOLDER, --path FOLDER Folder containing the special model-metadata.json file (https://github.com/Kaggle/kaggle-api/wiki/Model-Metadata). Defaults to current working directory ``` Example: `kaggle models create -p /path/to/model` ##### Delete model ``` usage: kaggle models delete [-h] [model] required arguments: model Model URL suffix in format / optional arguments: -h, --help show this help message and exit ``` Example: `kaggle models delete tensorflow/toxicity` ##### Update a model If you want to update a model, you need a metadata file at first. You can fetch the data by running `kaggle models get owner/slug -p folder`. ``` usage: kaggle models update [-h] [-p FOLDER] optional arguments: -h, --help show this help message and exit -p FOLDER, --path FOLDER Folder containing the special model-metadata.json file (https://github.com/Kaggle/kaggle-api/wiki/Model-Metadata). Defaults to current working directory ``` Example: `kaggle models update -p /path/to/model` #### Model Instances The API supports the following commands for Kaggle Model Instances. ``` usage: kaggle models instances [-h] {init, create, delete, update} ... optional arguments: -h, --help show this help message and exit commands: {get, init, create, delete} get Get a model instance init Initialize metadata file for model instance creation create Create a new model instance delete Delete a model instance update Update a model instance ``` ##### Get model instance ``` usage: kaggle models instances get [-h] [-p FOLDER] [modelInstance] required arguments: modelInstance Model Instance URL suffix in format /// optional arguments: -h, --help show this help message and exit -p PATH, --path PATH Folder where the special model-instance-metadata.json file (https://github.com/Kaggle/kaggle-api/wiki/Model-Metadata) will be downloaded (if specified). ``` Example: `kaggle models instances get tensorflow/toxicity/tfjs/default` ##### Initialize metadata file for a model instance ``` usage: kaggle models instances init [-h] [-p FOLDER] optional arguments: -h, --help show this help message and exit -p FOLDER, --path FOLDER Folder to create the model-instance-metadata.json file (https://github.com/Kaggle/kaggle-api/wiki/Model-Metadata). Defaults to current working directory ``` Example: `kaggle models instances init -p /path/to/modelinstance` ##### Create a new model instance If you want to create a new model instance, you need to initialize metadata file at first. You could fulfill this by running `kaggle models instances init` as describe above. ``` usage: kaggle models instances create [-h] [-p FOLDER] optional arguments: -h, --help show this help message and exit -p FOLDER, --path FOLDER Folder containing the special model-instance-metadata.json file (https://github.com/Kaggle/kaggle-api/wiki/Model-Metadata). Defaults to current working directory -q, --quiet Suppress printing information about the upload progress -r {skip,zip,tar}, --dir-mode {skip,zip,tar} What to do with directories: "skip" - ignore; "zip" - compressed upload; "tar" - uncompressed upload ``` Example: `kaggle models instances create -p /path/to/modelinstance` ##### Delete model instance ``` usage: kaggle models instances delete [-h] [modelInstance] required arguments: modelInstance Model Instance URL suffix in format /// optional arguments: -h, --help show this help message and exit ``` Example: `kaggle models instances delete tensorflow/toxicity/tfjs/default` ##### Update a model instance If you want to update a model instance, you need a metadata file at first. You can fetch the data by running `kaggle models instances get owner-slug/model-slug/framework/instance-slug -p folder`. ``` usage: kaggle models instances update [-h] [-p FOLDER] optional arguments: -h, --help show this help message and exit -p FOLDER, --path FOLDER Folder containing the special model-instance-metadata.json file (https://github.com/Kaggle/kaggle-api/wiki/Model-Metadata). Defaults to current working directory ``` Example: `kaggle models instances update -p /path/to/model` #### Model Instance Versions The API supports the following commands for Kaggle Model Instance Versions. ``` usage: kaggle models instances versions [-h] {init, create, download, delete} ... optional arguments: -h, --help show this help message and exit commands: {create, download, delete} create Create a new model instance version download Download a model instance version delete Delete a model instance version ``` ##### Create a new model instance version ``` usage: kaggle models instances versions create [-h] [modelInstance] [-p FOLDER] [-n NOTES] required arguments: modelInstance Model Instance URL suffix in format /// optional arguments: -h, --help show this help message and exit -p FOLDER, --path FOLDER Folder containing the model files to upload -n, --version-notes NOTES Version notes to record for this new version -q, --quiet Suppress printing information about the upload progress -r {skip,zip,tar}, --dir-mode {skip,zip,tar} What to do with directories: "skip" - ignore; "zip" - compressed upload; "tar" - uncompressed upload ``` Example: `kaggle models instances versions create tensorflow/toxicity/tfjs/default -p /path/to/files -n "updated weights"` ##### Download a model instance version ``` usage: kaggle models instances versions download [-h] [-p PATH] [--untar] [-f] [-q] [modelInstanceVersion] required arguments: modelInstanceVersion Model Instance version URL suffix in format //// optional arguments: -h, --help show this help message and exit -p PATH, --path PATH Folder where file(s) will be downloaded, defaults to current working directory --untar Untar the downloaded file. Will delete the tar file when completed. -f, --force Skip check whether local version of file is up to date, force file download -q, --quiet Suppress printing information about the download progress ``` Examples: `kaggle models instances versions download tensorflow/toxicity/tfjs/default/1` ##### Delete model instance ``` usage: kaggle models instances versions delete [-h] [modelInstanceVersion] required arguments: modelInstanceVersion Model Instance version URL suffix in format //// optional arguments: -h, --help show this help message and exit ``` Example: `kaggle models instances versions delete tensorflow/toxicity/tfjs/default/1` ### Config The API supports the following commands for configuration. ``` usage: kaggle config [-h] {view,set,unset} ... optional arguments: -h, --help show this help message and exit commands: {view,set,unset} view View current config values set Set a configuration value unset Clear a configuration value ``` ##### View current config values ``` usage: kaggle config path [-h] [-p PATH] optional arguments: -h, --help show this help message and exit -p PATH, --path PATH folder where file(s) will be downloaded, defaults to current working directory ``` Example: `kaggle config path -p C:\` ##### View current config values ``` usage: kaggle config view [-h] optional arguments: -h, --help show this help message and exit ``` Example: `kaggle config view` ##### Set a configuration value ``` usage: kaggle config set [-h] -n NAME -v VALUE required arguments: -n NAME, --name NAME Name of the configuration parameter (one of competition, path, proxy) -v VALUE, --value VALUE Value of the configuration parameter, valid values depending on name - competition: Competition URL suffix (use "kaggle competitions list" to show options) - path: Folder where file(s) will be downloaded, defaults to current working directory - proxy: Proxy for HTTP requests ``` Example: `kaggle config set -n competition -v titanic` ##### Clear a configuration value ``` usage: kaggle config unset [-h] -n NAME required arguments: -n NAME, --name NAME Name of the configuration parameter (one of competition, path, proxy) ``` Example: `kaggle config unset -n competition` ## License The Kaggle API is released under the [Apache 2.0 license](../LICENSE.txt). --- # Kaggle API Official API for https://www.kaggle.com, accessible using a command line tool implemented in Python 3. [User documentation](docs/README.md) ## Installation Ensure you have Python 3 and the package manager `pip` installed. Run the following command to access the Kaggle API using the command line: ```sh pip install kaggle ``` ## Development ### Kaggle Internal Obviously, this depends on Kaggle services. When you're extending the API and modifying or adding to those services, you should be working in your Kaggle mid-tier development environment. You'll run Kaggle locally, in the container, and test the Python code by running it in the container so it can connect to your local testing environment. ### Prerequisites We use [hatch](https://hatch.pypa.io) to manage this project. Follow these [instructions](https://hatch.pypa.io/latest/install/) to install it. If you are working in a managed environment, you may want to use `pipx`. If it isn't already installed try `sudo apt install pipx`. Then you should be able to proceed with `pipx install hatch`. ### Dependencies ```sh hatch run install-deps ``` ### Compile ```sh hatch run compile ``` The compiled files are generated in the `kaggle/` directory from the `src/` directory. All the changes must be done in the `src/` directory. ### Run Use `hatch run install` to compile the program and install it in the default `hatch` environment. To run that version locally for testing, use hatch: `hatch run kaggle -v`. If you'd rather not type `hatch run` every time, launch a new shell in the hatch environment: `hatch shell`. You can also run the code in python directly: ```sh hatch run python ``` ```python import kaggle from kaggle.api.kaggle_api_extended import KaggleApi api = KaggleApi() api.authenticate() api.model_list_cli() Next Page Token = [...] [...] ``` Or in a single command: ```sh hatch run python -c "import kaggle; from kaggle.api.kaggle_api_extended import KaggleApi; api = KaggleApi(); api.authenticate(); api.model_list_cli()" ``` ### Example Let's change the `model_list_cli` method in the source file: ```sh ❯ git diff src/kaggle/api/kaggle_api_extended.py [...] + print('hello Kaggle CLI update')^M models = self.model_list(sort_by, search, owner, page_size, page_token) [...] ❯ hatch run compile [...] ❯ hatch run python -c "import kaggle; from kaggle.api.kaggle_api_extended import KaggleApi; api = KaggleApi(); api.authenticate(); api.model_list_cli()" hello Kaggle CLI update Next Page Token = [...] ``` ### Integration Tests To run integration tests on your local machine, you need to set up your Kaggle API credentials. You can do this in one of these two ways described [this doc](docs/README.md). Refer to the sections: - Using environment variables - Using credentials file After setting up your credentials by any of these methods, you can run the integration tests as follows: ```sh # Run all tests hatch run integration-test ``` ## License The Kaggle API is released under the [Apache 2.0 license](LICENSE). --- # kagglehub The `kagglehub` library provides a simple way to interact with Kaggle resources such as datasets, models, notebook outputs in Python. This library also integrates natively with the Kaggle notebook environment. This means the behavior differs when you download a Kaggle resource with `kagglehub` in the Kaggle notebook environment: * In a Kaggle notebook: * The resource is automatically attached to your Kaggle notebook. * The resource will be shown under the "Input" panel in the Kaggle notebook editor. * The resource files are served from the shared Kaggle resources cache (not using the VM's disk). * Outside a Kaggle notebook: * The resource files are downloaded to a local [cache folder](#change-the-default-cache-folder). ## Installation Install the `kagglehub` package with pip: ``` pip install kagglehub ``` ## Usage ### Authenticate > [!NOTE] > `kagglehub` is authenticated by default when running in a Kaggle notebook. Authenticating is **only** needed to access public resources requiring user consent or private resources. First, you will need a Kaggle account. You can sign up [here](https://www.kaggle.com/account/login). After login, you can download your Kaggle API credentials at https://www.kaggle.com/settings by clicking on the "Create New Token" button under the "API" section. You have four different options to authenticate. Note that if you use `kaggle-api` (the `kaggle` command-line tool) you have already done Option 3 and can skip this. #### Option 1: Calling kagglehub.login() This will prompt you to enter your username and token: ```python import kagglehub kagglehub.login() ``` #### Option 2: Read credentials from environment variables You can also choose to export your Kaggle username and token to the environment: ```sh export KAGGLE_USERNAME=datadinosaur export KAGGLE_KEY=xxxxxxxxxxxxxx ``` #### Option 3: Read credentials from `kaggle.json` Store your `kaggle.json` credentials file at `~/.kaggle/kaggle.json`. Alternatively, you can set the `KAGGLE_CONFIG_DIR` environment variable to change this location to `$KAGGLE_CONFIG_DIR/kaggle.json`. Note for Windows users: The default directory is `%HOMEPATH%/kaggle.json`. #### Option 4: Read credentials from Google Colab secrets Store your username and key token as Colab secrets `KAGGLE_USERNAME` and `KAGGLE_KEY`. Instructions on adding secrets in both Colab and Colab Enterprise can be found in [this article](https://www.googlecloudcommunity.com/gc/Cloud-Hub/How-do-I-add-secrets-in-Google-Colab-Enterprise/m-p/784866). ### Download Model The following examples download the `answer-equivalence-bem` variation of this Kaggle model: https://www.kaggle.com/models/google/bert/tensorFlow2/answer-equivalence-bem ```python import kagglehub # Download the latest version. kagglehub.model_download('google/bert/tensorFlow2/answer-equivalence-bem') # Download a specific version. kagglehub.model_download('google/bert/tensorFlow2/answer-equivalence-bem/1') # Download a single file. kagglehub.model_download('google/bert/tensorFlow2/answer-equivalence-bem', path='variables/variables.index') # Download a model or file, even if previously downloaded to cache. kagglehub.model_download('google/bert/tensorFlow2/answer-equivalence-bem', force_download=True) ``` ### Upload Model Uploads a new variation (or a new variation's version if it already exists). ```python import kagglehub # For example, to upload a new variation to this model: # - https://www.kaggle.com/models/google/bert/tensorFlow2/answer-equivalence-bem # # You would use the following handle: `google/bert/tensorFlow2/answer-equivalence-bem` handle = '///' local_model_dir = 'path/to/local/model/dir' kagglehub.model_upload(handle, local_model_dir) # You can also specify some version notes (optional) kagglehub.model_upload(handle, local_model_dir, version_notes='improved accuracy') # You can also specify a license (optional) kagglehub.model_upload(handle, local_model_dir, license_name='Apache 2.0') # You can also specify a list of patterns for files/dirs to ignore. # These patterns are combined with `kagglehub.models.DEFAULT_IGNORE_PATTERNS` # to determine which files and directories to exclude. # To ignore entire directories, include a trailing slash (/) in the pattern. kagglehub.model_upload(handle, local_model_dir, ignore_patterns=["original/", "*.tmp"]) ``` ### Load Dataset Loads a file from a Kaggle Dataset into a python object based on the selected `KaggleDatasetAdapter`: - `KaggleDatasetAdapter.PANDAS` → [pandas DataFrame](https://pandas.pydata.org/docs/reference/frame.html) (or multiple given certain files/settings) - `KaggleDatasetAdapter.HUGGING_FACE`→ [Hugging Face Dataset](https://huggingface.co/docs/datasets/main/en/package_reference/main_classes#datasets.Dataset) - `KaggleDatasetAdapter.POLARS` → polars [LazyFrame](https://docs.pola.rs/api/python/stable/reference/lazyframe/index.html) or [DataFrame](https://docs.pola.rs/api/python/stable/reference/dataframe/index.html) (or multiple given certain files/settings) **NOTE: To use these adapters, you must install the optional dependencies (or already have them available in your environment)** - `KaggleDatasetAdapter.PANDAS` → `pip install kagglehub[pandas-datasets]` - `KaggleDatasetAdapter.HUGGING_FACE`→ `pip install kagglehub[hf-datasets]` - `KaggleDatasetAdapter.POLARS`→ `pip install kagglehub[polars-datasets]` #### `KaggleDatasetAdapter.PANDAS` This adapter supports the following file types, which map to a corresponding `pandas.read_*` method: | File Extension | `pandas` Method | | ----------------------------------------------- | -------------------------------------------------------------------------------------------------- | | .csv, .tsv[^1] | [`pandas.read_csv`](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html) | | .json, .jsonl[^2] | [`pandas.read_json`](https://pandas.pydata.org/docs/reference/api/pandas.read_json.html) | | .xml | [`pandas.read_xml`](https://pandas.pydata.org/docs/reference/api/pandas.read_xml.html) | | .parquet | [`pandas.read_parquet`](https://pandas.pydata.org/docs/reference/api/pandas.read_parquet.html) | | .feather | [`pandas.read_feather`](https://pandas.pydata.org/docs/reference/api/pandas.read_feather.html) | | .sqlite, .sqlite3, .db, .db3, .s3db, .dl3[^3] | [`pandas.read_sql_query`](https://pandas.pydata.org/docs/reference/api/pandas.read_sql_query.html) | | .xls, .xlsx, .xlsm, .xlsb, .odf, .ods, .odt[^4] | [`pandas.read_excel`](https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html) | [^1]: For TSV files, `\t` is automatically supplied for the `sep` parameter, but may be overridden with `pandas_kwargs` [^2]: For JSONL files, `True` is supplied for the `lines` parameter [^3]: For SQLite files, a `sql_query` must be provided to generate the `DataFrame`(s) [^4]: The specific file extension will dictate which optional `engine` dependency needs to be installed to read the file `dataset_load` also supports `pandas_kwargs` which will be passed as keyword arguments to the `pandas.read_*` method. Some examples include: ```python import kagglehub from kagglehub import KaggleDatasetAdapter # Load a DataFrame with a specific version of a CSV df = kagglehub.dataset_load( KaggleDatasetAdapter.PANDAS, "unsdsn/world-happiness/versions/1", "2016.csv", ) # Load a DataFrame with specific columns from a parquet file df = kagglehub.dataset_load( KaggleDatasetAdapter.PANDAS, "robikscube/textocr-text-extraction-from-images-dataset", "annot.parquet", pandas_kwargs={"columns": ["image_id", "bbox", "points", "area"]} ) # Load a dictionary of DataFrames from an Excel file where the keys are sheet names # and the values are DataFrames for each sheet's data. NOTE: As written, this requires # installing the default openpyxl engine. df_dict = kagglehub.dataset_load( KaggleDatasetAdapter.PANDAS, "theworldbank/education-statistics", "edstats-excel-zip-72-mb-/EdStatsEXCEL.xlsx", pandas_kwargs={"sheet_name": None}, ) # Load a DataFrame using an XML file (with the natively available etree parser) df = dataset_load( KaggleDatasetAdapter.PANDAS, "parulpandey/covid19-clinical-trials-dataset", "COVID-19 CLinical trials studies/COVID-19 CLinical trials studies/NCT00571389.xml", pandas_kwargs={"parser": "etree"}, ) # Load a DataFrame by executing a SQL query against a SQLite DB df = kagglehub.dataset_load( KaggleDatasetAdapter.PANDAS, "wyattowalsh/basketball", "nba.sqlite", sql_query="SELECT person_id, player_name FROM draft_history", ) ``` #### `KaggleDatasetAdapter.HUGGING_FACE` The Hugging Face `Dataset` provided by this adapater is built exclusively using [`Dataset.from_pandas`](https://huggingface.co/docs/datasets/main/en/package_reference/main_classes#datasets.Dataset.from_pandas). As a result, all of the file type and `pandas_kwargs` support is the same as [`KaggleDatasetAdapter.PANDAS`](#kaggledatasetadapterpandas). Some important things to note about this: 1. Because `Dataset.from_pandas` cannot accept a collection of `DataFrame`s, any attempts to load a file with `pandas_kwargs` that produce a collection of `DataFrame`s will result in a raised exception 2. `hf_kwargs` may be provided, which will be passed as keyword arguments to `Dataset.from_pandas` 2. Because the use of `pandas` is transparent when `pandas_kwargs` are not needed, we default to `False` for `preserve_index`—this can be overridden using `hf_kwargs` Some examples include: ```python import kagglehub from kagglehub import KaggleDatasetAdapter # Load a Dataset with a specific version of a CSV, then remove a column dataset = kagglehub.dataset_load( KaggleDatasetAdapter.HUGGING_FACE, "unsdsn/world-happiness/versions/1", "2016.csv", ) dataset = dataset.remove_columns('Region') # Load a Dataset with specific columns from a parquet file, then split into test/train splits dataset = kagglehub.dataset_load( KaggleDatasetAdapter.HUGGING_FACE, "robikscube/textocr-text-extraction-from-images-dataset", "annot.parquet", pandas_kwargs={"columns": ["image_id", "bbox", "points", "area"]} ) dataset_with_splits = dataset.train_test_split(test_size=0.8, train_size=0.2) # Load a Dataset by executing a SQL query against a SQLite DB, then rename a column dataset = kagglehub.dataset_load( KaggleDatasetAdapter.HUGGING_FACE, "wyattowalsh/basketball", "nba.sqlite", sql_query="SELECT person_id, player_name FROM draft_history", ) dataset = dataset.rename_column('season', 'year') ``` #### `KaggleDatasetAdapter.POLARS` This adapter supports the following file types, which map to a corresponding `polars.scan_*` or `polars.read_*` method: | File Extension | `polars` Method | | ----------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | .csv, .tsv[^1] | [`polars.scan_csv`](https://docs.pola.rs/api/python/stable/reference/api/polars.scan_csv.html#polars.scan_csv) or [`polars.read_csv`](https://docs.pola.rs/api/python/stable/reference/api/polars.read_csv.html) | | .json | [`polars.read_json`](https://docs.pola.rs/api/python/stable/reference/api/polars.read_json.html) | | .jsonl | [`polars.scan_ndjson`](https://docs.pola.rs/api/python/stable/reference/api/polars.scan_ndjson.html) or [`polars.read_ndjson`](https://docs.pola.rs/api/python/stable/reference/api/polars.read_ndjson.html) | | .parquet | [`polars.scan_parquet`](https://docs.pola.rs/api/python/stable/reference/api/polars.scan_parquet.html) or [`polars.read_parquet`](https://docs.pola.rs/api/python/stable/reference/api/polars.read_parquet.html) | | .feather | [`polars.scan_ipc`](https://docs.pola.rs/api/python/stable/reference/api/polars.scan_ipc.html) or [`polars.read_ipc`](https://docs.pola.rs/api/python/stable/reference/api/polars.read_ipc.html) | | .sqlite, .sqlite3, .db, .db3, .s3db, .dl3[^2] | [`polars.read_database`](https://docs.pola.rs/api/python/stable/reference/api/polars.read_database.html) | | .xls, .xlsx, .xlsm, .xlsb, .odf, .ods, .odt[^3] | [`polars.read_excel`](https://docs.pola.rs/api/python/stable/reference/api/polars.read_excel.html) | [^1]: For TSV files, `\t` is automatically supplied for the `separator` parameter, but may be overridden with `polars_kwargs` [^2]: For SQLite files, a `sql_query` must be provided to generate the `DataFrame`(s) [^3]: The specific file extension may dictate which optional `engine` dependency needs to be installed to read the file `dataset_load` also supports `polars_kwargs` which will be passed as keyword arguments to the `polars.scan_*` or `polars_read_*` method. ##### `LazyFrame` vs `DataFrame` Per polars documentation, [LazyFrame](https://docs.pola.rs/api/python/stable/reference/lazyframe/index.html) "allows for whole-query optimisation in addition to parallelism, and is the preferred (and highest-performance) mode of operation for polars." As such, `scan_*` methods are used by default whenever possible--and when not possible the result of the `read_*` method is returned after calling [`.lazy()`](https://docs.pola.rs/api/python/stable/reference/dataframe/api/polars.DataFrame.lazy.html). If a [DataFrame](https://docs.pola.rs/api/python/stable/reference/dataframe/index.html) is preferred, `dataset_load` supports an optional `polars_frame_type` and `PolarsFrameType.DATA_FRAME` may be passed in. This will force a `read_*` method to be used with no `.lazy()` call. **NOTE:** For file types that support `scan_*`, changing the `polars_frame_type` may affect which `polars_kwargs` are acceptable to the underlying method since it will force a `read_*` method to be used rather than a `scan_*` method. Some examples include: ```python import kagglehub from kagglehub import KaggleDatasetAdapter, PolarsFrameType # Load a LazyFrame with a specific version of a CSV lf = kagglehub.dataset_load( KaggleDatasetAdapter.POLARS, "unsdsn/world-happiness/versions/1", "2016.csv", ) # Load a LazyFramefrom a parquet file, then select specific columns lf = kagglehub.dataset_load( KaggleDatasetAdapter.POLARS, "robikscube/textocr-text-extraction-from-images-dataset", "annot.parquet", ) lf.select(["image_id", "bbox", "points", "area"]).collect() # Load a DataFrame with specific columns from a parquet file df = kagglehub.dataset_load( KaggleDatasetAdapter.POLARS, "robikscube/textocr-text-extraction-from-images-dataset", "annot.parquet", polars_frame_type=PolarsFrameType.DATA_FRAME, polars_kwargs={"columns": ["image_id", "bbox", "points", "area"]} ) # Load a dictionary of LazyFrames from an Excel file where the keys are sheet names # and the values are LazyFrames for each sheet's data. NOTE: As written, this requires # installing the default fastexcel engine. lf_dict = kagglehub.dataset_load( KaggleDatasetAdapter.POLARS, "theworldbank/education-statistics", "edstats-excel-zip-72-mb-/EdStatsEXCEL.xlsx", # sheet_id of 0 returns all sheets polars_kwargs={"sheet_id": 0}, ) # Load a LazyFrame by executing a SQL query against a SQLite DB lf = kagglehub.dataset_load( KaggleDatasetAdapter.POLARS, "wyattowalsh/basketball", "nba.sqlite", sql_query="SELECT person_id, player_name FROM draft_history", ) ``` ### Download Dataset The following examples download the `Spotify Recommendation` Kaggle dataset: https://www.kaggle.com/datasets/bricevergnou/spotify-recommendation ```python import kagglehub # Download the latest version. kagglehub.dataset_download('bricevergnou/spotify-recommendation') # Download a specific version. kagglehub.dataset_download('bricevergnou/spotify-recommendation/versions/1') # Download a single file. kagglehub.dataset_download('bricevergnou/spotify-recommendation', path='data.csv') # Download a dataset or file, even if previously downloaded to cache. kagglehub.dataset_download('bricevergnou/spotify-recommendation', force_download=True) ``` ### Upload Dataset Uploads a new dataset (or a new version if it already exists). ```python import kagglehub # For example, to upload a new dataset (or version) at: # - https://www.kaggle.com/datasets/bricevergnou/spotify-recommendation # # You would use the following handle: `bricevergnou/spotify-recommendation` handle = '/' local_dataset_dir = 'path/to/local/dataset/dir' # Create a new dataset kagglehub.dataset_upload(handle, local_dataset_dir) # You can then create a new version of this existing dataset and include version notes (optional). kagglehub.dataset_upload(handle, local_dataset_dir, version_notes='improved data') # You can also specify a list of patterns for files/dirs to ignore. # These patterns are combined with `kagglehub.datasets.DEFAULT_IGNORE_PATTERNS` # to determine which files and directories to exclude. # To ignore entire directories, include a trailing slash (/) in the pattern. kagglehub.dataset_upload(handle, local_dataset_dir, ignore_patterns=["original/", "*.tmp"]) ``` ### Download Competition The following examples download the `Digit Recognizer` Kaggle competition: https://www.kaggle.com/competitions/digit-recognizer ```python import kagglehub # Download the latest version. kagglehub.competition_download('digit-recognizer') # Download a single file. kagglehub.competition_download('digit-recognizer', path='train.csv') # Download a competition or file, even if previously downloaded to cache. kagglehub.competition_download('digit-recognizer', force_download=True) ``` ### Download Notebook Outputs The following examples download the `Titanic Tutorial` notebook output: https://www.kaggle.com/code/alexisbcook/titanic-tutorial ```python import kagglehub # Download the latest version. kagglehub.notebook_output_download('alexisbcook/titanic-tutorial') # Download a specific version of the notebook output. kagglehub.notebook_output_download('alexisbcook/titanic-tutorial/versions/1') # Download a single file. kagglehub.notebok_output_download('alexisbcook/titanic-tutorial', path='submission.csv') ``` ### Install Utility Script The following example installs the utility script `Physionet Challenge Utility Script` Utility Script: https://www.kaggle.com/code/bjoernjostein/physionet-challenge-utility-script. Using this command allows the code from this script to be available in your python environment. ```python import kagglehub # Install the latest version. kagglehub.utility_script_install('bjoernjostein/physionet-challenge-utility-script') ``` ### Options #### Change the default cache folder By default, `kagglehub` downloads files to your home folder at `~/.cache/kagglehub/`. You can override this path by setting the `KAGGLEHUB_CACHE` environment variable. ## Development ### Prequisites We use [hatch](https://hatch.pypa.io) to manage this project. Follow these [instructions](https://hatch.pypa.io/latest/install/) to install it. ### Tests ```sh # Run all tests for current Python version. hatch test # Run all tests for all Python versions. hatch test --all # Run all tests for a specific Python version. hatch test -py 3.11 # Run a single test file hatch test tests/test_.py ``` ### Integration Tests To run integration tests on your local machine, you need to set up your Kaggle API credentials. You can do this in one of these two ways described in the earlier sections of this document. Refer to the sections: - [Using environment variables](#option-2-read-credentials-from-environment-variables) - [Using credentials file](#option-3-read-credentials-from-kagglejson) After setting up your credentials by any of these methods, you can run the integration tests as follows: ```sh # Run all tests hatch test integration_tests ``` ### Run `kagglehub` from source #### Option 1: Execute a one-liner of code from the command line ```sh # Download a model & print the path hatch run python -c "import kagglehub; print('path: ', kagglehub.model_download('google/bert/tensorFlow2/answer-equivalence-bem'))" ``` #### Option 2: Run a saved script from the /tools/scripts directory ```sh # This runs the same code as the one-liner above, but reads it from a # checked in script located at tool/scripts/download_model.py hatch run python tools/scripts/download_model.py ``` #### Option 3: Run a temporary script from the root of the repo Any script created at the root of the repo is gitignore'd, so they're just temporary scripts for testing in development. Placing temporary scripts at the root makes the run command easier to use during local development. ```sh # Test out some new changes hatch run python test_new_feature.py ``` ### Lint / Format ```sh # Lint check hatch run lint:style hatch run lint:typing hatch run lint:all # for both # Format hatch run lint:fmt ``` ### Coverage report ```sh hatch test --cover ``` ### Build ```sh hatch build ``` ### Running `hatch` commands inside Docker This is useful to run in a consistent environment and easily switch between Python versions. The following shows how to run `hatch run lint:all` but this also works for any other hatch commands: ``` # Use default Python version ./docker-hatch run lint:all # Use specific Python version (Must be a valid tag from: https://hub.docker.com/_/python) ./docker-hatch -v 3.10 run lint:all # Run test in docker with specific Python version ./docker-hatch -v 3.10 test # Run python from specific environment (e.g. one with optional dependencies installed) ./docker-hatch run extra-deps-env:python -c "print('hello world')" # Run commands with other root-level hatch options (everything after -- gets passed to hatch) ./docker-hatch -v 3.10 -- -v env create debug-env-with-verbose-logging ``` ## VS Code setup ### Prerequisites Install the recommended extensions. ### Instructions Configure hatch to create virtual env in project folder. ``` hatch config set dirs.env.virtual .env ``` After, create all the python environments needed by running `hatch test --all`. Finally, configure vscode to use one of the selected environments: `cmd + shift + p` -> `python: Select Interpreter` -> Pick one of the folders in `./.env` ## Support The kagglehub library has configured automatic logging for console. For file based logging, setting the `KAGGLE_LOGGING_ENABLED=1` environment variable will output logs to a directory. The default log destination is resolved via the [os.path.expanduser](https://docs.python.org/3/library/os.path.html#os.path.expanduser) The table below contains possible locations: | os | log path | | ------- | ------------------------------------------------ | | osx | /user/$USERNAME/.kaggle/logs/kagglehub.log | | linux | ~/.kaggle/logs/kagglehub.log | | windows | C:\Users\\%USERNAME%\\.kaggle\logs\kagglehub.log | If needed, the root log directory can be overriden using the following environment variable: `KAGGLE_LOGGING_ROOT_DIR` Please include the log to help troubleshoot issues. ## Contributing If you'd like to contribute to `kagglehub`, please make sure to take a look at [CONTRIBUTING.md](CONTRIBUTING.md). --- # Model Summary Provide a brief overview of the model including details about its architecture, how it can be used, characteristics of the model, training data, and evaluation results. ## Usage How can this model be used? You should provide a code snippet that demonstrates how to load and/or fine-tune your model, and you should define the shape of both the inputs and the outputs. Are there known and preventable failures to be aware of? ## System Is this a standalone model or part of a system? What are the input requirements? What are the downstream dependencies when using the model outputs? ## Implementation requirements What hardware and software were used for training the model? Describe the compute requirements for training and inference (e.g., # of chips, training time, total computation, measured performance, energy consumption). # Model Characteristics ## Model initialization Was the model trained from scratch or fine-tuned from a pre-trained model? ## Model stats What’s the size of the model? Provide information about size, weights, layers, and latency. ## Other details Is the model pruned? Is it quantized? Describe any techniques to preserve differential privacy. # Data Overview Provide more details about the data used to train this model. ## Training data Describe the data that was used to train the model. How was it collected? What pre-processing was done? ## Demographic groups Describe any demographic data or attributes that suggest demographic groups ## Evaluation data What was the train / test / dev split? Are there notable differences between training and test data? # Evaluation Results ## Summary Summarize and link to evaluation results for this analysis. ## Subgroup evaluation results Did you do any subgroup analysis? Describe the results and any assumptions about disaggregating data. Are there any known and preventable failures about this model? ## Fairness How did you define fairness? What metrics and baselines did you use? What were the results of your analysis? ## Usage limitations Are there sensitive use cases? What factors might limit model performance and what conditions should be satisfied to use this model? ## Ethics What ethical factors did the model developers consider? Were any risks identified? What mitigations or remediates were undertaken? --- # Model Format Describe the format for the model (e.g. a SavedModel file for TF 2.0) # Training Data Describe the data that the model instance was trained on. # Model Inputs Describe the type and the shape of the model inputs. # Model Outputs Describe the type and the shape of the model outputs. # Model Usage Provide code snippets that demonstrate how to load and make use of the model instance. # Fine-tuning Provide code snippets that demonstrate how to fine-tune the model instance (if applicable). # Changelog Describe the differences between the different versions for this specific model instance (if applicable). --- A full model is composed of 3 types of entities: 1. The model 2. The instances 3. The instance versions Let's take the example of [efficientnet](https://www.kaggle.com/models/tensorflow/efficientnet) to explain these entities. A model like `efficientnet` contains multiple instances. An instance is a specific variation of the model (e.g. B0, B1, ...) with a certain framework (e.g. TensorFlow2). ## Model To create a model, a special `model-metadata.json` file must be specified. Here's a basic example for `model-metadata.json`: ``` { "ownerSlug": "INSERT_OWNER_SLUG_HERE", "title": "INSERT_TITLE_HERE", "slug": "INSERT_SLUG_HERE", "subtitle": "", "isPrivate": true, "description": "Model Card Markdown, see below", "publishTime": "", "provenanceSources": "" } ``` You can also use the API command `kaggle models init -p /path/to/model` to have the API create this file for you for a new model. If you wish to get the metadata for an existing model, you can use `kaggle models get username/model-slug`. ### Contents We currently support the following metadata fields for models. * `ownerSlug`: the slug of the user or organization * `title`: the model's title * `slug`: the model's slug (unique per owner) * `licenseName`: the name of the license (see the list below) * `subtitle`: the model's subtitle * `isPrivate`: whether or not the model should be private (only visible by the owners). If not specified, will be `true` * `description`: the model's card in markdown syntax (see the template below) * `publishTime`: the original publishing time of the model * `provenanceSources`: the provenance of the model ### Description You can find a template of the model card on this wiki page: https://github.com/Kaggle/kaggle-api/wiki/Model-Card ## Model Instance To create a model instance, a special `model-instance-metadata.json` file must be specified. Here's a basic example for `model-instance-metadata.json`: ``` { "ownerSlug": "INSERT_OWNER_SLUG_HERE", "modelSlug": "INSERT_EXISTING_MODEL_SLUG_HERE", "instanceSlug": "INSERT_INSTANCE_SLUG_HERE", "framework": "INSERT_FRAMEWORK_HERE", "overview": "", "usage": "Usage Markdown, see below", "licenseName": "Apache 2.0", "fineTunable": False, "trainingData": [], "modelInstanceType": "Unspecified", "baseModelInstance": "", "externalBaseModelUrl": "" } ``` You can also use the API command `kaggle models instances init -p /path/to/model-instance` to have the API create this file for you for a new model instance. ### Contents We currently support the following metadata fields for model instances. * `ownerSlug`: the slug of the user or organization of the model * `modelSlug`: the existing model's slug * `instanceSlug`: the slug of the instance * `framework`: the instance's framework (possible options: `tensorFlow1`,`tensorFlow2`,`tfLite`,`tfJs`,`pyTorch`,`jax`,`coral`, ...) * `overview`: a short overview of the instance * `usage`: the instance's usage in markdown syntax (see the template below) * `fineTunable`: whether the instance is fine tunable * `trainingData`: a list of training data in the form of strings, URLs, Kaggle Datasets, etc... * `modelInstanceType`: whether the model instance is a base model, external variant, internal variant, or unspecified * `baseModelInstance`: if this is an internal variant, the `{owner-slug}/{model-slug}/{framework}/{instance-slug}` of the base model instance * `externalBaseModelUrl`: if this is an external variant, a URL to the base model ### Licenses Here is a list of the available licenses for models: - Apache 2.0 - Attribution 3.0 IGO (CC BY 3.0 IGO) - Attribution 3.0 Unported (CC BY 3.0) - Attribution 4.0 International (CC BY 4.0) - Attribution-NoDerivatives 4.0 International (CC BY-ND 4.0) - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) - Attribution-NonCommercial-ShareAlike 3.0 IGO (CC BY-NC-SA 3.0 IGO) - BSD-3-Clause - CC BY-NC-SA 4.0 - CC BY-SA 3.0 - CC BY-SA 4.0 - CC0: Public Domain - Community Data License Agreement - Permissive - Version 1.0 - Community Data License Agreement - Sharing - Version 1.0 - GNU Affero General Public License 3.0 - GNU Free Documentation License 1.3 - GNU Lesser General Public License 3.0 - GPL 2 - MIT - ODC Attribution License (ODC-By) - ODC Public Domain Dedication and Licence (PDDL) - GPL 3 ### Usage You can find a template of the Usage markdown on this wiki page: https://github.com/Kaggle/kaggle-api/wiki/ModelInstance-Usage The following template variables can be used in this markdown: - `${VERSION_NUMBER}` is replaced by the version number when rendered - `${VARIATION_SLUG}` is replaced by the variation slug when rendered - `${FRAMEWORK}` is replaced by the framework name - `${PATH}` is replaced by `/kaggle/input////`. - `${FILEPATH}` is replaced by `/kaggle/input/////`. This value is only defined if the databundle contain a single file - `${URL}` is replaced by the absolute URL of the model