# Firecrawl > Documentation for Firecrawl ## Pages - [Firecrawl Documentation](firecrawl-documentation.md) - [http://firecrawl.dev llms-full.txt](httpfirecrawldev-llms-fulltxt.md): Introducing /extract - Get web data with a prompt [Try now](https://www.firecrawl.dev/extract) - [Turn websites into LLM-ready data](turn-websites-into-llm-ready-data.md): Power your AI apps with clean web data from any website. [It's also open source.](https://github.com/firecrawl/firecr... - [Scrape a website:](scrape-a-website.md): scrape_result = app.scrape_url('firecrawl.dev', - [Get web data with a prompt](get-web-data-with-a-prompt.md): Turn entire websites into structured data with AI - [Preview](preview.md): Take a look at the API response (Preview limited to 5 pages) - [PRIVACY POLICY](privacy-policy.md): Date of last revision: December 26, 2024 - [Launch Week II](launch-week-ii.md): Follow us on your favorite platform to hear about every newFirecrawllaunch during the week! - [l](l.md): Turn any website into an API with AI. - [TERMS OF USE / SERVICE AGREEMENT](terms-of-use-service-agreement.md): Date of last revision: November 5, 2024 - [Build an agent that checks for website contradictions](build-an-agent-that-checks-for-website-contradictions.md): In this quick tutorial you will learn how to use Firecrawl and Claude to scrape your website’s data and look for cont... - [Preview](preview-2.md): Take a look at the API response (Preview limited to 5 pages) - [Build a 'Chat with website' using Groq Llama 3](build-a-chat-with-website-using-groq-llama-3.md): Install our python dependencies, including langchain, groq, faiss, ollama, and firecrawl-py. - [Evaluating Web Data Extraction with CrawlBench](evaluating-web-data-extraction-with-crawlbench.md): The most common AI Engineering task, after you have a really good web scraper/crawler like Firecrawl, is to feed it i... - [Preview](preview-3.md): Take a look at the API response (Preview limited to 5 pages) - [l](l-2.md): Turn any website into an API with AI. - [Extract website data using LLMs](extract-website-data-using-llms.md): Install our python dependencies, including groq and firecrawl-py. - [Here we define the fields we want to extract from the page content](here-we-define-the-fields-we-want-to-extract-from-the-page-content.md): extract = ["summary","date","companies_building_with_quest","title_of_the_article","people_testimonials"] - [Pretty print the JSON response](pretty-print-the-json-response.md): dataExtracted = json.dumps(str(completion.choices[0].message.content), indent=4) - [OpenAI Swarm Tutorial: Create Marketing Campaigns for Any Website](openai-swarm-tutorial-create-marketing-campaigns-for-any-website.md): OpenAI Swarm Tutorial: Create Marketing Campaigns for Any Website with AI - YouTube - [Firecrawl July 2024 Updates](firecrawl-july-2024-updates.md): We are excited to share our latest updates from July! - [Handling 300k requests per day: an adventure in scaling](handling-300k-requests-per-day-an-adventure-in-scaling.md): When I joined the Firecrawl team in early July, we spent most of our time working on new features and minor bugfixes.... - [BeautifulSoup4 vs. Scrapy - A Comprehensive Comparison for Web Scraping in Python](beautifulsoup4-vs-scrapy-a-comprehensive-comparison-for-web-scraping-in-python.md): Web scraping has become an essential tool for gathering data from the internet. Whether you’re tracking prices, colle... - [Get a webpage](get-a-webpage.md): response = requests.get(' - [Find all article titles](find-all-article-titles.md): titles = soup.find_all('span', class_='titleline') - [hackernews_spider.py](hackernews-spiderpy.md): import scrapy - [To run the spider, we need to use the Scrapy command line](to-run-the-spider-we-need-to-use-the-scrapy-command-line.md) - [scrapy runspider hackernews_spider.py -o results.json](scrapy-runspider-hackernews-spiderpy-o-resultsjson.md): This code defines a simple Scrapy spider that crawls Hacker News. The spider starts at the homepage, extracts story t... - [Requires additional setup and command-line usage as seen above](requires-additional-setup-and-command-line-usage-as-seen-above.md): Here’s a simple breakdown of key features: - [Working scraper](working-scraper.md): soup.find('div', class_='product-price').text # Returns: "$99.99" - [Import required libraries](import-required-libraries.md): from firecrawl import FirecrawlApp - [Load environment variables from .env file](load-environment-variables-from-env-file.md): load_dotenv() - [Define Pydantic model for a single GitHub repository](define-pydantic-model-for-a-single-github-repository.md): class Repository(BaseModel): - [Define model for the full response containing list of repositories](define-model-for-the-full-response-containing-list-of-repositories.md): class Repositories(BaseModel): - [Scrape GitHub trending page using our defined schema](scrape-github-trending-page-using-our-defined-schema.md): trending_repos = app.scrape_url( - [Loop through the first 3 repositories and print their details](loop-through-the-first-3-repositories-and-print-their-details.md): for idx, repo in enumerate(trending_repos['extract']['repositories']): - [15 Python Web Scraping Projects: From Beginner to Advanced](15-python-web-scraping-projects-from-beginner-to-advanced.md): Web scraping is one of the most powerful tools in a programmer’s arsenal, allowing you to gather data from across the... - [Create a new virtual environment](create-a-new-virtual-environment.md): python -m venv scraping-env - [On macOS/Linux:](on-macoslinux.md): source scraping-env/bin/activate - [How to Use Firecrawl's Scrape API: Complete Web Scraping Tutorial](how-to-use-firecrawls-scrape-api-complete-web-scraping-tutorial.md): Traditional web scraping offers unique challenges. Relevant information is often scattered across multiple pages cont... - [Convert list of WeatherData objects into dictionaries](convert-list-of-weatherdata-objects-into-dictionaries.md): data_dicts = [city.model_dump() for city in data_full] - [Convert list of dictionaries into DataFrame](convert-list-of-dictionaries-into-dataframe.md): df = pd.DataFrame(data_dicts) - [Building an Intelligent Code Documentation RAG Assistant with DeepSeek and Firecrawl](building-an-intelligent-code-documentation-rag-assistant-with-deepseek-and-firec.md) - [Building an Intelligent Code Documentation Assistant: RAG-Powered DeepSeek Implementation](building-an-intelligent-code-documentation-assistant-rag-powered-deepseek-implem.md): DeepSeek R1’s release made waves in the AI community, with countless demos highlighting its impressive capabilities. ... - [Get logger for the scraper module](get-logger-for-the-scraper-module.md): logger = logging.getLogger(__name__) - [Configure logging](configure-logging.md): logging.basicConfig( - [Introducing /extract: Get structured web data with just a prompt](introducing-extract-get-structured-web-data-with-just-a-prompt.md): /extract by Firecrawl - Get structured web data with just a prompt (Open Beta) - YouTube - [Introducing Fire Engine for Firecrawl](introducing-fire-engine-for-firecrawl.md): Firecrawl handles web scraping orchestration but doesn’t do the actual scraping. It initially relied on third-party s... - [Launch Week I Recap](launch-week-i-recap.md): Last week marked an exciting milestone for Firecrawl as we kicked off our inaugural Launch Week, unveiling a series o... - [How to Use Prompt Caching and Cache Control with Anthropic Models](how-to-use-prompt-caching-and-cache-control-with-anthropic-models.md): Anthropic recently launched prompt caching and cache control in beta, allowing you to cache large context prompts up ... - [Scraping Company Data and Funding Information in Bulk With Firecrawl and Claude](scraping-company-data-and-funding-information-in-bulk-with-firecrawl-and-claude.md): In today’s data-driven business world, having access to accurate information about companies and their funding histor... - [Define the data structure we want to extract](define-the-data-structure-we-want-to-extract.md): class CompanyData(BaseModel): - [Scrape company data from Crunchbase](scrape-company-data-from-crunchbase.md): data = app.extract( - [Access the extracted data](access-the-extracted-data.md): company = CompanyData(**data["data"]) - [from scraper import CrunchbaseScraper](from-scraper-import-crunchbasescraper.md): load_dotenv() - [Crunchbase Company Data Scraper](crunchbase-company-data-scraper.md): A Streamlit app that scrapes company information and funding data from Crunchbase. - [Building an Automated Price Tracking Tool](building-an-automated-price-tracking-tool.md): There is a lot to be said about the psychology of discounts. For example, buying a discounted item we don’t need isn’... - [Set up sidebar](set-up-sidebar.md): with st.sidebar: - [Main content](main-content.md): st.title("Price Tracker Dashboard") - [utils.py](utilspy.md): from urllib.parse import urlparse - [Set up sidebar](set-up-sidebar-2.md): with st.sidebar: - [Main content](main-content-2.md): ... - [Set up sidebar](set-up-sidebar-3.md): with st.sidebar: - [Main content](main-content-3.md): st.title("Price Tracker Dashboard") - [Get all products](get-all-products.md): products = db.get_all_products() - [Create a card for each product](create-a-card-for-each-product.md): for product in products: - [Create a card for each product](create-a-card-for-each-product-2.md): for product in products: - [Threshold percentage for price drop alerts (e.g., 5% = 0.05)](threshold-percentage-for-price-drop-alerts-eg-5-005.md): PRICE_DROP_THRESHOLD = 0.05 - [Web Scraping Automation: How to Run Scrapers on a Schedule](web-scraping-automation-how-to-run-scrapers-on-a-schedule.md): Web scraping is an essential skill for programmers in this data-driven world. Whether you’re tracking prices, monitor... - [firecrawl_scraper.py](firecrawl-scraperpy.md): import json - [Schedule it](schedule-it.md): schedule.every(3).seconds.do(job) - [Schedule the scraper to run every hour](schedule-the-scraper-to-run-every-hour.md): schedule.every().hour.do(save_firecrawl_news_data) - [cron_scraper.py](cron-scraperpy.md): import sys - [Set up logging](set-up-logging.md): log_dir = Path("logs") - [Run every minute](run-every-minute.md): */1 * * * * cd /absolute/path/to/project && /absolute/path/to/.venv/bin/python cron_scraper.py >> ~/cron.log 2>&1 - [Run every hour](run-every-hour.md): */1 * * * * cd /absolute/path/to/project && /absolute/path/to/.venv/bin/python cron_scraper.py >> ~/cron.log 2>&1 - [Initialize git in your project directory](initialize-git-in-your-project-directory.md): git init - [Create a new repo on GitHub.com, then:](create-a-new-repo-on-githubcom-then.md): git remote add origin - [Scraping Job Boards Using Firecrawl Actions and OpenAI](scraping-job-boards-using-firecrawl-actions-and-openai.md): Scraping job boards to extract structured data can be a complex task, especially when dealing with dynamic websites a... - [Initialize API keys](initialize-api-keys.md): firecrawl_api_key = os.getenv("FIRECRAWL_API_KEY") - [URL of the jobs page to scrape](url-of-the-jobs-page-to-scrape.md): jobs_page_url = " - [Candidate's resume (as a string)](candidates-resume-as-a-string.md): resume_paste = """ - [Extract apply links using OpenAI](extract-apply-links-using-openai.md): apply_links = [] - [Initialize a list to store job data](initialize-a-list-to-store-job-data.md): extracted_data = [] - [Define the extraction schema](define-the-extraction-schema.md): schema = { - [Extract job details for each link](extract-job-details-for-each-link.md): for link in apply_links: - [Prepare the prompt](prepare-the-prompt.md): prompt = f""" - [Get recommendations from OpenAI](get-recommendations-from-openai.md): completion = openai.ChatCompletion.create( - [Extract recommended jobs](extract-recommended-jobs.md): recommended_jobs = json.loads(completion.choices[0].message.content.strip()) - [Output the recommended jobs](output-the-recommended-jobs.md): print(json.dumps(recommended_jobs, indent=2)) - [Using LLM Extraction for Customer Insights](using-llm-extraction-for-customer-insights.md): Understanding our customers - not just who they are, but what they do—is crucial to tailoring our products and servic... - [Launch Week I / Day 7: Crawl Webhooks (v1)](launch-week-i-day-7-crawl-webhooks-v1.md): Welcome to Day 7 of Firecrawl’s Launch Week! We’re excited to introduce new /crawl webhook support. - [Getting Started with OpenAI's Predicted Outputs for Faster LLM Responses](getting-started-with-openais-predicted-outputs-for-faster-llm-responses.md): Leveraging the full potential of Large Language Models (LLMs) often involves balancing between response accuracy and ... - [Retrieve API keys from environment variables](retrieve-api-keys-from-environment-variables.md): firecrawl_api_key = os.getenv("FIRECRAWL_API_KEY") - [Initialize the FirecrawlApp and OpenAI client](initialize-the-firecrawlapp-and-openai-client.md): app = FirecrawlApp(api_key=firecrawl_api_key) - [Get the blog URL (you can input your own)](get-the-blog-url-you-can-input-your-own.md): blog_url = " - [Scrape the blog content in markdown format](scrape-the-blog-content-in-markdown-format.md): blog_scrape_result = app.scrape_url(blog_url, params={'formats': ['markdown']}) - [Extract the top-level domain](extract-the-top-level-domain.md): top_level_domain = '/'.join(blog_url.split('/')[:3]) - [Map the website to get all internal links](map-the-website-to-get-all-internal-links.md): site_map = app.map_url(top_level_domain) - [Scrape and Analyze Airbnb Data with Firecrawl and E2B](scrape-and-analyze-airbnb-data-with-firecrawl-and-e2b.md): This cookbook demonstrates how to scrape Airbnb data and analyze it using [Firecrawl](https://www.firecrawl.dev/) and... - [TODO: Get your E2B API key from https://e2b.dev/docs](todo-get-your-e2b-api-key-from-httpse2bdevdocs.md): E2B_API_KEY="" - [TODO: Get your Firecrawl API key from https://firecrawl.dev](todo-get-your-firecrawl-api-key-from-httpsfirecrawldev.md): FIRECRAWL_API_KEY="" - [TODO: Get your Anthropic API key from https://anthropic.com](todo-get-your-anthropic-api-key-from-httpsanthropiccom.md): ANTHROPIC_API_KEY="" - [Mastering Firecrawl's Crawl Endpoint: A Complete Web Scraping Guide](mastering-firecrawls-crawl-endpoint-a-complete-web-scraping-guide.md): Web scraping and data extraction have become essential tools as businesses race to convert unprecedented amounts of o... - [Womens Fiction](womens-fiction.md): **17** results. - [Crawl the first 5 pages of the stripe API documentation](crawl-the-first-5-pages-of-the-stripe-api-documentation.md): stripe_crawl_result = app.crawl_url( - [Crawl the first 5 pages of the stripe API documentation](crawl-the-first-5-pages-of-the-stripe-api-documentation-2.md): stripe_crawl_result = app.crawl_url( - [Example of URL control parameters](example-of-url-control-parameters.md): url_control_result = app.crawl_url( - [Print the total number of pages crawled](print-the-total-number-of-pages-crawled.md): print(f"Total pages crawled: {url_control_result['total']}") - [Example of URL control parameters](example-of-url-control-parameters-2.md): url_control_result = app.crawl_url( - [Print the total number of pages crawled](print-the-total-number-of-pages-crawled-2.md): print(f"Total pages crawled: {url_control_result['total']}") - [Query the database](query-the-database.md): conn = sqlite3.connect("crawl_results.db") - [Start the crawl](start-the-crawl.md): crawl_status = app.async_crawl_url(url=" - [Save results incrementally](save-results-incrementally.md): save_incremental_results(app, crawl_status["id"]) - [Start the crawl](start-the-crawl-2.md): docs = loader.load() - [Add text splitting before creating the vector store](add-text-splitting-before-creating-the-vector-store.md): text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100) - [Split the documents](split-the-documents.md): split_docs = text_splitter.split_documents(docs) - [Create embeddings for the documents](create-embeddings-for-the-documents.md): embeddings = OpenAIEmbeddings() - [Create a vector store from the loaded documents](create-a-vector-store-from-the-loaded-documents.md): docs = filter_complex_metadata(docs) - [Initialize the language model](initialize-the-language-model.md): llm = ChatAnthropic(model="claude-3-5-sonnet-20240620", streaming=True) - [Create a QA chain](create-a-qa-chain.md): qa_chain = RetrievalQA.from_chain_type( - [Example question](example-question.md): query = "What is the main topic of the website?" - [Building an AI Resume Job Matching App With Firecrawl And Claude](building-an-ai-resume-job-matching-app-with-firecrawl-and-claude.md): Finding the perfect job can feel like searching for a needle in a haystack. As a developer, you might spend hours sca... - [How Gamma Supercharges Onboarding with Firecrawl](how-gamma-supercharges-onboarding-with-firecrawl.md): At [Gamma](https://gamma.app/), we recently launched Gamma Sites, which allows anyone to build a website as easily as... - [A Complete Guide Scraping Authenticated Websites with cURL and Firecrawl](a-complete-guide-scraping-authenticated-websites-with-curl-and-firecrawl.md): Scraping authenticated websites is often a key requirement for developers and data analysts. While many graphical too... - [Getting Started with Grok-2: Setup and Web Crawler Example](getting-started-with-grok-2-setup-and-web-crawler-example.md): Grok-2, the latest language model from x.ai, brings advanced language understanding capabilities to developers, enabl... - [Load environment variables from .env file](load-environment-variables-from-env-file-2.md): load_dotenv() - [Retrieve API keys](retrieve-api-keys.md): grok_api_key = os.getenv("GROK_API_KEY") - [Initialize FirecrawlApp](initialize-firecrawlapp.md): app = FirecrawlApp(api_key=firecrawl_api_key) - [How to quickly install BeautifulSoup with Python](how-to-quickly-install-beautifulsoup-with-python.md): [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) is a Python library for pulling data out of H... - ['Example Domain'](example-domain.md): We use the requests library to fetch the HTML from a URL, then pass it to BeautifulSoup to parse. This allows us to n... - [Launch Week I / Day 6: LLM Extract (v1)](launch-week-i-day-6-llm-extract-v1.md): Welcome to Day 6 of Firecrawl’s Launch Week! We’re excited to introduce v1 support for LLM Extract. - [Cloudflare Error 1015: How to solve it?](cloudflare-error-1015-how-to-solve-it.md): Cloudflare Error 1015 is a rate limiting error that occurs when Cloudflare detects that you are exceeding the request... - [Launch Week I / Day 1: Introducing Teams](launch-week-i-day-1-introducing-teams.md): Welcome to Firecrawl’s first ever Launch Week! Over the course of the next five days, we’ll be bringing you an exciti...