# Firecrawl

> Documentation for Firecrawl

## Pages

- [Firecrawl Documentation](firecrawl-documentation.md)
- [http://firecrawl.dev llms-full.txt](httpfirecrawldev-llms-fulltxt.md): Introducing /extract - Get web data with a prompt [Try now](https://www.firecrawl.dev/extract)
- [Turn websites into   LLM-ready data](turn-websites-into-llm-ready-data.md): Power your AI apps with clean web data from any website. [It's also open source.](https://github.com/firecrawl/firecr...
- [Scrape a website:](scrape-a-website.md): scrape_result = app.scrape_url('firecrawl.dev',
- [Get web data  with a prompt](get-web-data-with-a-prompt.md): Turn entire websites into structured data with AI
- [Preview](preview.md): Take a look at the API response (Preview limited to 5 pages)
- [PRIVACY POLICY](privacy-policy.md): Date of last revision: December 26, 2024
- [Launch Week II](launch-week-ii.md): Follow us on your favorite platform to hear about every newFirecrawllaunch during the week!
- [l](l.md): Turn any website into an API with AI.
- [TERMS OF USE / SERVICE AGREEMENT](terms-of-use-service-agreement.md): Date of last revision: November 5, 2024
- [Build an agent that checks for website contradictions](build-an-agent-that-checks-for-website-contradictions.md): In this quick tutorial you will learn how to use Firecrawl and Claude to scrape your website’s data and look for cont...
- [Preview](preview-2.md): Take a look at the API response (Preview limited to 5 pages)
- [Build a 'Chat with website' using Groq Llama 3](build-a-chat-with-website-using-groq-llama-3.md): Install our python dependencies, including langchain, groq, faiss, ollama, and firecrawl-py.
- [Evaluating Web Data Extraction with CrawlBench](evaluating-web-data-extraction-with-crawlbench.md): The most common AI Engineering task, after you have a really good web scraper/crawler like Firecrawl, is to feed it i...
- [Preview](preview-3.md): Take a look at the API response (Preview limited to 5 pages)
- [l](l-2.md): Turn any website into an API with AI.
- [Extract website data using LLMs](extract-website-data-using-llms.md): Install our python dependencies, including groq and firecrawl-py.
- [Here we define the fields we want to extract from the page content](here-we-define-the-fields-we-want-to-extract-from-the-page-content.md): extract = ["summary","date","companies_building_with_quest","title_of_the_article","people_testimonials"]
- [Pretty print the JSON response](pretty-print-the-json-response.md): dataExtracted = json.dumps(str(completion.choices[0].message.content), indent=4)
- [OpenAI Swarm Tutorial: Create Marketing Campaigns for Any Website](openai-swarm-tutorial-create-marketing-campaigns-for-any-website.md): OpenAI Swarm Tutorial: Create Marketing Campaigns for Any Website with AI - YouTube
- [Firecrawl July 2024 Updates](firecrawl-july-2024-updates.md): We are excited to share our latest updates from July!
- [Handling 300k requests per day: an adventure in scaling](handling-300k-requests-per-day-an-adventure-in-scaling.md): When I joined the Firecrawl team in early July, we spent most of our time working on new features and minor bugfixes....
- [BeautifulSoup4 vs. Scrapy - A Comprehensive Comparison for Web Scraping in Python](beautifulsoup4-vs-scrapy-a-comprehensive-comparison-for-web-scraping-in-python.md): Web scraping has become an essential tool for gathering data from the internet. Whether you’re tracking prices, colle...
- [Get a webpage](get-a-webpage.md): response = requests.get('
- [Find all article titles](find-all-article-titles.md): titles = soup.find_all('span', class_='titleline')
- [hackernews_spider.py](hackernews-spiderpy.md): import scrapy
- [To run the spider, we need to use the Scrapy command line](to-run-the-spider-we-need-to-use-the-scrapy-command-line.md)
- [scrapy runspider hackernews_spider.py -o results.json](scrapy-runspider-hackernews-spiderpy-o-resultsjson.md): This code defines a simple Scrapy spider that crawls Hacker News. The spider starts at the homepage, extracts story t...
- [Requires additional setup and command-line usage as seen above](requires-additional-setup-and-command-line-usage-as-seen-above.md): Here’s a simple breakdown of key features:
- [Working scraper](working-scraper.md): soup.find('div', class_='product-price').text # Returns: "$99.99"
- [Import required libraries](import-required-libraries.md): from firecrawl import FirecrawlApp
- [Load environment variables from .env file](load-environment-variables-from-env-file.md): load_dotenv()
- [Define Pydantic model for a single GitHub repository](define-pydantic-model-for-a-single-github-repository.md): class Repository(BaseModel):
- [Define model for the full response containing list of repositories](define-model-for-the-full-response-containing-list-of-repositories.md): class Repositories(BaseModel):
- [Scrape GitHub trending page using our defined schema](scrape-github-trending-page-using-our-defined-schema.md): trending_repos = app.scrape_url(
- [Loop through the first 3 repositories and print their details](loop-through-the-first-3-repositories-and-print-their-details.md): for idx, repo in enumerate(trending_repos['extract']['repositories']):
- [15 Python Web Scraping Projects: From Beginner to Advanced](15-python-web-scraping-projects-from-beginner-to-advanced.md): Web scraping is one of the most powerful tools in a programmer’s arsenal, allowing you to gather data from across the...
- [Create a new virtual environment](create-a-new-virtual-environment.md): python -m venv scraping-env
- [On macOS/Linux:](on-macoslinux.md): source scraping-env/bin/activate
- [How to Use Firecrawl's Scrape API: Complete Web Scraping Tutorial](how-to-use-firecrawls-scrape-api-complete-web-scraping-tutorial.md): Traditional web scraping offers unique challenges. Relevant information is often scattered across multiple pages cont...
- [Convert list of WeatherData objects into dictionaries](convert-list-of-weatherdata-objects-into-dictionaries.md): data_dicts = [city.model_dump() for city in data_full]
- [Convert list of dictionaries into DataFrame](convert-list-of-dictionaries-into-dataframe.md): df = pd.DataFrame(data_dicts)
- [Building an Intelligent Code Documentation RAG Assistant with DeepSeek and Firecrawl](building-an-intelligent-code-documentation-rag-assistant-with-deepseek-and-firec.md)
- [Building an Intelligent Code Documentation Assistant: RAG-Powered DeepSeek Implementation](building-an-intelligent-code-documentation-assistant-rag-powered-deepseek-implem.md): DeepSeek R1’s release made waves in the AI community, with countless demos highlighting its impressive capabilities. ...
- [Get logger for the scraper module](get-logger-for-the-scraper-module.md): logger = logging.getLogger(__name__)
- [Configure logging](configure-logging.md): logging.basicConfig(
- [Introducing /extract: Get structured web data with just a prompt](introducing-extract-get-structured-web-data-with-just-a-prompt.md): /extract by Firecrawl - Get structured web data with just a prompt (Open Beta) - YouTube
- [Introducing Fire Engine for Firecrawl](introducing-fire-engine-for-firecrawl.md): Firecrawl handles web scraping orchestration but doesn’t do the actual scraping. It initially relied on third-party s...
- [Launch Week I Recap](launch-week-i-recap.md): Last week marked an exciting milestone for Firecrawl as we kicked off our inaugural Launch Week, unveiling a series o...
- [How to Use Prompt Caching and Cache Control with Anthropic Models](how-to-use-prompt-caching-and-cache-control-with-anthropic-models.md): Anthropic recently launched prompt caching and cache control in beta, allowing you to cache large context prompts up ...
- [Scraping Company Data and Funding Information in Bulk With Firecrawl and Claude](scraping-company-data-and-funding-information-in-bulk-with-firecrawl-and-claude.md): In today’s data-driven business world, having access to accurate information about companies and their funding histor...
- [Define the data structure we want to extract](define-the-data-structure-we-want-to-extract.md): class CompanyData(BaseModel):
- [Scrape company data from Crunchbase](scrape-company-data-from-crunchbase.md): data = app.extract(
- [Access the extracted data](access-the-extracted-data.md): company = CompanyData(**data["data"])
- [from scraper import CrunchbaseScraper](from-scraper-import-crunchbasescraper.md): load_dotenv()
- [Crunchbase Company Data Scraper](crunchbase-company-data-scraper.md): A Streamlit app that scrapes company information and funding data from Crunchbase.
- [Building an Automated Price Tracking Tool](building-an-automated-price-tracking-tool.md): There is a lot to be said about the psychology of discounts. For example, buying a discounted item we don’t need isn’...
- [Set up sidebar](set-up-sidebar.md): with st.sidebar:
- [Main content](main-content.md): st.title("Price Tracker Dashboard")
- [utils.py](utilspy.md): from urllib.parse import urlparse
- [Set up sidebar](set-up-sidebar-2.md): with st.sidebar:
- [Main content](main-content-2.md): ...
- [Set up sidebar](set-up-sidebar-3.md): with st.sidebar:
- [Main content](main-content-3.md): st.title("Price Tracker Dashboard")
- [Get all products](get-all-products.md): products = db.get_all_products()
- [Create a card for each product](create-a-card-for-each-product.md): for product in products:
- [Create a card for each product](create-a-card-for-each-product-2.md): for product in products:
- [Threshold percentage for price drop alerts (e.g., 5% = 0.05)](threshold-percentage-for-price-drop-alerts-eg-5-005.md): PRICE_DROP_THRESHOLD = 0.05
- [Web Scraping Automation: How to Run Scrapers on a Schedule](web-scraping-automation-how-to-run-scrapers-on-a-schedule.md): Web scraping is an essential skill for programmers in this data-driven world. Whether you’re tracking prices, monitor...
- [firecrawl_scraper.py](firecrawl-scraperpy.md): import json
- [Schedule it](schedule-it.md): schedule.every(3).seconds.do(job)
- [Schedule the scraper to run every hour](schedule-the-scraper-to-run-every-hour.md): schedule.every().hour.do(save_firecrawl_news_data)
- [cron_scraper.py](cron-scraperpy.md): import sys
- [Set up logging](set-up-logging.md): log_dir = Path("logs")
- [Run every minute](run-every-minute.md): */1 * * * * cd /absolute/path/to/project && /absolute/path/to/.venv/bin/python cron_scraper.py >> ~/cron.log 2>&1
- [Run every hour](run-every-hour.md): */1 * * * * cd /absolute/path/to/project && /absolute/path/to/.venv/bin/python cron_scraper.py >> ~/cron.log 2>&1
- [Initialize git in your project directory](initialize-git-in-your-project-directory.md): git init
- [Create a new repo on GitHub.com, then:](create-a-new-repo-on-githubcom-then.md): git remote add origin
- [Scraping Job Boards Using Firecrawl Actions and OpenAI](scraping-job-boards-using-firecrawl-actions-and-openai.md): Scraping job boards to extract structured data can be a complex task, especially when dealing with dynamic websites a...
- [Initialize API keys](initialize-api-keys.md): firecrawl_api_key = os.getenv("FIRECRAWL_API_KEY")
- [URL of the jobs page to scrape](url-of-the-jobs-page-to-scrape.md): jobs_page_url = "
- [Candidate's resume (as a string)](candidates-resume-as-a-string.md): resume_paste = """
- [Extract apply links using OpenAI](extract-apply-links-using-openai.md): apply_links = []
- [Initialize a list to store job data](initialize-a-list-to-store-job-data.md): extracted_data = []
- [Define the extraction schema](define-the-extraction-schema.md): schema = {
- [Extract job details for each link](extract-job-details-for-each-link.md): for link in apply_links:
- [Prepare the prompt](prepare-the-prompt.md): prompt = f"""
- [Get recommendations from OpenAI](get-recommendations-from-openai.md): completion = openai.ChatCompletion.create(
- [Extract recommended jobs](extract-recommended-jobs.md): recommended_jobs = json.loads(completion.choices[0].message.content.strip())
- [Output the recommended jobs](output-the-recommended-jobs.md): print(json.dumps(recommended_jobs, indent=2))
- [Using LLM Extraction for Customer Insights](using-llm-extraction-for-customer-insights.md): Understanding our customers - not just who they are, but what they do—is crucial to tailoring our products and servic...
- [Launch Week I / Day 7: Crawl Webhooks (v1)](launch-week-i-day-7-crawl-webhooks-v1.md): Welcome to Day 7 of Firecrawl’s Launch Week! We’re excited to introduce new /crawl webhook support.
- [Getting Started with OpenAI's Predicted Outputs for Faster LLM Responses](getting-started-with-openais-predicted-outputs-for-faster-llm-responses.md): Leveraging the full potential of Large Language Models (LLMs) often involves balancing between response accuracy and ...
- [Retrieve API keys from environment variables](retrieve-api-keys-from-environment-variables.md): firecrawl_api_key = os.getenv("FIRECRAWL_API_KEY")
- [Initialize the FirecrawlApp and OpenAI client](initialize-the-firecrawlapp-and-openai-client.md): app = FirecrawlApp(api_key=firecrawl_api_key)
- [Get the blog URL (you can input your own)](get-the-blog-url-you-can-input-your-own.md): blog_url = "
- [Scrape the blog content in markdown format](scrape-the-blog-content-in-markdown-format.md): blog_scrape_result = app.scrape_url(blog_url, params={'formats': ['markdown']})
- [Extract the top-level domain](extract-the-top-level-domain.md): top_level_domain = '/'.join(blog_url.split('/')[:3])
- [Map the website to get all internal links](map-the-website-to-get-all-internal-links.md): site_map = app.map_url(top_level_domain)
- [Scrape and Analyze Airbnb Data with Firecrawl and E2B](scrape-and-analyze-airbnb-data-with-firecrawl-and-e2b.md): This cookbook demonstrates how to scrape Airbnb data and analyze it using [Firecrawl](https://www.firecrawl.dev/) and...
- [TODO: Get your E2B API key from https://e2b.dev/docs](todo-get-your-e2b-api-key-from-httpse2bdevdocs.md): E2B_API_KEY=""
- [TODO: Get your Firecrawl API key from https://firecrawl.dev](todo-get-your-firecrawl-api-key-from-httpsfirecrawldev.md): FIRECRAWL_API_KEY=""
- [TODO: Get your Anthropic API key from https://anthropic.com](todo-get-your-anthropic-api-key-from-httpsanthropiccom.md): ANTHROPIC_API_KEY=""
- [Mastering Firecrawl's Crawl Endpoint: A Complete Web Scraping Guide](mastering-firecrawls-crawl-endpoint-a-complete-web-scraping-guide.md): Web scraping and data extraction have become essential tools as businesses race to convert unprecedented amounts of o...
- [Womens Fiction](womens-fiction.md): **17** results.
- [Crawl the first 5 pages of the stripe API documentation](crawl-the-first-5-pages-of-the-stripe-api-documentation.md): stripe_crawl_result = app.crawl_url(
- [Crawl the first 5 pages of the stripe API documentation](crawl-the-first-5-pages-of-the-stripe-api-documentation-2.md): stripe_crawl_result = app.crawl_url(
- [Example of URL control parameters](example-of-url-control-parameters.md): url_control_result = app.crawl_url(
- [Print the total number of pages crawled](print-the-total-number-of-pages-crawled.md): print(f"Total pages crawled: {url_control_result['total']}")
- [Example of URL control parameters](example-of-url-control-parameters-2.md): url_control_result = app.crawl_url(
- [Print the total number of pages crawled](print-the-total-number-of-pages-crawled-2.md): print(f"Total pages crawled: {url_control_result['total']}")
- [Query the database](query-the-database.md): conn = sqlite3.connect("crawl_results.db")
- [Start the crawl](start-the-crawl.md): crawl_status = app.async_crawl_url(url="
- [Save results incrementally](save-results-incrementally.md): save_incremental_results(app, crawl_status["id"])
- [Start the crawl](start-the-crawl-2.md): docs = loader.load()
- [Add text splitting before creating the vector store](add-text-splitting-before-creating-the-vector-store.md): text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
- [Split the documents](split-the-documents.md): split_docs = text_splitter.split_documents(docs)
- [Create embeddings for the documents](create-embeddings-for-the-documents.md): embeddings = OpenAIEmbeddings()
- [Create a vector store from the loaded documents](create-a-vector-store-from-the-loaded-documents.md): docs = filter_complex_metadata(docs)
- [Initialize the language model](initialize-the-language-model.md): llm = ChatAnthropic(model="claude-3-5-sonnet-20240620", streaming=True)
- [Create a QA chain](create-a-qa-chain.md): qa_chain = RetrievalQA.from_chain_type(
- [Example question](example-question.md): query = "What is the main topic of the website?"
- [Building an AI Resume Job Matching App With Firecrawl And Claude](building-an-ai-resume-job-matching-app-with-firecrawl-and-claude.md): Finding the perfect job can feel like searching for a needle in a haystack. As a developer, you might spend hours sca...
- [How Gamma Supercharges Onboarding with Firecrawl](how-gamma-supercharges-onboarding-with-firecrawl.md): At [Gamma](https://gamma.app/), we recently launched Gamma Sites, which allows anyone to build a website as easily as...
- [A Complete Guide Scraping Authenticated Websites with cURL and Firecrawl](a-complete-guide-scraping-authenticated-websites-with-curl-and-firecrawl.md): Scraping authenticated websites is often a key requirement for developers and data analysts. While many graphical too...
- [Getting Started with Grok-2: Setup and Web Crawler Example](getting-started-with-grok-2-setup-and-web-crawler-example.md): Grok-2, the latest language model from x.ai, brings advanced language understanding capabilities to developers, enabl...
- [Load environment variables from .env file](load-environment-variables-from-env-file-2.md): load_dotenv()
- [Retrieve API keys](retrieve-api-keys.md): grok_api_key = os.getenv("GROK_API_KEY")
- [Initialize FirecrawlApp](initialize-firecrawlapp.md): app = FirecrawlApp(api_key=firecrawl_api_key)
- [How to quickly install BeautifulSoup with Python](how-to-quickly-install-beautifulsoup-with-python.md): [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) is a Python library for pulling data out of H...
- ['Example Domain'](example-domain.md): We use the requests library to fetch the HTML from a URL, then pass it to BeautifulSoup to parse. This allows us to n...
- [Launch Week I / Day 6: LLM Extract (v1)](launch-week-i-day-6-llm-extract-v1.md): Welcome to Day 6 of Firecrawl’s Launch Week! We’re excited to introduce v1 support for LLM Extract.
- [Cloudflare Error 1015: How to solve it?](cloudflare-error-1015-how-to-solve-it.md): Cloudflare Error 1015 is a rate limiting error that occurs when Cloudflare detects that you are exceeding the request...
- [Launch Week I / Day 1: Introducing Teams](launch-week-i-day-1-introducing-teams.md): Welcome to Firecrawl’s first ever Launch Week! Over the course of the next five days, we’ll be bringing you an exciti...