Firecrawl

Turn any website into clean, LLM-ready data — scrape, crawl, search.

Categories: Data OpsSearch
Pricing: FREEMIUM
Source: Open core
Hosting: Hybrid
Platforms: APIWeb
Models: Model-agnostic
Verified: Jun 6, 2026

A web data API for AI — scrape, crawl, map, and search pages into clean markdown or structured JSON, handling proxies, anti-bot, and JS rendering for you. Open-source core (AGPL) plus a hosted service; a default web-ingestion layer for agents and RAG pipelines.

Capabilities 3

What it actually does — grouped by capability family.

Web scraping (primary capability)

Unified search (secondary capability)

Structured extraction (secondary capability)

Pros & cons

Clean markdown / structured JSON output
Manages proxies and JS rendering for you
AGPL core, self-hostable
Scrape, crawl, map, search in one API

AGPL license constrains redistribution
Hosted usage priced by credits
Heavy sites can still need tuning

View Crawl4AI details
Data OpsFREEOSS
Crawl4AI
Crawl4AI
Open-source crawler that turns the web into clean, LLM-ready Markdown.
Crawl4AI is an open-source (Apache 2.0) web crawler and scraper built for AI pipelines, converting pages into clean Markdown or structured data for RAG, agents, and data pipelines. The core runs locally with no API key, handles JS rendering, and supports optional LLM-based extraction with any provider. It installs as a Python library/CLI or deploys as a Dockerized FastAPI server; a hosted Cloud API is in closed beta.
Core runs fully locally
You run the infra
- web-scraping
- crawling
- open-source
- markdown
- +1
Open
View Apify details
Data OpsFREEMIUM
Apify
Apify
Full-stack web scraping and browser automation platform for AI data.
A cloud platform for web scraping, data extraction, and browser automation built around 'Actors' — serverless programs that crawl sites and return structured data. Its store offers tens of thousands of ready-made Actors, and outputs clean Markdown or JSON that feed LLMs, vector databases, and RAG pipelines via LangChain and LlamaIndex. The company also maintains the open-source Crawlee crawling library for local development.
Serverless 'Actors' scale automatically
Usage-based costs add up at scale
- web-scraping
- crawling
- automation
- rag
Open
View ScrapeGraphAI details
Data OpsFREEMIUMOpen core
ScrapeGraphAI
ScrapeGraphAI
Turn any webpage into structured data with one prompt-driven API call.
ScrapeGraphAI is an AI web-scraping tool that extracts structured data from pages and documents using natural-language prompts instead of CSS selectors or XPath, orchestrating LLMs in graph-style pipelines (single-page, multi-page, search, crawl). The core library is open-source under the MIT license with Python and Node SDKs; a hosted API adds a credit-based free tier and paid plans, plus integrations with LangChain, LlamaIndex, n8n, and an MCP server.
Prompt-driven, selector-free extraction
LLM cost per extraction page
- web-scraping
- extraction
- open-source
- rag
- +1
Open
View Jina AI details
SearchFREEMIUMOpen core
Jina AI
Jina AI
Search-foundation APIs — Reader, embeddings, and reranker — for grounding LLMs.
A suite of search-foundation APIs for retrieval and RAG: a Reader that turns any URL or web search into LLM-ready markdown, multilingual multimodal embeddings, and a reranker. One key spans every service, the Reader is open source, and the embedding models are also released as open weights for self-hosting.
One key spans Reader, embeddings, reranker
Acquired by Elastic (Oct 2025); roadmap may shift
- search
- embeddings
- reranker
- rag
- +1
Open

Open Firecrawl

Firecrawl

Capabilities 3

Pros & cons

Tags

Further reading

Crawl4AI

Apify

ScrapeGraphAI

Jina AI