安装和设置(2023 版)

¥Installation & Setup (2023 Edition)

1.基本安装

¥1. Basic Installation

pip install crawl4ai

这将安装Crawl4AI 库以及必要的依赖项。还包含高级功能(如变压器或 PyTorch)。

¥This installs the core Crawl4AI library along with essential dependencies. No advanced features (like transformers or PyTorch) are included yet.

2. 初始设置和诊断

¥2. Initial Setup & Diagnostics

2.1 运行安装命令

¥2.1 Run the Setup Command

安装完成后调用:

¥After installing, call:

crawl4ai-setup

它起什么作用? - 安装或更新常规模式和未检测模式所需的浏览器依赖项 - 执行操作系统级检查(例如,Linux 上缺少库) - 确认您的环境已准备好进行爬网

¥What does it do? - Installs or updates required browser dependencies for both regular and undetected modes - Performs OS-level checks (e.g., missing libs on Linux) - Confirms your environment is ready to crawl

2.2 诊断

¥2.2 Diagnostics

或者,你可以运行诊断确认一切正常:

¥Optionally, you can run diagnostics to confirm everything is functioning:

crawl4ai-doctor

此命令尝试: - 检查 Python 版本兼容性 - 验证 Playwright 安装 - 检查环境变量或库冲突

¥This command attempts to: - Check Python version compatibility - Verify Playwright installation - Inspect environment variables or library conflicts

如果出现任何问题,请按照其建议(例如,安装额外的系统包)并重新运行crawl4ai-setup

¥If any issues arise, follow its suggestions (e.g., installing additional system packages) and re-run crawl4ai-setup.


3. 验证安装:简单爬取(如果已经运行,请跳过此步骤crawl4ai-doctor)

¥3. Verifying Installation: A Simple Crawl (Skip this step if you already run crawl4ai-doctor)

下面是一个最小的 Python 脚本,演示了基本的爬网。它使用我们新的BrowserConfigCrawlerRunConfig为了清楚起见,尽管此示例中没有传递任何自定义设置:

¥Below is a minimal Python script demonstrating a basic crawl. It uses our new BrowserConfig and CrawlerRunConfig for clarity, though no custom settings are passed in this example:

import asyncio
from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig

async def main():
    async with AsyncWebCrawler() as crawler:
        result = await crawler.arun(
            url="https://www.example.com",
        )
        print(result.markdown[:300])  # Show the first 300 characters of extracted text

if __name__ == "__main__":
    asyncio.run(main())

预期的结果: - 无头浏览器会话加载example.com- Crawl4AI 返回约 300 个 markdown 字符。
如果发生错误,请重新运行crawl4ai-doctor或手动确保 Playwright 已正确安装。

¥Expected outcome: - A headless browser session loads example.com - Crawl4AI returns ~300 characters of markdown.
If errors occur, rerun crawl4ai-doctor or manually ensure Playwright is installed correctly.


4.高级安装(可选)

¥4. Advanced Installation (Optional)

警告:仅安装这些如果你真的需要它们。它们带来更大的依赖关系,包括大型模型,这会显著增加磁盘使用量和内存负载。

¥Warning: Only install these if you truly need them. They bring in larger dependencies, including big models, which can increase disk usage and memory load significantly.

4.1 火炬、变压器或全部

¥4.1 Torch, Transformers, or All

  • 文本聚类(Torch)

    pip install crawl4ai[torch]
    crawl4ai-setup
    
    安装基于 PyTorch 的功能(例如余弦相似度或高级语义分块)。

    ¥

    Text Clustering (Torch)

    pip install crawl4ai[torch]
    crawl4ai-setup
    
    Installs PyTorch-based features (e.g., cosine similarity or advanced semantic chunking).

  • 变形金刚

    pip install crawl4ai[transformer]
    crawl4ai-setup
    
    添加基于 Hugging Face 的摘要或生成策略。

    ¥

    Transformers

    pip install crawl4ai[transformer]
    crawl4ai-setup
    
    Adds Hugging Face-based summarization or generation strategies.

  • 所有功能

    pip install crawl4ai[all]
    crawl4ai-setup
    

    ¥

    All Features

    pip install crawl4ai[all]
    crawl4ai-setup
    

(可选)预取模型

¥(Optional) Pre-Fetching Models

crawl4ai-download-models
This step caches large models locally (if needed). Only do this if your workflow requires them.


5. Docker(实验性)

¥5. Docker (Experimental)

我们提供暂时的Docker 方法进行测试。它不稳定,可能会破裂未来版本中也会有更新。我们计划在 2025 年第一季度的稳定版本中对 Docker 进行重大改进。如果您还想尝试:

¥We provide a temporary Docker approach for testing. It’s not stable and may break with future releases. We plan a major Docker revamp in a future stable version, 2025 Q1. If you still want to try:

docker pull unclecode/crawl4ai:basic
docker run -p 11235:11235 unclecode/crawl4ai:basic

然后你可以向http://localhost:11235/crawl执行爬行。生产用途在我们的新 Docker 方法准备就绪(计划于 2025 年 1 月或 2 月)之前,我们不鼓励这样做。

¥You can then make POST requests to http://localhost:11235/crawl to perform crawls. Production usage is discouraged until our new Docker approach is ready (planned in Jan or Feb 2025).


6. 本地服务器模式(旧版)

¥6. Local Server Mode (Legacy)

一些较旧的文档提到将 Crawl4AI 作为本地服务器运行。这种方法已被部分替换新的基于 Docker 的原型和即将发布的稳定服务器版本。您可以尝试,但预计会有重大变化。新的 Docker 架构最终确定后,官方本地服务器说明将发布。

¥Some older docs mention running Crawl4AI as a local server. This approach has been partially replaced by the new Docker-based prototype and upcoming stable server release. You can experiment, but expect major changes. Official local server instructions will arrive once the new Docker architecture is finalized.


概括

¥Summary

1.安装pip install crawl4ai然后运行crawl4ai-setup.2.诊断crawl4ai-doctor如果您看到错误。3.核实通过爬行example.com以最少的BrowserConfig+CrawlerRunConfig .4.先进的功能(Torch、Transformers)是选修的—如果不需要它们,请避免使用它们(它们会显著增加资源使用量)。5. Docker实验—在稳定版本发布之前,使用风险自负。6.本地服务器旧文档中的引用已基本被弃用;新的解决方案正在进行中。

¥1. Install with pip install crawl4ai and run crawl4ai-setup. 2. Diagnose with crawl4ai-doctor if you see errors. 3. Verify by crawling example.com with minimal BrowserConfig + CrawlerRunConfig. 4. Advanced features (Torch, Transformers) are optional—avoid them if you don’t need them (they significantly increase resource usage). 5. Docker is experimental—use at your own risk until the stable version is released. 6. Local server references in older docs are largely deprecated; a new solution is in progress.

有问题吗?查看GitHub 问题了解更新或询问社区!

¥Got questions? Check GitHub issues for updates or ask the community!


> Feedback