Crawl4AI 缓存系统和迁移指南

概述

从 0.5.0 版本开始,Crawl4AI 引入了一个新的缓存系统,用更直观的CacheMode枚举。此更改简化了缓存控制并使行为更加可预测。

旧方法与新方法

旧方法(已弃用)

旧系统使用多个布尔标志:-bypass_cache :完全跳过缓存 -disable_cache :禁用所有缓存 -no_cache_read :不要从缓存中读取 -no_cache_write :不写入缓存

新系统采用单一CacheMode枚举:-CacheMode.ENABLED :正常缓存(读/写) -CacheMode.DISABLED :根本没有缓存 -CacheMode.READ_ONLY :仅从缓存读取 -CacheMode.WRITE_ONLY :仅写入缓存 -CacheMode.BYPASS :此操作跳过缓存

迁移示例

旧代码(已弃用)

import asyncio
from crawl4ai import AsyncWebCrawler

async def use_proxy():
    async with AsyncWebCrawler(verbose=True) as crawler:
        result = await crawler.arun(
            url="https://www.nbcnews.com/business",
            bypass_cache=True  # Old way
        )
        print(len(result.markdown))

async def main():
    await use_proxy()

if __name__ == "__main__":
    asyncio.run(main())
import asyncio
from crawl4ai import AsyncWebCrawler, CacheMode
from crawl4ai.async_configs import CrawlerRunConfig

async def use_proxy():
    # Use CacheMode in CrawlerRunConfig
    config = CrawlerRunConfig(cache_mode=CacheMode.BYPASS)  
    async with AsyncWebCrawler(verbose=True) as crawler:
        result = await crawler.arun(
            url="https://www.nbcnews.com/business",
            config=config  # Pass the configuration object
        )
        print(len(result.markdown))

async def main():
    await use_proxy()

if __name__ == "__main__":
    asyncio.run(main())

常见的迁移模式

旧旗帜 新模式
bypass_cache=True cache_mode=CacheMode.BYPASS
disable_cache=True cache_mode=CacheMode.DISABLED
no_cache_read=True cache_mode=CacheMode.WRITE_ONLY
no_cache_write=True cache_mode=CacheMode.READ_ONLY

> Feedback