Crawl4AI 缓存系统和迁移指南

¥Crawl4AI Cache System and Migration Guide

概述

¥Overview

从 0.5.0 版本开始，Crawl4AI 引入了一个新的缓存系统，用更直观的CacheMode枚举。此更改简化了缓存控制并使行为更加可预测。

¥Starting from version 0.5.0, Crawl4AI introduces a new caching system that replaces the old boolean flags with a more intuitive CacheMode enum. This change simplifies cache control and makes the behavior more predictable.

旧方法与新方法

¥Old vs New Approach

旧方法（已弃用）

¥Old Way (Deprecated)

旧系统使用多个布尔标志：-bypass_cache ：完全跳过缓存 -disable_cache ：禁用所有缓存 -no_cache_read ：不要从缓存中读取 -no_cache_write ：不写入缓存

¥The old system used multiple boolean flags: - bypass_cache: Skip cache entirely - disable_cache: Disable all caching - no_cache_read: Don't read from cache - no_cache_write: Don't write to cache

新方法（推荐）

¥New Way (Recommended)

新系统采用单一CacheMode枚举：-CacheMode.ENABLED ：正常缓存（读/写） -CacheMode.DISABLED ：根本没有缓存 -CacheMode.READ_ONLY ：仅从缓存中读取 -CacheMode.WRITE_ONLY ：仅写入缓存 -CacheMode.BYPASS ：此操作跳过缓存

¥The new system uses a single CacheMode enum: - CacheMode.ENABLED: Normal caching (read/write) - CacheMode.DISABLED: No caching at all - CacheMode.READ_ONLY: Only read from cache - CacheMode.WRITE_ONLY: Only write to cache - CacheMode.BYPASS: Skip cache for this operation

迁移示例

¥Migration Example

旧代码（已弃用）

¥Old Code (Deprecated)

import asyncio
from crawl4ai import AsyncWebCrawler

async def use_proxy():
    async with AsyncWebCrawler(verbose=True) as crawler:
        result = await crawler.arun(
            url="https://www.nbcnews.com/business",
            bypass_cache=True  # Old way
        )
        print(len(result.markdown))

async def main():
    await use_proxy()

if __name__ == "__main__":
    asyncio.run(main())

新代码（推荐）

¥New Code (Recommended)

import asyncio
from crawl4ai import AsyncWebCrawler, CacheMode
from crawl4ai.async_configs import CrawlerRunConfig

async def use_proxy():
    # Use CacheMode in CrawlerRunConfig
    config = CrawlerRunConfig(cache_mode=CacheMode.BYPASS)  
    async with AsyncWebCrawler(verbose=True) as crawler:
        result = await crawler.arun(
            url="https://www.nbcnews.com/business",
            config=config  # Pass the configuration object
        )
        print(len(result.markdown))

async def main():
    await use_proxy()

if __name__ == "__main__":
    asyncio.run(main())

常见的迁移模式

¥Common Migration Patterns

¥Old Flag

¥New Mode

旧旗帜	新模式
`bypass_cache=True`	`cache_mode=CacheMode.BYPASS`
`disable_cache=True`	`cache_mode=CacheMode.DISABLED`
`no_cache_read=True`	`cache_mode=CacheMode.WRITE_ONLY`
`no_cache_write=True`	`cache_mode=CacheMode.READ_ONLY`