Chonkie

Chonkie 是一个简洁、超轻量且速度极快的分块库,专为 RAG(检索增强生成)应用程序设计。

Chonkie 通过 QdrantHandshake 类与 Qdrant 无缝集成,让您无需离开 Chonkie SDK 即可对文本数据进行分块、嵌入和存储。

设置

安装支持 Qdrant 的 Chonkie

pip install "chonkie[qdrant]"

基本用法

QdrantHandshake 提供了一个简单的接口来存储和搜索分块

from chonkie import QdrantHandshake, SemanticChunker

# Initialize handshake with custom embedding model
handshake = QdrantHandshake(
    url="https://:6333",
    collection_name="my_documents",
    embedding_model="sentence-transformers/all-MiniLM-L6-v2"
)

# Create and write chunks
chunker = SemanticChunker()
chunks = chunker.chunk("Your text content here...")
handshake.write(chunks)

# Search using natural language
results = handshake.search(query="your search query", limit=5)
for result in results:
    print(f"{result['score']}: {result['text']}")

Qdrant 云

handshake = QdrantHandshake(
    url="https://your-cluster.qdrant.io",
    api_key="your-api-key",
    collection_name="my_collection",
    embedding_model="BAAI/bge-small-en-v1.5"  # Change to your preferred model
)

完整的 RAG 管道

使用 Chonkie 流畅的 Pipeline API 构建端到端 RAG 管道

from chonkie import Pipeline

# Process documents and store in Qdrant with custom embedding model
docs = (Pipeline()
    .fetch_from("file", dir="./knowledge_base", ext=[".txt", ".md"])
    .process_with("text")
    .chunk_with("semantic", chunk_size=512)
    .store_in("qdrant",
              collection_name="knowledge",
              url="https://:6333",
              embedding_model="sentence-transformers/all-MiniLM-L6-v2")
    .run())

print(f"Ingested {len(docs)} documents into Qdrant")

带改进的管道

from chonkie import Pipeline

# Advanced pipeline with overlapping context and custom embeddings
docs = (Pipeline()
    .fetch_from("file", dir="./docs")
    .process_with("text")
    .chunk_with("semantic", threshold=0.8)
    .refine_with("overlap", context_size=100)
    .store_in("qdrant",
              url="https://your-cluster.qdrant.io",
              api_key="your-api-key",
              collection_name="knowledge_base",
              embedding_model="BAAI/bge-small-en-v1.5")
    .run())

下一步

此页面有用吗?

感谢您的反馈!🙏

很抱歉听到您有疑问。😔 您可以在 GitHub 上编辑此页面,或者创建一个 GitHub issue。