DLT (数据加载工具)

DLT 是一个开源库，您可以将其添加到 Python 脚本中，用于将数据从各种（通常杂乱的）数据源加载到结构良好、实时的数据集中。

通过 DLT-Qdrant 集成，您现在可以选择 Qdrant 作为 DLT 目标来加载数据。

DLT 功能

自动化维护 - 借助模式推断、警报和简洁的声明式代码，维护变得简单。
可在运行 Python 的任何地方运行 - 例如在 Airflow、无服务器函数、notebooks 中。无论微型还是大型基础设施，都能轻松扩展。
用户友好、声明式的接口，消除了初学者的知识障碍，同时增强了资深专业人士的能力。

用法

要开始使用，请安装带有 qdrant extra 的 dlt。

pip install "dlt[qdrant]"

在 DLT secrets 文件中配置目标。默认情况下，该文件位于 ~/.dlt/secrets.toml。将以下部分添加到 secrets 文件中。

[destination.qdrant.credentials]
location = "https://your-qdrant-url"
api_key = "your-qdrant-api-key"

位置默认为 https://:6333，且 api_key 未定义 - 这些是本地 Qdrant 实例的默认设置。在此处找到有关 DLT 配置的更多信息此处。

定义数据源。

import dlt
from dlt.destinations.qdrant import qdrant_adapter

movies = [
    {
        "title": "Blade Runner",
        "year": 1982,
        "description": "The film is about a dystopian vision of the future that combines noir elements with sci-fi imagery."
    },
    {
        "title": "Ghost in the Shell",
        "year": 1995,
        "description": "The film is about a cyborg policewoman and her partner who set out to find the main culprit behind brain hacking, the Puppet Master."
    },
    {
        "title": "The Matrix",
        "year": 1999,
        "description": "The movie is set in the 22nd century and tells the story of a computer hacker who joins an underground group fighting the powerful computers that rule the earth."
    }
]

定义管道。

pipeline = dlt.pipeline(
    pipeline_name="movies",
    destination="qdrant",
    dataset_name="movies_dataset",
)

运行管道。

info = pipeline.run(
    qdrant_adapter(
        movies,
        embed=["title", "description"]
    )
)

数据现已加载到 Qdrant 中。

要在数据加载后使用向量搜索，您必须指定 Qdrant 需要为其生成嵌入的字段。您可以通过使用 qdrant_adapter 函数包装数据（或 DLT 资源）来实现。

写入策略

DLT 写入策略定义了数据应如何写入目标。所有写入策略都受 Qdrant 目标支持。

DLT 同步

Qdrant 目标支持同步 DLT 状态。

下一步

完整的 Qdrant DLT 目标文档可以在此处找到。
源代码

此页面有用吗？

在此页面上