第 0 天

实现基本向量搜索

跟随我们一起构建您的第一个集合，插入向量，并运行相似性搜索。本教程将引导您完成每个步骤。

步骤 1：安装 Qdrant 客户端

要与 Qdrant 交互，我们需要 Python 客户端。这使我们能够与 Qdrant 服务通信、管理集合和执行向量搜索。

!pip install qdrant-client

步骤 2：导入所需的库

从 qdrant-client 包中导入必要的模块。QdrantClient 类建立与 Qdrant 的连接，而 models 模块提供 Distance、VectorParams 和 PointStruct 的配置。

from qdrant_client import QdrantClient, models

步骤 3：连接到 Qdrant 云

要连接到 Qdrant 云，您需要从 Qdrant 云仪表板获取集群 URL 和 API 密钥。请替换为您的实际凭据

import os

client = QdrantClient(url=os.getenv("QDRANT_URL"), api_key=os.getenv("QDRANT_API_KEY"))

# For Colab:
# from google.colab import userdata
# client = QdrantClient(url=userdata.get("QDRANT_URL"), api_key=userdata.get("QDRANT_API_KEY"))

注意：您也可以使用内存模式进行测试：client = QdrantClient(":memory:")，但数据在重启后不会持久化。

步骤 4：创建一个集合

Qdrant 中的集合类似于关系数据库中的表——一个用于存储向量及其元数据的容器。创建集合时，请指定

名称：集合的唯一标识符
向量配置:
- 大小：向量的维度
- 距离度量：衡量向量之间相似度的方法

# Define the collection name
collection_name = "my_first_collection"

# Create the collection with specified vector parameters
client.create_collection(
    collection_name=collection_name,
    vectors_config=models.VectorParams(
        size=4,  # Dimensionality of the vectors
        distance=models.Distance.COSINE  # Distance metric for similarity search
    )
)

预期输出：True（表示创建成功）

距离度量解释（了解更多）

欧几里得：测量空间中点之间的直线距离
余弦：测量向量之间的角度，侧重于方向而不是大小
点积：测量向量的点积，同时捕捉大小和方向

步骤 5：验证集合创建

通过检索现有集合列表来确认您的集合已成功创建

# Retrieve and display the list of collections
collections = client.get_collections()
print("Existing collections:", collections)

get_collections() 方法返回您的 Qdrant 实例中的所有集合，对于动态管理多个集合非常有用。

步骤 6：将点插入集合

点是 Qdrant 中的核心数据实体。每个点包含

ID：唯一标识符
向量数据：一个表示向量空间中数据点的数值数组
Payload (可选)：附加元数据

# Define the vectors to be inserted
points = [
    models.PointStruct(
        id=1,
        vector=[0.1, 0.2, 0.3, 0.4],  # 4D vector
        payload={"category": "example"}  # Metadata (optional)
    ),
    models.PointStruct(
        id=2,
        vector=[0.2, 0.3, 0.4, 0.5],
        payload={"category": "demo"}
    )
]

# Insert vectors into the collection
client.upsert(
    collection_name=collection_name,
    points=points
)

预期输出：UpdateResult(operation_id=2, status=<UpdateStatus.COMPLETED: 'completed'>)

步骤 7：检索集合详细信息

现在我们已经插入了向量，让我们通过获取集合信息来确认它们已正确存储

collection_info = client.get_collection(collection_name)
print("Collection info:", collection_info)

预期输出：详细的集合信息，显示 points_count=2、向量配置和 HNSW 设置。

步骤 8：运行您的第一个相似性搜索

使用 Qdrant 的搜索功能查找与给定查询最相似的向量

相似性搜索的工作原理

Qdrant 搜索集合以查找与您的查询向量最接近的向量。
结果按相似性得分进行排序，最佳匹配首先出现。

query_vector = [0.08, 0.14, 0.33, 0.28]

search_results = client.query_points(
    collection_name=collection_name,
    query=query_vector,
    limit=1  # Return the top 1 most similar vector
)

print("Search results:", search_results)

预期输出：points=[ScoredPoint(id=1, score=0.97642946, payload={'category': 'example'})]

继续下一步