5 分钟构建一个语义搜索引擎

耗时:5 - 15 分钟级别:新手Open In Colab

本教程有两个版本

  • 此页面上的版本使用 Qdrant Cloud。你将使用 Qdrant Cloud 的永久免费层级(无需信用卡)在云端部署集群并生成向量嵌入。
  • 或者,你也可以在自己的机器上运行 Qdrant。这需要你自己管理集群和向量嵌入基础设施。如果你更倾向于此选项,请查看本教程的本地部署版本

概述

如果你是向量搜索引擎的新手,本教程非常适合你。只需 5 分钟,你就能构建一个科幻小说语义搜索引擎。设置完成后,你可以向引擎询问有关即将来临的外星威胁的问题。你创建的系统将推荐书籍,助你为潜在的太空攻击做好准备。

如果你使用 Python,可以使用此 Google Colab 笔记本

1. 创建 Qdrant 集群

如果你还没有 Qdrant 集群,请按照以下步骤创建一个

  1. 使用你的电子邮件、Google 或 Github 凭据注册一个 Qdrant Cloud 账户
  2. 创建免费集群 (Create a Free Cluster) 下,输入集群名称并选择你首选的云提供商和区域。
  3. 点击创建免费集群 (Create Free Cluster)
  4. 出现提示时,请复制 API 密钥并将其保存在安全的地方,因为它不会再次显示。
  5. 复制 集群端点 (Cluster Endpoint)。它看起来应该类似于 https://xxx.cloud.qdrant.io

2. 设置客户端连接

首先,为你所选的编程语言安装 Qdrant 客户端

qdrant-client
qdrant/js-client-rest
qdrant-client
io.qdrant:client
Qdrant.Client
github.com/qdrant/go-client

该库允许你通过代码与 Qdrant 进行交互。

接下来,使用端点和 API 密钥创建到 Qdrant 集群的客户端连接。

from qdrant_client import QdrantClient, models

client = QdrantClient(
    url=QDRANT_URL,
    api_key=QDRANT_API_KEY,
    cloud_inference=True
)
const client = new QdrantClient({
    url: QDRANT_URL,
    apiKey: QDRANT_API_KEY,
});
let client = Qdrant::from_url(QDRANT_URL)
    .api_key(QDRANT_API_KEY)
    .build()?;
QdrantClient client =
    new QdrantClient(
        QdrantGrpcClient.newBuilder(QDRANT_URL, 6334, true)
            .withApiKey(QDRANT_API_KEY)
            .build());
var client = new QdrantClient(
	host: QDRANT_URL,
	port: 6334,
	https: true,
	apiKey: QDRANT_API_KEY
);
client, err := qdrant.NewClient(&qdrant.Config{
	Host:   QDRANT_URL,
	APIKey: QDRANT_API_KEY,
	UseTLS: true,
})

QDRANT_URLQDRANT_API_KEY 替换为在上一步中获得的集群端点和 API 密钥。cloud_inference=True 参数启用了 Qdrant Cloud 的推理功能,允许集群在无需管理你自己的嵌入基础设施的情况下生成向量嵌入。

3. 创建集合

Qdrant 中的所有数据都组织在集合中。由于你要存储书籍,我们创建一个名为 my_books 的集合。

COLLECTION_NAME="my_books"

client.create_collection(
    collection_name=COLLECTION_NAME,
    vectors_config=models.VectorParams(
        size=384,  # Vector size is defined by used model
        distance=models.Distance.COSINE,
    ),
)
const collectionName = "my_books";

await client.createCollection(collectionName, {
    vectors: {
        size: 384, // Vector size is defined by used model
        distance: "Cosine",
    },
});
let collection_name = "my_books";

client
    .create_collection(
        CreateCollectionBuilder::new(collection_name)
            .vectors_config(VectorParamsBuilder::new(384, Distance::Cosine)), // Vector size is defined by used model
    )
    .await?;
String COLLECTION_NAME = "my_books";

client.createCollectionAsync(COLLECTION_NAME,
        VectorParams.newBuilder().setDistance(Distance.Cosine).setSize(384).build()).get();
string COLLECTION_NAME = "my_books";

await client.CreateCollectionAsync(
	collectionName: COLLECTION_NAME,
	vectorsConfig: new VectorParams { Size = 384, Distance = Distance.Cosine }
);
collectionName := "my_books"

client.CreateCollection(context.Background(), &qdrant.CreateCollection{
	CollectionName: collectionName,
	VectorsConfig: qdrant.NewVectorsConfig(&qdrant.VectorParams{
		Size:     384, // Vector size is defined by used model
		Distance: qdrant.Distance_Cosine,
	}),
})
  • size 参数定义了集合向量的维度。384 对应于本教程中使用的嵌入模型的输出维度。
  • distance 参数指定了用于衡量两点之间距离的函数。

4. 将数据上传到集群

数据集包含一系列科幻小说。每个条目都有名称、作者、出版年份和简短描述。

documents = [
    {
        "name": "The Time Machine",
        "description": "A man travels through time and witnesses the evolution of humanity.",
        "author": "H.G. Wells",
        "year": 1895,
    },
    {
        "name": "Ender's Game",
        "description": "A young boy is trained to become a military leader in a war against an alien race.",
        "author": "Orson Scott Card",
        "year": 1985,
    },
    {
        "name": "Brave New World",
        "description": "A dystopian society where people are genetically engineered and conditioned to conform to a strict social hierarchy.",
        "author": "Aldous Huxley",
        "year": 1932,
    },
    {
        "name": "The Hitchhiker's Guide to the Galaxy",
        "description": "A comedic science fiction series following the misadventures of an unwitting human and his alien friend.",
        "author": "Douglas Adams",
        "year": 1979,
    },
    {
        "name": "Dune",
        "description": "A desert planet is the site of political intrigue and power struggles.",
        "author": "Frank Herbert",
        "year": 1965,
    },
    {
        "name": "Foundation",
        "description": "A mathematician develops a science to predict the future of humanity and works to save civilization from collapse.",
        "author": "Isaac Asimov",
        "year": 1951,
    },
    {
        "name": "Snow Crash",
        "description": "A futuristic world where the internet has evolved into a virtual reality metaverse.",
        "author": "Neal Stephenson",
        "year": 1992,
    },
    {
        "name": "Neuromancer",
        "description": "A hacker is hired to pull off a near-impossible hack and gets pulled into a web of intrigue.",
        "author": "William Gibson",
        "year": 1984,
    },
    {
        "name": "The War of the Worlds",
        "description": "A Martian invasion of Earth throws humanity into chaos.",
        "author": "H.G. Wells",
        "year": 1898,
    },
    {
        "name": "The Hunger Games",
        "description": "A dystopian society where teenagers are forced to fight to the death in a televised spectacle.",
        "author": "Suzanne Collins",
        "year": 2008,
    },
    {
        "name": "The Andromeda Strain",
        "description": "A deadly virus from outer space threatens to wipe out humanity.",
        "author": "Michael Crichton",
        "year": 1969,
    },
    {
        "name": "The Left Hand of Darkness",
        "description": "A human ambassador is sent to a planet where the inhabitants are genderless and can change gender at will.",
        "author": "Ursula K. Le Guin",
        "year": 1969,
    },
    {
        "name": "The Three-Body Problem",
        "description": "Humans encounter an alien civilization that lives in a dying system.",
        "author": "Liu Cixin",
        "year": 2008,
    },
]
const documents = [
    { name: "The Time Machine", description: "A man travels through time and witnesses the evolution of humanity.", author: "H.G. Wells", year: 1895 },
    { name: "Ender's Game", description: "A young boy is trained to become a military leader in a war against an alien race.", author: "Orson Scott Card", year: 1985 },
    { name: "Brave New World", description: "A dystopian society where people are genetically engineered and conditioned to conform to a strict social hierarchy.", author: "Aldous Huxley", year: 1932 },
    { name: "The Hitchhiker's Guide to the Galaxy", description: "A comedic science fiction series following the misadventures of an unwitting human and his alien friend.", author: "Douglas Adams", year: 1979 },
    { name: "Dune", description: "A desert planet is the site of political intrigue and power struggles.", author: "Frank Herbert", year: 1965 },
    { name: "Foundation", description: "A mathematician develops a science to predict the future of humanity and works to save civilization from collapse.", author: "Isaac Asimov", year: 1951 },
    { name: "Snow Crash", description: "A futuristic world where the internet has evolved into a virtual reality metaverse.", author: "Neal Stephenson", year: 1992 },
    { name: "Neuromancer", description: "A hacker is hired to pull off a near-impossible hack and gets pulled into a web of intrigue.", author: "William Gibson", year: 1984 },
    { name: "The War of the Worlds", description: "A Martian invasion of Earth throws humanity into chaos.", author: "H.G. Wells", year: 1898 },
    { name: "The Hunger Games", description: "A dystopian society where teenagers are forced to fight to the death in a televised spectacle.", author: "Suzanne Collins", year: 2008 },
    { name: "The Andromeda Strain", description: "A deadly virus from outer space threatens to wipe out humanity.", author: "Michael Crichton", year: 1969 },
    { name: "The Left Hand of Darkness", description: "A human ambassador is sent to a planet where the inhabitants are genderless and can change gender at will.", author: "Ursula K. Le Guin", year: 1969 },
    { name: "The Three-Body Problem", description: "Humans encounter an alien civilization that lives in a dying system.", author: "Liu Cixin", year: 2008 },
];
let documents = [
    ("The Time Machine", "A man travels through time and witnesses the evolution of humanity.", "H.G. Wells", 1895),
    ("Ender's Game", "A young boy is trained to become a military leader in a war against an alien race.", "Orson Scott Card", 1985),
    ("Brave New World", "A dystopian society where people are genetically engineered and conditioned to conform to a strict social hierarchy.", "Aldous Huxley", 1932),
    ("The Hitchhiker's Guide to the Galaxy", "A comedic science fiction series following the misadventures of an unwitting human and his alien friend.", "Douglas Adams", 1979),
    ("Dune", "A desert planet is the site of political intrigue and power struggles.", "Frank Herbert", 1965),
    ("Foundation", "A mathematician develops a science to predict the future of humanity and works to save civilization from collapse.", "Isaac Asimov", 1951),
    ("Snow Crash", "A futuristic world where the internet has evolved into a virtual reality metaverse.", "Neal Stephenson", 1992),
    ("Neuromancer", "A hacker is hired to pull off a near-impossible hack and gets pulled into a web of intrigue.", "William Gibson", 1984),
    ("The War of the Worlds", "A Martian invasion of Earth throws humanity into chaos.", "H.G. Wells", 1898),
    ("The Hunger Games", "A dystopian society where teenagers are forced to fight to the death in a televised spectacle.", "Suzanne Collins", 2008),
    ("The Andromeda Strain", "A deadly virus from outer space threatens to wipe out humanity.", "Michael Crichton", 1969),
    ("The Left Hand of Darkness", "A human ambassador is sent to a planet where the inhabitants are genderless and can change gender at will.", "Ursula K. Le Guin", 1969),
    ("The Three-Body Problem", "Humans encounter an alien civilization that lives in a dying system.", "Liu Cixin", 2008),
];
List<Map<String, Value>> payloads = List.of(
    Map.of(
        "name", value("The Time Machine"),
        "description", value("A man travels through time and witnesses the evolution of humanity."),
        "author", value("H.G. Wells"),
        "year", value(1895)),
    Map.of(
        "name", value("Ender's Game"),
        "description",
            value("A young boy is trained to become a military leader in a war against an alien race."),
        "author", value("Orson Scott Card"),
        "year", value(1985)),
    Map.of(
        "name", value("Brave New World"),
        "description",
            value(
                "A dystopian society where people are genetically engineered and conditioned to conform to a strict social hierarchy."),
        "author", value("Aldous Huxley"),
        "year", value(1932)),
    Map.of(
        "name", value("The Hitchhiker's Guide to the Galaxy"),
        "description",
            value(
                "A comedic science fiction series following the misadventures of an unwitting human and his alien friend."),
        "author", value("Douglas Adams"),
        "year", value(1979)),
    Map.of(
        "name", value("Dune"),
        "description", value("A desert planet is the site of political intrigue and power struggles."),
        "author", value("Frank Herbert"),
        "year", value(1965)),
    Map.of(
        "name", value("Foundation"),
        "description",
            value(
                "A mathematician develops a science to predict the future of humanity and works to save civilization from collapse."),
        "author", value("Isaac Asimov"),
        "year", value(1951)),
    Map.of(
        "name", value("Snow Crash"),
        "description",
            value("A futuristic world where the internet has evolved into a virtual reality metaverse."),
        "author", value("Neal Stephenson"),
        "year", value(1992)),
    Map.of(
        "name", value("Neuromancer"),
        "description",
            value(
                "A hacker is hired to pull off a near-impossible hack and gets pulled into a web of intrigue."),
        "author", value("William Gibson"),
        "year", value(1984)),
    Map.of(
        "name", value("The War of the Worlds"),
        "description", value("A Martian invasion of Earth throws humanity into chaos."),
        "author", value("H.G. Wells"),
        "year", value(1898)),
    Map.of(
        "name", value("The Hunger Games"),
        "description",
            value("A dystopian society where teenagers are forced to fight to the death in a televised spectacle."),
        "author", value("Suzanne Collins"),
        "year", value(2008)),
    Map.of(
        "name", value("The Andromeda Strain"),
        "description", value("A deadly virus from outer space threatens to wipe out humanity."),
        "author", value("Michael Crichton"),
        "year", value(1969)),
    Map.of(
        "name", value("The Left Hand of Darkness"),
        "description",
            value(
                "A human ambassador is sent to a planet where the inhabitants are genderless and can change gender at will."),
        "author", value("Ursula K. Le Guin"),
        "year", value(1969)),
    Map.of(
        "name", value("The Three-Body Problem"),
        "description", value("Humans encounter an alien civilization that lives in a dying system."),
        "author", value("Liu Cixin"),
        "year", value(2008)));
var payloads = new List<Dictionary<string, Value>>
{
	new() { ["name"] = "The Time Machine", ["description"] = "A man travels through time and witnesses the evolution of humanity.", ["author"] = "H.G. Wells", ["year"] = 1895 },
	new() { ["name"] = "Ender's Game", ["description"] = "A young boy is trained to become a military leader in a war against an alien race.", ["author"] = "Orson Scott Card", ["year"] = 1985 },
	new() { ["name"] = "Brave New World", ["description"] = "A dystopian society where people are genetically engineered and conditioned to conform to a strict social hierarchy.", ["author"] = "Aldous Huxley", ["year"] = 1932 },
	new() { ["name"] = "The Hitchhiker's Guide to the Galaxy", ["description"] = "A comedic science fiction series following the misadventures of an unwitting human and his alien friend.", ["author"] = "Douglas Adams", ["year"] = 1979 },
	new() { ["name"] = "Dune", ["description"] = "A desert planet is the site of political intrigue and power struggles.", ["author"] = "Frank Herbert", ["year"] = 1965 },
	new() { ["name"] = "Foundation", ["description"] = "A mathematician develops a science to predict the future of humanity and works to save civilization from collapse.", ["author"] = "Isaac Asimov", ["year"] = 1951 },
	new() { ["name"] = "Snow Crash", ["description"] = "A futuristic world where the internet has evolved into a virtual reality metaverse.", ["author"] = "Neal Stephenson", ["year"] = 1992 },
	new() { ["name"] = "Neuromancer", ["description"] = "A hacker is hired to pull off a near-impossible hack and gets pulled into a web of intrigue.", ["author"] = "William Gibson", ["year"] = 1984 },
	new() { ["name"] = "The War of the Worlds", ["description"] = "A Martian invasion of Earth throws humanity into chaos.", ["author"] = "H.G. Wells", ["year"] = 1898 },
	new() { ["name"] = "The Hunger Games", ["description"] = "A dystopian society where teenagers are forced to fight to the death in a televised spectacle.", ["author"] = "Suzanne Collins", ["year"] = 2008 },
	new() { ["name"] = "The Andromeda Strain", ["description"] = "A deadly virus from outer space threatens to wipe out humanity.", ["author"] = "Michael Crichton", ["year"] = 1969 },
	new() { ["name"] = "The Left Hand of Darkness", ["description"] = "A human ambassador is sent to a planet where the inhabitants are genderless and can change gender at will.", ["author"] = "Ursula K. Le Guin", ["year"] = 1969 },
	new() { ["name"] = "The Three-Body Problem", ["description"] = "Humans encounter an alien civilization that lives in a dying system.", ["author"] = "Liu Cixin", ["year"] = 2008 }
};
documents := []map[string]any{
	{
		"name":        "The Time Machine",
		"description": "A man travels through time and witnesses the evolution of humanity.",
		"author":      "H.G. Wells",
		"year":        1895,
	},
	{
		"name":        "Ender's Game",
		"description": "A young boy is trained to become a military leader in a war against an alien race.",
		"author":      "Orson Scott Card",
		"year":        1985,
	},
	{
		"name":        "Brave New World",
		"description": "A dystopian society where people are genetically engineered and conditioned to conform to a strict social hierarchy.",
		"author":      "Aldous Huxley",
		"year":        1932,
	},
	{
		"name":        "The Hitchhiker's Guide to the Galaxy",
		"description": "A comedic science fiction series following the misadventures of an unwitting human and his alien friend.",
		"author":      "Douglas Adams",
		"year":        1979,
	},
	{
		"name":        "Dune",
		"description": "A desert planet is the site of political intrigue and power struggles.",
		"author":      "Frank Herbert",
		"year":        1965,
	},
	{
		"name":        "Foundation",
		"description": "A mathematician develops a science to predict the future of humanity and works to save civilization from collapse.",
		"author":      "Isaac Asimov",
		"year":        1951,
	},
	{
		"name":        "Snow Crash",
		"description": "A futuristic world where the internet has evolved into a virtual reality metaverse.",
		"author":      "Neal Stephenson",
		"year":        1992,
	},
	{
		"name":        "Neuromancer",
		"description": "A hacker is hired to pull off a near-impossible hack and gets pulled into a web of intrigue.",
		"author":      "William Gibson",
		"year":        1984,
	},
	{
		"name":        "The War of the Worlds",
		"description": "A Martian invasion of Earth throws humanity into chaos.",
		"author":      "H.G. Wells",
		"year":        1898,
	},
	{
		"name":        "The Hunger Games",
		"description": "A dystopian society where teenagers are forced to fight to the death in a televised spectacle.",
		"author":      "Suzanne Collins",
		"year":        2008,
	},
	{
		"name":        "The Andromeda Strain",
		"description": "A deadly virus from outer space threatens to wipe out humanity.",
		"author":      "Michael Crichton",
		"year":        1969,
	},
	{
		"name":        "The Left Hand of Darkness",
		"description": "A human ambassador is sent to a planet where the inhabitants are genderless and can change gender at will.",
		"author":      "Ursula K. Le Guin",
		"year":        1969,
	},
	{
		"name":        "The Three-Body Problem",
		"description": "Humans encounter an alien civilization that lives in a dying system.",
		"author":      "Liu Cixin",
		"year":        2008,
	},
}

将每本书作为点 (point) 存储在 my_books 集合中,每个点包含一个唯一 ID、一个由描述生成的向量,以及一个包含书籍元数据的有效载荷 (payload)

EMBEDDING_MODEL="sentence-transformers/all-minilm-l6-v2"

client.upload_points(
    collection_name=COLLECTION_NAME,
    points=[
        models.PointStruct(
            id=idx,
            vector=models.Document(
                text=doc["description"],
                model=EMBEDDING_MODEL
            ),
            payload=doc
        )
        for idx, doc in enumerate(documents)
    ],
)
const embeddingModel = "sentence-transformers/all-minilm-l6-v2";

const points = documents.map((doc, idx) => ({
    id: idx,
    vector: {
        text: doc.description,
        model: embeddingModel,
    },
    payload: doc,
}));

await client.upsert(collectionName, { points });
let embedding_model = "sentence-transformers/all-minilm-l6-v2";

let points: Vec<PointStruct> = documents
    .iter()
    .enumerate()
    .map(|(idx, (name, description, author, year))| {
        PointStruct::new(
            idx as u64,
            Document::new(*description, embedding_model),
            [
                ("name", (*name).into()),
                ("description", (*description).into()),
                ("author", (*author).into()),
                ("year", (*year).into()),
            ],
        )
    })
    .collect();

client
    .upsert_points(UpsertPointsBuilder::new(collection_name, points))
    .await?;
String EMBEDDING_MODEL = "sentence-transformers/all-minilm-l6-v2";

List<PointStruct> points = new ArrayList<>();

for (int idx = 0; idx < payloads.size(); idx++) {
    Map<String, Value> payload = payloads.get(idx);
    String description = payload.get("description").getStringValue();

    PointStruct point =
        PointStruct.newBuilder()
            .setId(id((long) idx))
            .setVectors(
                vectors(
                    vector(
                        Document.newBuilder()
                            .setText(description)
                            .setModel(EMBEDDING_MODEL)
                            .build())))
            .putAllPayload(payload)
            .build();

    points.add(point);
}

client.upsertAsync(COLLECTION_NAME, points).get();
string EMBEDDING_MODEL = "sentence-transformers/all-minilm-l6-v2";

var points = new List<PointStruct>();

for (ulong idx = 0; idx < (ulong)payloads.Count; idx++)
{
	var payload = payloads[(int)idx];
	string description = payload["description"].StringValue;

	var point = new PointStruct
	{
		Id = idx,
		Vectors = new Document
		{
			Text = description,
			Model = EMBEDDING_MODEL
		},
		Payload = { payload }
	};

	points.Add(point);
}

await client.UpsertAsync(
	collectionName: COLLECTION_NAME,
	points: points
);
embeddingModel := "sentence-transformers/all-minilm-l6-v2"

points := make([]*qdrant.PointStruct, len(documents))
for idx, doc := range documents {
	points[idx] = &qdrant.PointStruct{
		Id: qdrant.NewIDNum(uint64(idx)),
		Vectors: qdrant.NewVectorsDocument(&qdrant.Document{
			Text:  doc["description"].(string),
			Model: embeddingModel,
		}),
		Payload: qdrant.NewValueMap(doc),
	}
}

client.Upsert(context.Background(), &qdrant.UpsertPoints{
	CollectionName: collectionName,
	Points:         points,
})

这段代码告诉 Qdrant Cloud 使用 sentence-transformers/all-minilm-l6-v2 嵌入模型从书籍描述中生成向量嵌入。这是 Qdrant Cloud 上可用的免费模型之一。有关可用免费和付费模型的列表,请参阅 Qdrant Cloud 控制台中“集群详情”页面的“推理”选项卡。

5. 查询引擎

现在数据已存储在 Qdrant 中,你可以对其进行查询并获得语义相关的结果。

hits = client.query_points(
    collection_name=COLLECTION_NAME,
    query=models.Document(
        text="alien invasion",
        model=EMBEDDING_MODEL
    ),
    limit=3,
).points

for hit in hits:
    print(hit.payload, "score:", hit.score)
const queryResult = await client.query(collectionName, {
    query: {
        text: "alien invasion",
        model: embeddingModel,
    },
    limit: 3,
});

for (const hit of queryResult.points) {
    console.log(hit.payload, "score:", hit.score);
}
let query_result = client
    .query(
        QueryPointsBuilder::new(collection_name)
            .query(Query::new_nearest(Document::new(
                "alien invasion",
                embedding_model,
            )))
            .limit(3)
            .with_payload(true),
    )
    .await?;

for hit in query_result.result {
    println!("{:?} score: {}", hit.payload, hit.score);
}
QueryPoints request =
    QueryPoints.newBuilder()
        .setCollectionName(COLLECTION_NAME)
        .setQuery(
            nearest(
                Document.newBuilder()
                    .setText("alien invasion")
                    .setModel(EMBEDDING_MODEL)
                    .build()))
        .setLimit(3)
        .build();

var hits = client.queryAsync(request).get();

for (var hit : hits) {
    System.out.println(hit.getPayloadMap() + " score: " + hit.getScore());
}
var hits = await client.QueryAsync(
	collectionName: COLLECTION_NAME,
	query: new Document
	{
		Text = "alien invasion",
		Model = EMBEDDING_MODEL
	},
	limit: 3
);

foreach (var hit in hits)
{
	Console.WriteLine($"{hit.Payload} score: {hit.Score}");
}
queryResult, err := client.Query(context.Background(), &qdrant.QueryPoints{
	CollectionName: collectionName,
	Query: qdrant.NewQueryDocument(&qdrant.Document{
		Text:  "alien invasion",
		Model: embeddingModel,
	}),
	Limit: qdrant.PtrOf(uint64(3)),
})

for _, hit := range queryResult {
	fmt.Println(hit.Payload, "score:", hit.Score)
}

此查询使用相同的嵌入模型为“外星人入侵”查询生成向量。然后,搜索引擎在集合中查找最相似的三个向量,并返回它们的有效载荷和相似度分数。

响应

搜索引擎返回与外星人入侵相关的最相关的三本书。每本书都分配了一个分数,表示其与查询的相似度

{'name': 'The War of the Worlds', 'description': 'A Martian invasion of Earth throws humanity into chaos.', 'author': 'H.G. Wells', 'year': 1898} score: 0.570093257022374
{'name': "The Hitchhiker's Guide to the Galaxy", 'description': 'A comedic science fiction series following the misadventures of an unwitting human and his alien friend.', 'author': 'Douglas Adams', 'year': 1979} score: 0.5040468703143637
{'name': 'The Three-Body Problem', 'description': 'Humans encounter an alien civilization that lives in a dying system.', 'author': 'Liu Cixin', 'year': 2008} score: 0.45902943411768216

缩小查询范围

那么 21 世纪初的最新书籍怎么样?Qdrant 允许你通过应用过滤器来缩小查询结果。要过滤 2000 年后出版的书籍,你可以在有效载荷的 year 字段上进行过滤。

在对有效载荷字段进行过滤之前,请为该字段创建有效载荷索引

client.create_payload_index(
    collection_name=COLLECTION_NAME,
    field_name="year",
    field_schema=models.PayloadSchemaType.INTEGER,
)
await client.createPayloadIndex(collectionName, {
    field_name: "year",
    field_schema: "integer",
});
client
    .create_field_index(
        CreateFieldIndexCollectionBuilder::new(collection_name, "year", FieldType::Integer)
            .wait(true),
    )
    .await?;
client
    .createPayloadIndexAsync(
        COLLECTION_NAME,
        "year",
        PayloadSchemaType.Integer,
        null,
        true,
        null,
        null)
    .get();
await client.CreatePayloadIndexAsync(
	collectionName: COLLECTION_NAME,
	fieldName: "year",
	schemaType: PayloadSchemaType.Integer
);
client.CreateFieldIndex(context.Background(), &qdrant.CreateFieldIndexCollection{
	CollectionName: collectionName,
	FieldName:      "year",
	FieldType:      qdrant.FieldType_FieldTypeInteger.Enum(),
})

在生产环境中,请在上传数据之前创建有效载荷索引,以从索引中获得最大收益。

现在你可以将过滤器应用于查询

hits = client.query_points(
    collection_name=COLLECTION_NAME,
    query=models.Document(
        text="alien invasion",
        model=EMBEDDING_MODEL
    ),
    query_filter=models.Filter(
        must=[models.FieldCondition(key="year", range=models.Range(gte=2000))]
    ),
    limit=1,
).points

for hit in hits:
    print(hit.payload, "score:", hit.score)
const queryResultFiltered = await client.query(collectionName, {
    query: {
        text: "alien invasion",
        model: embeddingModel,
    },
    filter: {
        must: [
            {
                key: "year",
                range: {
                    gte: 2000,
                },
            },
        ],
    },
    limit: 1,
});

for (const hit of queryResultFiltered.points) {
    console.log(hit.payload, "score:", hit.score);
}
let query_result_filtered = client
    .query(
        QueryPointsBuilder::new(collection_name)
            .query(Query::new_nearest(Document::new(
                "alien invasion",
                embedding_model,
            )))
            .filter(Filter::must([Condition::range(
                "year",
                Range {
                    gte: Some(2000.0),
                    ..Default::default()
                },
            )]))
            .limit(1)
            .with_payload(true),
    )
    .await?;

for hit in query_result_filtered.result {
    println!("{:?} score: {}", hit.payload, hit.score);
}
QueryPoints filteredRequest =
    QueryPoints.newBuilder()
        .setCollectionName(COLLECTION_NAME)
        .setQuery(
            nearest(
                Document.newBuilder()
                    .setText("alien invasion")
                    .setModel(EMBEDDING_MODEL)
                    .build()))
        .setFilter(
            Filter.newBuilder()
                .addMust(range("year", Range.newBuilder().setGte(2000.0).build()))
                .build())
        .setLimit(1)
        .build();

var filteredHits = client.queryAsync(filteredRequest).get();

for (var hit : filteredHits) {
    System.out.println(hit.getPayloadMap() + " score: " + hit.getScore());
}
var filteredHits = await client.QueryAsync(
	collectionName: COLLECTION_NAME,
	query: new Document
	{
		Text = "alien invasion",
		Model = EMBEDDING_MODEL
	},
	filter: new Filter
	{
		Must = { Range("year", new Qdrant.Client.Grpc.Range { Gte = 2000.0 }) }
	},
	limit: 1
);

foreach (var hit in filteredHits)
{
	Console.WriteLine($"{hit.Payload} score: {hit.Score}");
}
queryResultFiltered, err := client.Query(context.Background(), &qdrant.QueryPoints{
	CollectionName: collectionName,
	Query: qdrant.NewQueryDocument(&qdrant.Document{
		Text:  "alien invasion",
		Model: embeddingModel,
	}),
	Filter: &qdrant.Filter{
		Must: []*qdrant.Condition{
			qdrant.NewRange("year", &qdrant.Range{
				Gte: qdrant.PtrOf(2000.0),
			}),
		},
	},
	Limit: qdrant.PtrOf(uint64(1)),
})

for _, hit := range queryResultFiltered {
	fmt.Println(hit.Payload, "score:", hit.Score)
}

响应

结果已缩小为 2008 年的一条记录

{'name': 'The Three-Body Problem', 'description': 'Humans encounter an alien civilization that lives in a dying system.', 'author': 'Liu Cixin', 'year': 2008} score: 0.45902943411768216

后续步骤

恭喜,你刚刚创建了你的第一个搜索引擎!相信我们,Qdrant 的其余部分也不复杂。对于下一个教程,尝试构建你自己的混合搜索服务,或参加免费的 Qdrant 基础课程

此页面有用吗?

感谢您的反馈!🙏

很遗憾听到这个消息。😔 你可以在 GitHub 上编辑此页面,或创建一个 GitHub 问题。