Qdrant 混合搜索与重排序

预计耗时:40 分钟难度:中等

重排序(Reranking)是一种提高搜索精度的强大技术:与其对整个语料库运行昂贵的模型,不如将其应用于已由更快速方法检索到的一小部分候选集上。这样既能保持低延迟,又能呈现出最相关的结果。

重排序与混合搜索配合效果极佳,后者通过广撒网式的检索,最大限度地提升了多个检索路径的召回率。重排序可以使用更深度的相关性信号对混合搜索结果进行排序。例如,后期交互模型(late interaction model)将查询和文档都表示为多个向量,从而实现比单一嵌入(embedding)更细致的词级比较。

在本教程中,你将学习如何构建一个混合搜索引擎,它结合使用稠密嵌入(dense embeddings)进行语义搜索、稀疏嵌入(sparse embeddings)进行关键词搜索,以及后期交互嵌入进行重排序。最终打造出一个强大的搜索引擎,通过结合不同嵌入类型的优势,提供高度相关的搜索结果。

你将使用 Qdrant 云推理(Qdrant Cloud Inference)来生成向量嵌入。本教程中使用的三种嵌入模型(稠密、稀疏和后期交互)均可在 Qdrant 云中免费使用。如果你更喜欢管理自己的嵌入基础设施,也可以应用相同的原则,但需要调整代码示例以使用你自己的嵌入服务。

概述

让我们先从剖析架构开始

摄入阶段

Processing dense, sparse, and late interaction embeddings in Qdrant

首先,你将摄入一个包含科幻小说信息的 CSV 文件。每一行都是一个文档(对应一本书),包含标题、作者和描述字段。每本书的描述都将被处理以生成三种类型的嵌入:

  • 稠密嵌入:捕捉文本背后更深层的语义含义。
  • 稀疏嵌入:支持更传统的关键词检索方法。具体来说,你将使用 BM25,这是一种概率检索模型。BM25 根据词项与给定查询的相关程度(考虑词频、文档长度以及该词在所有文档中的通用程度)对文档进行排序。它非常适合关键词权重较高的搜索场景。
  • 后期交互嵌入:捕捉查询词与文档词之间的细微交互。你将使用 ColBERT 模型,它采用了两阶段方法。首先,它使用 BERT 为查询和文档生成上下文嵌入,然后执行后期交互,高效地匹配这些嵌入以微调相关性。在Qdrant 中的重排序多向量表示教程和多向量搜索课程中了解更多关于后期交互模型的信息。

数据(包括所有嵌入)存储在 Qdrant 这一向量搜索引擎中。这使你能够基于多层相关性高效地搜索、检索和重排序文档。

检索阶段

Query retrieval and reranking process in Qdrant

当用户提交查询时,它会像文档一样被转换为上述三种类型的嵌入:用于语义搜索的稠密向量、用于关键词搜索的稀疏向量,以及用于精确重排序的后期交互向量。

接下来,混合搜索利用稠密和稀疏嵌入找到最相关的文档。稠密嵌入用于语义搜索,稀疏嵌入用于关键词搜索。随后,所得的文档集合将使用后期交互嵌入进行重排序,从而得出不仅相关,而且通过优先考虑真正符合用户意图的文档来针对查询进行优化的结果。

实施

安装并初始化 Qdrant 客户端

首先,安装 Qdrant 客户端

qdrant-client
qdrant/js-client-rest
qdrant-client
io.qdrant:client
Qdrant.Client
github.com/qdrant/go-client

接下来,初始化客户端

from qdrant_client import QdrantClient

client = QdrantClient(
    url="https://xyz-example.eu-central.aws.cloud.qdrant.io:6333",
    api_key="<your-api-key>",
    cloud_inference=True,
)
const client = new QdrantClient({
    url: QDRANT_URL,
    apiKey: QDRANT_API_KEY,
});
let client = Qdrant::from_url(qdrant_url)
    .api_key(qdrant_api_key)
    .build()?;
QdrantClient client =
    new QdrantClient(
        QdrantGrpcClient.newBuilder(QDRANT_URL, 6334, true)
            .withApiKey(QDRANT_API_KEY)
            .build());
var client = new QdrantClient(
	host: QDRANT_URL,
	https: true,
	apiKey: QDRANT_API_KEY
);
client, err := qdrant.NewClient(&qdrant.Config{
	Host:   QDRANT_URL,
	APIKey: QDRANT_API_KEY,
	UseTLS: true,
})

模型

然后,定义这三种嵌入模型。你将使用 384 维的 sentence-transformers/all-MiniLM-L6-v2 模型进行稠密嵌入,使用 qdrant/bm25 模型进行稀疏嵌入,并使用 96 维的 answerdotai/answerai-colbert-small-v1 多向量模型进行后期交互嵌入。

dense_embedding_model = "sentence-transformers/all-MiniLM-L6-v2"
sparse_embedding_model = "qdrant/bm25"
late_interaction_embedding_model = "answerdotai/answerai-colbert-small-v1"
const denseEmbeddingModel = "sentence-transformers/all-MiniLM-L6-v2";
const sparseEmbeddingModel = "qdrant/bm25";
const lateInteractionEmbeddingModel = "answerdotai/answerai-colbert-small-v1";
let dense_embedding_model = "sentence-transformers/all-MiniLM-L6-v2";
let sparse_embedding_model = "qdrant/bm25";
let late_interaction_embedding_model = "answerdotai/answerai-colbert-small-v1";
String denseEmbeddingModel = "sentence-transformers/all-MiniLM-L6-v2";
String sparseEmbeddingModel = "qdrant/bm25";
String lateInteractionEmbeddingModel = "answerdotai/answerai-colbert-small-v1";
string denseEmbeddingModel = "sentence-transformers/all-MiniLM-L6-v2";
string sparseEmbeddingModel = "qdrant/bm25";
string lateInteractionEmbeddingModel = "answerdotai/answerai-colbert-small-v1";
denseEmbeddingModel := "sentence-transformers/all-MiniLM-L6-v2"
sparseEmbeddingModel := "qdrant/bm25"
lateInteractionEmbeddingModel := "answerdotai/answerai-colbert-small-v1"

创建集合

创建一个名为 hybrid-search 的新集合,并配置为处理这三种向量类型:

  • 稠密嵌入 (dense):使用余弦距离进行语义比较。
  • 后期交互嵌入 (multi):使用余弦距离,并采用最大相似度比较器的多向量配置。请注意 m=0 的配置,用于禁用 HNSW 索引。这些嵌入用于重排序而非 ANN 检索,因此不需要 HNSW 索引。
  • 稀疏嵌入 (sparse):使用 IDF 修饰符进行基于关键词的搜索。
from qdrant_client.models import Distance, VectorParams, models

collection_name = "hybrid-search"

if client.collection_exists(collection_name=collection_name):
    client.delete_collection(collection_name=collection_name)

client.create_collection(
    collection_name,
    vectors_config={
        "dense": models.VectorParams(
            size=384,
            distance=models.Distance.COSINE,
        ),
        "multi": models.VectorParams(
            size=96,
            distance=models.Distance.COSINE,
            multivector_config=models.MultiVectorConfig(
                comparator=models.MultiVectorComparator.MAX_SIM,
            ),
            hnsw_config=models.HnswConfigDiff(m=0)  #  Disable HNSW for reranking
        ),
    },
    sparse_vectors_config={
        "sparse": models.SparseVectorParams(modifier=models.Modifier.IDF)
    }
)
const collectionName = "hybrid-search";

if (await client.collectionExists(collectionName)) {
    await client.deleteCollection(collectionName);
}

await client.createCollection(collectionName, {
    vectors: {
        dense: {
            size: 384,
            distance: "Cosine",
        },
        multi: {
            size: 96,
            distance: "Cosine",
            multivector_config: { comparator: "max_sim" },
            hnsw_config: { m: 0 }, // Disable HNSW for reranking
        },
    },
    sparse_vectors: {
        sparse: { modifier: "idf" },
    },
});
let collection_name = "hybrid-search";

if client.collection_exists(collection_name).await? {
    client.delete_collection(collection_name).await?;
}

let mut vectors = VectorsConfigBuilder::default();
vectors.add_named_vector_params(
    "dense",
    VectorParamsBuilder::new(384, Distance::Cosine),
);
vectors.add_named_vector_params(
    "multi",
    VectorParamsBuilder::new(96, Distance::Cosine)
        .multivector_config(MultiVectorConfigBuilder::new(MultiVectorComparator::MaxSim))
        .hnsw_config(HnswConfigDiffBuilder::default().m(0)), // Disable HNSW for reranking
);

let mut sparse = SparseVectorsConfigBuilder::default();
sparse.add_named_vector_params(
    "sparse",
    SparseVectorParamsBuilder::default().modifier(Modifier::Idf),
);

client
    .create_collection(
        CreateCollectionBuilder::new(collection_name)
            .vectors_config(vectors)
            .sparse_vectors_config(sparse),
    )
    .await?;
String collectionName = "hybrid-search";

if (client.collectionExistsAsync(collectionName).get()) {
    client.deleteCollectionAsync(collectionName).get();
}

client.createCollectionAsync(
    CreateCollection.newBuilder()
        .setCollectionName(collectionName)
        .setVectorsConfig(
            VectorsConfig.newBuilder()
                .setParamsMap(
                    VectorParamsMap.newBuilder()
                        .putMap(
                            "dense",
                            VectorParams.newBuilder()
                                .setSize(384)
                                .setDistance(Distance.Cosine)
                                .build())
                        .putMap(
                            "multi",
                            VectorParams.newBuilder()
                                .setSize(96)
                                .setDistance(Distance.Cosine)
                                .setMultivectorConfig(
                                    MultiVectorConfig.newBuilder()
                                        .setComparator(MultiVectorComparator.MaxSim)
                                        .build())
                                .setHnswConfig(
                                    HnswConfigDiff.newBuilder()
                                        .setM(0) // Disable HNSW for reranking
                                        .build())
                                .build())
                        .build()))
        .setSparseVectorsConfig(
            SparseVectorConfig.newBuilder()
                .putMap(
                    "sparse",
                    SparseVectorParams.newBuilder()
                        .setModifier(Modifier.Idf)
                        .build())
                .build())
        .build()
).get();
string collectionName = "hybrid-search";

if (await client.CollectionExistsAsync(collectionName))
	await client.DeleteCollectionAsync(collectionName);

await client.CreateCollectionAsync(
	collectionName: collectionName,
	vectorsConfig: new VectorParamsMap
	{
		Map =
		{
			["dense"] = new VectorParams
			{
				Size = 384,
				Distance = Distance.Cosine,
			},
			["multi"] = new VectorParams
			{
				Size = 96,
				Distance = Distance.Cosine,
				MultivectorConfig = new() { Comparator = MultiVectorComparator.MaxSim },
				HnswConfig = new HnswConfigDiff { M = 0 }, // Disable HNSW for reranking
			},
		}
	},
	sparseVectorsConfig: new SparseVectorConfig
	{
		Map =
		{
			["sparse"] = new SparseVectorParams { Modifier = Modifier.Idf }
		}
	}
);
collectionName := "hybrid-search"

exists, err := client.CollectionExists(context.Background(), collectionName)
if exists {
	client.DeleteCollection(context.Background(), collectionName)
}

client.CreateCollection(context.Background(), &qdrant.CreateCollection{
	CollectionName: collectionName,
	VectorsConfig: qdrant.NewVectorsConfigMap(
		map[string]*qdrant.VectorParams{
			"dense": {
				Size:     384,
				Distance: qdrant.Distance_Cosine,
			},
			"multi": {
				Size:     96,
				Distance: qdrant.Distance_Cosine,
				MultivectorConfig: &qdrant.MultiVectorConfig{
					Comparator: qdrant.MultiVectorComparator_MaxSim,
				},
				HnswConfig: &qdrant.HnswConfigDiff{M: qdrant.PtrOf(uint64(0))}, // Disable HNSW for reranking
			},
		},
	),
	SparseVectorsConfig: qdrant.NewSparseVectorsConfig(
		map[string]*qdrant.SparseVectorParams{
			"sparse": {Modifier: qdrant.Modifier_Idf.Enum()},
		},
	),
})

数据摄入

现在,你可以从 CSV 加载科幻小说描述并将其插入到 hybrid-search 集合中。利用云推理,通过将文本包装在 Document 对象中,嵌入计算将在服务端完成。

from qdrant_client.models import Document, PointStruct

csv_url = 'https://raw.githubusercontent.com/qdrant/examples/refs/heads/master/sci-fi-books/top_100_scifi_books_full.csv'

points = (
    PointStruct(
        id=idx,
        vector={
            "dense": Document(text=row['Description'], model=dense_embedding_model),
            "sparse": Document(text=row['Description'], model=sparse_embedding_model),
            "multi": Document(text=row['Description'], model=late_interaction_embedding_model),
        },
        payload={"title": row['Title'], "author": row['Author'], "description": row['Description']}
    )
    for idx, row in enumerate(parse_csv(csv_url))
)
client.upload_points(
    collection_name=collection_name,
    points=points,
    batch_size=25
)
const csvUrl = "https://raw.githubusercontent.com/qdrant/examples/refs/heads/master/sci-fi-books/top_100_scifi_books_full.csv";

const batchSize = 25;
let idx = 0;
let buffer: Schemas["PointStruct"][] = [];

for await (const { title, author, description } of parseCSV(csvUrl)) {
    buffer.push({
        id: idx++,
        vector: {
            dense: { text: description, model: denseEmbeddingModel },
            sparse: { text: description, model: sparseEmbeddingModel },
            multi: { text: description, model: lateInteractionEmbeddingModel },
        },
        payload: { title, author, description },
    });

    if (buffer.length >= batchSize) {
        await client.upsert(collectionName, { points: buffer });
        buffer = [];
    }
}

if (buffer.length > 0) {
    await client.upsert(collectionName, { points: buffer });
}
let csv_url = "https://raw.githubusercontent.com/qdrant/examples/refs/heads/master/sci-fi-books/top_100_scifi_books_full.csv";

let batch_size = 25;
let mut idx: u64 = 0;
let mut buffer: Vec<PointStruct> = Vec::new();

for row in parse_csv(csv_url)? {
    let row = row?;
    let title = row.title;
    let author = row.author;
    let description = row.description;

    let vectors = NamedVectors::default()
        .add_vector("dense", Document::new(&description, dense_embedding_model))
        .add_vector("sparse", Document::new(&description, sparse_embedding_model))
        .add_vector("multi", Document::new(&description, late_interaction_embedding_model));

    buffer.push(PointStruct::new(
        idx,
        vectors,
        [
            ("title", title.into()),
            ("author", author.into()),
            ("description", description.into()),
        ],
    ));
    idx += 1;

    if buffer.len() >= batch_size {
        client
            .upsert_points(UpsertPointsBuilder::new(
                collection_name,
                std::mem::take(&mut buffer),
            ))
            .await?;
    }
}

if !buffer.is_empty() {
    client
        .upsert_points(UpsertPointsBuilder::new(collection_name, buffer))
        .await?;
}
String csvUrl = "https://raw.githubusercontent.com/qdrant/examples/refs/heads/master/sci-fi-books/top_100_scifi_books_full.csv";

int batchSize = 25;
long idx = 0;
List<PointStruct> buffer = new ArrayList<>();

try (var stream = parseCSV(csvUrl)) {
    for (var row : (Iterable<CsvRow>) stream::iterator) {
        String title = row.title;
        String author = row.author;
        String description = row.description;

        buffer.add(
            PointStruct.newBuilder()
                .setId(io.qdrant.client.PointIdFactory.id(idx++))
                .setVectors(
                    namedVectors(
                        Map.of(
                            "dense",
                            vector(
                                Document.newBuilder()
                                    .setText(description)
                                    .setModel(denseEmbeddingModel)
                                    .build()),
                            "sparse",
                            vector(
                                Document.newBuilder()
                                    .setText(description)
                                    .setModel(sparseEmbeddingModel)
                                    .build()),
                            "multi",
                            vector(
                                Document.newBuilder()
                                    .setText(description)
                                    .setModel(lateInteractionEmbeddingModel)
                                    .build()))))
                .putAllPayload(
                    Map.of(
                        "title", value(title),
                        "author", value(author),
                        "description", value(description)))
                .build());

        if (buffer.size() >= batchSize) {
            client.upsertAsync(collectionName, buffer).get();
            buffer.clear();
        }
    }
}

if (!buffer.isEmpty()) {
    client.upsertAsync(collectionName, buffer).get();
}
string csvUrl = "https://raw.githubusercontent.com/qdrant/examples/refs/heads/master/sci-fi-books/top_100_scifi_books_full.csv";

int batchSize = 25;
ulong idx = 0;
var buffer = new List<PointStruct>();

await foreach (var (title, author, description) in ParseCsv(csvUrl))
{
	buffer.Add(new PointStruct
	{
		Id = idx++,
		Vectors = new Dictionary<string, Vector>
		{
			["dense"] = new Document { Text = description, Model = denseEmbeddingModel },
			["sparse"] = new Document { Text = description, Model = sparseEmbeddingModel },
			["multi"] = new Document { Text = description, Model = lateInteractionEmbeddingModel },
		},
		Payload = { ["title"] = title, ["author"] = author, ["description"] = description }
	});

	if (buffer.Count >= batchSize)
	{
		await client.UpsertAsync(collectionName: collectionName, points: buffer);
		buffer.Clear();
	}
}

if (buffer.Count > 0)
	await client.UpsertAsync(collectionName: collectionName, points: buffer);
csvUrl := "https://raw.githubusercontent.com/qdrant/examples/refs/heads/master/sci-fi-books/top_100_scifi_books_full.csv"

batchSize := 25
var idx uint64
var buffer []*qdrant.PointStruct

err = parseCSV(csvUrl, func(row CSVRow) {
	title := row.Title
	author := row.Author
	description := row.Description

	buffer = append(buffer, &qdrant.PointStruct{
		Id: qdrant.NewIDNum(idx),
		Vectors: qdrant.NewVectorsMap(map[string]*qdrant.Vector{
			"dense":  qdrant.NewVectorDocument(&qdrant.Document{Text: description, Model: denseEmbeddingModel}),
			"sparse": qdrant.NewVectorDocument(&qdrant.Document{Text: description, Model: sparseEmbeddingModel}),
			"multi":  qdrant.NewVectorDocument(&qdrant.Document{Text: description, Model: lateInteractionEmbeddingModel}),
		}),
		Payload: qdrant.NewValueMap(map[string]any{
			"title":       title,
			"author":      author,
			"description": description,
		}),
	})
	idx++

	if len(buffer) >= batchSize {
		client.Upsert(context.Background(), &qdrant.UpsertPoints{
			CollectionName: collectionName,
			Points:         buffer,
		})
		buffer = nil
	}
})

if len(buffer) > 0 {
	client.Upsert(context.Background(), &qdrant.UpsertPoints{
		CollectionName: collectionName,
		Points:         buffer,
	})
}

这段代码为每本书创建一个点(point),包含三种向量类型以及一个载荷(payload),其中包括标题、作者和描述。文档以 25 个为一批上传到 Qdrant,云推理会自动生成所有三种嵌入。在生产环境中,最佳批次大小取决于你的数据和集群,因此你可能需要通过尝试不同的大小来获得最佳性能。

这段代码使用了一个辅助函数来流式传输和解析 CSV 文件

详细信息
import csv
import urllib.request

def parse_csv(url):
    with urllib.request.urlopen(url) as response:
        reader = csv.DictReader(line.decode('utf-8') for line in response)
        yield from reader
function parseCsvLine(line: string): string[] {
    const fields: string[] = [];
    let i = 0;
    while (i < line.length) {
        if (line[i] === '"') {
            i++;
            let field = "";
            while (i < line.length) {
                if (line[i] === '"' && line[i + 1] === '"') { field += '"'; i += 2; }
                else if (line[i] === '"') { i++; break; }
                else { field += line[i++]; }
            }
            fields.push(field);
            if (line[i] === ",") i++;
        } else {
            const start = i;
            while (i < line.length && line[i] !== ",") i++;
            fields.push(line.slice(start, i));
            if (i < line.length) i++;
        }
    }
    return fields;
}

async function* parseCSV(url: string): AsyncGenerator<{ text: string; datetime: string }> {
    const response = await fetch(url);
    const reader = response.body!.getReader();
    const decoder = new TextDecoder();
    let remainder = "";
    let headers: string[] | null = null;
    let textIdx = -1;
    let datetimeIdx = -1;

    while (true) {
        const { done, value } = await reader.read();
        const chunk = done ? "" : decoder.decode(value, { stream: true });
        const lines = (remainder + chunk).split("\n");
        remainder = done ? "" : lines.pop()!;

        for (const line of lines) {
            if (!line.trim()) continue;
            if (headers === null) {
                headers = line.split(",");
                textIdx = headers.indexOf("text");
                datetimeIdx = headers.indexOf("datetime");
                continue;
            }
            const fields = parseCsvLine(line);
            yield { text: fields[textIdx], datetime: fields[datetimeIdx] };
        }

        if (done) break;
    }
}
struct CsvRow {
    text: String,
    datetime: String,
}

fn parse_csv(url: &str) -> anyhow::Result<impl Iterator<Item = anyhow::Result<CsvRow>>> {
    let reader = ureq::get(url).call()?.into_body().into_reader();
    let mut rdr = csv::Reader::from_reader(reader);
    let headers = rdr.headers()?.clone();
    let text_idx = headers.iter().position(|h| h == "text").unwrap();
    let datetime_idx = headers.iter().position(|h| h == "datetime").unwrap();
    let iter = rdr.into_records().map(move |result| {
        let record = result?;
        Ok(CsvRow {
            text: record[text_idx].to_string(),
            datetime: record[datetime_idx].to_string(),
        })
    });
    Ok(iter)
}
static class CsvRow {
    final String text;
    final String datetime;
    CsvRow(String text, String datetime) { this.text = text; this.datetime = datetime; }
}

static Stream<CsvRow> parseCSV(String url) throws Exception {
    Function<String, List<String>> parseCsvLine = line -> {
        List<String> fields = new ArrayList<>();
        boolean inQuotes = false;
        var sb = new StringBuilder();
        for (char c : line.toCharArray()) {
            if (c == '"') {
                inQuotes = !inQuotes;
            } else if (c == ',' && !inQuotes) {
                fields.add(sb.toString());
                sb.setLength(0);
            } else {
                sb.append(c);
            }
        }
        fields.add(sb.toString());
        return fields;
    };

    var reader = new BufferedReader(new InputStreamReader(new URL(url).openStream()));
    String headerLine = reader.readLine();
    List<String> headers = List.of(headerLine.split(","));
    int textIdx = headers.indexOf("text");
    int datetimeIdx = headers.indexOf("datetime");

    return reader.lines()
        .map(line -> {
            List<String> fields = parseCsvLine.apply(line);
            return new CsvRow(fields.get(textIdx), fields.get(datetimeIdx));
        })
        .onClose(() -> { try { reader.close(); } catch (Exception ignored) {} });
}
async IAsyncEnumerable<(string text, string datetime)> ParseCsv(string url)
{
	using var httpClient = new HttpClient();
	using var stream = await httpClient.GetStreamAsync(url);
	using var parser = new TextFieldParser(new StreamReader(stream));
	parser.TextFieldType = Microsoft.VisualBasic.FileIO.FieldType.Delimited;
	parser.SetDelimiters(",");
	string[]? headers = parser.ReadFields();
	int textIdx = Array.IndexOf(headers!, "text");
	int datetimeIdx = Array.IndexOf(headers!, "datetime");
	while (!parser.EndOfData)
	{
		var fields = parser.ReadFields()!;
		yield return (fields[textIdx], fields[datetimeIdx]);
	}
}
type CSVRow struct {
	Text     string
	Datetime string
}

func parseCSV(url string, fn func(CSVRow)) error {
	resp, err := http.Get(url)
	if err != nil {
		return err
	}
	defer resp.Body.Close()

	csvReader := csv.NewReader(resp.Body)
	headers, err := csvReader.Read()
	if err != nil {
		return err
	}

	textIdx, datetimeIdx := -1, -1
	for i, h := range headers {
		switch h {
		case "text":
			textIdx = i
		case "datetime":
			datetimeIdx = i
		}
	}

	for {
		row, err := csvReader.Read()
		if err == io.EOF {
			break
		}
		if err != nil {
			return err
		}
		fn(CSVRow{Text: row[textIdx], Datetime: row[datetimeIdx]})
	}
	return nil
}

检索

在合并结果之前,让我们先看看稠密检索和稀疏检索各自的表现。

在检索时,将查询包装在一个 Document 对象中,以便云推理在服务端计算相应的嵌入。

稠密检索捕捉语义含义

import pprint

query = "time travel"

results = client.query_points(
    collection_name,
    query=models.Document(text=query, model=dense_embedding_model),
    using="dense",
    limit=10,
)

pprint.pp(results.points)
const query = "time travel";

const denseResults = await client.query(collectionName, {
    query: { text: query, model: denseEmbeddingModel },
    using: "dense",
    limit: 10,
});

console.log(denseResults.points);
let query = "time travel";

let results = client
    .query(
        QueryPointsBuilder::new(collection_name)
            .query(Query::new_nearest(Document::new(query, dense_embedding_model)))
            .using("dense")
            .limit(10),
    )
    .await?;

for result in results.result {
    println!("{:?}", result);
}
String query = "time travel";

var results = client.queryAsync(
    QueryPoints.newBuilder()
        .setCollectionName(collectionName)
        .setQuery(
            nearest(
                Document.newBuilder()
                    .setText(query)
                    .setModel(denseEmbeddingModel)
                    .build()))
        .setUsing("dense")
        .setLimit(10)
        .build()
).get();

for (var result : results) {
    System.out.println(result);
}
string query = "time travel";

var results = await client.QueryAsync(
	collectionName: collectionName,
	query: new Document { Text = query, Model = denseEmbeddingModel },
	usingVector: "dense",
	limit: 10
);

foreach (var result in results)
	Console.WriteLine(result);
query := "time travel"

results, err := client.Query(context.Background(), &qdrant.QueryPoints{
	CollectionName: collectionName,
	Query: qdrant.NewQueryDocument(&qdrant.Document{
		Text:  query,
		Model: denseEmbeddingModel,
	}),
	Using: qdrant.PtrOf("dense"),
	Limit: qdrant.PtrOf(uint64(10)),
})

for _, result := range results {
	fmt.Println(result)
}

让我们看看前 5 个结果

位置标题描述
1《时间机器》(The Time Machine)一位维多利亚时代的科学家远赴未来,见证文明的命运。
2《第五号屠宰场》(Slaughterhouse-Five)一部关于战争与命运、非线性、穿梭时间的沉思录。
3《边缘世界》(The Peripheral)两条时间线通过远程临场技术交织在一起。
4《世界之间的空间》(The Space Between Worlds)一名多元宇宙旅行者揭开了平行地球上危险的秘密。
5《永恒战争》(The Forever War)一名士兵在进行星际战争时经历了极端的时间膨胀。

这些书中的每一本都与“时间旅行”这一概念有着强烈的语义联系,即使描述中并未出现完全相同的短语。

稀疏检索专注于关键词匹配

results = client.query_points(
    collection_name,
    query=models.Document(text=query, model=sparse_embedding_model),
    using="sparse",
    limit=10,
)

pprint.pp(results.points)
const sparseResults = await client.query(collectionName, {
    query: { text: query, model: sparseEmbeddingModel },
    using: "sparse",
    limit: 10,
});

console.log(sparseResults.points);
let results = client
    .query(
        QueryPointsBuilder::new(collection_name)
            .query(Query::new_nearest(Document::new(query, sparse_embedding_model)))
            .using("sparse")
            .limit(10),
    )
    .await?;

for result in results.result {
    println!("{:?}", result);
}
results = client.queryAsync(
    QueryPoints.newBuilder()
        .setCollectionName(collectionName)
        .setQuery(
            nearest(
                Document.newBuilder()
                    .setText(query)
                    .setModel(sparseEmbeddingModel)
                    .build()))
        .setUsing("sparse")
        .setLimit(10)
        .build()
).get();

for (var result : results) {
    System.out.println(result);
}
results = await client.QueryAsync(
	collectionName: collectionName,
	query: new Document { Text = query, Model = sparseEmbeddingModel },
	usingVector: "sparse",
	limit: 10
);

foreach (var result in results)
	Console.WriteLine(result);
results, err = client.Query(context.Background(), &qdrant.QueryPoints{
	CollectionName: collectionName,
	Query: qdrant.NewQueryDocument(&qdrant.Document{
		Text:  query,
		Model: sparseEmbeddingModel,
	}),
	Using: qdrant.PtrOf("sparse"),
	Limit: qdrant.PtrOf(uint64(10)),
})

for _, result := range results {
	fmt.Println(result)
}

前 5 个结果是

位置标题描述
1《第十一站》(Station Eleven)一个巡回交响乐团在疫情后的北美游荡。
2《海伯利安》(Hyperion)旅行者们在前往对抗神秘伯劳的朝圣之旅中分享着难以忘怀的故事。
3《世界之间的空间》(The Space Between Worlds)一名多元宇宙旅行者揭开了平行地球上危险的秘密。
4《时间机器》(The Time Machine)一位维多利亚时代的科学家远赴未来,见证文明的命运。
5《第五号屠宰场》(Slaughterhouse-Five)一部关于战争与命运、非线性、穿梭时间的沉思录。

稀疏 BM25 模型执行带有词干提取的关键词匹配。因此,它返回的是描述中包含“time”(时间)和“travel”(旅行)变体的书籍。例如,《第十一站》和《海伯利安》提到了“traveling”和“travelers”,但它们主要并不是关于时间旅行的。

混合搜索可用于预取稠密和稀疏结果,随后使用倒数排名融合 (RRF) 进行合并。

prefetch = [
    models.Prefetch(
        query=models.Document(text=query, model=dense_embedding_model),
        using="dense",
        limit=20,
    ),
    models.Prefetch(
        query=models.Document(text=query, model=sparse_embedding_model),
        using="sparse",
        limit=20,
    ),
]

results = client.query_points(
    collection_name,
    prefetch=prefetch,
    query=models.FusionQuery(fusion=models.Fusion.RRF),
    with_payload=True,
    limit=10,
)

pprint.pp(results.points)
const hybridResults = await client.query(collectionName, {
    prefetch: [
        {
            query: { text: query, model: denseEmbeddingModel },
            using: "dense",
            limit: 20,
        },
        {
            query: { text: query, model: sparseEmbeddingModel },
            using: "sparse",
            limit: 20,
        },
    ],
    query: { fusion: "rrf" },
    with_payload: true,
    limit: 10,
});

console.log(hybridResults.points);
let results = client
    .query(
        QueryPointsBuilder::new(collection_name)
            .add_prefetch(
                PrefetchQueryBuilder::default()
                    .query(Query::new_nearest(Document::new(query, dense_embedding_model)))
                    .using("dense")
                    .limit(20u64),
            )
            .add_prefetch(
                PrefetchQueryBuilder::default()
                    .query(Query::new_nearest(Document::new(query, sparse_embedding_model)))
                    .using("sparse")
                    .limit(20u64),
            )
            .query(Query::new_fusion(Fusion::Rrf))
            .with_payload(true)
            .limit(10),
    )
    .await?;

for result in results.result {
    println!("{:?}", result);
}
results = client.queryAsync(
    QueryPoints.newBuilder()
        .setCollectionName(collectionName)
        .addPrefetch(
            PrefetchQuery.newBuilder()
                .setQuery(
                    nearest(
                        Document.newBuilder()
                            .setText(query)
                            .setModel(denseEmbeddingModel)
                            .build()))
                .setUsing("dense")
                .setLimit(20)
                .build())
        .addPrefetch(
            PrefetchQuery.newBuilder()
                .setQuery(
                    nearest(
                        Document.newBuilder()
                            .setText(query)
                            .setModel(sparseEmbeddingModel)
                            .build()))
                .setUsing("sparse")
                .setLimit(20)
                .build())
        .setQuery(Query.newBuilder().setFusion(Fusion.RRF).build())
        .setWithPayload(enable(true))
        .setLimit(10)
        .build()
).get();

for (var result : results) {
    System.out.println(result);
}
results = await client.QueryAsync(
	collectionName: collectionName,
	prefetch: new List<PrefetchQuery>
	{
		new()
		{
			Query = new Document { Text = query, Model = denseEmbeddingModel },
			Using = "dense",
			Limit = 20,
		},
		new()
		{
			Query = new Document { Text = query, Model = sparseEmbeddingModel },
			Using = "sparse",
			Limit = 20,
		},
	},
	query: Fusion.Rrf,
	payloadSelector: true,
	limit: 10
);

foreach (var result in results)
	Console.WriteLine(result);
results, err = client.Query(context.Background(), &qdrant.QueryPoints{
	CollectionName: collectionName,
	Prefetch: []*qdrant.PrefetchQuery{
		{
			Query: qdrant.NewQueryDocument(&qdrant.Document{
				Text:  query,
				Model: denseEmbeddingModel,
			}),
			Using: qdrant.PtrOf("dense"),
			Limit: qdrant.PtrOf(uint64(20)),
		},
		{
			Query: qdrant.NewQueryDocument(&qdrant.Document{
				Text:  query,
				Model: sparseEmbeddingModel,
			}),
			Using: qdrant.PtrOf("sparse"),
			Limit: qdrant.PtrOf(uint64(20)),
		},
	},
	Query:       qdrant.NewQueryFusion(qdrant.Fusion_RRF),
	WithPayload: qdrant.NewWithPayload(true),
	Limit:       qdrant.PtrOf(uint64(10)),
})

for _, result := range results {
	fmt.Println(result)
}

这会并行运行两个子查询:一个使用稠密嵌入寻找语义含义,另一个使用稀疏 BM25 嵌入进行关键词匹配。预取步骤会从每个子查询(稠密和稀疏)中获取前 20 个候选结果,并使用 RRF 将排序后的列表融合为单个结果集。

结果是语义上与时间旅行相关以及包含关键词的书籍的混合,为你提供了更广泛的相关文档集。然而,排序可能不是最优的,因为默认情况下,RRF 对两种信号一视同仁,无法捕捉查询词和文档词之间的细微交互。例如,《第十一站》排名靠前是因为它有更强的关键词匹配,尽管它与时间旅行无关。

位置标题描述
1《时间机器》(The Time Machine)一位维多利亚时代的科学家远赴未来,见证文明的命运。
2《第十一站》(Station Eleven)一个巡回交响乐团在疫情后的北美游荡。
3《第五号屠宰场》(Slaughterhouse-Five)一部关于战争与命运、非线性、穿梭时间的沉思录。
4《世界之间的空间》(The Space Between Worlds)一名多元宇宙旅行者揭开了平行地球上危险的秘密。
5《海伯利安》(Hyperion)旅行者们在前往对抗神秘伯劳的朝圣之旅中分享着难以忘怀的故事。

重排序(Rerank)

混合搜索结果可以使用后期交互嵌入进行重排序,以实现最高精度。与其使用 RRF 融合,不如将 ColBERT 多向量作为最终的排名信号。

prefetch = [
    models.Prefetch(
        query=models.Document(text=query, model=dense_embedding_model),
        using="dense",
        limit=20,
    ),
    models.Prefetch(
        query=models.Document(text=query, model=sparse_embedding_model),
        using="sparse",
        limit=20,
    ),
]

results = client.query_points(
    collection_name,
    prefetch=prefetch,
    query=models.Document(text=query, model=late_interaction_embedding_model),
    using="multi",
    with_payload=True,
    limit=10,
)

pprint.pp(results.points)
const rerankedResults = await client.query(collectionName, {
    prefetch: [
        {
            query: { text: query, model: denseEmbeddingModel },
            using: "dense",
            limit: 20,
        },
        {
            query: { text: query, model: sparseEmbeddingModel },
            using: "sparse",
            limit: 20,
        },
    ],
    query: { text: query, model: lateInteractionEmbeddingModel },
    using: "multi",
    with_payload: true,
    limit: 10,
});

console.log(rerankedResults.points);
let results = client
    .query(
        QueryPointsBuilder::new(collection_name)
            .add_prefetch(
                PrefetchQueryBuilder::default()
                    .query(Query::new_nearest(Document::new(query, dense_embedding_model)))
                    .using("dense")
                    .limit(20u64),
            )
            .add_prefetch(
                PrefetchQueryBuilder::default()
                    .query(Query::new_nearest(Document::new(query, sparse_embedding_model)))
                    .using("sparse")
                    .limit(20u64),
            )
            .query(Query::new_nearest(Document::new(query, late_interaction_embedding_model)))
            .using("multi")
            .with_payload(true)
            .limit(10),
    )
    .await?;

for result in results.result {
    println!("{:?}", result);
}
results = client.queryAsync(
    QueryPoints.newBuilder()
        .setCollectionName(collectionName)
        .addPrefetch(
            PrefetchQuery.newBuilder()
                .setQuery(
                    nearest(
                        Document.newBuilder()
                            .setText(query)
                            .setModel(denseEmbeddingModel)
                            .build()))
                .setUsing("dense")
                .setLimit(20)
                .build())
        .addPrefetch(
            PrefetchQuery.newBuilder()
                .setQuery(
                    nearest(
                        Document.newBuilder()
                            .setText(query)
                            .setModel(sparseEmbeddingModel)
                            .build()))
                .setUsing("sparse")
                .setLimit(20)
                .build())
        .setQuery(
            nearest(
                Document.newBuilder()
                    .setText(query)
                    .setModel(lateInteractionEmbeddingModel)
                    .build()))
        .setUsing("multi")
        .setWithPayload(enable(true))
        .setLimit(10)
        .build()
).get();

for (var result : results) {
    System.out.println(result);
}
results = await client.QueryAsync(
	collectionName: collectionName,
	prefetch: new List<PrefetchQuery>
	{
		new()
		{
			Query = new Document { Text = query, Model = denseEmbeddingModel },
			Using = "dense",
			Limit = 20,
		},
		new()
		{
			Query = new Document { Text = query, Model = sparseEmbeddingModel },
			Using = "sparse",
			Limit = 20,
		},
	},
	query: new Document { Text = query, Model = lateInteractionEmbeddingModel },
	usingVector: "multi",
	payloadSelector: true,
	limit: 10
);

foreach (var result in results)
	Console.WriteLine(result);
results, err = client.Query(context.Background(), &qdrant.QueryPoints{
	CollectionName: collectionName,
	Prefetch: []*qdrant.PrefetchQuery{
		{
			Query: qdrant.NewQueryDocument(&qdrant.Document{
				Text:  query,
				Model: denseEmbeddingModel,
			}),
			Using: qdrant.PtrOf("dense"),
			Limit: qdrant.PtrOf(uint64(20)),
		},
		{
			Query: qdrant.NewQueryDocument(&qdrant.Document{
				Text:  query,
				Model: sparseEmbeddingModel,
			}),
			Using: qdrant.PtrOf("sparse"),
			Limit: qdrant.PtrOf(uint64(20)),
		},
	},
	Query: qdrant.NewQueryDocument(&qdrant.Document{
		Text:  query,
		Model: lateInteractionEmbeddingModel,
	}),
	Using:       qdrant.PtrOf("multi"),
	WithPayload: qdrant.NewWithPayload(true),
	Limit:       qdrant.PtrOf(uint64(10)),
})

for _, result := range results {
	fmt.Println(result)
}

预取步骤从每个子查询中获取前 20 个候选结果,ColBERT 后期交互模型对组合后的候选集进行重排序,以呈现出最相关的结果。

比较结果

让我们比较一下有无重排序的混合搜索前 10 个结果。请注意,根据后期交互嵌入的相关性,某些文档的排名是如何变化的。

标题描述重排序后RRF 排名排名变化
《第五号屠宰场》(Slaughterhouse-Five)一部关于战争与命运、非线性、穿梭时间的沉思录。13上升
《永恒战争》(The Forever War)一名士兵在进行星际战争时经历了极端的时间膨胀。28上升
《Kindred》一位现代黑人女性被拉回到内战前的南方。37上升
《Spin》地球被未知的力量包裹在时间扭曲的屏障中。46上升
《光之旅》(The Light Brigade)士兵们被转化为光,跨越时空进行一场战争。510上升

重排序的最佳实践

使用后期交互模型进行重排序可以显著提高搜索结果的相关性,特别是在与混合搜索结合使用时。以下是一些需要牢记的最佳实践:

  • 持续测试和监控:定期评估你的混合搜索管道,以避免过拟合,并及时进行调整以维持性能。
  • 平衡相关性与成本:重排序可能会产生较高的计算成本,且后期交互嵌入需要大量的存储空间。目标是在相关性和成本之间取得平衡。对于许多用例,简单的融合方法(如 RRF)可能已经足够有效,而后期交互模型可以留给对精度要求极高的查询使用。

结论

使用后期交互模型进行重排序是一种强大的工具,尤其在结合混合搜索方法时,能大幅提升搜索结果的相关性。虽然由于其复杂性可能会增加一些延迟,但将其应用于预过滤后的小规模结果集,可以兼顾速度与相关性。

此页面有用吗?

感谢您的反馈!🙏

很遗憾听到这个消息。😔 你可以在 GitHub 上编辑此页面,或创建一个 GitHub Issue。