如何使用 FastEmbed 进行重排序

重排序模型 (Rerankers)

重排序模型是一种旨在优化搜索结果排序的模型。通常先使用快速、简单的检索方法(例如 BM25 或稠密向量)检索出一部分文档子集,然后由重排序模型——一种更强大、更精确但运行速度较慢且体积较大的模型——对该子集进行重新评估,以进一步提升文档与查询的相关性。

重排序模型深入分析查询与每个文档之间的标记级(token-level)交互,因此使用成本较高,但在确定相关性方面非常精确。由于它们以速度换取准确性,因此最适合应用于有限的候选集,而不是整个语料库。

本教程目标

使用 cross-encoder 模型作为重排序模型是很常见的做法。本教程使用 Jina Reranker v2 Base Multilingual(遵循 CC-BY-NC-4.0 许可),这是一种 FastEmbed 支持的交叉编码(cross-encoder)重排序模型。

我们使用 all-MiniLM-L6-v2 稠密嵌入模型(同样由 FastEmbed 支持)作为第一阶段检索器,然后使用 Jina Reranker v2 对结果进行精炼。

设置

安装 qdrant-clientfastembed

pip install "qdrant-client[fastembed]>=1.14.1"

导入交叉编码器和用于第一阶段检索的文本嵌入模型。

from fastembed import TextEmbedding
from fastembed.rerank.cross_encoder import TextCrossEncoder

您可以使用以下命令列出 FastEmbed 支持的交叉编码重排序模型。

TextCrossEncoder.list_supported_models()

该命令会显示可用的模型,包括输出向量维度、模型描述、模型大小、模型来源和模型文件等详细信息。

可用模型
[{'model': 'Xenova/ms-marco-MiniLM-L-6-v2',
  'size_in_GB': 0.08,
  'sources': {'hf': 'Xenova/ms-marco-MiniLM-L-6-v2'},
  'model_file': 'onnx/model.onnx',
  'description': 'MiniLM-L-6-v2 model optimized for re-ranking tasks.',
  'license': 'apache-2.0'},
 {'model': 'Xenova/ms-marco-MiniLM-L-12-v2',
  'size_in_GB': 0.12,
  'sources': {'hf': 'Xenova/ms-marco-MiniLM-L-12-v2'},
  'model_file': 'onnx/model.onnx',
  'description': 'MiniLM-L-12-v2 model optimized for re-ranking tasks.',
  'license': 'apache-2.0'},
 {'model': 'BAAI/bge-reranker-base',
  'size_in_GB': 1.04,
  'sources': {'hf': 'BAAI/bge-reranker-base'},
  'model_file': 'onnx/model.onnx',
  'description': 'BGE reranker base model for cross-encoder re-ranking.',
  'license': 'mit'},
 {'model': 'jinaai/jina-reranker-v1-tiny-en',
  'size_in_GB': 0.13,
  'sources': {'hf': 'jinaai/jina-reranker-v1-tiny-en'},
  'model_file': 'onnx/model.onnx',
  'description': 'Designed for blazing-fast re-ranking with 8K context length and fewer parameters than jina-reranker-v1-turbo-en.',
  'license': 'apache-2.0'},
 {'model': 'jinaai/jina-reranker-v1-turbo-en',
  'size_in_GB': 0.15,
  'sources': {'hf': 'jinaai/jina-reranker-v1-turbo-en'},
  'model_file': 'onnx/model.onnx',
  'description': 'Designed for blazing-fast re-ranking with 8K context length.',
  'license': 'apache-2.0'},
 {'model': 'jinaai/jina-reranker-v2-base-multilingual',
  'size_in_GB': 1.11,
  'sources': {'hf': 'jinaai/jina-reranker-v2-base-multilingual'},
  'model_file': 'onnx/model.onnx',
  'description': 'A multi-lingual reranker model for cross-encoder re-ranking with 1K context length and sliding window',
  'license': 'cc-by-nc-4.0'}]  # some of the fields are omitted for brevity

现在,加载第一阶段检索器和重排序模型。

encoder_name = "sentence-transformers/all-MiniLM-L6-v2"
dense_embedding_model = TextEmbedding(model_name=encoder_name)
reranker = TextCrossEncoder(model_name='jinaai/jina-reranker-v2-base-multilingual')

系统将获取并下载模型文件,并显示下载进度。

嵌入与索引数据以进行第一阶段检索

我们将使用 all-MiniLM-L6-v2 模型对一个简单的电影描述数据集进行向量化,并将嵌入结果保存到 Qdrant 中,以便进行第一阶段检索。

然后,我们将使用交叉编码重排序模型对第一阶段检索到的小部分数据子集进行重排序。

电影描述数据集
descriptions = ["In 1431, Jeanne d'Arc is placed on trial on charges of heresy. The ecclesiastical jurists attempt to force Jeanne to recant her claims of holy visions.",
 "A film projectionist longs to be a detective, and puts his meagre skills to work when he is framed by a rival for stealing his girlfriend's father's pocketwatch.",
 "A group of high-end professional thieves start to feel the heat from the LAPD when they unknowingly leave a clue at their latest heist.",
 "A petty thief with an utter resemblance to a samurai warlord is hired as the lord's double. When the warlord later dies the thief is forced to take up arms in his place.",
 "A young boy named Kubo must locate a magical suit of armour worn by his late father in order to defeat a vengeful spirit from the past.",
 "A biopic detailing the 2 decades that Punjabi Sikh revolutionary Udham Singh spent planning the assassination of the man responsible for the Jallianwala Bagh massacre.",
 "When a machine that allows therapists to enter their patients' dreams is stolen, all hell breaks loose. Only a young female therapist, Paprika, can stop it.",
 "An ordinary word processor has the worst night of his life after he agrees to visit a girl in Soho whom he met that evening at a coffee shop.",
 "A story that revolves around drug abuse in the affluent north Indian State of Punjab and how the youth there have succumbed to it en-masse resulting in a socio-economic decline.",
 "A world-weary political journalist picks up the story of a woman's search for her son, who was taken away from her decades ago after she became pregnant and was forced to live in a convent.",
 "Concurrent theatrical ending of the TV series Neon Genesis Evangelion (1995).",
 "During World War II, a rebellious U.S. Army Major is assigned a dozen convicted murderers to train and lead them into a mass assassination mission of German officers.",
 "The toys are mistakenly delivered to a day-care center instead of the attic right before Andy leaves for college, and it's up to Woody to convince the other toys that they weren't abandoned and to return home.",
 "A soldier fighting aliens gets to relive the same day over and over again, the day restarting every time he dies.",
 "After two male musicians witness a mob hit, they flee the state in an all-female band disguised as women, but further complications set in.",
 "Exiled into the dangerous forest by her wicked stepmother, a princess is rescued by seven dwarf miners who make her part of their household.",
 "A renegade reporter trailing a young runaway heiress for a big story joins her on a bus heading from Florida to New York, and they end up stuck with each other when the bus leaves them behind at one of the stops.",
 "Story of 40-man Turkish task force who must defend a relay station.",
 "Spinal Tap, one of England's loudest bands, is chronicled by film director Marty DiBergi on what proves to be a fateful tour.",
 "Oskar, an overlooked and bullied boy, finds love and revenge through Eli, a beautiful but peculiar girl."]
descriptions_embeddings = list(
    dense_embedding_model.embed(descriptions)
)

让我们将嵌入结果上传到 Qdrant。

Qdrant Client 提供了一种简单的内存模式,允许您在本地使用少量数据进行实验。
此外,您也可以在 Qdrant Cloud 中使用 免费集群 进行实验。

from qdrant_client import QdrantClient, models

client = QdrantClient(":memory:")  # Qdrant is running from RAM.

让我们使用电影数据创建一个 集合 (collection)

client.create_collection(
    collection_name="movies",
    vectors_config={
        "embedding": models.VectorParams(
            size=client.get_embedding_size("sentence-transformers/all-MiniLM-L6-v2"), 
            distance=models.Distance.COSINE
        )
    }
)

并将嵌入数据上传到其中。

client.upload_points(
    collection_name="movies",
    points=[
        models.PointStruct(
            id=idx, 
            payload={"description": description}, 
            vector={"embedding": vector}
        )
        for idx, (description, vector) in enumerate(
            zip(descriptions, descriptions_embeddings)
        )
    ],
)
使用隐式嵌入计算上传
client.upload_points(
    collection_name="movies",
    points=[
        models.PointStruct(
            id=idx,
            payload={"description": description},
            vector={"embedding": models.Document(text=description, model=encoder_name)},
        )
        for idx, description in enumerate(descriptions)
    ],
)

第一阶段检索

让我们看看仅使用基于 all-MiniLM-L6-v2 的稠密检索器时,结果的相关性如何。

query = "A story about a strong historically significant female figure."
query_embedded = list(dense_embedding_model.query_embed(query))[0]

initial_retrieval = client.query_points(
    collection_name="movies",
    using="embedding",
    query=query_embedded,
    with_payload=True,
    limit=10
)

description_hits = []
for i, hit in enumerate(initial_retrieval.points):
    print(f'Result number {i+1} is \"{hit.payload["description"]}\"')
    description_hits.append(hit.payload["description"])
使用隐式嵌入计算的查询点
query = "A story about a strong historically significant female figure."

initial_retrieval = client.query_points(
    collection_name="movies",
    using="embedding",
    query=models.Document(text=query, model=encoder_name),
    with_payload=True,
    limit=10
)

结果如下:

Result number 1 is "A world-weary political journalist picks up the story of a woman's search for her son, who was taken away from her decades ago after she became pregnant and was forced to live in a convent."
Result number 2 is "Exiled into the dangerous forest by her wicked stepmother, a princess is rescued by seven dwarf miners who make her part of their household."
...
Result number 9 is "A biopic detailing the 2 decades that Punjabi Sikh revolutionary Udham Singh spent planning the assassination of the man responsible for the Jallianwala Bagh massacre."
Result number 10 is "In 1431, Jeanne d'Arc is placed on trial on charges of heresy. The ecclesiastical jurists attempt to force Jeanne to recant her claims of holy visions."

我们可以看到,最符合描述的 “The Messenger: The Story of Joan of Arc” 在结果中排在第 10 位。

让我们尝试使用 Jina Reranker v2 对检索到的子集进行顺序优化。它将查询和一组文档(电影描述)作为输入,并基于查询与每个文档之间的标记级交互计算相关性分数。

new_scores = list(
    reranker.rerank(query, description_hits)
)  # returns scores between query and each document

ranking = [
    (i, score) for i, score in enumerate(new_scores)
]  # saving document indices
ranking.sort(
    key=lambda x: x[1], reverse=True
)  # sorting them in order of relevance defined by reranker

for i, rank in enumerate(ranking):
    print(f'''Reranked result number {i+1} is \"{description_hits[rank[0]]}\"''')

重排序模型根据相关性将目标电影移到了第一位。

Reranked result number 1 is "In 1431, Jeanne d'Arc is placed on trial on charges of heresy. The ecclesiastical jurists attempt to force Jeanne to recant her claims of holy visions."
Reranked result number 2 is "Exiled into the dangerous forest by her wicked stepmother, a princess is rescued by seven dwarf miners who make her part of their household."
...
Reranked result number 9 is "An ordinary word processor has the worst night of his life after he agrees to visit a girl in Soho whom he met that evening at a coffee shop."
Reranked result number 10 is "A biopic detailing the 2 decades that Punjabi Sikh revolutionary Udham Singh spent planning the assassination of the man responsible for the Jallianwala Bagh massacre."

结论

重排序模型通过更深入的语义分析重新排列检索到的候选项,从而优化搜索结果。为了保证效率,它们应仅应用于检索结果的一小部分子集

利用重排序模型的力量,在搜索速度和准确性之间取得平衡!

此页面有用吗?

感谢您的反馈!🙏

很抱歉听到这个消息。😔 您可以在 GitHub 上 编辑 此页面,或 创建一个 GitHub issue。