如何使用 FastEmbed 生成文本嵌入
安装 FastEmbed
pip install fastembed
仅用于演示目的,您将使用 Lists 和 NumPy 处理示例数据。
from typing import List
import numpy as np
加载默认模型
在本示例中,您将使用默认文本嵌入模型 BAAI/bge-small-en-v1.5
。
from fastembed import TextEmbedding
添加示例数据
现在,添加两个示例文档。您的文档必须是一个列表,并且每个文档必须是字符串类型。
documents: List[str] = [
"FastEmbed is lighter than Transformers & Sentence-Transformers.",
"FastEmbed is supported by and maintained by Qdrant.",
]
下载并初始化模型。打印一条消息以验证过程。
embedding_model = TextEmbedding()
print("The model BAAI/bge-small-en-v1.5 is ready to use.")
嵌入数据
为两个文档生成嵌入。
embeddings_generator = embedding_model.embed(documents)
embeddings_list = list(embeddings_generator)
len(embeddings_list[0])
这是示例文档列表。默认模型创建的向量具有 384 个维度。
Document: This is built to be faster and lighter than other embedding libraries e.g. Transformers, Sentence-Transformers, etc.
Vector of type: <class 'numpy.ndarray'> with shape: (384,)
Document: fastembed is supported by and maintained by Qdrant.
Vector of type: <class 'numpy.ndarray'> with shape: (384,)
可视化嵌入
print("Embeddings:\n", embeddings_list)
这些嵌入看起来不是太有趣,但这里有一个可视化。
Embeddings:
[[-0.11154681 0.00976555 0.00524559 0.01951888 -0.01934952 0.02943449
-0.10519084 -0.00890122 0.01831438 0.01486796 -0.05642502 0.02561352
-0.00120165 0.00637456 0.02633459 0.0089221 0.05313658 0.03955453
-0.04400245 -0.02929407 0.04691846 -0.02515868 0.00778646 -0.05410657
...
-0.00243012 -0.01820582 0.02938612 0.02108984 -0.02178085 0.02971899
-0.00790564 0.03561783 0.0652488 -0.04371546 -0.05550042 0.02651665
-0.01116153 -0.01682246 -0.05976734 -0.03143916 0.06522726 0.01801389
-0.02611006 0.01627177 -0.0368538 0.03968835 0.027597 0.03305927]]