Mixpeek 视频嵌入
Mixpeek 的视频处理能力允许您对视频进行分块和嵌入,而 Qdrant 提供这些嵌入的高效存储和检索。
前提条件
- Python 3.7+
- Mixpeek API 密钥
- Mixpeek 客户端已安装(
pip install mixpeek
) - Qdrant 客户端已安装(
pip install qdrant-client
)
安装
- 安装所需包
pip install mixpeek qdrant-client
- 设置您的 Mixpeek API 密钥
from mixpeek import Mixpeek
mixpeek = Mixpeek('your_api_key_here')
- 初始化 Qdrant 客户端
from qdrant_client import QdrantClient
client = QdrantClient("localhost", port=6333)
使用方法
1. 创建 Qdrant 集合
在插入向量之前,请确保创建 Qdrant 集合。您可以使用以下方法创建具有适当向量大小(针对“vuse-generic-v1”模型为 768)的集合
client.create_collection(
collection_name="video_chunks",
vectors_config=models.VectorParams(size=768, distance=models.Distance.COSINE)
)
2. 处理并嵌入视频
首先,将视频处理成块并嵌入每个块
from mixpeek import Mixpeek
from qdrant_client import QdrantClient, models
mixpeek = Mixpeek('your_api_key_here')
client = QdrantClient("localhost", port=6333)
video_url = "https://mixpeek-public-demo.s3.us-east-2.amazonaws.com/starter/jurassic_park_trailer.mp4"
# Process video chunks
processed_chunks = mixpeek.tools.video.process(
video_source=video_url,
chunk_interval=1, # 1 second intervals
resolution=[720, 1280]
)
# Embed each chunk and insert into Qdrant
for index, chunk in enumerate(processed_chunks):
print(f"Processing video chunk: {index}")
embedding = mixpeek.embed.video(
model_id="vuse-generic-v1",
input=chunk['base64_chunk'],
input_type="base64"
)['embedding']
# Insert into Qdrant
client.upsert(
collection_name="video_chunks",
points=[models.PointStruct(
id=index,
vector=embedding,
payload={
"start_time": chunk["start_time"],
"end_time": chunk["end_time"]
}
)]
)
print(f" Embedding preview: {embedding[:5] + ['...'] + embedding[-5:]}")
print(f"Processed and inserted {len(processed_chunks)} chunks")
3. 搜索相似视频块
要搜索相似视频块,您可以使用文本或视频查询
文本查询
query_text = "a car chase scene"
# Embed the text query
query_embedding = mixpeek.embed.video(
model_id="vuse-generic-v1",
input=query_text,
input_type="text"
)['embedding']
# Search in Qdrant
search_results = client.query_points(
collection_name="video_chunks",
query=query_embedding,
limit=5
).points
for result in search_results:
print(f"Chunk ID: {result.id}, Score: {result.score}")
print(f"Time range: {result.payload['start_time']} - {result.payload['end_time']}")
视频查询
query_video_url = "https://mixpeek-public-demo.s3.us-east-2.amazonaws.com/starter/jurassic_bunny.mp4"
# Embed the video query
query_embedding = mixpeek.embed.video(
model_id="vuse-generic-v1",
input=query_video_url,
input_type="url"
)['embedding']
# Search in Qdrant
search_results = client.query_points(
collection_name="video_chunks",
query=query_embedding,
limit=5
).points
for result in search_results:
print(f"Chunk ID: {result.id}, Score: {result.score}")
print(f"Time range: {result.payload['start_time']} - {result.payload['end_time']}")
资源
有关 Mixpeek Embed 的更多信息,请查阅官方文档:https://docs.mixpeek.com/api-documentation/inference/embed