Mixpeek 视频嵌入

Mixpeek 的视频处理能力让您能够对视频进行分块和嵌入,而 Qdrant 则为这些嵌入提供了高效的存储和检索。

先决条件

  • Python 3.7+
  • Mixpeek API 密钥
  • 已安装 Mixpeek 客户端 (pip install mixpeek)
  • 已安装 Qdrant 客户端 (pip install qdrant-client)

安装

  1. 安装所需的软件包
pip install mixpeek qdrant-client
  1. 设置您的 Mixpeek API 密钥
from mixpeek import Mixpeek

mixpeek = Mixpeek('your_api_key_here')
  1. 初始化 Qdrant 客户端
from qdrant_client import QdrantClient

client = QdrantClient("localhost", port=6333)

用法

1. 创建 Qdrant 集合

在插入向量之前,请务必创建一个 Qdrant 集合。您可以使用以下方式创建一个具有适当向量大小(“vuse-generic-v1”模型为 768)的集合

client.create_collection(
    collection_name="video_chunks",
    vectors_config=models.VectorParams(size=768, distance=models.Distance.COSINE)
)

2. 处理和嵌入视频

首先,将视频处理成分块并嵌入每个分块

from mixpeek import Mixpeek
from qdrant_client import QdrantClient, models

mixpeek = Mixpeek('your_api_key_here')
client = QdrantClient("localhost", port=6333)

video_url = "https://mixpeek-public-demo.s3.us-east-2.amazonaws.com/starter/jurassic_park_trailer.mp4"

# Process video chunks
processed_chunks = mixpeek.tools.video.process(
    video_source=video_url,
    chunk_interval=1,  # 1 second intervals
    resolution=[720, 1280]
)

# Embed each chunk and insert into Qdrant
for index, chunk in enumerate(processed_chunks):
    print(f"Processing video chunk: {index}")

    embedding = mixpeek.embed.video(
        model_id="vuse-generic-v1",
        input=chunk['base64_chunk'],
        input_type="base64"
    )['embedding']

    # Insert into Qdrant
    client.upsert(
        collection_name="video_chunks",
        points=[models.PointStruct(
            id=index,
            vector=embedding,
            payload={
                "start_time": chunk["start_time"],
                "end_time": chunk["end_time"]
            }
        )]
    )

    print(f"  Embedding preview: {embedding[:5] + ['...'] + embedding[-5:]}")

print(f"Processed and inserted {len(processed_chunks)} chunks")

3. 搜索相似视频分块

要搜索相似的视频分块,您可以使用文本或视频查询

文本查询

query_text = "a car chase scene"

# Embed the text query
query_embedding = mixpeek.embed.video(
    model_id="vuse-generic-v1",
    input=query_text,
    input_type="text"
)['embedding']

# Search in Qdrant
search_results = client.query_points(
    collection_name="video_chunks",
    query=query_embedding,
    limit=5
).points

for result in search_results:
    print(f"Chunk ID: {result.id}, Score: {result.score}")
    print(f"Time range: {result.payload['start_time']} - {result.payload['end_time']}")

视频查询

query_video_url = "https://mixpeek-public-demo.s3.us-east-2.amazonaws.com/starter/jurassic_bunny.mp4"

# Embed the video query
query_embedding = mixpeek.embed.video(
    model_id="vuse-generic-v1",
    input=query_video_url,
    input_type="url"
)['embedding']

# Search in Qdrant
search_results = client.query_points(
    collection_name="video_chunks",
    query=query_embedding,
    limit=5
).points

for result in search_results:
    print(f"Chunk ID: {result.id}, Score: {result.score}")
    print(f"Time range: {result.payload['start_time']} - {result.payload['end_time']}")

资源

有关 Mixpeek Embed 的更多信息,请查阅官方文档:https://docs.mixpeek.com/api-documentation/inference/embed

此页面有用吗?

感谢您的反馈!🙏

很抱歉听到这个消息。😔 您可以在 GitHub 上编辑此页面,或创建一个 GitHub 问题。