MongoDB8.0新特性实战：向量搜索、时序集合与分片集群优化

张

张建站

2026/5/24 22:43:45

10分钟阅读

MongoDB 8.0新特性实战：向量搜索、时序集合与分片集群优化作者：Crown_22 | AI Agent Hermes Agent 桌面程序开发者前言MongoDB 8.0 是一个重大版本更新，带来了多项面向 AI 和大数据场景的新特性。其中最引人注目的是原生向量搜索（Vector Search）——这让 MongoDB 直接向量量数据库领域发起挑战，同时保留了文档数据库的灵活性。本文将深入 MongoDB 8.0 的核心新特性：向量搜索、时序集合（Time Series）、分片优化、聚合管道增强等，通过真实可运行的代码示例，展示如何在生产环境中利用这些特性。第一章：向量搜索——MongoDB 的 AI 进击1.1 为什么 MongoDB 要做向量搜索传统 RAG（检索增强生成）架构需要维护两个数据库：一个文档数据库（存原始数据）+ 一个向量数据库（存嵌入向量）。这带来了数据同步、运维复杂度和延迟等问题。MongoDB 8.0 的向量搜索让你在同一个数据库中同时存储文档和向量，一次查询就能完成语义检索。1.2 创建向量搜索索引frompymongoimportMongoClientfrompymongo.operationsimportSearchIndexModelimporttime client=MongoClient("mongodb://localhost:27017/")db=client["ai_app"]collection=db["documents"]# 创建向量搜索索引index_model=SearchIndexModel(definition={"fields":[{"type":"vector","path":"embedding",# 向量字段名"numDimensions":1-536,# 向量维度（与嵌入模型匹配）"similarity":"cosine",# 相似度算法：cosine/euclidean/dotProduct},{"type":"filter","path":"category",# 支持预过滤},]},name="vector_index",type="vectorSearch",)# 创建索引（异步操作，需要等待）collection.create_search_index(model=index_model)# 等待索引就绪whileTrue:indexes=list(collection.list_search_indexes("vector_index"))ifindexesandindexes[0].get("status")=="READY":print("Vector search index is ready!")breakprint("Waiting for index to be ready...")time.sleep(5)1.3 插入带向量的文档fromopenaiimportOpenAIimportnumpyasnp openai_client=OpenAI()defget_embedding(text:str)-list[float]:"""获取文本嵌入向量"""response=openai_client.embeddings.create(model="text-embedding-3-small",input=text,)returnresponse.data[0].embedding# 插入示例文档documents=[{"title":"Python 异步编程指南","content":"Python 的 asyncio 库提供了编写单线程并发代码的基础设施...","category":"python","tags":["async","python","concurrency"],},{"title":"React Server Components 深度解析","content":"React Server Components 允许在服务端渲染组件...","category":"frontend","tags":["react","ssr","frontend"],},{"title":"Kubernetes Pod 调度策略","content":"Kubernetes 的调度器负责将 Pod 分配到合适的节点...","category":"devops","tags":["kubernetes","scheduling","devops"],},]# 为每个文档生成嵌入并插入fordocindocuments:doc["embedding"]=get_embedding(doc["title"]+" "+doc["content"])collection.insert_one(doc)print(f"Inserted{len(documents)}documents with embeddings")1.4 向量相似度搜索defvector_search(query:str,limit:int=5,category:str=None)-list[dict]:"""向量相似度搜索"""query_embedding=get_embedding(query)# 构建搜索管道pipeline=[{"$vectorSearch":{"index":"vector_index","path":"embedding","queryVector":query_embedding,"numCandidates":100,# 候选数量（越大越精确，越慢）"limit":limit,}},{"$project":{"title":1,"content":1,"category":1,"score":{"$meta":"vectorSearchScore"},# 相似度分数"_id":0,}},]# 添加预过滤（可选）ifcategory:pipeline.insert(1,{"$match":{"category":category}})results=list(collection.aggregate(pipeline))returnresults# 搜索示例results=vector_search("如何优化 Python 并发性能",limit=3)forrinresults:print(f"[{r['score']:.4f}]{r['title']}")print(f"{r['content'][:100]}...")print()1.5 混合搜索：向量 + 全文defhybrid_search(query:str,limit:int=5)-list[dict]:"""混合搜索：结合向量搜索和全文搜索"""query_embedding=get_embedding(query)pipeline=[# 第一阶段：向量搜索获取候选集{"$vectorSearch":{"index":"vector_index","path":"embedding","queryVector":query_embedding,"numCandidates":200,"limit":50,# 获取更多候选}},# 第二阶段：全文搜索评分{"$search":{"index":"fulltext_index","text":{"query":query,"path":["title","content"],},}},# 第三阶段：合并分数{"$addFields":{"vectorScore":{"$meta":"vectorSearchScore"},"textScore":{"$meta":"searchScore"},"combinedScore":{"$add":[{"$multiply":[{"$meta":"vectorSearchScore"},0.7]},{"$multiply":[{"$meta":"searchScore"},0.3