基于Microsoft.Extensions.AI的向量搜索实践指南

兔尾巴老李

1. 项目概述：基于Microsoft.Extensions.AI的向量搜索实践

最近在开发一个云服务知识库系统时，我遇到了一个典型的技术挑战：如何让用户通过自然语言查询快速找到最相关的云服务？传统的关键词匹配方式在面对"存储Word文档"这样的查询时，往往无法准确关联到"Azure Blob Storage"这样的服务。这正是向量搜索技术大显身手的地方。

通过Microsoft.Extensions.AI和Microsoft.Extensions.VectorData这两个强大的.NET库，我们能够构建一个语义搜索系统。这个系统的核心思想是将文本转换为高维向量（嵌入），然后通过计算向量间的相似度来找到语义上最匹配的结果。这种方法的优势在于它能理解查询的意图，而不仅仅是匹配关键词。

2. 核心组件解析

2.1 数据模型设计

首先我们需要定义一个数据模型来表示云服务知识库中的条目。CloudServiceWiki类就是这个模型的核心：

csharp复制using Microsoft.Extensions.VectorData;

namespace VectorDataAIDemo;

internal class CloudServiceWiki
{
    [VectorStoreKey]
    public int Key { get; set; }

    [VectorStoreData]
    public string Name { get; set; }

    [VectorStoreData]
    public string Description { get; set; }

    [VectorStoreVector(
        Dimensions: 384,
        DistanceFunction = DistanceFunction.CosineSimilarity)]
    public ReadOnlyMemory<float> Vector { get; set; }
}

这里有几个关键点需要注意：

[VectorStoreKey]标记的属性将作为记录的唯一标识符
[VectorStoreData]标记的属性会被存储并可在查询结果中返回
[VectorStoreVector]标记的属性存储嵌入向量，需要指定维度和距离计算方式

提示：维度数384是许多小型嵌入模型的典型输出维度。对于更复杂的场景，可能需要使用维度更大的模型（如1536维的text-embedding-3-large）。

2.2 数据准备

实际项目中，数据可能来自数据库或API。这里我们先用硬编码数据演示：

csharp复制List<CloudServiceWiki> cloudServices =
[
    new() {
            Key = 0,
            Name = "Azure App Service",
            Description = "Host .NET, Java, Node.js, and Python web applications and APIs..."
    },
    new() {
            Key = 1,
            Name = "Azure Service Bus",
            Description = "A fully managed enterprise message broker..."
    },
    // 其他服务数据...
];

3. 嵌入生成与向量存储

3.1 配置嵌入生成器

嵌入生成器负责将文本转换为向量。这里我们使用OpenAI的嵌入模型：

csharp复制// 从用户机密中读取配置
IConfigurationRoot config = new ConfigurationBuilder().AddUserSecrets<Program>().Build();
string model = config["ModelName"];
string key = config["OpenAIKey"];

// 创建嵌入生成器
IEmbeddingGenerator<string, Embedding<float>> generator =
    new OpenAIClient(new ApiKeyCredential(key))
      .GetEmbeddingClient(model: model)
      .AsIEmbeddingGenerator();

注意：在实际生产环境中，应该考虑：

使用Azure Key Vault等安全方案管理API密钥

实现重试机制处理API限流

考虑本地嵌入模型以减少延迟和成本

3.2 构建向量存储

内存向量存储适合演示和小规模数据，生产环境应考虑持久化方案：

csharp复制var vectorStore = new InMemoryVectorStore();
VectorStoreCollection<int, CloudServiceWiki> cloudServicesStore =
    vectorStore.GetCollection<int, CloudServiceWiki>("cloudServices");
await cloudServicesStore.EnsureCollectionExistsAsync();

// 为每条记录生成嵌入并存储
foreach (CloudServiceWiki service in cloudServices)
{
    service.Vector = await generator.GenerateVectorAsync(service.Description);
    await cloudServicesStore.UpsertAsync(service);
}

4. 查询处理与结果展示

4.1 执行向量搜索

查询处理流程与数据准备类似，也需要先将查询文本向量化：

csharp复制string query = "Which Azure service should I use to store my Word documents?";
ReadOnlyMemory<float> queryEmbedding = await generator.GenerateVectorAsync(query);

IAsyncEnumerable<VectorSearchResult<CloudServiceWiki>> results =
    cloudServicesStore.SearchAsync(queryEmbedding, top: 1);

await foreach (VectorSearchResult<CloudServiceWiki> result in results)
{
    Console.WriteLine($"Name: {result.Record.Name}");
    Console.WriteLine($"Description: {result.Record.Description}");
    Console.WriteLine($"Vector match score: {result.Score}");
}