YOLOv6与ONNX Runtime在.NET工业质检中的实践-AI智能范式网

YOLOv6与ONNX Runtime在.NET工业质检中的实践

猫球

1. 项目概述

在计算机视觉领域，目标检测一直是核心任务之一。最近我在一个工业质检项目中尝试将ONNX Runtime与OpenCVSharp结合使用YOLOv6模型，发现这种技术组合在.NET环境下既保持了深度学习的高精度，又能获得不错的推理性能。今天就来分享一下这个技术栈的具体实现方法。

这个方案特别适合需要在Windows平台部署AI模型的.NET开发者。相比Python方案，C#实现的系统更易于与企业现有系统集成，运行时依赖更少，且能充分利用ONNX格式的跨平台优势。我们将使用YOLOv6官方提供的ONNX模型，通过Microsoft.ML.OnnxRuntime加载推理，再用OpenCvSharp处理图像输入和结果可视化。

2. 环境准备与工具链选型

2.1 开发环境配置

首先需要准备以下环境：

Visual Studio 2022（社区版即可）
.NET 6或更高版本
NuGet包管理器

通过NuGet安装三个核心组件：

bash复制Install-Package Microsoft.ML.OnnxRuntime
Install-Package OpenCvSharp4
Install-Package OpenCvSharp4.runtime.win

选择这些组件的主要考虑：

OnnxRuntime是微软官方维护的ONNX模型推理库，对Windows平台优化最好
OpenCvSharp4是.NET下最成熟的OpenCV封装，比EmguCV更活跃
必须配套安装对应平台的runtime包，否则会报DLL加载错误

2.2 模型准备

从YOLOv6官方GitHub下载ONNX格式模型时要注意：

根据硬件选择合适版本（nano/small/medium/large）
确认模型输入输出格式（通常是1x3x640x640的float数组）
下载对应的labels.txt文件包含类别名称

重要提示：官方提供的.onnx文件可能包含不被OnnxRuntime支持的算子，建议使用onnx-simplifier优化模型：
python复制python -m onnxsim yolov6s.onnx yolov6s-sim.onnx

3. 核心实现流程

3.1 图像预处理

YOLOv6的预处理包括：

保持长宽比resize到640x640
归一化到0-1范围
转换为NCHW格式

csharp复制using OpenCvSharp;

Mat Preprocess(Mat image)
{
    // 保持长宽比的resize
    int target_size = 640;
    float ratio = Math.Min(target_size / (float)image.Width, target_size / (float)image.Height);
    Size new_size = new Size((int)(image.Width * ratio), (int)(image.Height * ratio));
    Mat resized = new Mat();
    Cv2.Resize(image, resized, new_size);
    
    // 填充到正方形
    Mat padded = new Mat(target_size, target_size, MatType.CV_8UC3, new Scalar(114, 114, 114));
    resized.CopyTo(new Mat(padded, new Rect(0, 0, resized.Width, resized.Height)));
    
    // 转换为float并归一化
    padded.ConvertTo(padded, MatType.CV_32FC3, 1.0f / 255.0f);
    
    // 转换为NCHW格式
    var input = new float[1 * 3 * target_size * target_size];
    for (int c = 0; c < 3; c++)
    {
        for (int h = 0; h < target_size; h++)
        {
            for (int w = 0; w < target_size; w++)
            {
                input[c * target_size * target_size + h * target_size + w] = padded.At<Vec3f>(h, w)[c];
            }
        }
    }
    return input;
}

3.2 模型推理

创建推理会话并运行预测：

csharp复制using Microsoft.ML.OnnxRuntime;

var session = new InferenceSession("yolov6s-sim.onnx");

// 准备输入
var inputs = new List<NamedOnnxValue> {
    NamedOnnxValue.CreateFromTensor("images", new DenseTensor<float>(input, new[] {1, 3, 640, 640}))
};

// 运行推理
using var results = session.Run(inputs);
var output = results.First().AsTensor<float>();

3.3 后处理解析

YOLOv6的输出解析相对复杂，需要处理：

过滤低置信度检测
NMS去重
坐标转换回原图尺寸

csharp复制List<Detection> ParseOutput(float[] output, float conf_thresh=0.5, float iou_thresh=0.5)
{
    // YOLOv6输出格式为1x8400x85
    int num_classes = 80;
    int stride = num_classes + 5;
    var detections = new List<Detection>();
    
    for (int i = 0; i < 8400; i++) {
        float conf = output[i * stride + 4];
        if (conf < conf_thresh) continue;
        
        // 找出最大概率的类别
        int cls_id = -1;
        float max_cls_prob = 0;
        for (int c = 0; c < num_classes; c++) {
            float prob = output[i * stride + 5 + c] * conf;
            if (prob > max_cls_prob) {
                max_cls_prob = prob;
                cls_id = c;
            }
        }
        
        if (cls_id < 0) continue;
        
        // 解析bbox坐标
        float cx = output[i * stride + 0];
        float cy = output[i * stride + 1];
        float w = output[i * stride + 2];
        float h = output[i * stride + 3];
        
        detections.Add(new Detection {
            ClassId = cls_id,
            Confidence = max_cls_prob,
            Box = new Rect(
                (int)((cx - w/2) * original_width),
                (int)((cy - h/2) * original_height),
                (int)(w * original_width),
                (int)(h * original_height)
            )
        });
    }
    
    // 执行NMS
    return NMS(detections, iou_thresh);
}

4. 性能优化技巧

4.1 推理会话配置

创建InferenceSession时可以指定优化选项：

csharp复制var options = new SessionOptions {
    GraphOptimizationLevel = GraphOptimizationLevel.ORT_ENABLE_ALL,
    EnableMemoryPattern = true,
    ExecutionMode = ExecutionMode.ORT_PARALLEL
};

// 对于GPU用户
options.AppendExecutionProvider_CUDA();
// 或者
options.AppendExecutionProvider_DML();

4.2 批处理优化

当需要处理多张图片时，可以：

预分配输入输出缓冲区
使用固定内存(pinned memory)
并行执行预处理

csharp复制class BatchProcessor : IDisposable
{
    private float[] _inputBuffer;
    private GCHandle _handle;
    
    public BatchProcessor(int batchSize)
    {
        _inputBuffer = new float[batchSize * 3 * 640 * 640];
        _handle = GCHandle.Alloc(_inputBuffer, GCHandleType.Pinned);
    }
    
    public void ProcessBatch(Mat[] images)
    {
        Parallel.For(0, images.Length, i => 
        {
            var input = Preprocess(images[i]);
            Array.Copy(input, 0, _inputBuffer, i * 3 * 640 * 640, input.Length);
        });
        
        // 批量推理...
    }
    
    public void Dispose()
    {
        _handle.Free();
    }
}

5. 常见问题与解决方案

5.1 模型加载失败

问题现象：

code复制Microsoft.ML.OnnxRuntime.OnnxRuntimeException: [ErrorCode:InvalidGraph]...

排查步骤：

使用Netron工具检查模型结构
确认输入输出tensor名称匹配
检查是否有不支持的算子

解决方案：

使用onnxruntime-tools优化模型
或者转换模型时指定opset_version=12

5.2 内存泄漏

典型场景：
长时间运行后内存持续增长

关键检查点：

确保所有IDisposable对象（Mat, InferenceSession等）正确释放
避免频繁创建/销毁会话
检查GCHandle是否正确释放

5.3 检测框偏移

问题表现：
检测框位置与物体实际位置不匹配

调试方法：

保存预处理后的图像检查resize是否正确
验证坐标转换公式
检查模型输入输出尺度是否匹配

6. 实际应用扩展

6.1 多模型组合

可以串联多个ONNX模型实现复杂流程：

csharp复制// 先进行目标检测
var detections = yolov6.Detect(image);

// 对每个检测结果调用分类模型
foreach (var det in detections)
{
    var crop = new Mat(image, det.Box);
    var cls_result = classifier.Classify(crop);
    det.SubClass = cls_result.TopClass;
}

6.2 视频流处理

对于摄像头视频流处理的关键优化：

复用中间缓冲区
异步流水线处理
动态跳帧策略

csharp复制async Task ProcessVideoAsync(VideoCapture capture, CancellationToken token)
{
    var frameQueue = new BlockingCollection<Mat>(5);
    var processor = new YoloProcessor();
    
    // 生产者线程
    var producer = Task.Run(() => 
    {
        using var frame = new Mat();
        while (!token.IsCancellationRequested)
        {
            if (capture.Read(frame) && !frame.Empty())
            {
                frameQueue.Add(frame.Clone());
            }
        }
    });
    
    // 消费者线程
    while (!token.IsCancellationRequested)
    {
        if (frameQueue.TryTake(out var frame, 100))
        {
            var results = await Task.Run(() => processor.Detect(frame));
            RenderResults(frame, results);
            frame.Dispose();
        }
    }
}

在工业级应用中，这套技术栈的典型性能表现是：在i7-11800H CPU上，YOLOv6s模型可以做到约45FPS的处理速度，显存占用控制在500MB以内。对于需要更高性能的场景，建议启用GPU加速或使用TensorRT进一步优化模型。