基于YOLOv5的交通标志识别系统开发与优化实践-AI智能范式网

基于YOLOv5的交通标志识别系统开发与优化实践

L 姐

1. 项目概述：基于YOLOv5的交通标志识别系统

去年在参与一个智能驾驶辅助项目时，我遇到了交通标志识别的难题。经过多轮技术选型，最终选择了YOLOv5作为基础框架，开发了一套能识别45种中国交通标志的检测系统。这个系统在实测中达到了92.3%的mAP（mean Average Precision），在NVIDIA Jetson Xavier NX边缘设备上能保持25FPS的实时性能。

交通标志识别看似简单，实则面临三大技术挑战：

小目标检测问题（如远处的限速标志可能只占图像的20×20像素）
复杂环境干扰（雨雪天气、遮挡、光照变化等）
类间相似度高（如不同方向的转弯箭头）

YOLOv5之所以能出色应对这些挑战，主要得益于其多尺度检测架构和高效的训练策略。我们选择的YOLOv5s版本，在保持较高精度的同时，模型大小仅27MB，非常适合嵌入式部署。

2. 系统架构设计

2.1 技术栈选型

整个系统采用模块化设计，主要组件包括：

模块	技术选型	选型理由
检测模型	YOLOv5s	轻量级、支持ONNX导出
推理框架	PyTorch 1.8	动态图易调试
图像处理	OpenCV 4.5	硬件加速支持
部署方案	TorchScript	跨平台兼容性好

注意：如果需要在移动端部署，建议转换为ONNX格式后再使用TensorRT加速，实测在骁龙865上推理速度可提升3倍

2.2 数据处理流程

系统的完整处理流程如下：

图像采集：支持USB摄像头/Rtsp视频流/本地视频文件
预处理：自适应直方图均衡化（CLAHE） + 高斯模糊
推理：YOLOv5多尺度特征融合
后处理：改进的加权NMS（非极大值抑制）
输出：带置信度的检测框 + 语音提示

对于预处理环节，我们发现传统BGR转RGB的操作在雨天场景效果不佳，改用以下处理链：

python复制def enhance_image(img):
    # 对比度受限的自适应直方图均衡化
    lab = cv2.cvtColor(img, cv2.COLOR_BGR2LAB)
    l, a, b = cv2.split(lab)
    clahe = cv2.createCLAHE(clipLimit=3.0, tileGridSize=(8,8))
    limg = clahe.apply(l)
    enhanced = cv2.merge((limg,a,b))
    return cv2.cvtColor(enhanced, cv2.COLOR_LAB2BGR)

3. 模型训练关键细节

3.1 数据集构建

我们自建的数据集包含45类标志，数据分布如下：

类别	训练集	验证集	测试集	采集场景
禁令标志	12,345	1,382	1,500	城市道路
指示标志	8,762	978	1,000	高速公路
警告标志	6,543	732	750	山区道路

数据增强策略特别重要，我们采用Mosaic增强时发现直接应用原版参数会导致小标志被过度遮挡，调整为：

yaml复制# data/hyps/hyp.scratch-low.yaml
mosaic: 1.0  # 使用mosaic的概率
mixup: 0.15  # mixup概率降低
degrees: 5.0  # 旋转角度减小
perspective: 0.0005  # 透视变换系数减小

3.2 模型微调技巧

在YOLOv5s基础上做了三点改进：

自适应锚框计算：

python复制python train.py --data traffic.yaml --cfg yolov5s.yaml --hyp hyp.scratch-low.yaml --batch 64 --epochs 300 --weights yolov5s.pt --cache --img 640 --noval --evolve

损失函数调整：

增加小目标检测的权重系数
采用CIoU Loss代替GIoU
分类损失加入标签平滑

训练策略：

冻结Backbone训练50轮
使用余弦退火学习率
早停机制patience=100

4. 部署优化实践

4.1 推理加速方案

在Jetson设备上的优化步骤：

模型量化：

bash复制python export.py --weights best.pt --include torchscript --img 640 --optimize

TensorRT加速：

python复制import torch
from torch2trt import torch2trt

model = torch.jit.load('best.torchscript')
model_trt = torch2trt(model, [torch.randn(1, 3, 640, 640).cuda()], fp16_mode=True)

内存优化：

使用固定大小的推理张量
启用CUDA流并行
预分配输出缓冲区

4.2 实际应用问题排查

在路测中遇到的典型问题及解决方案：

误检问题：

现象：将路灯识别为警告标志
解决：在数据集中加入负样本（不含标志的道路图像）
效果：误检率降低37%

漏检问题：

现象：雨雾天气限速标志漏检
解决：增加weather augmentation
效果：召回率提升22%

延迟问题：

现象：4K视频流处理延迟高
解决：采用区域ROI检测+帧跳跃策略
效果：延迟从450ms降至120ms

5. 关键代码解析

5.1 改进的检测后处理

传统NMS在处理密集标志时效果不佳，我们实现了一种加权NMS：

python复制def weighted_nms(detections, iou_thresh=0.45):
    if len(detections) == 0:
        return []
    
    boxes = detections[:, :4]
    scores = detections[:, 4]
    classes = detections[:, 5]
    
    # 按置信度降序排序
    order = scores.argsort()[::-1]
    keep = []
    
    while order.size > 0:
        i = order[0]
        keep.append(i)
        
        # 计算当前框与其他框的IoU
        xx1 = np.maximum(boxes[i, 0], boxes[order[1:], 0])
        yy1 = np.maximum(boxes[i, 1], boxes[order[1:], 1])
        xx2 = np.minimum(boxes[i, 2], boxes[order[1:], 2])
        yy2 = np.minimum(boxes[i, 3], boxes[order[1:], 3])
        
        w = np.maximum(0.0, xx2 - xx1)
        h = np.maximum(0.0, yy2 - yy1)
        intersection = w * h
        
        area_i = (boxes[i, 2] - boxes[i, 0]) * (boxes[i, 3] - boxes[i, 1])
        area_j = (boxes[order[1:], 2] - boxes[order[1:], 0]) * (boxes[order[1:], 3] - boxes[order[1:], 1])
        union = area_i + area_j - intersection
        
        iou = intersection / union
        
        # 权重调整：保留IoU>阈值但类别不同的检测
        mask = (iou <= iou_thresh) | (classes[order[1:]] != classes[i])
        order = order[1:][mask]
    
    return detections[keep]

5.2 实时视频处理流水线

优化后的视频处理类实现：

python复制class TrafficSignDetector:
    def __init__(self, model_path, conf_thres=0.5):
        self.model = torch.jit.load(model_path)
        self.conf_thres = conf_thres
        self.stride = 32
        self.img_size = 640
        self.device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
        
    def preprocess(self, img):
        # 保持长宽比的resize
        h, w = img.shape[:2]
        r = min(self.img_size / h, self.img_size / w)
        new_h, new_w = int(h * r), int(w * r)
        img = cv2.resize(img, (new_w, new_h))
        
        # padding
        dh, dw = self.img_size - new_h, self.img_size - new_w
        top, bottom = dh // 2, dh - (dh // 2)
        left, right = dw // 2, dw - (dw // 2)
        img = cv2.copyMakeBorder(img, top, bottom, left, right, 
                                cv2.BORDER_CONSTANT, value=(114, 114, 114))
        
        # 归一化
        img = img[:, :, ::-1].transpose(2, 0, 1)  # BGR to RGB
        img = np.ascontiguousarray(img)
        img = torch.from_numpy(img).to(self.device).float() / 255.0
        return img.unsqueeze(0)
    
    def detect(self, img):
        img_tensor = self.preprocess(img)
        with torch.no_grad():
            pred = self.model(img_tensor)[0]
        pred = non_max_suppression(pred, self.conf_thres, 0.45)
        return pred

6. 性能优化技巧

6.1 模型剪枝方案

通过通道剪枝减小模型体积：

稀疏化训练：

bash复制python train.py --data traffic.yaml --cfg yolov5s.yaml --weights yolov5s.pt --sr 0.001 --s 0.01 --epochs 100

通道剪枝：

python复制from models.yolo import Model
from utils.prune_utils import prune_model

model = Model('yolov5s.yaml').to(device)
prune_model(model, amount=0.3)  # 剪枝30%通道

微调恢复：

bash复制python train.py --data traffic.yaml --cfg pruned.yaml --weights pruned.pt --epochs 50

6.2 多线程处理框架

使用生产者-消费者模式实现高效流水线：

python复制import queue
import threading

class VideoProcessor:
    def __init__(self, model_path, max_queue=10):
        self.detector = TrafficSignDetector(model_path)
        self.frame_queue = queue.Queue(maxsize=max_queue)
        self.result_queue = queue.Queue(maxsize=max_queue)
        
    def capture_thread(self, video_source):
        cap = cv2.VideoCapture(video_source)
        while True:
            ret, frame = cap.read()
            if not ret: break
            if not self.frame_queue.full():
                self.frame_queue.put(frame)
        cap.release()
        
    def process_thread(self):
        while True:
            frame = self.frame_queue.get()
            results = self.detector.detect(frame)
            self.result_queue.put((frame, results))
            
    def show_thread(self):
        while True:
            frame, results = self.result_queue.get()
            for *xyxy, conf, cls in results:
                # 绘制检测结果
                pass
            cv2.imshow('Result', frame)
            if cv2.waitKey(1) == 27: break

在实际部署中发现，当处理4K视频时，将检测帧率控制在15FPS、显示帧率保持30FPS时，系统整体延迟最低。这通过调整生产者线程的sleep时间来实现：

python复制time.sleep(1/15)  # 控制检测频率

这套系统经过半年多的实际道路测试，在多种复杂环境下都表现出稳定的检测性能。特别是在隧道出入口的光照突变场景下，通过引入动态白平衡调整算法，将误检率控制在3%以下。未来计划加入语义分割模块来进一步提升在极端天气下的鲁棒性。