目标检测作为计算机视觉的核心任务之一,在工业质检、安防监控、自动驾驶等领域有着广泛应用。YOLO(You Only Look Once)系列因其出色的实时性能而备受青睐,而YOLOv8作为Ultralytics公司2023年推出的最新版本,在精度和速度上实现了新的突破。本文将带你使用KerasCV框架实现YOLOv8模型的完整训练与推理流程,相比原生PyTorch实现,KerasCV提供了更简洁的API和与TensorFlow生态的无缝集成。
提示:本文假设读者已掌握Python基础语法和深度学习基本概念,推荐在Colab或配备NVIDIA显卡的本地环境中运行代码。
首先安装必要的依赖库(建议使用Python 3.8+环境):
bash复制pip install tensorflow keras-cv matplotlib opencv-python
验证KerasCV版本(需≥0.6.0):
python复制import keras_cv
print(keras_cv.__version__) # 应输出0.6.0或更高
以COCO数据集为例,我们需要将其转换为KerasCV兼容的格式。以下是创建tf.data.Dataset的典型流程:
python复制import tensorflow as tf
from keras_cv import bounding_box
def load_dataset(split="train"):
# 加载COCO标注文件
annotations = json.load(open(f"annotations/instances_{split}2017.json"))
# 构建图像路径到标注的映射
id_to_annotations = {}
for ann in annotations["annotations"]:
if ann["image_id"] not in id_to_annotations:
id_to_annotations[ann["image_id"]] = []
id_to_annotations[ann["image_id"]].append(ann)
# 创建tf.data.Dataset
def generator():
for img_info in annotations["images"]:
img_id = img_info["id"]
boxes = []
classes = []
for ann in id_to_annotations.get(img_id, []):
x, y, w, h = ann["bbox"]
boxes.append([x, y, x+w, y+h]) # 转换为xyxy格式
classes.append(ann["category_id"])
if boxes:
yield img_info["file_name"], {
"boxes": tf.constant(boxes, dtype=tf.float32),
"classes": tf.constant(classes, dtype=tf.int32)
}
return tf.data.Dataset.from_generator(
generator,
output_signature=(
tf.TensorSpec(shape=(), dtype=tf.string),
{
"boxes": tf.TensorSpec(shape=(None, 4), dtype=tf.float32),
"classes": tf.TensorSpec(shape=(None,), dtype=tf.int32)
}
)
)
train_ds = load_dataset("train")
val_ds = load_dataset("val")
注意:实际应用中建议使用TFRecord格式存储数据以提高IO性能,此处简化流程仅作演示。
KerasCV提供了预构建的YOLOv8实现:
python复制from keras_cv.models import YOLOV8Detector
model = YOLOV8Detector(
num_classes=80, # COCO类别数
bounding_box_format="xyxy",
backbone="yolo_v8_m_backbone", # 可选s/m/l/x不同规模
fpn_depth=2
)
# 编译模型
optimizer = tf.keras.optimizers.AdamW(
learning_rate=0.001,
global_clipnorm=10.0
)
model.compile(
optimizer=optimizer,
classification_loss="binary_crossentropy",
box_loss="ciou"
)
KerasCV提供丰富的预处理层:
python复制from keras_cv.layers import RandomColorJitter, RandomCutout, Mosaic
augmenter = tf.keras.Sequential([
Mosaic(bounding_box_format="xyxy"),
RandomColorJitter(value_range=(0, 255)),
RandomCutout(height_factor=0.2, width_factor=0.2),
# 更多增强层...
])
def augment_fn(sample):
image = tf.io.read_file(sample[0])
image = tf.image.decode_jpeg(image, channels=3)
return augmenter(
{"images": image, "bounding_boxes": sample[1]},
training=True
)
train_ds = train_ds.map(augment_fn).batch(16).prefetch(tf.data.AUTOTUNE)
python复制callbacks = [
tf.keras.callbacks.EarlyStopping(patience=5),
tf.keras.callbacks.ModelCheckpoint("yolov8.keras"),
tf.keras.callbacks.ReduceLROnPlateau(factor=0.1, patience=3)
]
history = model.fit(
train_ds,
validation_data=val_ds,
epochs=100,
callbacks=callbacks
)
加载训练好的模型进行推理:
python复制import cv2
def predict(image_path, confidence_threshold=0.5):
image = cv2.imread(image_path)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
input_image = tf.expand_dims(image, axis=0)
# 执行预测
outputs = model.predict(input_image)
boxes = outputs["boxes"][0]
scores = outputs["scores"][0]
classes = outputs["classes"][0]
# 过滤低置信度检测
mask = scores > confidence_threshold
boxes = boxes[mask]
scores = scores[mask]
classes = classes[mask]
return boxes, scores, classes
YOLOv8原始输出需要NMS处理:
python复制from keras_cv.layers import NonMaxSuppression
nms = NonMaxSuppression(
bounding_box_format="xyxy",
iou_threshold=0.5,
confidence_threshold=0.5
)
def refined_predict(image):
raw_pred = model(image)
return nms(raw_pred)
python复制converter = tf.experimental.tensorrt.Converter(
input_saved_model_dir="yolov8_savedmodel"
)
converter.convert()
converter.save("yolov8_tensorrt")
python复制quantize_model = tfmot.quantization.keras.quantize_model
q_aware_model = quantize_model(model)
q_aware_model.compile(optimizer=optimizer, ...)
python复制lr_schedule = tf.keras.optimizers.schedules.PolynomialDecay(
initial_learning_rate=0.0001,
decay_steps=10000,
end_learning_rate=0.00001
)
python复制model = YOLOV8Detector(
...
anchor_generator=keras_cv.models.YOLOV8AnchorGenerator(
aspect_ratios=[1.0, 2.0, 0.5],
scales=[1.0, 1.25, 0.8],
strides=[8, 16, 32]
)
)
python复制options = tf.data.Options()
options.experimental_distribute.auto_shard_policy = \
tf.data.experimental.AutoShardPolicy.DATA
train_ds = train_ds.with_options(options)
strategy = tf.distribute.MirroredStrategy()
with strategy.scope():
model = YOLOV8Detector(...)
冻结骨干网络进行微调:
python复制model.backbone.trainable = False
model.compile(...) # 使用更小的学习率
model.fit(...)
# 解冻部分层
for layer in model.backbone.layers[-10:]:
layer.trainable = True
添加分割头:
python复制from keras_cv.models import YOLOV8Segmentation
seg_model = YOLOV8Segmentation(
num_classes=80,
bounding_box_format="xyxy",
backbone="yolo_v8_m_backbone"
)
使用TFLite转换:
python复制converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_model = converter.convert()
with open('yolov8.tflite', 'wb') as f:
f.write(tflite_model)
在实际项目中,我发现KerasCV的YOLOv8实现相比原生版本更易于集成到现有TensorFlow管道中,特别是在需要与其他Keras模型协同工作时优势明显。一个实用的技巧是在训练初期关闭部分数据增强(如Mosaic),待损失稳定后再逐步启用,这能有效提升训练稳定性。