PyTorch实战：手动实现AlexNet、ResNet与VGG卷积神经网络

马迪姐

1. 从零实现经典卷积神经网络：AlexNet、ResNet与VGG的PyTorch实战

作为计算机视觉领域的基石，卷积神经网络(CNN)在过去十年中彻底改变了图像处理的方式。今天我将带大家手动实现三大经典CNN模型：AlexNet、ResNet和VGG。不同于直接调用现成模型，我们将从底层开始构建，深入理解每个卷积层、池化层的设计原理和实现细节。

1.1 为什么需要手动实现经典模型？

在深度学习框架高度集成的今天，我们很容易陷入"调包侠"的陷阱——只会调用现成API而不理解底层原理。手动实现经典CNN模型的价值在于：

深入理解网络架构：通过逐层构建，掌握卷积核尺寸、步长、填充等关键参数的设计逻辑
调试能力提升：当模型出现维度不匹配等问题时，能够快速定位问题所在层
定制化修改：基于经典架构进行创新改进，满足特定任务需求
面试加分项：大厂算法岗面试常要求手写经典网络结构

下面我们以PyTorch为例，从最简单的AlexNet开始，逐步实现更复杂的ResNet和VGG。

2. AlexNet实现与解析

2.1 AlexNet网络结构概览

AlexNet是2012年ImageNet竞赛冠军，开启了深度学习在计算机视觉领域的新纪元。其核心结构包含：

5个卷积层（交替使用最大池化）
3个全连接层
ReLU激活函数
Dropout正则化

输入尺寸为224×224的RGB图像，输出1000类的分类结果。

2.2 完整实现代码

python复制import torch
import torch.nn as nn
import torchvision.models as models

# 官方AlexNet参考
alexnet = models.alexnet()
print(alexnet)

class MyAlexNet(nn.Module):
    def __init__(self):
        super(MyAlexNet, self).__init__()
        self.relu = nn.ReLU()
        self.drop = nn.Dropout(0.5)
        
        # 卷积层定义
        self.conv1 = nn.Conv2d(3, 64, 11, 4, padding=2)
        self.pool1 = nn.MaxPool2d(3, stride=2)
        self.conv2 = nn.Conv2d(64, 192, 5, 1, 2)
        self.pool2 = nn.MaxPool2d(3, stride=2)
        self.conv3 = nn.Conv2d(192, 384, 3, 1, 1)
        self.conv4 = nn.Conv2d(384, 256, 3, 1, 1)
        self.conv5 = nn.Conv2d(256, 256, 3, 1, 1)
        self.pool3 = nn.MaxPool2d(3, stride=2)
        self.adapool = nn.AdaptiveAvgPool2d(6)
        
        # 全连接层
        self.fc1 = nn.Linear(9216, 4096)
        self.fc2 = nn.Linear(4096, 4096)
        self.fc3 = nn.Linear(4096, 1000)

    def forward(self, x):
        x = self.conv1(x)
        x = self.relu(x)
        x = self.pool1(x)
        
        x = self.conv2(x)
        x = self.relu(x)
        x = self.pool2(x)
        
        x = self.conv3(x)
        x = self.relu(x)
        print("Conv3输出尺寸:", x.size())
        
        x = self.conv4(x)
        x = self.relu(x)
        print("Conv4输出尺寸:", x.size())
        
        x = self.conv5(x)
        x = self.relu(x)
        x = self.pool3(x)
        print("Pool3输出尺寸:", x.size())
        
        x = self.adapool(x)
        x = x.view(x.size()[0], -1)
        
        x = self.fc1(x)
        x = self.relu(x)
        x = self.drop(x)
        
        x = self.fc2(x)
        x = self.relu(x)
        x = self.drop(x)
        
        x = self.fc3(x)
        return x

2.3 关键实现细节解析

卷积层参数设计：
- 第一层使用11×11大卷积核，后续逐渐减小到5×5和3×3
- 步长设计考虑了特征图尺寸的逐步缩小
- 填充(padding)保证了尺寸变化的可控性
维度变化跟踪：
- 在forward方法中插入print语句，实时监控特征图尺寸变化
- 确保从输入到输出的维度转换符合预期
参数量统计：

python复制def get_parameter_number(model):
    total_num = sum(p.numel() for p in model.parameters())
    trainable_num = sum(p.numel() for p in model.parameters() if p.requires_grad)
    return {'Total': total_num, 'Trainable': trainable_num}

model = MyAlexNet()
print(get_parameter_number(model))

输出结果应与官方AlexNet一致（约6100万参数），验证了实现的正确性。

2.4 精简版AlexNet实现

对于教学演示或快速验证，可以去掉Dropout和非必要的ReLU，聚焦核心结构：

python复制class SimpleAlexNet(nn.Module):
    def __init__(self):
        super(SimpleAlexNet, self).__init__()
        self.conv1 = nn.Conv2d(3, 64, 11, 4, padding=2)
        self.pool1 = nn.MaxPool2d(3, 2)
        self.conv2 = nn.Conv2d(64, 192, 5, 1, padding=2)
        self.pool2 = nn.MaxPool2d(3, 2)
        self.conv3 = nn.Conv2d(192, 384, 3, 1, 1)
        self.conv4 = nn.Conv2d(384, 256, 3, 1, 1)
        self.conv5 = nn.Conv2d(256, 256, 3, 1, 1)
        self.pool3 = nn.MaxPool2d(3, 2)
        self.adapool = nn.AdaptiveAvgPool2d(6)
        self.fc1 = nn.Linear(9216, 4096)
        self.fc2 = nn.Linear(4096, 4096)
        self.fc3 = nn.Linear(4096, 1000)

    def forward(self, x):
        x = self.pool1(self.conv1(x))
        x = self.pool2(self.conv2(x))
        x = self.conv3(x)
        x = self.conv4(x)
        x = self.pool3(self.conv5(x))
        x = self.adapool(x)
        x = x.view(x.size()[0], -1)
        x = self.fc3(self.fc2(self.fc1(x)))
        return x

3. ResNet实现与残差连接解析

3.1 ResNet的核心创新

ResNet通过引入残差连接(residual connection)解决了深层网络的梯度消失问题，其主要特点包括：

残差块(Residual Block)结构
跳跃连接(skip connection)
批量归一化(BatchNorm)
更深的网络结构(可达152层)

3.2 残差块实现

python复制class ResidualBlock(nn.Module):
    def __init__(self, in_channels, out_channels, stride=1):
        super(ResidualBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, 
                              stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3,
                              stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channels)
        self.relu = nn.ReLU(inplace=True)
        
        # 捷径分支：当输入输出维度不匹配时使用1×1卷积调整
        self.shortcut = nn.Sequential()
        if stride != 1 or in_channels != out_channels:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_channels, out_channels, kernel_size=1,
                         stride=stride, bias=False),
                nn.BatchNorm2d(out_channels)
            )

    def forward(self, x):
        residual = x
        
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)
        
        out = self.conv2(out)
        out = self.bn2(out)
        
        out += self.shortcut(residual)  # 残差连接
        out = self.relu(out)
        
        return out

3.3 完整ResNet18实现

python复制class MyResNet18(nn.Module):
    def __init__(self, num_classes=1000):
        super(MyResNet18, self).__init__()
        self.in_channels = 64
        
        # 初始卷积层
        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        
        # 残差层
        self.layer1 = self._make_layer(64, 64, 2, stride=1)
        self.layer2 = self._make_layer(64, 128, 2, stride=2)
        self.layer3 = self._make_layer(128, 256, 2, stride=2)
        self.layer4 = self._make_layer(256, 512, 2, stride=2)
        
        # 分类头
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(512, num_classes)
    
    def _make_layer(self, in_channels, out_channels, blocks, stride):
        layers = []
        layers.append(ResidualBlock(in_channels, out_channels, stride))
        
        for _ in range(1, blocks):
            layers.append(ResidualBlock(out_channels, out_channels))
            
        return nn.Sequential(*layers)
    
    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)
        
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
        
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.fc(x)
        
        return x

3.4 维度变化验证

通过打印各层输出尺寸，验证网络设计的正确性：

python复制model = MyResNet18()
x = torch.randn(1, 3, 224, 224)
out = model(x)
print(out.shape)  # 应输出torch.Size([1, 1000])

4. VGG网络实现与解析

4.1 VGG网络特点

VGG的核心设计理念是：

使用更小的3×3卷积核堆叠代替大卷积核
通过最大池化逐步减小特征图尺寸
通道数随着网络深度逐步增加
全连接层用于最终分类

4.2 VGG基础块实现

python复制class VGGBlock(nn.Module):
    def __init__(self, in_channels, out_channels, num_convs):
        super(VGGBlock, self).__init__()
        layers = []
        
        for _ in range(num_convs):
            layers.append(nn.Conv2d(in_channels, out_channels, 
                                  kernel_size=3, padding=1))
            layers.append(nn.ReLU(inplace=True))
            in_channels = out_channels
            
        layers.append(nn.MaxPool2d(kernel_size=2, stride=2))
        
        self.block = nn.Sequential(*layers)
    
    def forward(self, x):
        return self.block(x)

4.3 完整VGG13实现

python复制class MyVGG13(nn.Module):
    def __init__(self, num_classes=1000):
        super(MyVGG13, self).__init__()
        
        # 特征提取部分
        self.features = nn.Sequential(
            VGGBlock(3, 64, 2),
            VGGBlock(64, 128, 2),
            VGGBlock(128, 256, 2),
            VGGBlock(256, 512, 2),
            VGGBlock(512, 512, 2)
        )
        
        # 分类头
        self.avgpool = nn.AdaptiveAvgPool2d((7, 7))
        self.classifier = nn.Sequential(
            nn.Linear(512 * 7 * 7, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(),
            nn.Linear(4096, num_classes)
        )
    
    def forward(self, x):
        x = self.features(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x

4.4 参数量对比验证

python复制vgg = MyVGG13()
print("自定义VGG13参数量:", get_parameter_number(vgg))

official_vgg = models.vgg13()
print("官方VGG13参数量:", get_parameter_number(official_vgg))

5. 激活函数对比与选择

5.1 Sigmoid与Softmax实现

python复制# Sigmoid示例
sigmoid = nn.Sigmoid()
input = torch.randn(4)
output = sigmoid(input)
print("Sigmoid输出:", output)

# Softmax示例
softmax = nn.Softmax(dim=1)
input = torch.randn(4, 5)
output = softmax(input)
print("Softmax输出:", output)

5.2 不同激活函数对比

特性	ReLU	Sigmoid	Softmax
输出范围	[0, +∞)	(0, 1)	(0, 1)且和为1
适用场景	隐藏层	二分类输出层	多分类输出层
梯度特性	正区间无衰减	最大梯度0.25	依赖输入分布
计算复杂度	O(1)	O(1)	O(n)
死亡神经元问题	可能存在	无	无

5.3 激活函数选择建议

隐藏层首选ReLU：
- 计算简单，梯度稳定
- 有效缓解梯度消失问题
- 可使用LeakyReLU或PReLU解决神经元死亡问题
输出层选择：
- 二分类任务：Sigmoid
- 多分类任务：Softmax
- 回归任务：线性激活(无激活函数)
特殊情况：
- 当需要输出有正有负时，可使用Tanh
- 自编码器等特殊结构可能需要Sigmoid作为输出

6. 模型测试与验证技巧

6.1 前向传播验证

对每个自定义模型，都应进行前向传播测试：

python复制def test_forward_pass(model, input_shape=(1, 3, 224, 224)):
    model.eval()
    with torch.no_grad():
        dummy_input = torch.randn(input_shape)
        output = model(dummy_input)
        print(f"输入形状: {input_shape}")
        print(f"输出形状: {output.shape}")
        return output.shape

# 测试AlexNet
test_forward_pass(MyAlexNet())

# 测试ResNet18
test_forward_pass(MyResNet18())

# 测试VGG13
test_forward_pass(MyVGG13())

6.2 参数量统计技巧

除了整体参数量，还应关注各层参数分布：

python复制def print_layer_params(model):
    for name, param in model.named_parameters():
        if param.requires_grad:
            print(f"{name}: {param.numel()}参数")

print_layer_params(MyResNet18())

6.3 常见问题排查

维度不匹配错误：
- 在forward方法中添加print语句跟踪各层输出形状
- 检查卷积/池化的padding和stride设置
- 确保全连接层输入维度与前一层的展平后维度匹配
训练不收敛：
- 检查初始化方法
- 验证梯度是否正常传播(特别是残差连接)
- 调整学习率和优化器参数
过拟合：
- 增加Dropout层
- 添加L2正则化
- 使用数据增强

7. 模型优化与调参经验

7.1 学习率设置技巧

初始学习率通常设为0.01-0.001
使用学习率调度器(如StepLR、ReduceLROnPlateau)
小批量数据可使用稍大学习率

python复制optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=30, gamma=0.1)

7.2 批归一化使用建议

卷积层后立即添加BN层
训练时使用model.train()，测试时使用model.eval()
不必在BN层后使用Dropout

python复制self.conv = nn.Conv2d(in_c, out_c, 3)
self.bn = nn.BatchNorm2d(out_c)
self.relu = nn.ReLU()

7.3 数据增强策略

python复制from torchvision import transforms

train_transform = transforms.Compose([
    transforms.RandomResizedCrop(224),
    transforms.RandomHorizontalFlip(),
    transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], 
                         std=[0.229, 0.224, 0.225])
])

8. 模型部署与性能优化

8.1 模型量化

python复制# 动态量化
quantized_model = torch.quantization.quantize_dynamic(
    model, {nn.Linear}, dtype=torch.qint8
)

# 静态量化
model.qconfig = torch.quantization.get_default_qconfig('fbgemm')
torch.quantization.prepare(model, inplace=True)
# 校准代码...
torch.quantization.convert(model, inplace=True)

8.2 ONNX导出

python复制dummy_input = torch.randn(1, 3, 224, 224)
torch.onnx.export(model, dummy_input, "model.onnx", 
                 input_names=["input"], output_names=["output"],
                 dynamic_axes={"input": {0: "batch_size"}, 
                              "output": {0: "batch_size"}})

8.3 性能优化技巧

混合精度训练：

python复制scaler = torch.cuda.amp.GradScaler()
with torch.cuda.amp.autocast():
    outputs = model(inputs)
    loss = criterion(outputs, labels)
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()

梯度累积：

python复制for i, (inputs, labels) in enumerate(train_loader):
    outputs = model(inputs)
    loss = criterion(outputs, labels)
    loss = loss / accumulation_steps
    loss.backward()
    
    if (i+1) % accumulation_steps == 0:
        optimizer.step()
        optimizer.zero_grad()

模型剪枝：

python复制parameters_to_prune = (
    (model.conv1, 'weight'),
    (model.fc3, 'weight'),
)

prune.global_unstructured(
    parameters_to_prune,
    pruning_method=prune.L1Unstructured,
    amount=0.2,
)

9. 实际应用案例

9.1 迁移学习示例

python复制# 加载预训练模型
model = models.resnet18(pretrained=True)

# 替换最后一层
num_ftrs = model.fc.in_features
model.fc = nn.Linear(num_ftrs, 10)  # 假设我们的任务有10类

# 只训练最后一层
for param in model.parameters():
    param.requires_grad = False
for param in model.fc.parameters():
    param.requires_grad = True

9.2 自定义数据集训练

python复制# 数据加载
dataset = datasets.ImageFolder(root='data/train', transform=train_transform)
dataloader = torch.utils.data.DataLoader(dataset, batch_size=32, shuffle=True)

# 训练循环
for epoch in range(num_epochs):
    model.train()
    for inputs, labels in dataloader:
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

9.3 模型集成技巧

python复制# 多个模型预测结果平均
def ensemble_predict(models, input):
    with torch.no_grad():
        outputs = [model(input) for model in models]
        avg_output = torch.mean(torch.stack(outputs), dim=0)
    return avg_output

10. 进阶话题与扩展阅读

10.1 模型可视化工具

Netron：可视化模型结构
TensorBoard：训练过程可视化
Torchviz：计算图可视化

python复制from torchviz import make_dot

x = torch.randn(1, 3, 224, 224)
y = model(x)
make_dot(y, params=dict(model.named_parameters())).render("model", format="png")

10.2 模型压缩技术

知识蒸馏：使用大模型指导小模型训练
量化训练：直接训练低精度模型
神经架构搜索：自动寻找高效模型结构

10.3 最新研究趋势

Vision Transformer：将Transformer应用于视觉任务
EfficientNet：复合缩放方法优化模型效率
Self-Supervised Learning：无监督预训练方法

11. 总结与个人实践建议

通过手动实现AlexNet、ResNet和VGG这三个经典CNN模型，我们深入理解了卷积神经网络的设计原理和实现细节。在实际项目中，我有以下几点建议：

从简单开始：先实现简化版模型，确保基础结构正确，再添加复杂组件
维度检查：在forward方法中定期打印特征图尺寸，及早发现维度不匹配问题
参数量验证：将自定义模型的参数量与官方实现对比，确保结构正确
模块化设计：将重复使用的结构(如残差块、VGG块)封装为独立模块
测试驱动：先写测试代码验证各组件功能，再组装完整模型

手动实现经典模型是深入理解深度学习的最佳途径之一。虽然现代框架提供了现成的实现，但只有亲自动手构建，才能真正掌握模型的设计精髓，为后续的模型改进和创新打下坚实基础。

已经到底了哦