在计算机视觉领域,图像分类是最基础也最核心的任务之一。PyTorch作为当前最流行的深度学习框架,其灵活的张量操作和自动微分机制为分类任务提供了强大支持。这次我们将深入探讨如何利用PyTorch中的交叉熵损失函数,分别实现多分类(如MNIST手写数字识别)和二分类(如猫狗分类)任务。
交叉熵损失(Cross-Entropy Loss)是分类任务中最常用的损失函数,它能有效衡量预测概率分布与真实分布之间的差异。在多分类任务中我们使用nn.CrossEntropyLoss,而二分类任务则可以使用nn.BCEWithLogitsLoss。这两种实现方式在PyTorch中有细微但重要的区别,这也是很多初学者容易混淆的地方。
交叉熵源于信息论,用于衡量两个概率分布之间的差异。给定真实分布p和预测分布q,交叉熵定义为:
H(p,q) = -Σ p(x) log q(x)
在分类任务中,p是one-hot编码的真实标签,q是模型输出的概率分布。最小化交叉熵等价于最大化预测正确的对数似然。
PyTorch中的实现做了两个重要优化:
CrossEntropyLoss)BCEWithLogitsLoss内置Sigmoid)多分类任务(如10类MNIST分类):
nn.CrossEntropyLoss二分类任务(如猫狗分类):
nn.BCEWithLogitsLoss以CIFAR-10数据集为例:
python复制import torch
import torchvision
import torch.nn as nn
import torch.optim as optim
# 数据加载
transform = torchvision.transforms.Compose([
torchvision.transforms.ToTensor(),
torchvision.transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5))
])
trainset = torchvision.datasets.CIFAR10(
root='./data', train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(
trainset, batch_size=32, shuffle=True)
# 简单CNN模型
class Net(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16*5*5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10) # 10类输出
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = torch.flatten(x, 1)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x) # 注意:没有Softmax
return x
python复制model = Net()
criterion = nn.CrossEntropyLoss() # 多分类损失
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
for epoch in range(10):
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
inputs, labels = data
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, labels) # 自动处理Softmax
loss.backward()
optimizer.step()
running_loss += loss.item()
if i % 100 == 99:
print(f'[{epoch+1}, {i+1}] loss: {running_loss/100:.3f}')
running_loss = 0.0
重要提示:PyTorch的CrossEntropyLoss已经整合了Softmax,因此模型最后一层不需要也不应该再加Softmax激活,否则会影响数值稳定性。
以猫狗二分类为例,需要注意:
python复制# 自定义数据集示例
class CatDogDataset(torch.utils.data.Dataset):
def __init__(self, img_dir, transform=None):
self.img_dir = img_dir
self.transform = transform
self.img_names = os.listdir(img_dir)
def __len__(self):
return len(self.img_names)
def __getitem__(self, idx):
img_path = os.path.join(self.img_dir, self.img_names[idx])
image = Image.open(img_path).convert('RGB')
# 假设文件名包含'cat'或'dog'
label = 0.0 if 'cat' in self.img_names[idx] else 1.0
if self.transform:
image = self.transform(image)
return image, torch.tensor(label, dtype=torch.float32)
python复制class BinaryClassifier(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16*5*5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 1) # 二分类只需1个输出
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = torch.flatten(x, 1)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x) # 无Sigmoid
return x
model = BinaryClassifier()
criterion = nn.BCEWithLogitsLoss() # 内置Sigmoid
optimizer = optim.Adam(model.parameters(), lr=0.0001)
# 训练循环
for epoch in range(10):
model.train()
for images, labels in trainloader:
optimizer.zero_grad()
outputs = model(images).squeeze() # 从[N,1]变为[N]
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
标签格式错误:
输出层添加Softmax:
类别不平衡问题:
python复制# 解决方案:添加类别权重
weights = torch.tensor([1.0, 2.0, 1.0]) # 假设第2类样本较少
criterion = nn.CrossEntropyLoss(weight=weights)
输出维度处理:
.squeeze()确保正确形状概率阈值选择:
python复制# 预测时添加Sigmoid并设置阈值
with torch.no_grad():
outputs = model(inputs)
probs = torch.sigmoid(outputs)
preds = (probs > 0.5).float() # 默认0.5阈值
数值稳定性技巧:
pos_weight参数处理样本不平衡标签平滑(Label Smoothing):
python复制criterion = nn.CrossEntropyLoss(label_smoothing=0.1)
自定义损失组合:
python复制def custom_loss(outputs, targets):
ce_loss = F.cross_entropy(outputs, targets)
reg_loss = torch.norm(model.fc3.weight, p=2)
return ce_loss + 0.01*reg_loss
混合精度训练:
python复制scaler = torch.cuda.amp.GradScaler()
with torch.cuda.amp.autocast():
outputs = model(inputs)
loss = criterion(outputs, labels)
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()
python复制correct = 0
total = 0
with torch.no_grad():
for data in testloader:
images, labels = data
outputs = model(images)
_, predicted = torch.max(outputs.data, 1) # 取概率最大类别
total += labels.size(0)
correct += (predicted == labels).sum().item()
print(f'Accuracy: {100 * correct / total}%')
python复制TP, FP, TN, FN = 0, 0, 0, 0
with torch.no_grad():
for images, labels in testloader:
outputs = model(images)
preds = (torch.sigmoid(outputs) > 0.5).float()
TP += ((preds == 1) & (labels == 1)).sum().item()
FP += ((preds == 1) & (labels == 0)).sum().item()
TN += ((preds == 0) & (labels == 0)).sum().item()
FN += ((preds == 0) & (labels == 1)).sum().item()
precision = TP / (TP + FP + 1e-8)
recall = TP / (TP + FN + 1e-8)
print(f'Precision: {precision:.4f}, Recall: {recall:.4f}')
python复制# 余弦退火学习率
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(
optimizer, T_max=10, eta_min=1e-6)
for epoch in range(100):
train(...)
scheduler.step()
在实际项目中,交叉熵损失的正确使用往往能决定模型的最终性能。根据我的经验,有几点特别值得注意:
model.eval()和torch.no_grad()class_weight而非过采样