Rust张量库视图操作实现与优化指南

Niujiubaba

1. 从零构建Rust张量库：视图操作详解

在深度学习框架中，视图操作(view operations)是处理张量数据的基础工具。它们允许我们以不同的方式解释相同的内存数据，而无需实际复制或移动数据。本文将深入探讨如何在Rust中实现这些核心操作。

1.1 视图操作的核心概念

视图操作的关键在于理解三个核心属性：

形状(shape)：描述张量在每个维度上的大小
步幅(strides)：描述在内存中沿每个维度移动时的跳跃距离
偏移(offset)：表示子张量在原始存储中的起始位置

当我们在PyTorch或NumPy中执行transpose()或reshape()等操作时，通常就是在使用视图。原始数据保持不变，但索引逻辑发生了变化。

提示：视图操作之所以高效，是因为它们避免了数据复制，仅通过修改元数据来改变数据解释方式。

1.2 步幅的深入解析

步幅是理解视图操作的核心。考虑一个形状为[10,9,5,13]的四维张量：

rust复制let shape = vec![10, 9, 5, 13];
let strides = compute_strides(&shape); // [585, 65, 13, 1]

计算步幅的算法如下：

rust复制fn compute_strides(shape: &[usize]) -> Vec<usize> {
    let mut strides = vec![1; shape.len()];
    for i in (0..shape.len()-1).rev() {
        strides[i] = strides[i+1] * shape[i+1];
    }
    strides
}

这个步幅数组告诉我们：

在第一维(i)移动时，每次跳跃585个内存位置
在第二维(j)移动时，每次跳跃65个位置
在第三维(k)移动时，每次跳跃13个位置
在第四维(l)移动时，每次跳跃1个位置

1.3 张量形状的结构体设计

我们需要一个结构体来封装形状和步幅信息：

rust复制#[derive(Debug, Clone, PartialEq)]
struct TensorShape {
    shape: Vec<usize>,
    strides: Vec<usize>,
    linear_offset: usize, // 新增：支持子张量
}

初始化函数需要同时计算步幅：

rust复制impl TensorShape {
    fn new(shape: Vec<usize>) -> Self {
        let strides = compute_strides(&shape);
        Self { shape, strides, linear_offset: 0 }
    }
}

2. 实现基础视图操作

2.1 转置(Permute)操作

转置是最常见的视图操作之一，特别是在处理图像数据时。图像数据通常有两种排列方式：

CHW格式：通道(channel)、高度(height)、宽度(width)
HWC格式：高度、宽度、通道

实现转置的关键是同时调整形状和步幅：

rust复制impl TensorShape {
    fn permute(&self, dims: &[usize]) -> Self {
        let shape = dims.iter().map(|&i| self.shape[i]).collect();
        let strides = dims.iter().map(|&i| self.strides[i]).collect();
        Self { shape, strides, linear_offset: self.linear_offset }
    }
}

对应的张量实现：

rust复制impl<T: Clone> Tensor<T> {
    fn permute(&self, dims: &[usize]) -> Self {
        Tensor {
            shape: self.shape.permute(dims),
            storage: self.storage.clone(),
        }
    }
}

注意：这里的clone()只是复制存储的引用，不是实际数据。真正的数据共享需要引用计数或智能指针，这将在后续文章中实现。

2.2 合并(Merge)操作

合并操作将多个相邻维度合并为一个维度，常见于将多维数据展平为一维：

rust复制impl TensorShape {
    fn merge(&self, range: RangeInclusive<usize>) -> Self {
        let (start, end) = (*range.start(), *range.end());
        assert!(start <= end && end < self.shape.len());
        
        let merged_size = self.shape[range.clone()].iter().product();
        let merged_stride = self.strides[end];
        
        let mut new_shape = self.shape[..start].to_vec();
        new_shape.push(merged_size);
        new_shape.extend_from_slice(&self.shape[end+1..]);
        
        let mut new_strides = self.strides[..start].to_vec();
        new_strides.push(merged_stride);
        new_strides.extend_from_slice(&self.strides[end+1..]);
        
        Self { shape: new_shape, strides: new_strides, linear_offset: self.linear_offset }
    }
}

使用示例：

rust复制let shape = TensorShape::new(vec![2, 3, 4, 5]);
let merged = shape.merge(1..=2); // 结果形状为[2, 12, 5]

2.3 拆分(Split)操作

拆分是合并的逆操作，将一个维度拆分为多个维度：

rust复制impl TensorShape {
    fn split(&self, dim: usize, sizes: &[usize]) -> Self {
        // 计算推断尺寸
        let total_size = self.shape[dim];
        let product: usize = sizes.iter().filter(|&&s| s != 0).product();
        let inferred = sizes.iter().position(|&s| s == 0)
            .map(|i| total_size / product);
        
        // 构建最终尺寸
        let mut final_sizes = sizes.to_vec();
        if let Some(i) = inferred {
            final_sizes[i] = total_size / product;
        }
        
        // 计算新步幅
        let mut new_strides = vec![];
        let mut current_stride = self.strides[dim];
        for &size in final_sizes.iter().rev() {
            new_strides.push(current_stride);
            current_stride *= size;
        }
        new_strides.reverse();
        
        // 构建新形状和步幅
        let mut new_shape = self.shape[..dim].to_vec();
        new_shape.extend(final_sizes);
        new_shape.extend_from_slice(&self.shape[dim+1..]);
        
        let mut new_strides_full = self.strides[..dim].to_vec();
        new_strides_full.extend(new_strides);
        new_strides_full.extend_from_slice(&self.strides[dim+1..]);
        
        Self { shape: new_shape, strides: new_strides_full, linear_offset: self.linear_offset }
    }
}

使用示例：

rust复制let shape = TensorShape::new(vec![3, 20]);
let split = shape.split(1, &[4, 5]); // 结果形状为[3, 4, 5]

3. 重塑(Reshape)操作的实现策略

重塑操作可以分解为四种基本情况：

3.1 纯合并操作

当输出形状的前缀积包含在输入形状的前缀积中时：

rust复制// 示例：[2,3,4,5] -> [6,20]
// 步骤：
// 1. 合并0-1维：[2,3,4,5] -> [6,4,5]
// 2. 合并1-2维：[6,4,5] -> [6,20]

3.2 纯拆分操作

当输入形状的前缀积包含在输出形状的前缀积中时：

rust复制// 示例：[6,20] -> [2,3,4,5]
// 步骤：
// 1. 拆分第0维：[6,20] -> [2,3,20]
// 2. 拆分第2维：[2,3,20] -> [2,3,4,5]

3.3 先拆分后合并

对于更复杂的reshape，如[6,8] -> [2,3,4,4]，我们需要：

将输入和输出分解为质因数
寻找匹配的排列组合
执行相应的拆分和合并操作

3.4 通用解决方案

当上述方法都失败时，我们可以：

将所有维度合并为一个
然后拆分为目标形状

虽然这种方法总能奏效，但会丢失原始维度的语义信息。

4. 切片(Slice)和跳跃(Skip)操作

4.1 基本切片实现

切片操作需要处理起始偏移和形状变化：

rust复制impl TensorShape {
    fn slice(&self, dim: usize, range: RangeInclusive<usize>) -> Self {
        let (start, end) = (*range.start(), *range.end());
        assert!(start <= end && end < self.shape[dim]);
        
        let mut new_shape = self.shape.clone();
        new_shape[dim] = end - start + 1;
        
        let additional_offset = start * self.strides[dim];
        
        Self {
            shape: new_shape,
            strides: self.strides.clone(),
            linear_offset: self.linear_offset + additional_offset,
        }
    }
}

4.2 跳跃(Skip)操作

跳跃操作通过调整步幅来实现元素间隔选取：

rust复制impl TensorShape {
    fn skip(&self, dim: usize, step: usize) -> Self {
        assert!(step > 0);
        let mut new_strides = self.strides.clone();
        new_strides[dim] *= step;
        
        let mut new_shape = self.shape.clone();
        new_shape[dim] = (new_shape[dim] + step - 1) / step; // 向上取整
        
        Self { shape: new_shape, strides: new_strides, linear_offset: self.linear_offset }
    }
}

使用示例：

rust复制let shape = TensorShape::new(vec![4, 4]);
let skipped = shape.skip(0, 2).skip(1, 2); // 每隔一个元素选取

5. 实际应用与性能考量

5.1 与Candle框架的对比

Candle是Rust生态中知名的张量计算库，其实现与我们的设计有许多相似之处：

rust复制// Candle中的permute实现
pub fn permute(&self, dims: &[usize]) -> Result<Self> {
    let mut new_stride = vec![0; dims.len()];
    let mut new_dims = vec![0; dims.len()];
    for (i, &dim) in dims.iter().enumerate() {
        new_stride[i] = self.stride()[dim];
        new_dims[i] = self.dims()[dim];
    }
    Ok(Self { /* ... */ })
}