Question

在我的程序中，我有一个std::vector<std::array<float, n_channels>> vecvec，其中n_channels是在编译时已知的常量整数。在程序中，vecvec随时间增长。

我现在想解除约束n_channels必须在编译时知道，因此我将定义更改为std::vector<std::vector<float>> vecvec。 n_channels仍然是一个固定值，在构造vecvec之前就已知道（vecvec的所有元素都具有相同的长度）。

但是，现在我的程序突然慢了2.5倍。

我认为这是因为vecvec的内存突然碎片化了，因为它并不“知道” vecvec的每个元素都具有相同的大小。

有没有办法我也可以吃蛋糕？

Answer 1

您也想吃蛋糕吗？今天实现自己的可调整行大小的二维数组类！！！

您可以编写自己的2D数组类。通过使行在内存中连续，您可以获得使用std::vector<std::array<...>>的所有好处，但没有固定的编译时大小！为了简化实现，可以将其包装std::vector。

为了实现全部功能，我们还应该创建两个“ helper”类。其中一个代表数组中的一行，另一个代表该行的迭代器。当我们遍历2D数组时，我们将遍历数组的行。

行类

这很简单。它只包含一个开始和结束指针。数组是连续存储的，因此我们实际上并不存储Row，但是拥有它们仍然很方便，因此我们有一个要迭代的类型。

由于Row类仅表示矩阵中一行的视图，因此 Row类不应分配或删除任何内存。此外，我将所有Row类常量的成员函数，以便可以对直接从Row返回的RowIterator进行操作。

template<class T>
struct Row {
    T* _start;
    size_t _size;
    // These are const because if we need the elements to be const
    // We just make T const
    T* begin() const noexcept { return _start; }
    T* end() const noexcept { return _start + _size; }
    size_t size() const noexcept { return _size; }
    T& operator[](size_t index) const noexcept {
        return _start[index]; 
    }
    // Implicitly convertible to Row<T const>
    operator Row<T const>() const noexcept {
        return {_start, _size}; 
    }
};

RowIterator类

这只是实现了随机访问迭代器的基本功能。您可以向前，向后移动，向其中索引，从中添加或减去整数，等等。例如，如果我减去5，它将向后移动5行。

template<class T>
struct RowIterator {
    using value_type = Row<T>; 
    using element_type = Row<T>; 
    using reference_type = Row<T>; 
    using const_reference_type = Row<T>;
    // Add other iterator traits as needed  


    Row<T> current; 
    void operator++() noexcept {
        current._start += current._size; 
    }
    void operator--() noexcept {
        current._start -= current._size; 
    }
    RowIterator<T> operator+(intptr_t rows) const noexcept {
        return { Row<T>{current._start + rows * current._size, current._size } }; 
    }
    RowIterator<T> operator-(intptr_t rows) const noexcept {
        return { Row<T>{current._start - rows * current._size, current._size } }; 
    }
    RowIterator<T>& operator+=(intptr_t rows) noexcept {
        current._start += rows * current._size; 
        return *this; 
    }
    RowIterator<T>& operator-=(intptr_t rows) noexcept {
        current._start -= rows * current._size; 
        return *this; 
    }
    Row<T> operator*() const noexcept {
        return current; 
    }
    bool operator==(RowIterator<T> other) const noexcept {
        return current._start == other.current._start && current._size == other.current._size; 
    }
    bool operator!=(RowIterator<T> other) const noexcept {
        return current._start != other.current._start || current._size != other.current._size; 
    }
    Row<T> operator[](intptr_t index) {
        return (*this + index).current; 
    }
};

vector2D类

2D向量类将其元素连续存储在向量中，但是要访问它们或对其进行迭代，它将返回Row和RowIterator。因为Row只是两个值（一个指针和一个大小），所以这样做确实很便宜，并且编译器应该能够轻松地对其进行优化。

请注意，为了保持const正确性，我使用了Row<T const>，它创建了一个带有常量元素的Row。（这大大简化了Row的实现）。

template<class T>
class vector2D : private std::vector<T> {
    size_t rows; 
    size_t columns; 
    using std::vector<T>::data; 

   public:
    size_t size() const noexcept {
        return rows; 
    }
    // Gets a particular row
    Row<T> operator[](size_t index) noexcept {
        return { data() + columns * index, columns }; 
    }
    // Get a particular row when const
    Row<T const> operator[](size_t index) const noexcept {
        return { data() + columns * index, columns }; 
    }
    RowIterator<T> begin() noexcept {
        return { Row<T>{ data() , columns } }; 
    }
    RowIterator<T> end() noexcept { 
        return { Row<T>{ data() + columns * rows, columns } }; 
    }
    RowIterator<T const> begin() const noexcept {
        return { Row<T const>{ data() , columns } }; 
    }
    RowIterator<T const> end() const noexcept { 
        return { Row<T const>{ data() + columns * rows, columns } }; 
    }

    template<size_t N>
    void push_back(std::array<T, N> const& arr) {
        if(arr.size() == columns) {
            insert(end(), arr.begin(), arr.end()); 
            rows++; 
        }
        else
            throw std::invalid_argument("Bad number of columns"); 
    }

    void push_back(Row<T> arr) {
        if(arr.size() == columns) {
            insert(end(), arr.begin(), arr.end()); 
            rows++; 
        }
        else
            throw std::invalid_argument("Bad number of columns"); 
    }
    void push_back(Row<T const> arr) {
        if(arr.size() == columns) {
            insert(end(), arr.begin(), arr.end()); 
            rows++; 
        }
        else
            throw std::invalid_argument("Bad number of columns"); 
    }
    void push_back(std::initializer_list<T> arr) {
        if(arr.size() == columns) {
            insert(end(), arr.begin(), arr.end()); 
            rows++; 
        }
        else
            throw std::invalid_argument("Bad number of columns"); 
    }
    vector2D(size_t rows, size_t columns)
        : std::vector<T>(rows * columns)
        , rows(rows)
        , columns(columns) {}

};

基准测试结果

Run the benchmark here

有了基准测试结果，vector2D的速度与使用数组向量的速度一样！！

测试

测试分为两个部分：

用值填充2D数组
所有值求和

为了使事情尽可能通用，这些是我使用的功能。它们可以与std::vector<std::vector<...>>，std::vector<std::array<...>>或我们自己的vector2D一起使用！

template<class List>
auto calculateSum2D(List const& list) {
  using elem_t = std::decay_t<decltype(list[0][0])>;
  elem_t initial = 0;

  for(auto const& row : list) {
    for(auto& elem : row) {
      initial += elem;
    }
  }
  return initial;
}

template<class List>
void fill(List& list, int rows, int cols) {
  for(int i = 0; i < rows; i++) {
    for(int j = 0; j < cols; j++) {
      list[i][j] = i * j; 
    }
  }
}

结果

我们使用Quickbench来获得结果，vector2D比使用向量载体要快 4.5 倍！

这些结果是使用相应的功能获得的，是使用快速工作台编写的！

// Benchmark using a vector of vectors
static void sumVector(benchmark::State& state) {
  // Code inside this loop is measured repeatedly
  for (auto _ : state) {
    std::vector<std::vector<double>> vect(rows, std::vector<double>(cols));
    fill(vect, rows, cols); 

    auto sum = calculateSum2D(vect); 
    benchmark::DoNotOptimize(sum);
  }
}
// Register the function as a benchmark
BENCHMARK(sumVector);

// Benchmark using a vector of arrays
static void sumArray(benchmark::State& state) {
  // Code inside this loop is measured repeatedly
  for (auto _ : state) {
    std::vector<std::array<double, cols>> vect(rows, std::array<double, cols>());
    fill(vect, rows, cols); 

    auto sum = calculateSum2D(vect); 
    benchmark::DoNotOptimize(sum);
  }
}
// Register the function as a benchmark
BENCHMARK(sumArray);

// Benchmark using vector2D implementation
static void sumvector2D(benchmark::State& state) {
  // Code inside this loop is measured repeatedly
  for (auto _ : state) {
    vector2D<double> vect(rows, cols);
    fill(vect, rows, cols); 

    auto sum = calculateSum2D(vect); 
    benchmark::DoNotOptimize(sum);
  }
}
// Register the function as a benchmark
BENCHMARK(sumvector2D);

基准v2：无重复分配

View benchmark 2 here

事实证明，在初始基准测试中，大部分成本来自重复分配（在所有情况下，每次基准测试迭代都会重新分配对象）。为了解决这个问题，我将声明移出了循环，因此声明只会出现一次。我还调整了行和列的数量，以便有更多的行和更少的列，以便获得一种更实际的方案，其中整个内容都不适合缓存。

再一次vector2D和vector<array>的表现几乎相同，但是这次vector<vector>的表现要好得多，而且差距并不那么大。

速度差异的原因是这一次，唯一的差异是高速缓存局部性差的结果，因为每个对象只分配了一次。

摘要

根据基准测试结果，vector2D应该使您的性能恢复到最初的水平。因为您的代码可能包含分配和用法的混合，所以您得到的结果介于两个基准之间（向量的向量慢2.5倍）。因为vector2D是连续的，并且避免了困扰向量向量方法的重复堆分配，所以它应该和数组向量一样快。

固定大小向量的C ++有效增长向量

1 个答案:

您也想吃蛋糕吗？今天实现自己的可调整行大小的二维数组类！！！

行类

RowIterator类

vector2D类

基准测试结果

测试

结果

基准v2：无重复分配

摘要