Question

我有[32678 x 10]矩阵（w2c），我想将24700行复制到另一个矩阵（out）。我有一个要在向量（index）中复制的行的索引。为了在matlab中做到这一点，我做了：

out = w2c(index_im,:);

大约需要0.002622秒。

在OpenCV中：

Mat out(index.cols, w2c.cols, w2c.type());
for (int i = 0; i < index.cols; ++i) {
    w2c.row(index.at<int>(i) - 1).copyTo(out.row(i));
}

大约需要0.015121秒。

正如您所看到的，Matlab的速度提高了6倍。如何使OpenCV代码高效？

我使用的是cmake-2.9，g ++ - 4.8，opencv-2.4.9，ubuntu 14.04

更新：

我在发布模式下运行我的代码，这是结果（它仍然比Matlab慢得多）

RELEASE     DEBUG       MATLAB
0.008183    0.010070    0.001604    
0.009630    0.010050    0.001679
0.009120    0.009890    0.001566
0.007534    0.009567    0.001635
0.007886    0.009886    0.001840

Answer 1

基于our discussion in chat，您未在启用优化的情况下进行编译。如果这样做，您将看到显着的性能提升。此外，请确保您链接到OpenCV的发布版本。

我在没有启用优化和启用优化的情况下测量了以下示例的执行时间：

<强>的main.cpp

#include <algorithm>
#include <iostream>
#include <iterator>
#include <numeric>
#include <random>
#include <vector>
#include <chrono>
#include <opencv2/opencv.hpp>


int main(int argc, char **argv)
{
    const int num_rows = 32678;
    const int num_cols = 10;
    const int index_size = 24700;

    const int num_runs = 1000;
    const int seed = 42;

    std::vector<int> index_vec(num_rows);

    // fill index with sequence
    std::iota (index_vec.begin(), index_vec.end(), 0);

    // randomize sequence
    std::random_device rd;
    std::mt19937 g(rd());
    g.seed(seed);
    std::shuffle(index_vec.begin(), index_vec.end(), g);

    // trunkate index
    index_vec.resize(index_size);

    cv::Mat w2c(num_rows, num_cols, CV_32F);

    // copy
    cv::Mat out(index_size, w2c.cols, w2c.type());

    auto start = std::chrono::high_resolution_clock::now();
    for (int k = 0; k<num_runs; ++k)
    {
        for (int i = 0; i < index_size; ++i)
        {
            w2c.row(index_vec[i]).copyTo(out.row(i));
        }
    }

    auto end = std::chrono::high_resolution_clock::now();

    auto duration = std::chrono::duration_cast<std::chrono::microseconds>(end - start);

    std::cout << duration.count()/num_runs << " microseconds" << std::endl;

    return 0;
}

<强>的CMakeLists.txt

project(copy)
find_package(OpenCV REQUIRED)
add_executable(copy main.cpp)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11")
include_directories(${OpenCV_INCLUDE_DIRS})
target_link_libraries(copy ${OpenCV_LIBS})

编译并在没有优化的情况下运行

cmake . -DCMAKE_BUILD_TYPE=DEBUG
make
./copy
3924 microseconds

编译并运行优化

cmake . -DCMAKE_BUILD_TYPE=RELEASE
make
./copy
2664 microseconds

我在

上运行了这些测试

Intel Core i7-4600U CPU
Ubuntu 14.04（x64）
GCC 4.8.2
OpenCV 3.0.0（发布版本）

Answer 2

所以我尝试了不同的方法来解决这个问题，而我获得比Matlab更好的性能的唯一方法就是使用memcpy并直接自己复制数据。

    Mat out( index.cols, w2c.cols, w2c.type() );
    for ( int i=0;i<index.cols;++i ){
        int ind = index.at<int>(i)-1;
        const float *src = w2c.ptr<float> (ind);
        float *des = out.ptr<float> (i);
        memcpy(des,src,w2c.cols*sizeof(float));
    }

这样整个过程大约需要0.001063，比Matlab快一点。

我也发现以这种方式复制数据：

    Mat out;
    Mat out( index.cols, w2c.cols, w2c.type() );
    for ( int i=0;i<index.cols;++i ){
        int ind = index.at<int>(i)-1;
        out.push_back(w2c.row(ind)); 
    }

比复制它更快：

    Mat out( index.cols, w2c.cols, w2c.type() );
    for ( int i=0;i<index.cols;++i ){
        int ind = index.at<int>(i)-1;
        w2c.row(ind).copyTo(out.row(i));
    }

但我不知道为什么。无论如何，它们都比Matlab慢。

在OpenCV中将一些行从一个矩阵复制到另一个矩阵的最快方法

2 个答案: