-march=native
时,我的代码片段的时间变为两倍。
// -> sequential
int n = (int) 1e7;
Vector<double, 32> a;
a.init(n);
for (int i = 0; i < n; i++)
a(i) = 1.0;
double r1;
Timer::start();
psum(a.data, n, r1);
Timer::stop();
std::cout << "timing (ms): " << Timer::get_timing() << std::endl;
std::cout << r1 << std::endl;
// <-
// -> threading simple
int n_threads = 2;
Vector<double, 32> b;
b.init(n);
for (int i = 0; i < n; i++)
b(i) = 2.0;
double r2;
Timer::start();
std::thread t1(psum, b.data + n/2, n/2, std::ref(r1));
psum(b.data, n/2, r2);
t1.join();
Timer::stop();
std::cout << "timing (ms): " << Timer::get_timing() << std::endl;
std::cout << r1 + r2 << std::endl;
// <-
具体来说,线程示例从 8 ms 跳到 16 ms 。 16 ms 是顺序代码的时序。
额外信息:
c++ -std=c++11 -O3 -pthread ...
知道这是从哪里来的吗?
当我仅激活-mtune=skylake
时,时间跳转到 32 ms 。