我是Eigen Tensors的新手,所以我可能做的事情非常糟糕。我有一个代码来计算两个浮点矩阵之间差异的Z分数。我发现代码的运行速度比Python和numpy中的相同代码慢500倍。 我做错了什么?
C ++中的代码
int scale = atoi(argv[1]);
Eigen::array<int, 2> bbcast({scale, 1});
long startTime = get_nanos();
Eigen::Tensor<float, 2> a(2, 5);
a.setRandom();
Eigen::Tensor<float, 2> b(2, 5);
b.setRandom();
Eigen::Tensor<float, 2> scaled_a = a.broadcast(bbcast);
Eigen::Tensor<float, 2> scaled_b = b.broadcast(bbcast);
Eigen::array<int, 1> dims({0 /* dimension to reduce */});
Eigen::array<int, 2> good_dims{{1,(int)scaled_a.dimension(1)}};
auto means = (scaled_a - scaled_b).mean(dims).reshape(good_dims);
std::cout << means << std::endl;
printf("Calculated means, took %f seconds\n",(float)(get_nanos() - startTime) / 1000000000L);
Eigen::array<int, 2> bcast({(int)scaled_a.dimension(0), 1});
auto submean = (scaled_a - scaled_b) - means.broadcast(bcast);
auto stds = submean.mean(dims).reshape(good_dims).abs().square().mean(dims).reshape(good_dims).sqrt();
std::cout << stds << std::endl;
printf("Calculated std, took %f seconds\n",(float)(get_nanos() - startTime) / 1000000000L);
这在我的Linux VM上运行大约3秒,其中20000乘5个浮点数
Python中的代码:
import numpy as np
import time
start = time.time()
a = np.random.rand(2*10000,5)
b = np.random.rand(2*10000,5)
stds = np.std(a - b, axis = 0)
means = np.mean(a - b, axis = 0)
#diffs = np.sum(np.abs(net_out - correct_out)/stds,axis=1)
diffs = np.abs(a - b - means)/stds
print(diffs)
print("Took", time.time() - start )
在同一个VM上运行0.0068秒。
非常感谢, 摩西
答案 0 :(得分:2)
对于2D张量,更好地使用Matrix
或Array
,这将导致更简单的代码:
ArrayXXd a = ArrayXXd::Random(2*10000,5);
ArrayXXd b = ArrayXXd::Random(2*10000,5);
auto means = (a-b).colwise().mean().eval();
auto stds = (((a-b).rowwise()-means).square().colwise().sum() / (a.rows()-1)).sqrt().eval();
ArrayXXd diffs = abs((a-b).rowwise() - means).rowwise()/stds;
请注意使用.eval()
的行的auto
,请参阅why。
在使用gcc和0.000324919s
在普通笔记本电脑上编译时,此代码需要-O3
(不考虑随机数生成,这可能要贵得多但不具代表性)。
这是我提出的Tensor版本,再次注意到eval()
来电:
int n = a.dimension(0);
Eigen::array<int, 1> dims({0 /* dimension to reduce */});
Eigen::array<int, 2> good_dims{{1,(int)a.dimension(1)}};
Eigen::array<int,2> bc({n,1});
auto means = (a - b).mean(dims).eval();
auto submean = (a - b) - means.reshape(good_dims).broadcast(bc);
auto stds = (submean.square().eval().sum(dims) * 1.f/(float(n-1))).sqrt().eval();
diffs = submean.abs() / stds.reshape(good_dims).broadcast(bc);
但似乎相当慢,约0.007秒。要将Tensor
视为Array
,您可以使用Map
:
Map<const ArrayXXf> a(tensor_a.data(), tensor_a.dimension(0), tensor_a.dimension(1));