我有大约4000个具有3000维度的向量,我需要计算差异向量对。
这里有问题。 我试过两种方式。
differ = (double *)malloc(sizeof(double) * testNum * trainNum * featureDim);
array_view<double, 2> differAMP(testNum, trainNum, featureDim, differ);
QueryPerformanceFrequency(&tc);
QueryPerformanceCounter(&t1);
parallel_for_each(
differAMP.extent,
[=](concurrency::index<3> idx) restrict(amp) {
differAMP[idx] = (test(idx[0], idx[2]) - train(idx[1], idx[2]));
}
);
但是我认为vs会抛出运行时异常因为内存限制。 然后我改变了代码
differ = (double *)malloc(sizeof(double) trainNum * featureDim);
array_view<double, 2> differAMP(trainNum, featureDim, differ);
QueryPerformanceFrequency(&tc);
QueryPerformanceCounter(&t1);
parallel_for_each(
differAMP.extent,
[=](concurrency::index<2> idx) restrict(amp) {
differAMP[idx] = (test(testIndex, idx[1]) - train(idx[0], idx[1]));
}
);
我在循环内为每个测试用例运行它。 但是,vs抛出了一个array_view删除的异常。 现在我不知道该怎么做。
答案 0 :(得分:0)
在第一种情况下,由于内存不足而导致失败。在第二种情况下,您可能会因为任务超过2秒TDR限制而失败。请记住,parallel_for_each
将工作异步排队到GPU的DMA缓冲区。因此将其置于for
循环中将导致所有
for (int testIndex = 0; testIndex < testNum; ++testIndex)
{
differ = (double *)malloc(sizeof(double) trainNum * featureDim);
array_view<double, 2> differAMP(trainNum, featureDim, differ);
QueryPerformanceFrequency(&tc);
QueryPerformanceCounter(&t1);
parallel_for_each(
differAMP.extent,
[=](concurrency::index<2> idx) restrict(amp) {
differAMP[idx] = (test(testIndex, idx[1]) - train(idx[0], idx[1]));
});
// Force this parallel_for_each to finish before running the next one.
differAmp.source_accelerator_view.wait();
}
上面的代码阻止所有工作在一个块中排队并执行,导致TDR超时,从而引发异常。
您还可以通过创建具有特定排队模式的视图来更改排队行为。立即模式强制任务立即排队,以吞吐量为代价改善延迟。
accelerator acc(accelerator::default_accelerator);
acc.create_view(queuing_mode::queuing_mode_immediate);