C ++ AMP在硬件上崩溃(GeForce GTX 660)

时间:2013-03-14 18:40:33

标签: c++ visual-c++ c++11 c++-amp

我在编写一些C ++ AMP代码时遇到问题。我收录了一个样本。 它在模拟加速器上运行良好但在我的硬件上崩溃了显示驱动程序(Windows 7,NVIDIA GeForce GTX 660,最新的驱动程序),但我的代码没有任何问题。

我的代码是否有问题,或者这是硬件/驱动程序/编译器问题?

#include "stdafx.h"

#include <vector>
#include <iostream>
#include <amp.h>

int _tmain(int argc, _TCHAR* argv[])
{
    // Prints "NVIDIA GeForce GTX 660"
    concurrency::accelerator_view target_view = concurrency::accelerator().create_view();
    std::wcout << target_view.accelerator.description << std::endl;

    // lower numbers do not cause the issue
    const int x = 2000;
    const int y = 30000;

    // 1d array for storing result
    std::vector<unsigned int> resultVector(y);
    Concurrency::array_view<unsigned int, 1> resultsArrayView(resultVector.size(), resultVector);

    // 2d array for data for processing 
    std::vector<unsigned int> dataVector(x * y);
    concurrency::array_view<unsigned int, 2> dataArrayView(y, x, dataVector);
    parallel_for_each(
        // Define the compute domain, which is the set of threads that are created.
        resultsArrayView.extent,
        // Define the code to run on each thread on the accelerator.
        [=](concurrency::index<1> idx) restrict(amp)
    {
        concurrency::array_view<unsigned int, 1> buffer = dataArrayView[idx[0]];
        unsigned int bufferSize = buffer.get_extent().size();

        // needs both loops to cause crash
        for (unsigned int outer = 0; outer < bufferSize; outer++)
        {
            for (unsigned int i = 0; i < bufferSize; i++)
            {
                // works without this line, also if I change to buffer[0] it works?
                dataArrayView[idx[0]][0] = 0;
            }
        }
        // works without this line
        resultsArrayView[0] = 0;
    });

    std::cout << "chash on next line" << std::endl; 
    resultsArrayView.synchronize();
    std::cout << "will never reach me" << std::endl; 

    system("PAUSE");
    return 0;
}

1 个答案:

答案 0 :(得分:8)

您的计算很可能超过允许的量子时间(默认为2秒)。在此之后,操作系统会强行进入并重新启动GPU,这称为Timeout Detection and Recovery (TDR)。软件适配器(参考设备)没有启用TDR,这就是计算可能超过允许的量子时间的原因。

你的计算真的需要3000个线程(变量x),每个线程执行2000 * 3000(x * y)循环迭代吗?您可以对计算进行分块,这样每个块的计算时间不到2秒。您还可以考虑禁用TDR或超出允许的量子时间以满足您的需求。

我强烈建议您阅读有关如何在C ++ AMP中处理TDR的博文,其中详细解释了TDR:http://blogs.msdn.com/b/nativeconcurrency/archive/2012/03/07/handling-tdrs-in-c-amp.aspx

此外,以下是有关如何在Windows 8上禁用TDR的单独博客文章: http://blogs.msdn.com/b/nativeconcurrency/archive/2012/03/06/disabling-tdr-on-windows-8-for-your-c-amp-algorithms.aspx