Question

在下面的代码中，我只是从main连续两次调用函数foo。该函数只是简单地进行设备内存分配，然后递增该指针。然后它退出并返回主要。

第一次调用foo时，内存被正确分配。但是现在当我再次调用foo时可以在输出中看到，cuda内存分配失败并显示错误无效设备指针

我尝试在两个foo调用之间使用cudaThreadSynchronize（），但没有获得。为什么内存分配失败？

实际上由于

而导致错误

matrixd + = 3;

因为如果我不这样做，那么错误就会消失但是为什么，即使我正在使用cudaFree（）？

请帮助我理解这一点。

我的输出就在这里

Calling foo for the first time
Allocation of matrixd passed:
I came back to main safely :-)
I am going back to foo again :-)
Allocation of matrixd failed, the reason is:  invalid device pointer

我的主要（）就在这里

#include<stdio.h>  
#include <cstdlib> // malloc(), free() 
#include <iostream> // cout, stream
#include <math.h>
#include <ctime> // time(), clock()
#include <bitset>
bool foo(  );

/***************************************
Main method.

****************************************/
 int main()  
 { 

    // Perform one warm-up pass and validate
    std::cout << "Calling foo for the first time"<<std::endl;
    foo();
    std::cout << "I came back to main safely :-) "<<std::endl;
    std::cout << "I am going back to foo again :-) "<<std::endl;
    foo( );    
    getchar();  
    return 0;  
 }

foo（）的定义在此文件中：

#include <cuda.h>
#include <cuda_runtime_api.h>
#include <device_launch_parameters.h>
#include <iostream>

bool foo( )
{
    // Error return value
    cudaError_t status;
    // Number of bytes in the matrix.
    int bytes = 9 *sizeof(float);
        // Pointers to the device arrays
    float *matrixd=NULL; 

    // Allocate memory on the device to store matrix
    cudaMalloc((void**) &matrixd, bytes);
    status = cudaGetLastError();              //To check the error
    if (status != cudaSuccess) {                     
        std::cout << "Allocation of matrixd failed, the reason is:  " <<    cudaGetErrorString(status) << 
        std::endl;
        cudaFree(matrixd);                     //Free call for memory
        return false;
    }

    std::cout << "Allocation of matrixd passed: "<<std::endl;


    ////// Increment address 
    for (int i=0; i<3; i++){
         matrixd += 3;
    }

        // Free device memory
    cudaFree(matrixd);     

    return true;
}

更新

更好的错误检查。另外，我只是设备指针的渐进主义一次。这次我得到以下输出：

Calling foo for the first time
Allocation of matrixd passed:
Increamented the pointer and going to free cuda memory:
GPUassert: invalid device pointer C:/Users/user/Desktop/Gauss/Gauss/GaussianElem
inationGPU.cu 44

第44行是cudaFree（）。为什么它仍然失败？

#define gpuErrchk(ans) { gpuAssert((ans), __FILE__, __LINE__); }
inline void gpuAssert(cudaError_t code, const char *file, int line, bool abort=true)
{
   if (code != cudaSuccess) 
   {
      fprintf(stderr,"GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);
      if (abort) exit(code);
   }
}

// GPU function for direct method Gross Jorden method.

bool foo( )
{

    // Error return value
    cudaError_t status;
    // Number of bytes in the matrix.
    int bytes = 9 *sizeof(float);
        // Pointers to the device arrays
    float *matrixd=NULL; 

    // Allocate memory on the device to store each matrix
    gpuErrchk( cudaMalloc((void**) &matrixd, bytes));
    //cudaMemset(outputMatrixd, 0, bytes);

    std::cout << "Allocation of matrixd passed: "<<std::endl;


    ////// Incerament address 

         matrixd += 1;

         std::cout << "Increamented the pointer and going to free cuda memory: "<<std::endl;

         // Free device memory
    gpuErrchk( cudaFree(matrixd));     

    return true;
}

Answer 1

真正的问题在于此代码：

for (int i=0; i<3; i++){
     matrixd += 3;
}

// Free device memory
cudaFree(matrixd);

您从未分配matrixd+9，因此将其传递给cudaFree是非法的，并且会产生无效的设备指针错误。此错误将传播到下次执行错误检查时，即在后续调用cudaMalloc之后。如果您阅读任何这些API调用的文档，您会注意到它们可以返回先前GPU操作中的错误。这是在这种情况下发生的事情。

在CUDA运行时API中检查错误可能很难正确执行。有一个强大的，准备好的方法，如何做到here。我建议你使用它。

CUDA：重新分配内存时无效的设备指针错误

1 个答案: