Question

这是在带有GeForce 320M（计算能力1.2）的MacBookPro7,1上。以前，使用OS X 10.7.8，XCode 4.x和CUDA 5.0，编译并运行CUDA代码。

然后，我更新到OS X 10.9.2，XCode 5.1和CUDA 5.5。起初，deviceQuery失败了。我在其他地方读到5.5.28（驱动程序CUDA 5.5附带）不支持计算能力1.x（sm_10），但5.5.43确实如此。在将CUDA驱动程序更新为更新的当前5.5.47（GPU驱动程序版本8.24.11 310.90.9b01）后，deviceQuery确实通过了以下输出。

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce 320M"
  CUDA Driver Version / Runtime Version          5.5 / 5.5
  CUDA Capability Major/Minor version number:    1.2
  Total amount of global memory:                 253 MBytes (265027584 bytes)
  ( 6) Multiprocessors, (  8) CUDA Cores/MP:     48 CUDA Cores
  GPU Clock rate:                                950 MHz (0.95 GHz)
  Memory Clock rate:                             1064 Mhz
  Memory Bus Width:                              128-bit
  Maximum Texture Dimension Size (x,y,z)         1D=(8192), 2D=(65536, 32768), 3D=(2048, 2048, 2048)
  Maximum Layered 1D Texture Size, (num) layers  1D=(8192), 512 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(8192, 8192), 512 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       16384 bytes
  Total number of registers available per block: 16384
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1024
  Maximum number of threads per block:           512
  Max dimension size of a thread block (x,y,z): (512, 512, 64)
  Max dimension size of a grid size    (x,y,z): (65535, 65535, 1)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             256 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            Yes
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      No
  Device PCI Bus ID / PCI location ID:           0 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.5, CUDA Runtime Version = 5.5, NumDevs = 1, Device0 = GeForce 320M
Result = PASS

此外，我可以成功编译而无需修改CUDA 5.5样本，但我还没有尝试编译所有这些样本。

但是，matrixMul，simpleCUFFT，simpleCUBLAS等样本在运行时都会立即失败。

$ ./matrixMul 
[Matrix Multiply Using CUDA] - Starting...
GPU Device 0: "GeForce 320M" with compute capability 1.2

MatrixA(160,160), MatrixB(320,160)
cudaMalloc d_A returned error code 2, line(164)

$ ./simpleCUFFT 
[simpleCUFFT] is starting...
GPU Device 0: "GeForce 320M" with compute capability 1.2

CUDA error at simpleCUFFT.cu:105 code=2(cudaErrorMemoryAllocation) "cudaMalloc((void **)&d_signal, mem_size)"

错误代码2是cudaErrorMemoryAllocation，但我怀疑它以某种方式隐藏了失败的CUDA初始化。

$ ./simpleCUBLAS 
GPU Device 0: "GeForce 320M" with compute capability 1.2

simpleCUBLAS test running..
!!!! CUBLAS initialization error

实际错误代码是从调用cublasCreate()返回CUBLAS_STATUS_NOT_INITIALIZED。

之前是否有人遇到此问题并找到修复程序？提前谢谢。

Answer 1

我猜你的内存不足了。您的GPU正由显示管理器使用，它只有256Mb的RAM。 OS 10.9显示管理器和CUDA 5.5运行时的内存占用总量可能会让您几乎没有空闲内存。我建议编写并运行一个这样的小测试程序：

#include <iostream>

int main(void)
{
    size_t mfree, mtotal;

    cudaSetDevice(0);
    cudaMemGetInfo(&mfree, &mtotal);

    std::cout << mfree << " bytes of " << mtotal << " available." << std::endl;

    return cudaDeviceReset();
}

[免责声明：用浏览器编写，永远不会自行编译或测试使用]

这应该会为您提供设备上下文建立后可用空闲内存的图片。您可能会对可以使用的内容感到惊讶。

编辑：这是一个更轻量级的替代测试，它甚至不会尝试在设备上建立上下文。相反，它仅使用驱动程序API来检查设备。如果成功，则OS X的运行时API发送会以某种方式中断，或者设备上没有可用于建立上下文的内存。如果它失败了，那么你真的有一个破碎的CUDA安装。不管怎样，我会考虑用NVIDIA打开一个错误报告：

#include <iostream>
#include <cuda.h>

int main(void)
{
    CUdevice d;
    size_t b;
    cuInit(0);
    cuDeviceGet(&d, 0);
    cuDeviceTotalMem(&b, d);

    std::cout << "Total memory = " << b << std::endl;

    return 0;
}

请注意，您需要明确链接cuda驱动程序库才能使其正常工作（例如，将-lcuda传递给nvcc）

CUDA 5.5样本在OS X 10.9上编译正常，但在运行时立即出错

1 个答案: