我有一个由2个Tesla M2050组成的gpu集群,当我执行我的代码时,cudaGetDeviceCount只返回1.如果我尝试用cudaSetDevice设置设备1,它会给我这个错误:设备序数无效。在Windows的设备管理器中,列出了两个设备。如果需要,这是我的源代码
cutilSafeCall(cudaGetDeviceCount(&num_devices));
for (device = 0; device < num_devices; device++) {
cudaDeviceProp properties;
cudaGetDeviceProperties(&properties, device);
printf("Device ID:\t%d\n", device);
printf("Device Name:\t%s\n", properties.name );
printf("Global memory:\t%d\n", properties.totalGlobalMem );
printf("Constant memory:\t%d\n", properties.totalConstMem );
printf("Warp size:\t%d\n", properties.warpSize );
}
devs=0;
ParseArguments(argc, argv);
cutilSafeCall(cudaSetDevice(devs));
任何帮助将不胜感激
编辑:deviceQuery.exe的输出
deviceQuery.exe Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
There is 1 device supporting CUDA
Device 0: "Tesla M2050"
CUDA Driver Version: 5.50
CUDA Runtime Version: 4.20
CUDA Capability Major/Minor version number: 2.0
...
...
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.50, CUDA Runtime Vers ion = 4.20, NumDevs = 1, Device = Tesla M2050
PASSED
Press <Enter> to Quit...
-----------------------------------------------------------
答案 0 :(得分:1)
如果单个节点中有两个CUDA GPU且deviceQuery仅报告一个,则请考虑以下可能性: