现在我需要使用cuda技术分配所有可用内存。 我用特斯拉C2050,Quadro 600和GeForce GTX 560 Ti做到了: 首先,我在设备上分配0字节的全局内存。第二步是通过cudaMemGetInfo函数定义设备的可用内存,并分配该可用内存。它适用于上面列出的设备。 但这种机制不适用于GeForce GTX 690。
有人可以帮助我,我可以使用什么机制在GeForce GTX 690设备上分配内存或该操作的任何范例?
看起来像这样:
cudaSetDevice(deviceIndex);
int (*reservedMemory);
cudaMalloc(&reservedMemory, 0);
size_t freeMemory, totalMemory;
cudaMemGetInfo(&freeMemory, &totalMemory);
cudaMalloc(&reservedMemory, freeMemory);
在GeForce GTX 690上,两个现有的流式多处理器之一在2147483648字节的内存上运行,但是我只能分配1341915136个字节的空闲全局内存,等于2050109440字节。 在Quadro 600上,一个现有的流式多处理器在1073414144字节的内存上运行,我可以分配所有可用的859803648字节的空闲全局内存,等于859803648字节。
有关Quadro 600的示例(显示编译,链接和执行过程):
D:\Gdmt> nvcc -arch=compute_20 -code=sm_21 -c ./Gdmt.cu -o ./Gdmt.obj
Gdmt.cu
tmpxft_00000bb4_00000000-3_Gdmt.cudafe1.gpu
tmpxft_00000bb4_00000000-8_Gdmt.cudafe2.gpu
Gdmt.cu
tmpxft_00000bb4_00000000-3_Gdmt.cudafe1.cpp
tmpxft_00000bb4_00000000-14_Gdmt.ii
D:\Gdmt> nvcc ./Gdmt.obj -o ./Gdmt.exe
D:\Gdmt> nvcc -arch=compute_20 -code=sm_21 -c ./Gdmt_additional.cu -o ./Gdmt_add
itional.obj
Gdmt_additional.cu
tmpxft_00000858_00000000-3_Gdmt_additional.cudafe1.gpu
tmpxft_00000858_00000000-8_Gdmt_additional.cudafe2.gpu
Gdmt_additional.cu
tmpxft_00000858_00000000-3_Gdmt_additional.cudafe1.cpp
tmpxft_00000858_00000000-14_Gdmt_additional.ii
D:\Gdmt> nvcc ./Gdmt_additional.obj -o ./Gdmt_additional.exe
D:\Gdmt> Gdmt.exe
Total amount of memory: 1073414144 Bytes;
Memory to reserve: 859803648 Bytes;
Memory reserved: 859803648 Bytes;
^C
D:\Gdmt> Gdmt_additional.exe
Allocation is succeeded on 890830848 bytes of reserved memory.
^C
D:\Gdmt>
有关GeForce GTX 690的示例(显示编译,链接和执行过程):
J:\Gdmt> nvcc -arch=compute_30 -code=sm_30 -c ./Gdmt.cu -o ./Gdmt.obj
Gdmt.cu
tmpxft_000011f0_00000000-5_Gdmt.cudafe1.gpu
tmpxft_000011f0_00000000-10_Gdmt.cudafe2.gpu
Gdmt.cu
tmpxft_000011f0_00000000-5_Gdmt.cudafe1.cpp
tmpxft_000011f0_00000000-15_Gdmt.ii
J:\Gdmt> nvcc ./Gdmt.obj -o ./Gdmt.exe
J:\Gdmt> nvcc -arch=compute_30 -code=sm_30 -c ./Gdmt_additional.cu -o ./Gdmt_add
itional.obj
Gdmt_additional.cu
tmpxft_00001164_00000000-5_Gdmt_additional.cudafe1.gpu
tmpxft_00001164_00000000-10_Gdmt_additional.cudafe2.gpu
Gdmt_additional.cu
tmpxft_00001164_00000000-5_Gdmt_additional.cudafe1.cpp
tmpxft_00001164_00000000-15_Gdmt_additional.ii
J:\Gdmt> nvcc ./Gdmt_additional.obj -o ./Gdmt_additional.exe
J:\Gdmt> Gdmt.exe
Total amount of memory: 2147483648 Bytes;
Memory to reserve: 2050109440 Bytes;
Warning, memory allocation process is not succeeded!
^C
J:\Gdmt> Gdmt_additional.exe
Allocation is succeeded on 1341915136 bytes of reserved memory.
^C
示例已存档并位于:
(z7存档 - 78.5 KB~80,434字节) https://docs.google.com/file/d/0BzZ5q0v8n-qTTDctVDV5Mnh2ODA/edit (zip存档 - 163 KB~167,457字节) https://docs.google.com/file/d/0BzZ5q0v8n-qTT2xoV3NXSzhQMDQ/edit
本主题是在“The GeForce Lounge”和“CUDA Programming and Performance”上发布的同名主题的克隆。
答案 0 :(得分:1)
我可以重新运行你的例子并得出相同的结果。
我试图从另一方面解决问题,并尝试分配大小不一的块。
int (*reservedMemory);
size_t const NBlockSize = 1300 *1024*1024;
size_t freeMemory, totalMemory;
cudaError_t nErr = cudaSuccess;
size_t nTotalAlloc=0;
while( nErr == cudaSuccess )
{
cudaMemGetInfo(&freeMemory, &totalMemory);
std::cout << "===========================================================" << std::endl;
std::cout << "Free/Total(kB): " << freeMemory/1024 << "/" << totalMemory/1024 << std::endl;
size_t nAllocSize = NBlockSize;
while( nAllocSize > freeMemory )
nAllocSize /= 2;
nErr = cudaMalloc(&reservedMemory, nAllocSize );
if( nErr == cudaSuccess )
nTotalAlloc += nAllocSize;
std::cout << "AllocSize(kB): " << nAllocSize/1024 << ", error: " << cudaGetErrorString(nErr) << std::endl;
}
std::cout << "TotalAlloc/Total (kB): " << nTotalAlloc/1024 << "/" << totalMemory/1024 << std::endl;
程序以一个大小为NBlockSize的块开始,如果freeMemory减少,也会减少nAllocSize。看看下面的输出,当分配与freeMemory有很大关联的块时,cudaMalloc似乎有点不可预测。有一次,它设法分配超过98%的可用内存,另一方面,它无法从1GB的可用内存中分配800MB。
最有趣的运行是起始块大小为700MB的那个。在上一次成功的循环中,它在1428中达到了1400kB,并且在下一次运行中无法分配10个20kB中的10个。
根据起始大小,程序设法在最佳运行时分配除8kB以外的所有可用空间,并在最差的情况下分配超过1GB。
D:\Buildx64\Test\GMDT\Debug>Gdmt.exe
NBlockSize(MB): 1000
===========================================================
Free/Total(kB): 1797120/2097152
AllocSize(kB): 1024000, percentage of freememory: 0.569801, error: no error
===========================================================
Free/Total(kB): 773120/2097152
AllocSize(kB): 512000, percentage of freememory: 0.662252, error: no error
===========================================================
Free/Total(kB): 261120/2097152
AllocSize(kB): 256000, percentage of freememory: 0.980392, error: no error
===========================================================
Free/Total(kB): 5128/2097152
AllocSize(kB): 4000, percentage of freememory: 0.780031, error: no error
===========================================================
Free/Total(kB): 1032/2097152
AllocSize(kB): 1000, percentage of freememory: 0.968992, error: no error
===========================================================
Free/Total(kB): 8/2097152
AllocSize(kB): 7, percentage of freememory: 0.976563, error: out of memory
TotalAlloc/Total (kB): 1797000/2097152
D:\Buildx64\Test\GMDT\Debug>Gdmt.exe
NBlockSize(MB): 1200
===========================================================
Free/Total(kB): 1796864/2097152
AllocSize(kB): 1228800, percentage of freememory: 0.683858, error: no error
===========================================================
Free/Total(kB): 568072/2097152
AllocSize(kB): 307200, percentage of freememory: 0.540777, error: no error
===========================================================
Free/Total(kB): 260872/2097152
AllocSize(kB): 153600, percentage of freememory: 0.588795, error: no error
===========================================================
Free/Total(kB): 107272/2097152
AllocSize(kB): 76800, percentage of freememory: 0.715937, error: no error
===========================================================
Free/Total(kB): 30472/2097152
AllocSize(kB): 19200, percentage of freememory: 0.630087, error: no error
===========================================================
Free/Total(kB): 11272/2097152
AllocSize(kB): 9600, percentage of freememory: 0.851668, error: no error
===========================================================
Free/Total(kB): 1672/2097152
AllocSize(kB): 1200, percentage of freememory: 0.717703, error: no error
===========================================================
Free/Total(kB): 392/2097152
AllocSize(kB): 300, percentage of freememory: 0.765306, error: out of memory
TotalAlloc/Total (kB): 1796400/2097152
D:\Buildx64\Test\GMDT\Debug>Gdmt.exe
NBlockSize(MB): 800
===========================================================
Free/Total(kB): 1844448/2097152
AllocSize(kB): 819200, percentage of freememory: 0.444144, error: no error
===========================================================
Free/Total(kB): 1025248/2097152
AllocSize(kB): 819200, percentage of freememory: 0.799026, error: out of memory
TotalAlloc/Total (kB): 819200/2097152
D:\Buildx64\Test\GMDT\Debug>Gdmt.exe
NBlockSize(MB): 700
===========================================================
Free/Total(kB): 1835528/2097152
AllocSize(kB): 716800, percentage of freememory: 0.390514, error: no error
===========================================================
Free/Total(kB): 1118740/2097152
AllocSize(kB): 716800, percentage of freememory: 0.640721, error: no error
===========================================================
Free/Total(kB): 401940/2097152
AllocSize(kB): 358400, percentage of freememory: 0.891675, error: no error
===========================================================
Free/Total(kB): 43540/2097152
AllocSize(kB): 22400, percentage of freememory: 0.514469, error: no error
===========================================================
Free/Total(kB): 21140/2097152
AllocSize(kB): 11200, percentage of freememory: 0.529801, error: no error
===========================================================
Free/Total(kB): 9876/2097152
AllocSize(kB): 5600, percentage of freememory: 0.567031, error: no error
===========================================================
Free/Total(kB): 4244/2097152
AllocSize(kB): 2800, percentage of freememory: 0.659755, error: no error
===========================================================
Free/Total(kB): 1428/2097152
AllocSize(kB): 1400, percentage of freememory: 0.980392, error: no error
===========================================================
Free/Total(kB): 20/2097152
AllocSize(kB): 10, percentage of freememory: 0.546875, error: out of memory
TotalAlloc/Total (kB): 1835400/2097152
答案 1 :(得分:0)
我最近记得,关于cuda中的“Page-Locked”机制。我测试它,并没有得到满意的性能结果(使用这种机制计算速度慢十倍,然后使用GeForce GTX 690的Windows内存预留功能非常有限的版本)。我只是认为将数据复制到设备以供以后计算和写回将自动完成,但实际上不涉及设备的内存。