Question

我有一个Thrust代码，它将大量数据（2.4G）加载到内存中，执行计算结果存储在主机（~1.5G）中，然后释放初始数据，将结果加载到设备中，执行其他对它进行计算，最后重新加载初始数据。推力代码如下所示：

thrust::host_device<float> hostData;
// here is a code which loads ~2.4G of data into hostData
thrust::device_vector<float> deviceData = hostData;
thrust::host_vector<float> hostResult;
// here is a code which perform calculations on deviceData and copies the result to hostResult (~1.5G)
free<thrust::device_vector<float> >(deviceData);
thrust::device_vector<float> deviceResult = hostResult;
// here is code which performs calculations on deviceResult and store some results also on the device
free<thrust::device_vector<float> >(deviceResult);
deviceData = hostData;

我定义的函数是免费的：

template<class T> void free(T &V) {
    V.clear();
    V.shrink_to_fit();
    size_t mem_tot;
    size_t mem_free;
    cudaMemGetInfo(&mem_free, &mem_tot);
    std::cout << "Free memory : " << mem_free << std::endl;
}

template void free<thrust::device_vector<int> >(thrust::device_vector<int>& V);
template void free<thrust::device_vector<float> >(
    thrust::device_vector<float>& V);

然而，我得到一个“thrust :: system :: detail :: bad_alloc'what（）：std :: bad_alloc：out of memory”错误尝试将hostData复制回deviceData，即使cudaMemGetInfo在此时返回我的设备有大约6G的可用内存。以下是免费方法的完整输出：

Free memory : 6295650304
Free memory : 6063775744
terminate called after throwing an instance of 'thrust::system::detail::bad_alloc'
what():  std::bad_alloc: out of memory

它似乎表明该设备内存不足，尽管有很多免费的。它是为Thrust向量释放内存的正确方法吗？我还应该注意，该代码适用于较小尺寸的数据（最高1.5G）

Answer 1

查看完整的，可编译的复制器代码会很有用。但是，你可能会遇到内存碎片。

即使可以将大量内存报告为空闲，也可能无法在单个大的连续块中进行分配。然后，此碎片将限制您可以请求的单个分配的最大大小。

这可能不是一个关于如何释放内存的问题，而是更多关于释放内存后开销分配的功能。您正在检查内存信息并获得大量回复的事实告诉我您正在正确地释放您的分配。

要尝试解决此问题，一种方法是仔细管理和重复使用您的分配。例如，如果您需要在设备上使用float的大型2.4G工作设备向量，则将其分配一次，然后将其重新用于连续操作。此外，如果您在尝试重新分配2.4G向量之前在设备上有任何剩余分配，那么在尝试重新分配2.4G之前尝试释放那些（即，您在设备上进行的所有分配）矢量。

CUDA推力内存分配问题

1 个答案: