我开始使用openCL而且我遇到了释放内存的问题。一切都很好,我得到的数据是我期望的,但我似乎无法在我的数组上调用delete[]
。代码如下所示。
在delete[]
上调用gpu_dest
工作正常,但在matrix
上调用它会导致错误。环顾四周,我发现我可能已经改变了阵列的位置,但由于它只是从这个程序中读取,我不知道我应该在哪里做这样的事情。
有人可以开导我吗?
完整错误如下:
---------------------------
Microsoft Visual C++ Runtime Library
---------------------------
Debug Assertion Failed!
Program: ...2014 - 2015\Parallel Systems\project-opencl\Debug\Project.exe
File: f:\dd\vctools\crt\crtw32\misc\dbgdel.cpp
Line: 52
Expression: _BLOCK_TYPE_IS_VALID(pHead->nBlockUse)
For information on how your program can cause an assertion
failure, see the Visual C++ documentation on asserts.
(Press Retry to debug the application)
代码
#include <exception>
#include <iostream>
#include <sstream>
#include <string>
#include <cstdlib>
#include <vector>
#include <JC/util.hpp>
//#define A(x,y) a[x*width + y]
int main(int argc, char *argv[])
{
try {
if (argc != 4) {
std::ostringstream oss;
oss << "Usage: " << argv[0] << " <kernel_file> <kernel_name> <workgroup_size>";
throw std::runtime_error(oss.str());
}
std::string kernel_file(argv[1]);
std::string kernel_name(argv[2]);
unsigned int workgroup_size = atoi(argv[3]);
std::cout << "Workgroup size: " << workgroup_size << std::endl;
// Initialize test matrix
int matrix_size = 9;
float input[9] = { 10, -7, 0, -3, 2, 6, 5, -1, 5 };
// Allocate memory on the host and populate source
float *gpu_dst = new float[matrix_size];
float *matrix = input;
// OpenCL initialization
std::vector<cl::Platform> platforms;
std::vector<cl::Device> devices;
cl::Platform::get(&platforms);
platforms[0].getDevices(CL_DEVICE_TYPE_GPU, &devices);
cl::Context context(devices);
cl::CommandQueue queue(context, devices[0], CL_QUEUE_PROFILING_ENABLE);
// Allocate memory on the device
cl::Buffer source_buf(context, CL_MEM_READ_ONLY, matrix_size*sizeof(float));
cl::Buffer dest_buf(context, CL_MEM_WRITE_ONLY, matrix_size*sizeof(float));
// Create the kernel
cl::Program program = jc::buildProgram(kernel_file, context, devices);
cl::Kernel kernel(program, kernel_name.c_str());
// set the kernel arguments
kernel.setArg<cl::Memory>(0, source_buf);
kernel.setArg<cl::Memory>(1, dest_buf);
kernel.setArg<cl_uint>(2, matrix_size);
// transfer source data from the host to the device
queue.enqueueWriteBuffer(source_buf, CL_TRUE, 0, matrix_size*sizeof(float), matrix);
// execute the code on the device
cl_ulong t;
t = jc::runAndTimeKernel(kernel, queue, cl::NDRange(matrix_size), cl::NDRange(workgroup_size));
// transfer destination data from the device to the host
queue.enqueueReadBuffer(dest_buf, CL_TRUE, 0, matrix_size*sizeof(float), gpu_dst);
// compute the data throughput in GB/s
float throughput = (2.0*matrix_size*sizeof(float)) / t; // t is in nano seconds
std::cout << "Achieved throughput: " << throughput << std::endl;
for (int i = 0; i < 9; i++)
{
std::cout << gpu_dst[i] << matrix[i] << std::endl;
}
std::cout << "Deallocating memory" << std::endl;
// Deallocate memory
delete[] gpu_dst;
delete[] matrix;// <-- This causes an error, for some reason..
std::cout << "Done" << std::endl;
return 0;
}
catch (cl::Error& e) {
std::cerr << e.what() << ": " << jc::readable_status(e.err());
return 3;
}
catch (std::exception& e) {
std::cerr << e.what() << std::endl;
return 2;
}
catch (...) {
std::cerr << "Unexpected error. Aborting!\n" << std::endl;
return 1;
}
}
答案 0 :(得分:1)
matrix
未动态分配,因此使用delete[]
无效。
float input[9] = { 10, -7, 0, -3, 2, 6, 5, -1, 5 };
float *matrix = input;
//..
delete [] matrix; // wrong
其次,为什么不使用std::vector
代替new[]
?
std::vector<float> gpu_dst(matrix_size);
然后你最后不需要delete [] gpu_dst;
。