我有一个自定义的内核和他各自的自定义ArrayFire函数,我想从Julia中调用此函数,但是当我使用相对较大的数组来执行此操作时,我得到:ArgumentError: cannot convert NULL to string
,我知道这可能会受到限制,具体取决于类型的GPU,但是麻烦的是,例如,当我更换内核时,此限制会有所不同。
内核(Gauss_Jordan_f内核)
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
__kernel void
Gauss_Jordan_f(__global double* A, __global double* B, int gsize)
{
int tx = get_global_id(0);
if (tx >= gsize)
{
return;
}
int Rfirst = tx * gsize;
double diag;
double fm;
for (int i = 0; i < gsize; i++)
{
diag = A[i*gsize + i];
if (tx != i && diag != 0)
{
fm = A[Rfirst + i] / diag;
B[tx] -= fm * B[i];
for (int j = i + 1; j < gsize; j++)
{
A[Rfirst + j] -= fm * A[i*gsize + j];
}
}
barrier(CLK_LOCAL_MEM_FENCE);
}
barrier(CLK_LOCAL_MEM_FENCE);
B[tx] /= A[Rfirst + tx];
}
函数(带有使用Gauss_Jordan_f内核的af :: array参数的函数)
void AFire::SELgj_(af::array &A, af::array &B) {
static cl_context af_context = afcl::getContext();
static cl_device_id af_device_id = afcl::getDeviceId();
static cl_command_queue af_queue = afcl::getQueue();
cl_mem * d_A = A.device<cl_mem>();
cl_mem * d_B = B.device<cl_mem>();
size_t order = (int)A.dims(0);
size_t program_length = strlen(GJordan_source);
int status = CL_SUCCESS;
cl_program program = clCreateProgramWithSource(af_context, 1, (const char **)&GJordan_source, &program_length, &status);
status = clBuildProgram(program, 1, &af_device_id, NULL, NULL, NULL);
cl_kernel kernel = clCreateKernel(program, "Gauss_Jordan_f", &status);
clSetKernelArg(kernel, 0, sizeof(cl_mem), d_A);
clSetKernelArg(kernel, 1, sizeof(cl_mem), d_B);
clSetKernelArg(kernel, 2, sizeof(cl_int), &order);
size_t localWorkSize = BLOCK_SIZE * BLOCK_SIZE;
size_t globalWorkSize = shrRoundUp(localWorkSize, order);
clEnqueueNDRangeKernel(af_queue, kernel, 1, 0, &globalWorkSize, &localWorkSize,
0, NULL, NULL);
A.unlock();
B.unlock();
}
从朱莉娅打电话
A
是AFArray: 2000×2000 Array{Float64,2}
和B
的向量,与A(大小为2000)兼容,我知道系统兼容,而且知道解决方案,但是:
ccall((:SELgj_,"path/to/dll")
,Cvoid,(Ref{af_array},Ref{af_array}),Af.arr,Df.arr)
但是,结局似乎很好
Df
ArgumentError: cannot convert NULL to string
Stacktrace:
[1] unsafe_string at .\strings\string.jl:56 [inlined]
[2] unsafe_string at .\c.jl:193 [inlined]
[3] get_last_error at C:\Users\user\.julia\packages\ArrayFire\4SkOz\src\util.jl:299 [inlined]
[4] _error(::UInt32) at C:\Users\user\.julia\packages\ArrayFire\4SkOz\src\util.jl:86
[5] convert_array(::AFArray{Float64,2}) at C:\Users\user\.julia\packages\ArrayFire\4SkOz\src\wrap.jl:748
[6] Type at C:\Users\user\.julia\packages\ArrayFire\4SkOz\src\array.jl:32 [inlined]
[7] toa(::AFArray{Float64,2}) at C:\Users\user\.julia\packages\ArrayFire\4SkOz\src\util.jl:37
[8] show(::IOContext{Base.GenericIOBuffer{Array{UInt8,1}}}, ::MIME{Symbol("text/plain")}, ::AFArray{Float64,2}) at C:\Users\user\.julia\packages\ArrayFire\4SkOz\src\util.jl:41
[9] limitstringmime(::MIME{Symbol("text/plain")}, ::AFArray{Float64,2}) at C:\Users\user\.julia\packages\IJulia\GIANC\src\inline.jl:37
[10] display_mimestring(::MIME{Symbol("text/plain")}, ::AFArray{Float64,2}) at C:\Users\user\.julia\packages\IJulia\GIANC\src\display.jl:66
[11] display_dict(::AFArray{Float64,2}) at C:\Users\user\.julia\packages\IJulia\GIANC\src\display.jl:95
[12] #invokelatest#1 at .\essentials.jl:697 [inlined]
[13] invokelatest at .\essentials.jl:696 [inlined]
[14] execute_request(::ZMQ.Socket, ::Msg) at C:\Users\user\.julia\packages\IJulia\GIANC\src\execute_request.jl:95
[15] #invokelatest#1 at .\essentials.jl:697 [inlined]
[16] invokelatest at .\essentials.jl:696 [inlined]
[17] eventloop(::ZMQ.Socket) at C:\Users\user\.julia\packages\IJulia\GIANC\src\eventloop.jl:8
[18] (::getfield(IJulia, Symbol("##15#18")))() at .\task.jl:259
,从这里不再可以创建或修改ArrayFire对象。此函数在最大大小为1000的情况下都可以正常运行,但是我还有其他函数和内核在较大的大小上也可以正常工作,我不知道为什么这样做变化。
ArrayFire详细信息
ArrayFire v3.6.1 (OpenCL, 64-bit Windows, build b443e14)
[0] NVIDIA: GeForce GT 630M, 2048 MB
-1- INTEL: Intel(R) HD Graphics 4000, 1400 MB
-2- INTEL: Intel(R) Core(TM) i7-3630QM CPU @ 2.40GHz, 6037 MB
OpenCL详细信息
ocldevicequery
[ocldevicequery] starting...
ocldevicequery Starting...
OpenCL SW Info:
CL_PLATFORM_NAME: NVIDIA CUDA
CL_PLATFORM_VERSION: OpenCL 1.2 CUDA 9.1.84
OpenCL SDK Revision: 7027912
OpenCL Device Info:
1 devices found supporting OpenCL:
---------------------------------
Device GeForce GT 630M
---------------------------------
CL_DEVICE_NAME: GeForce GT 630M
CL_DEVICE_VENDOR: NVIDIA Corporation
CL_DRIVER_VERSION: 391.35
CL_DEVICE_VERSION: OpenCL 1.1 CUDA
CL_DEVICE_OPENCL_C_VERSION: OpenCL C 1.1
CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU
CL_DEVICE_MAX_COMPUTE_UNITS: 2
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3
CL_DEVICE_MAX_WORK_ITEM_SIZES: 1024 / 1024 / 64
CL_DEVICE_MAX_WORK_GROUP_SIZE: 1024
CL_DEVICE_MAX_CLOCK_FREQUENCY: 950 MHz
CL_DEVICE_ADDRESS_BITS: 64
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 512 MByte
CL_DEVICE_GLOBAL_MEM_SIZE: 2048 MByte
CL_DEVICE_ERROR_CORRECTION_SUPPORT: no
CL_DEVICE_LOCAL_MEM_TYPE: local
CL_DEVICE_LOCAL_MEM_SIZE: 48 KByte
CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 64 KByte
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_PROFILING_ENABLE
CL_DEVICE_IMAGE_SUPPORT: 1
CL_DEVICE_MAX_READ_IMAGE_ARGS: 128
CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 8
CL_DEVICE_SINGLE_FP_CONFIG: denorms INF-quietNaNs round-to-nearest round-to-zero round-to-inf fma
CL_DEVICE_IMAGE <dim> 2D_MAX_WIDTH 16384
2D_MAX_HEIGHT 16384
3D_MAX_WIDTH 2048
3D_MAX_HEIGHT 2048
3D_MAX_DEPTH 2048
CL_DEVICE_EXTENSIONS: cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics
cl_khr_fp64
cl_khr_byte_addressable_store
cl_khr_icd
cl_khr_gl_sharing
cl_nv_compiler_options
cl_nv_device_attribute_query
cl_nv_pragma_unroll
cl_nv_d3d10_sharing
cl_khr_d3d10_sharing
cl_nv_d3d11_sharing
cl_nv_copy_opts
CL_DEVICE_COMPUTE_CAPABILITY_NV: 2.1
NUMBER OF MULTIPROCESSORS: 2
NUMBER OF CUDA CORES: 96
CL_DEVICE_REGISTERS_PER_BLOCK_NV: 32768
CL_DEVICE_WARP_SIZE_NV: 32
CL_DEVICE_GPU_OVERLAP_NV: CL_TRUE
CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV: CL_TRUE
CL_DEVICE_INTEGRATED_MEMORY_NV: CL_FALSE
CL_DEVICE_PREFERRED_VECTOR_WIDTH_<t> CHAR 1, SHORT 1, INT 1, LONG 1, FLOAT 1, DOUBLE 1
---------------------------------
2D Image Formats Supported (71)
---------------------------------
# Channel Order Channel Type
1 CL_R CL_FLOAT
2 CL_R CL_HALF_FLOAT
3 CL_R CL_UNORM_INT8
4 CL_R CL_UNORM_INT16
5 CL_R CL_SNORM_INT16
6 CL_R CL_SIGNED_INT8
7 CL_R CL_SIGNED_INT16
8 CL_R CL_SIGNED_INT32
9 CL_R CL_UNSIGNED_INT8
10 CL_R CL_UNSIGNED_INT16
11 CL_R CL_UNSIGNED_INT32
12 CL_A CL_FLOAT
13 CL_A CL_HALF_FLOAT
14 CL_A CL_UNORM_INT8
15 CL_A CL_UNORM_INT16
16 CL_A CL_SNORM_INT16
17 CL_A CL_SIGNED_INT8
18 CL_A CL_SIGNED_INT16
19 CL_A CL_SIGNED_INT32
20 CL_A CL_UNSIGNED_INT8
21 CL_A CL_UNSIGNED_INT16
22 CL_A CL_UNSIGNED_INT32
23 CL_RG CL_FLOAT
24 CL_RG CL_HALF_FLOAT
25 CL_RG CL_UNORM_INT8
26 CL_RG CL_UNORM_INT16
27 CL_RG CL_SNORM_INT16
28 CL_RG CL_SIGNED_INT8
29 CL_RG CL_SIGNED_INT16
30 CL_RG CL_SIGNED_INT32
31 CL_RG CL_UNSIGNED_INT8
32 CL_RG CL_UNSIGNED_INT16
33 CL_RG CL_UNSIGNED_INT32
34 CL_RA CL_FLOAT
35 CL_RA CL_HALF_FLOAT
36 CL_RA CL_UNORM_INT8
37 CL_RA CL_UNORM_INT16
38 CL_RA CL_SNORM_INT16
39 CL_RA CL_SIGNED_INT8
40 CL_RA CL_SIGNED_INT16
41 CL_RA CL_SIGNED_INT32
42 CL_RA CL_UNSIGNED_INT8
43 CL_RA CL_UNSIGNED_INT16
44 CL_RA CL_UNSIGNED_INT32
45 CL_RGBA CL_FLOAT
46 CL_RGBA CL_HALF_FLOAT
47 CL_RGBA CL_UNORM_INT8
48 CL_RGBA CL_UNORM_INT16
49 CL_RGBA CL_SNORM_INT16
50 CL_RGBA CL_SIGNED_INT8
51 CL_RGBA CL_SIGNED_INT16
52 CL_RGBA CL_SIGNED_INT32
53 CL_RGBA CL_UNSIGNED_INT8
54 CL_RGBA CL_UNSIGNED_INT16
55 CL_RGBA CL_UNSIGNED_INT32
56 CL_BGRA CL_UNORM_INT8
57 CL_BGRA CL_SIGNED_INT8
58 CL_BGRA CL_UNSIGNED_INT8
59 CL_ARGB CL_UNORM_INT8
60 CL_ARGB CL_SIGNED_INT8
61 CL_ARGB CL_UNSIGNED_INT8
62 CL_INTENSITY CL_FLOAT
63 CL_INTENSITY CL_HALF_FLOAT
64 CL_INTENSITY CL_UNORM_INT8
65 CL_INTENSITY CL_UNORM_INT16
66 CL_INTENSITY CL_SNORM_INT16
67 CL_LUMINANCE CL_FLOAT
68 CL_LUMINANCE CL_HALF_FLOAT
69 CL_LUMINANCE CL_UNORM_INT8
70 CL_LUMINANCE CL_UNORM_INT16
71 CL_LUMINANCE CL_SNORM_INT16
---------------------------------
3D Image Formats Supported (71)
---------------------------------
# Channel Order Channel Type
1 CL_R CL_FLOAT
2 CL_R CL_HALF_FLOAT
3 CL_R CL_UNORM_INT8
4 CL_R CL_UNORM_INT16
5 CL_R CL_SNORM_INT16
6 CL_R CL_SIGNED_INT8
7 CL_R CL_SIGNED_INT16
8 CL_R CL_SIGNED_INT32
9 CL_R CL_UNSIGNED_INT8
10 CL_R CL_UNSIGNED_INT16
11 CL_R CL_UNSIGNED_INT32
12 CL_A CL_FLOAT
13 CL_A CL_HALF_FLOAT
14 CL_A CL_UNORM_INT8
15 CL_A CL_UNORM_INT16
16 CL_A CL_SNORM_INT16
17 CL_A CL_SIGNED_INT8
18 CL_A CL_SIGNED_INT16
19 CL_A CL_SIGNED_INT32
20 CL_A CL_UNSIGNED_INT8
21 CL_A CL_UNSIGNED_INT16
22 CL_A CL_UNSIGNED_INT32
23 CL_RG CL_FLOAT
24 CL_RG CL_HALF_FLOAT
25 CL_RG CL_UNORM_INT8
26 CL_RG CL_UNORM_INT16
27 CL_RG CL_SNORM_INT16
28 CL_RG CL_SIGNED_INT8
29 CL_RG CL_SIGNED_INT16
30 CL_RG CL_SIGNED_INT32
31 CL_RG CL_UNSIGNED_INT8
32 CL_RG CL_UNSIGNED_INT16
33 CL_RG CL_UNSIGNED_INT32
34 CL_RA CL_FLOAT
35 CL_RA CL_HALF_FLOAT
36 CL_RA CL_UNORM_INT8
37 CL_RA CL_UNORM_INT16
38 CL_RA CL_SNORM_INT16
39 CL_RA CL_SIGNED_INT8
40 CL_RA CL_SIGNED_INT16
41 CL_RA CL_SIGNED_INT32
42 CL_RA CL_UNSIGNED_INT8
43 CL_RA CL_UNSIGNED_INT16
44 CL_RA CL_UNSIGNED_INT32
45 CL_RGBA CL_FLOAT
46 CL_RGBA CL_HALF_FLOAT
47 CL_RGBA CL_UNORM_INT8
48 CL_RGBA CL_UNORM_INT16
49 CL_RGBA CL_SNORM_INT16
50 CL_RGBA CL_SIGNED_INT8
51 CL_RGBA CL_SIGNED_INT16
52 CL_RGBA CL_SIGNED_INT32
53 CL_RGBA CL_UNSIGNED_INT8
54 CL_RGBA CL_UNSIGNED_INT16
55 CL_RGBA CL_UNSIGNED_INT32
56 CL_BGRA CL_UNORM_INT8
57 CL_BGRA CL_SIGNED_INT8
58 CL_BGRA CL_UNSIGNED_INT8
59 CL_ARGB CL_UNORM_INT8
60 CL_ARGB CL_SIGNED_INT8
61 CL_ARGB CL_UNSIGNED_INT8
62 CL_INTENSITY CL_FLOAT
63 CL_INTENSITY CL_HALF_FLOAT
64 CL_INTENSITY CL_UNORM_INT8
65 CL_INTENSITY CL_UNORM_INT16
66 CL_INTENSITY CL_SNORM_INT16
67 CL_LUMINANCE CL_FLOAT
68 CL_LUMINANCE CL_HALF_FLOAT
69 CL_LUMINANCE CL_UNORM_INT8
70 CL_LUMINANCE CL_UNORM_INT16
71 CL_LUMINANCE CL_SNORM_INT16
oclDeviceQuery, Platform Name = NVIDIA CUDA, Platform Version = OpenCL 1.2 CUDA 9.1.84, SDK Revision = 7027912, NumDevs = 1, Device = GeForce GT 630M
System Info:
Local Time/Date = 17:37:54, 1/23/2019
CPU Arch: 9
CPU Level: 6
# of CPU processors: 8
Windows Build: 9200
Windows Ver: 6.2 (Windows Vista / Windows 7)
[ocldevicequery] test results...
PASSED