任何人都可以告诉我为什么我的gpu的Max工作项少于cpu和计算单元??? 是cpu的平均性能优于gpu
cpu:intel core i7 2.2GH gpu:amd radeon hd 6700M
Number of platforms: 2
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 1.2 AMD-APP (1084.2)
Platform Name: AMD Accelerated Parallel Proces
sing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callbac
k cl_amd_offline_devices cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_me
dia_sharing
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 1.2
Platform Name: Intel(R) OpenCL
Platform Vendor: Intel(R) Corporation
Platform Extensions: cl_khr_fp64 cl_khr_icd cl_khr_g
lobal_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32
_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store
cl_intel_printf cl_ext_device_fission cl_intel_exec_by_local_thread cl_khr_gl_sh
aring cl_intel_dx9_media_sharing cl_khr_dx9_media_sharing cl_khr_d3d11_sharing
Platform Name: AMD Accelerated Parallel Proces
sing
Number of devices: 2
Device Type: CL_DEVICE_TYPE_GPU
Device ID: 4098
Max compute units: 6
Max work items dimensions: 3
Max work items[0]: 256
Max work items[1]: 256
Max work items[2]: 256
Max work group size: 256
Preferred vector width char: 16
Preferred vector width short: 8
Preferred vector width int: 4
Preferred vector width long: 2
Preferred vector width float: 4
Preferred vector width double: 0
Native vector width char: 16
Native vector width short: 8
Native vector width int: 4
Native vector width long: 2
Native vector width float: 4
Native vector width double: 0
Max clock frequency: 725Mhz
Address bits: 32
Max memory allocation: 536870912
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 16384
Max image 2D height: 16384
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 1024
Alignment (bits) of base address: 2048
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: No
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: None
Cache line size: 0
Cache size: 0
Global memory size: 2147483648
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 32768
Kernel Preferred work group size multiple: 64
Error correction support: 0
Unified memory for Host and Device: 0
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue properties:
Out-of-Order: No
Profiling : Yes
Platform ID: 02843864
Name: Turks
Vendor: Advanced Micro Devices, Inc.
Driver version: 1084.2 (VM)
Profile: FULL_PROFILE
Version: OpenCL 1.2 AMD-APP (1084.2)
Extensions: cl_khr_global_int32_base_atomic
s cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_lo
cal_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store
cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd
_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_d
x9_media_sharing
Device Type: CL_DEVICE_TYPE_CPU
Device ID: 4098
Max compute units: 8
Max work items dimensions: 3
Max work items[0]: 1024
Max work items[1]: 1024
Max work items[2]: 1024
Max work group size: 1024
Preferred vector width char: 16
Preferred vector width short: 8
Preferred vector width int: 4
Preferred vector width long: 2
Preferred vector width float: 8
Preferred vector width double: 4
Native vector width char: 16
Native vector width short: 8
Native vector width int: 4
Native vector width long: 2
Native vector width float: 8
Native vector width double: 4
Max clock frequency: 2195Mhz
Address bits: 32
Max memory allocation: 1073741824
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 8192
Max image 2D height: 8192
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 4096
Alignment (bits) of base address: 1024
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: Yes
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: Read/Write
Cache line size: 64
Cache size: 32768
Global memory size: 2147483648
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Global
Local memory size: 32768
Kernel Preferred work group size multiple: 1
Error correction support: 0
Unified memory for Host and Device: 1
Profiling timer resolution: 466
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: Yes
Queue properties:
Out-of-Order: No
Profiling : Yes
Platform ID: 02843864
Name: Intel(R) Core(TM) i7-2670
QM CPU @ 2.20GHz
Vendor: GenuineIntel
Driver version: 1084.2 (sse2,avx)
Profile: FULL_PROFILE
Version: OpenCL 1.2 AMD-APP (1084.2)
Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_
global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int3
2_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr
_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_at
tribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt cl_khr_d3
d10_sharing
Platform Name: Intel(R) OpenCL
Number of devices: 1
Device Type: CL_DEVICE_TYPE_CPU
Device ID: 32902
Max compute units: 8
Max work items dimensions: 3
Max work items[0]: 1024
Max work items[1]: 1024
Max work items[2]: 1024
Max work group size: 1024
Preferred vector width char: 1
Preferred vector width short: 1
Preferred vector width int: 1
Preferred vector width long: 1
Preferred vector width float: 1
Preferred vector width double: 1
Native vector width char: 16
Native vector width short: 8
Native vector width int: 4
Native vector width long: 2
Native vector width float: 8
Native vector width double: 4
Max clock frequency: 2200Mhz
Address bits: 32
Max memory allocation: 536838144
Image support: Yes
Max number of images read arguments: 480
Max number of images write arguments: 480
Max image 2D width: 16384
Max image 2D height: 16384
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 480
Max size of kernel argument: 3840
Alignment (bits) of base address: 1024
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: Yes
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: No
Round to +ve and infinity: No
IEEE754-2008 fused multiply-add: No
Cache type: Read/Write
Cache line size: 64
Cache size: 262144
Global memory size: 2147352576
Constant buffer size: 131072
Max number of constant args: 480
Local memory type: Global
Local memory size: 32768
Kernel Preferred work group size multiple: 128
Error correction support: 0
Unified memory for Host and Device: 1
Profiling timer resolution: 466
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: Yes
Queue properties:
Out-of-Order: Yes
Profiling : Yes
Platform ID: 00401218
Name: Intel(R) Core(TM) i7-2670
QM CPU @ 2.20GHz
Vendor: Intel(R) Corporation
Driver version: 3.0.1.15216
Profile: FULL_PROFILE
Version: OpenCL 1.2 (Build 80752)
Extensions: cl_khr_fp64 cl_khr_icd cl_khr_g
lobal_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32
_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store
cl_intel_printf cl_ext_device_fission cl_intel_exec_by_local_thread cl_khr_gl_sh
aring cl_intel_dx9_media_sharing cl_khr_dx9_media_sharing cl_khr_d3d11_sharing
为什么看到三个设备类型二用于cpu而一个用于gpu opencl for intel for cpu或built-in gpu 我有两个显示适配器:AMD Radeon HD 6700M系列 英特尔高清图形系列
答案 0 :(得分:2)
“我的GPU有多少核心/处理元素/硬件线程?”是新GPGPU用户经常被问到的问题。我通常的回答是“你为什么关心?”。无法查询设备使用OpenCL API的处理元素的数量。究竟什么构成处理元素和计算单元在不同架构之间差别很大。
现实情况是,您的设备有多少处理元素无关紧要,因为使用此指标无论如何都是估算设备性能的一种非常糟糕的方式。如果你真的需要知道设备对你的特定应用程序有多快,那么你应该对它进行基准测试(直接使用你的应用程序,或者使用与你的应用程序具有类似属性的微基准测试)。
回答您的另一个问题:您的系统上有两个能够使用CPU,Intel和AMD的OpenCL实现。因此,两个平台都会将CPU报告为可用的OpenCL设备。