为布局输入限定符创建一个pow(2, 24)
和local_size_x = 64
的缓冲区将返回WorkGroupID = 262143
,由于pow(2,24) / 64 - 1
,所以它都是正常的,它是零索引。
但是,如果我们将问题的全局维度/没有元素/大小增加到pow(2, 25)
,可以说WorkGroupID
将无理由地返回值,但它们与数学不匹配。
以下是设备获得的一些限制,我认为:
maxStorageBufferRange: uint32_t = 4294967295
maxComputeSharedMemorySize: uint32_t = 32768
maxComputeWorkGroupCount: uint32_t[3] = 00000202898A8EC4
maxComputeWorkGroupCount[0]: uint32_t = 65535
maxComputeWorkGroupCount[1]: uint32_t = 65535
maxComputeWorkGroupCount[2]: uint32_t = 65535
maxComputeWorkGroupInvocations: uint32_t = 1024
maxComputeWorkGroupSize: uint32_t[3] = 00000202898A8ED4
maxComputeWorkGroupSize[0]: uint32_t = 1024
maxComputeWorkGroupSize[1]: uint32_t = 1024
maxComputeWorkGroupSize[2]: uint32_t = 1024
我不会过分分配设备支持的更多元素。 所以经过2天+ 16小时后,我仍然没有弄清楚最新情况......
WorkGroupSize
,WorkGroupID
,LocalInvocationID
和GlobalInvocationID
在我达到n no时出现同样的问题。元素。毫无疑问GlobalInvocationID
由于计算方式而出现同样的问题......
#version 450
// Size of the Local Work-group is defined trough input layout qualifier
layout(local_size_x = 64, local_size_y = 1, local_size_z = 1) in;
layout(set = 0, binding = 0) buffer deviceBuffer
{
uint x[];
};
void main() {
uint i = gl_GlobalInvocationID.x;
//uint i = gl_WorkGroupSize.x * gl_WorkGroupID.x * gl_LocalInvocationID.x;
//x[i] += x[i];
// Total No. of Work Items (threads) in Global Dimension
//x[i] = gl_NumWorkGroups.x;
// Size of Work Dimension specified in Input Layout Qualifier
//x[i] = gl_WorkGroupSize.x;
// Is given by Global Dimension / Work Group Size
x[i] = gl_WorkGroupID.x;
//x[i] = gl_LocalInvocationID.x;
}
答案 0 :(得分:1)
maxComputeWorkGroupCount[0]: uint32_t = 65535 maxComputeWorkGroupCount[1]: uint32_t = 65535 maxComputeWorkGroupCount[2]: uint32_t = 65535
vkCmdDispatch
的大小为x = pow(2,25),y = 1,z = 1
根据您提供的信息groupCountX
= 2 25 = 33554432,但限制为maxComputeWorkGroupCount[0]
= 65535 = 2 16 -1
Vulkan规范Valid Usage for vkCmdDispatch说:
groupCountX
必须小于或等于VkPhysicalDeviceLimits::maxComputeWorkGroupCount[0]
违反有效使用是未定义的行为。 “未定义的行为”意味着“从一切看似工作正常”到“你的PC陷入黑洞并摧毁这个太阳系”的任何事情。对于违反有效使用的所有意图和目的,应用程序代码的逻辑错误。