Question

我写了一个OpenCL内核，它使用opencl-opengl互操作性来读取顶点和索引，但可能这甚至不重要，因为我只是做了简单的指针添加，以便通过索引获得特定的顶点。

uint pos = (index + base)*stride;

这里我以字节计算绝对位置，在我的例子中，pos是28,643,328，步幅为28，index = 0，base = 1,022,976。好吧，这似乎是正确的。

不幸的是，我无法直接使用vload3因为offset参数不是以字节为单位计算的绝对地址。所以我只需将pos添加到指针void* vertices_gl

void* new_addr = vertices_gl+pos;

new_addr在我的例子中= 0x2f90000，这是奇怪的部分开始的地方，

vertices_gl = 0x303f000

结果（new_addr）应为0x4B90000（0x303f000 + 28,643,328）

我不明白为什么地址vertices_gl减少了716,800（0xAF000）

我的目标是GPU：AMD Radeon HD5830

Ps：对于那些想知道的人，我使用printf来获取这些值:)（无法让CodeXL工作）

Answer 1

void*指针没有指针算法。使用char*指针执行逐字节指针计算。

或者好多了：使用指针指向的 real 类型，不要乘以偏移量。只需写vertex[index+base]假设vertex指向包含28个字节数据的类型。

性能考虑：将顶点属性与幂2对齐，以便合并内存访问。这意味着，在每个顶点条目后添加4个字节的填充。要自动执行此操作，如果属性都是浮点值，请使用float8作为顶点类型。我假设您使用位置和普通数据或类似的东西，因此编写一个自定义结构可能是一个好主意，该结构以方便和自我解释的方式封装两个向量：

// Defining a type for the vertex data. This is 32 bytes large.
// You can share this code in a header for inclusion in both OpenCL and C / C++!
typedef struct {
    float4 pos;
    float4 normal;
} VertexData;

// Example kernel
__kernel void computeNormalKernel(__global VertexData *vertex, uint base) {
    uint index = get_global_id(0);
    VertexData thisVertex = vertex[index+base];   // It can't be simpler!
    thisVertex.normal = computeNormal(...);       // Like you'd do it in C / C++!
    vertex[index+base] = thisVertex;              // Of couse also when writing
}

注意：如果您只是将float4中的一个更改为float3，则此代码不适用于您的步幅28，因为float3也会消耗4个浮点内存。但你可以像这样编写它，它不会添加填充（但请注意这会损害内存访问带宽）：

typedef struct {
    float pos[4];
    float normal[3];  // Assuming you want 3 floats here
} VertexData;

OpenCL void指针算术 - 奇怪的行为

1 个答案: