Question

我正在使用顶点数组在OpenGL ES2.0（iOS）中绘制许多类似的2D四边形，试图保持最佳性能。我知道，出于性能原因，建议将所有计算的几何体放入VBO中，并使用尽可能少的调用glDrawArrays（）。

然而，当有许多类似的四边形，每个都在每个帧中进行转换时，创建一次一个非常小的VBO不会更快，例如只添加四个顶点（或两个三角形）进入它，然后继续每帧转换和每个四元组的单独glDrawArrays（GL_TRIANGLE_STRIP， 0,4 ）调用？

在这种情况下，我希望从CPU到GPU的数据传输更少，性能更好，因为VBO内容很小且是静态的。多个glDrawArrays（）调用将使用作为制服传递的不同模型 - 视图 - 投影矩阵重复重绘相同的几何体。以下代码可以阐明我尝试做的事情：

/// Executed only once:

/// The quad attributes (only position to simplify the example).
NSInteger idx = 0;
attributes[idx++] = -0.5;
attributes[idx++] = -0.5;
attributes[idx++] = 0.5;
attributes[idx++] = -0.5;
attributes[idx++] = -0.5;
attributes[idx++] = 0.5;
attributes[idx++] = 0.5;
attributes[idx++] = 0.5;

/// The buffer data
if(NO == glIsVertexArrayOES(vertexArray)) {
    glGenVertexArraysOES(1, &vertexArray);
    glGenBuffers(1, &bufferObject);
}
glBindVertexArrayOES(vertexArray);
glBindBuffer(GL_ARRAY_BUFFER, bufferObject);
glBufferData(GL_ARRAY_BUFFER, sizeof(attributes), attributes, GL_STATIC_DRAW);
glEnableVertexAttribArray(positionAttributeLocation);
glVertexAttribPointer(positionAttributeLocation, 2, GL_FLOAT, GL_FALSE, 2*sizeof(float), (char *)NULL);

///...

/// Executed per frame:

glBindVertexArrayOES(vertexArray);
for(NSInteger i = 0; i < numQuads; i++) {
   quad = [quads objectAtIndex:i];
   m4 = GLKMatrix4Identity;
   m4 = GLKMatrix4MakeScale(quad.size, quad.size, 1.0);
   m4 = GLKMatrix4Multiply(GLKMatrix4MakeRotation(quad.angle, 0.0, 0.0, 1.0), m4);
   m4 = GLKMatrix4Multiply(GLKMatrix4MakeTranslation(quad.position.x, quad.position.y, 0.0), m4);
   modelViewProjectionMatrix = GLKMatrix4Multiply(projectionMatrix, m4);
   glUniformMatrix4fv(uniformLocationMVP, 1, GL_FLASE, modelViewProjectionMatrix.m);
   glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);
}

这种方法相对于单个glDrawArrays（）调用是否具有任何性能优势？

多个glDrawArrays（）调用与缓冲区更新 - 性能

0 个答案: