我正在使用顶点数组在OpenGL ES2.0(iOS)中绘制许多类似的2D四边形,试图保持最佳性能。我知道,出于性能原因,建议将所有计算的几何体放入VBO中,并使用尽可能少的调用glDrawArrays()。
然而,当有许多类似的四边形,每个都在每个帧中进行转换时,创建一次一个非常小的VBO不会更快,例如只添加四个顶点(或两个三角形)进入它,然后继续每帧转换和每个四元组的单独glDrawArrays(GL_TRIANGLE_STRIP, 0,4 )调用?
在这种情况下,我希望从CPU到GPU的数据传输更少,性能更好,因为VBO内容很小且是静态的。多个glDrawArrays()调用将使用作为制服传递的不同模型 - 视图 - 投影矩阵重复重绘相同的几何体。以下代码可以阐明我尝试做的事情:
/// Executed only once:
/// The quad attributes (only position to simplify the example).
NSInteger idx = 0;
attributes[idx++] = -0.5;
attributes[idx++] = -0.5;
attributes[idx++] = 0.5;
attributes[idx++] = -0.5;
attributes[idx++] = -0.5;
attributes[idx++] = 0.5;
attributes[idx++] = 0.5;
attributes[idx++] = 0.5;
/// The buffer data
if(NO == glIsVertexArrayOES(vertexArray)) {
glGenVertexArraysOES(1, &vertexArray);
glGenBuffers(1, &bufferObject);
}
glBindVertexArrayOES(vertexArray);
glBindBuffer(GL_ARRAY_BUFFER, bufferObject);
glBufferData(GL_ARRAY_BUFFER, sizeof(attributes), attributes, GL_STATIC_DRAW);
glEnableVertexAttribArray(positionAttributeLocation);
glVertexAttribPointer(positionAttributeLocation, 2, GL_FLOAT, GL_FALSE, 2*sizeof(float), (char *)NULL);
///...
/// Executed per frame:
glBindVertexArrayOES(vertexArray);
for(NSInteger i = 0; i < numQuads; i++) {
quad = [quads objectAtIndex:i];
m4 = GLKMatrix4Identity;
m4 = GLKMatrix4MakeScale(quad.size, quad.size, 1.0);
m4 = GLKMatrix4Multiply(GLKMatrix4MakeRotation(quad.angle, 0.0, 0.0, 1.0), m4);
m4 = GLKMatrix4Multiply(GLKMatrix4MakeTranslation(quad.position.x, quad.position.y, 0.0), m4);
modelViewProjectionMatrix = GLKMatrix4Multiply(projectionMatrix, m4);
glUniformMatrix4fv(uniformLocationMVP, 1, GL_FLASE, modelViewProjectionMatrix.m);
glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);
}
这种方法相对于单个glDrawArrays()调用是否具有任何性能优势?