为了加速我一直在写的体素渲染器,我将场景预先计算到显示列表中。场景是一个简单的50x50x3 blob的剔除体素,所以这是50 * 50 * 2 + 50 * 3 * 4或5600四边形。每个四边形都与一个纹理相关联,因此22400个顶点每个都有一个纹理坐标。
速度差异令人难以置信:非列表程序渲染帧需要16ms,而下载程序渲染时需要50ms(波动很大)。
我读了https://www.opengl.org/archives/resources/faq/technical/displaylist.htm,它表明如果程序崩溃,性能不会提高,因为显示列表会占用更多内存。但是,据我所知,我的程序的两个版本都使用稳定的10.0MB。
我目前正在使用SDL2,freeglut3,AMD Athlon(tm)II X4 645处理器×4和它的集成显卡,AMD RS880上的Gallium 0.4和64位的Debian GNU / Linux 8(jessie),如果这是一些特定于系统的问题。
如何在没有顶点列表的情况下进行渲染是为每个最后一个体素调用此代码:
void draw_cube_culled(spritesheet * sheet, rect * front, rect * back, rect * left, rect * right, rect * top, rect * bot, float x, float y, float z, float s, renderstate * state) {
glBindTexture(GL_TEXTURE_2D, sheet->sheet_image->texture);
side_t side;
glBegin(GL_QUADS);
glColor3f(1.0f, 1.0f, 1.0f);
//BOT--Y constant
side = BOT;
if(state->side[side]) {
glTexCoord2f(bot->left, bot->bot); glVertex3f(x, y, z);
glTexCoord2f(bot->right, bot->bot); glVertex3f(x+s, y, z);
glTexCoord2f(bot->right, bot->top); glVertex3f(x+s, y, z+s);
glTexCoord2f(bot->left, bot->top); glVertex3f(x, y, z+s);
}
//TOP--Y constant
side = TOP;
if(state->side[side]) {
glTexCoord2f(top->left, top->bot); glVertex3f(x, y+s, z);
glTexCoord2f(top->right, top->bot); glVertex3f(x+s, y+s, z);
glTexCoord2f(top->right, top->top); glVertex3f(x+s, y+s, z+s);
glTexCoord2f(top->left, top->top); glVertex3f(x, y+s, z+s);
}
//LEFT--X constant
side = LEFT;
if(state->side[side]) {
glTexCoord2f(left->left, left->bot); glVertex3f(x, y, z);
glTexCoord2f(left->right, left->bot); glVertex3f(x, y, z+s);
glTexCoord2f(left->right, left->top); glVertex3f(x, y+s, z+s);
glTexCoord2f(left->left, left->top); glVertex3f(x, y+s, z);
}
//RIGHT--X constant
side = RIGHT;
if(state->side[side]) {
glTexCoord2f(right->right, right->bot); glVertex3f(x+s, y, z);
glTexCoord2f(right->left, right->bot); glVertex3f(x+s, y, z+s);
glTexCoord2f(right->left, right->top); glVertex3f(x+s, y+s, z+s);
glTexCoord2f(right->right, right->top); glVertex3f(x+s, y+s, z);
}
//back--Z constant
side = BACK;
if(state->side[side]) {
glTexCoord2f(back->right, back->bot); glVertex3f(x, y, z);
glTexCoord2f(back->left, back->bot); glVertex3f(x+s, y, z);
glTexCoord2f(back->left, back->top); glVertex3f(x+s, y+s, z);
glTexCoord2f(back->right, back->top); glVertex3f(x, y+s, z);
}
//front--Z constant
side = FRONT;
if(state->side[side]) {
glTexCoord2f(front->left, front->bot); glVertex3f(x, y, z+s);
glTexCoord2f(front->right, front->bot); glVertex3f(x+s, y, z+s);
glTexCoord2f(front->right, front->top); glVertex3f(x+s, y+s, z+s);
glTexCoord2f(front->left, front->top); glVertex3f(x, y+s, z+s);
}
glEnd();
}
为了生成显示列表,我在初始化时调用此代码一次(请注意,draw_diorama是我用于在普通代码中绘制场景的方法,因此它调用50 * 50 * 3次以上的函数):
void diorama_compile(diorama * d) {
glNewList(d->compiled,GL_COMPILE);
draw_diorama(d);
glEndList();
}
我正在渲染显示列表:
void draw_compiled_diorama(diorama * d) {
glCallList(d->compiled);
}
这真让我困惑,因为一个代码涉及分支至少6 * 50 * 50 * 3倍的屏幕,大量读取不同类型的内存,浮点数学全部在CPU上,而另一个则没有。