Question

我目前正在使用一个基于Apple的GLPaint示例的库，用于在Open GL中的屏幕上绘图。目前，无论何时画布保存并恢复会话，都会绘制线条（可以看到进度），如果有很多要渲染的点，则需要相当长的时间。有没有办法让它并行或更快地渲染？

这是我使用的绘图代码：

CGPoint start = step.start;
CGPoint end = step.end;

// Convert touch point from UIView referential to OpenGL one (upside-down flip)
CGRect bounds = [self bounds];
start.y = bounds.size.height - start.y;
end.y = bounds.size.height - end.y;

static GLfloat*     vertexBuffer = NULL;
static NSUInteger   vertexMax = 64;
NSUInteger          vertexCount = 0,
count,
i;

[EAGLContext setCurrentContext:context];
glBindFramebufferOES(GL_FRAMEBUFFER_OES, viewFramebuffer);

// Convert locations from Points to Pixels
CGFloat scale = self.contentScaleFactor;
start.x *= scale;
start.y *= scale;
end.x *= scale;
end.y *= scale;

// Allocate vertex array buffer
if(vertexBuffer == NULL)
    vertexBuffer = malloc(vertexMax * 2 * sizeof(GLfloat));

// Add points to the buffer so there are drawing points every X pixels
count = MAX(ceilf(sqrtf((end.x - start.x) * (end.x - start.x) + (end.y - start.y) * (end.y - start.y)) / kBrushPixelStep), 1);
for(i = 0; i < count; ++i) {
    if(vertexCount == vertexMax) {
        vertexMax = 2 * vertexMax;
        vertexBuffer = realloc(vertexBuffer, vertexMax * 2 * sizeof(GLfloat));
    }

    vertexBuffer[2 * vertexCount + 0] = start.x + (end.x - start.x) * ((GLfloat)i / (GLfloat)count);
    vertexBuffer[2 * vertexCount + 1] = start.y + (end.y - start.y) * ((GLfloat)i / (GLfloat)count);
    vertexCount += 1;
}

// Render the vertex array
glVertexPointer(2, GL_FLOAT, 0, vertexBuffer);
glDrawArrays(GL_POINTS, 0, (int)vertexCount);

// Display the buffer
glBindRenderbufferOES(GL_RENDERBUFFER_OES, viewRenderbuffer);
[context presentRenderbuffer:GL_RENDERBUFFER_OES];

Answer 1

OpenGL不是多线程的。您必须从单个线程提交OpenGL命令。

您有几个选择：

您可以将代码分解为使用并发来构建发送到OpenGL的数据，然后在OpenGL API全部可用后将其提交给OpenGL API。
您可以使用着色器重构它以进行计算。这将计算推出CPU并进入GPU，这是针对并行操作进行了高度优化的。

上面的代码使用realloc在for循环中重复重新分配缓冲区。这是非常低效的，因为内存分配是现代操作系统上最慢的基于RAM的操作之一。您应该重构代码以预先计算内存缓冲区的最终大小，然后将缓冲区分配给它的最终大小一次，而不是使用realloc。这可以让您在很短的时间内快速提高速度。

看一下你的代码，重构你的for循环以将顶点计算分解为块并将这些块提交给GCD进行并发处理应该不是很难。诀窍在于将任务分解为足够大的工作单元，以便从并行处理中受益（设置任务以在后台队列中运行会产生一定的开销。您希望在每个工作单元中做足够的工作让开销值得。）

Answer 2

我相信上面评论中的对话框显示了您的性能问题的主要部分。除非我完全误解了它，否则代码的高级结构目前看起来像这样：

loop over steps
    calculate list of points from start/end points
    render list of points
    present the renderbuffer
end loop

仅在呈现所有步骤后呈现渲染缓冲区的速度要快得多：

loop over steps
    generate list of points from start/end points
    draw list of points
end loop
present the renderbuffer

更好的是，作为创建它的一部分，为每个步骤生成顶点缓冲对象（也称为VBO），并将步骤的坐标存储在缓冲区中。然后你的绘制逻辑变为：

loop over steps
    bind VBO for step
    draw content of VBO
end loop
present the renderbuffer

OpenGL GLPaint线程渲染

2 个答案: