我正在尝试将项目从OpenGL迁移到iOS上的Metal。但是我似乎已经碰壁了。任务很简单...
我的纹理很大(超过3000x3000像素)。在每个touchesMoved事件上,我需要在其上绘制几个(数百个)小纹理(例如124x124)。这是在启用特定混合功能的同时。它基本上就像一个油漆刷。然后显示大的纹理。这大致就是任务。
在OpenGL上,它运行非常快。我大约达到60fps。当我将相同的代码移植到Metal时,我只能设法获得15fps。
我已经创建了两个示例项目,几乎没有演示这个问题。这是项目(OpenGL和Metal)...
https://drive.google.com/file/d/12MPt1nMzE2UL_s4oXEUoTCXYiTz42r4b/view?usp=sharing
这大概是我在OpenGL中所做的...
?dplyr::between
我在每个触摸事件中以循环方式(大约200-500)运行此代码。它运行非常快。
这就是我将代码移植到Metal的方式...
- (void) renderBrush:(GLuint)brush on:(GLuint)fbo ofSize:(CGSize)size at:(CGPoint)point {
GLfloat brushCoordinates[] = {
0.0f, 0.0f,
1.0f, 0.0f,
0.0f, 1.0f,
1.0f, 1.0f,
};
GLfloat imageVertices[] = {
-1.0f, -1.0f,
1.0f, -1.0f,
-1.0f, 1.0f,
1.0f, 1.0f,
};
int brushSize = 124;
CGRect rect = CGRectMake(point.x - brushSize/2, point.y - brushSize/2, brushSize, brushSize);
rect.origin.x /= size.width;
rect.origin.y /= size.height;
rect.size.width /= size.width;
rect.size.height /= size.height;
[self convertImageVertices:imageVertices toProjectionRect:rect onImageOfSize:size];
int currentFBO;
glGetIntegerv(GL_FRAMEBUFFER_BINDING, ¤tFBO);
[_Program use];
glBindFramebuffer(GL_FRAMEBUFFER, fbo);
glViewport(0, 0, (int)size.width, (int)size.height);
glActiveTexture(GL_TEXTURE2);
glBindTexture(GL_TEXTURE_2D, brush);
glUniform1i(brushTextureLocation, 2);
glVertexAttribPointer(positionLocation, 2, GL_FLOAT, 0, 0, imageVertices);
glVertexAttribPointer(brushCoordinateLocation, 2, GL_FLOAT, 0, 0, brushCoordinates);
glEnable(GL_BLEND);
glBlendEquation(GL_FUNC_ADD);
glBlendFuncSeparate(GL_ONE, GL_ZERO, GL_ONE, GL_ONE);
glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);
glDisable(GL_BLEND);
glActiveTexture(GL_TEXTURE2);
glBindTexture(GL_TEXTURE_2D, 0);
glBindFramebuffer(GL_FRAMEBUFFER, currentFBO);
}
}
然后在每个触摸事件中使用单个MTLCommandBuffer循环运行此代码,例如...
- (void) renderBrush:(id<MTLTexture>)brush onTarget:(id<MTLTexture>)target at:(CGPoint)point withCommandBuffer:(id<MTLCommandBuffer>)commandBuffer {
int brushSize = 124;
CGRect rect = CGRectMake(point.x - brushSize/2, point.y - brushSize/2, brushSize, brushSize);
rect.origin.x /= target.width;
rect.origin.y /= target.height;
rect.size.width /= target.width;
rect.size.height /= target.height;
Float32 imageVertices[8];
// Calculate the vertices (basically the rectangle that we need to draw) on the target texture that we are going to draw
// We are not drawing on the entire target texture, only on a square around the point
[self composeImageVertices:imageVertices toProjectionRect:rect onImageOfSize:CGSizeMake(target.width, target.height)];
// We use different one vertexBuffer per pass. This is because this is run on a loop and the subsequent calls will overwrite
// The values. Other buffers also get overwritten but that is ok for now, we only need to demonstrate the performance.
id<MTLBuffer> vertexBuffer = [_vertexArray lastObject];
memcpy([vertexBuffer contents], imageVertices, 8 * sizeof(Float32));
id<MTLRenderCommandEncoder> commandEncoder = [commandBuffer renderCommandEncoderWithDescriptor:mRenderPassDescriptor];
commandEncoder.label = @"DrawCE";
[commandEncoder setRenderPipelineState:mPipelineState];
[commandEncoder setVertexBuffer:vertexBuffer offset:0 atIndex:0];
[commandEncoder setVertexBuffer:mBrushTextureBuffer offset:0 atIndex:1];
[commandEncoder setFragmentTexture:brush atIndex:0];
[commandEncoder setFragmentSamplerState:mSampleState atIndex:0];
[commandEncoder drawPrimitives:MTLPrimitiveTypeTriangleStrip vertexStart:0 vertexCount:4];
[commandEncoder endEncoding];
在我所附的示例代码中,我用计时器循环替换了触摸事件,以使事情变得简单。
在iPhone 7 Plus上,使用OpenGL可获得60fps,而使用Metal则可获得15fps。可能是我在这里做错了什么吗?
答案 0 :(得分:3)
删除所有冗余:
-setVertexBufferOffset:atIndex:
仅在需要时设置偏移量,而无需更改缓冲区。composeImageVertices:...
可以通过适当的强制转换直接写入顶点缓冲区,从而避免使用memcpy
。composeImageVertices:...
的实际作用以及deltaX
和deltaY
是常量,您也许可以一次设置一次顶点缓冲区。顶点着色器可以根据需要变换顶点。您将以统一的形式(目的地和渲染目标大小,甚至是变换矩阵)传递适当的数据。mPipelineState
,mBrushTextureBuffer
和mSampleState
。stroke
个实例。在顶点着色器中,根据实例ID变换位置。您将必须以统一数据的形式传递deltaX
和deltaY
。笔刷索引也可以位于传入的单个缓冲区中,并且着色器可以通过实例ID在其中查找笔刷索引。