Question

我正在尝试通过片段着色器执行自己的自定义glBlendFunc，但是，我的解决方案比原生glBlendFunc慢很多，即使它们执行完全混合功能也是如此。

我想知道是否有人建议如何以更有效的方式做到这一点。

我的解决方案是这样的：

void draw(fbo fbos[2], render_item item)
{
   // fbos[0] is the render target
   // fbos[1] is the previous render target used to read "background" to blend against in shader
   // Both fbos have exactly the same content, however they need to be different since we can't both read and write to the same texture. The texture we render to needs to have the entire content since we might not draw geometry everywhere.

   fbos[0]->attach(); // Attach fbo
   fbos[1]->bind(1); // Bind as texture 1

   render(item);

   glCopyTexSubImage2D(...); // copy from fbos[0] to fbos[1], fbos[1] == fbos[0]
}

fragment.glsl

vec4 blend_color(vec4 fore) 
{   
    vec4 back = texture2D(background, gl_TexCoord[1].st); // background is read from texture "1"
    return vec4(mix(back.rgb, fore.rgb, fore.a), back.a + fore.a);  
}

Answer 1

提高基于FBO的混合效果的最佳选择是NV_texture_barrier。尽管有这个名字，AMD也已经实现了它，所以如果你坚持使用Radeon HD级卡，它应该可供你使用。

基本上，它允许您在没有重量级操作（如FBO绑定或纹理附件操作）的情况下进行乒乓。该规范有一个底部的部分，显示了一般算法。

另一种选择是EXT_shader_image_load_store。这将需要DX11 / GL 4.x类硬件。 OpenGL 4.2最近使用ARB_shader_image_load_store将其提升为核心。

即便如此，正如达西所说，你永远不会打败经常混合。它使用着色器无法访问的特殊硬件结构（因为它们在着色器运行后发生）。如果有一些你无法以其他方式完成的效果，你应该只进行编程混合。

Answer 2

效率更高，因为混合操作直接内置于GPU硬件中，因此您可能无法在速度上击败它。话虽如此，请确保您已进行深度测试，背面剔除，硬件混合以及任何其他不需要的操作。我不能说它会产生巨大的变化，但它可能会产生一些。

自定义glBlendFunc比本机慢很多

2 个答案: