我正在研究一些棘手的金属逻辑,我想知道是否有人知道从金属片段着色器中读取多个输入纹理的最快方法(带有捕获,这必须在A7处理器上工作,所以没有Metal2功能请)。
fragment half4
fragmentShader(RasterizerData in [[stage_in]],
texture2d<half, access::read> c0Texture [[ texture(0) ]],
texture2d<half, access::read> c1Texture [[ texture(1) ]],
texture2d<half, access::read> c2Texture [[ texture(2) ]],
texture2d<half, access::read> c3Texture [[ texture(3) ]],
constant RenderTargetDimensionsUniform &rtd [[ buffer(0) ]])
{
ushort offsetGroup = ...;
half4 inHalf4;
// Choose from 1 of N textures
switch (offsetGroup) {
case 0: {
inHalf4 = c0Texture.read(blockRoot);
break;
}
case 1: {
inHalf4 = c1Texture.read(blockRoot);
break;
}
case 2: {
inHalf4 = c2Texture.read(blockRoot);
break;
}
case 3: {
inHalf4 = c3Texture.read(blockRoot);
break;
}
}
...
我尝试使用如下代码优化此逻辑,但速度要慢一些。
texture2d<half, access::read> arr[16];
arr[0] = c0Texture;
arr[1] = c1Texture;
arr[2] = c2Texture;
arr[3] = c3Texture;
inHalf4 = arr[offsetGroup].read(blockRoot);
当需要像这样读取大量输入纹理时,是否有其他方法可以正常工作并提供良好的性能?
更新:在查看Ken的评论之后,我尝试了一种简化方法,该方法使用blit命令编码器将每个纹理的数据复制到Y轴上具有切片偏移的单个纹理中。这个修改减少了着色器执行时间35ms,是一个很大的胜利!
fragment half4
fragmentShader(RasterizerData in [[stage_in]],
texture2d<half, access::read> inTexture [[ texture(0) ]],
constant RenderTargetDimensionsUniform &rtd [[ buffer(0) ]])
{
ushort slice = ...;
half4 inHalf4 = inTexture.read(blockRoot + ushort2(0, slice * sliceHeight));