我有这段代码片段(用于立方体贴图PCF过滤)。我想为着色器模型2优化它。我尝试用存储在制服中的置换矩阵来消除分支,但它需要太多(2x24)。
float3 l = normalize(ldir);
float3 al = abs(l);
float3 off2, off3, off4;
if( al.x < al.y )
{
if( al.y < al.z )
{
// z is dominant
off2 = CubeOffset(l.zxy, float2(0, 1), texelsize).yzx;
off3 = CubeOffset(l.zxy, float2(1, 0), texelsize).yzx;
off4 = CubeOffset(l.zxy, float2(1, 1), texelsize).yzx;
}
else
{
// y is dominant
off2 = CubeOffset(l.yxz, float2(0, 1), texelsize).yxz;
off3 = CubeOffset(l.yxz, float2(1, 0), texelsize).yxz;
off4 = CubeOffset(l.yxz, float2(1, 1), texelsize).yxz;
}
}
else
{
if( al.x < al.z )
{
// z is dominant
off2 = CubeOffset(l.zxy, float2(0, 1), texelsize).yzx;
off3 = CubeOffset(l.zxy, float2(1, 0), texelsize).yzx;
off4 = CubeOffset(l.zxy, float2(1, 1), texelsize).yzx;
}
else
{
// x is dominant
off2 = CubeOffset(l, float2(0, 1), texelsize);
off3 = CubeOffset(l, float2(1, 0), texelsize);
off4 = CubeOffset(l, float2(1, 1), texelsize);
}
}
也许在比较(al.xyy&lt; al.yzz)和混合之间可以找到数学关系。
更新: cubeoffset的定义
float3 CubeOffset(float3 swiz, float2 off, float2 texelsize)
{
float3 ret;
ret.yz = swiz.yz + 2.0f * off * texelsize;
ret.x = sqrt(1.0f - dot(ret.yz, ret.yz));
if( swiz.x < 0 )
ret.x *= -1.0f;
return ret;
}
编译SM 2.0时出现HLSL错误:
error X5608: Compiled shader code uses too many arithmetic instruction slots (107).
Max. allowed by the target (ps_2_0) is 64.
error X5609: Compiled shader code uses too many instruction slots (111).
Max. allowed by the target (ps_2_0) is 96.
GLSL处理得很好。目标是向后兼容。
(顺便说一句。算法有问题,但现在不是问题)
答案 0 :(得分:0)
不是真正的优化,但考虑测试一下。
显而易见,并非总是需要,在这种情况下很少有最佳解决方案是转移到不适合的CPU额外代码(例如,由于指令计数)。在分支的情况下,您可以:
这是你能做的最简单的事情。而且无需在汇编程序中进行调整。问题是在着色器内计算条件时。
希望它有所帮助。
答案 1 :(得分:0)
虽然我不知道是否可以用SM 2.0解决,但考虑到GPU功率的进步,我提供了SM 3.0解决方案。
请注意,此代码是我自己的着色器语言的代码段(但与HLSL类似):
template <int samples>
float PCFIrregularCUBE(sampler shadowmap, sampler noisetex, float3 ldir, float2 sloc, float2 texelsize)
{
const float kernelradius = 2.0f;
float3 l = normalize(ldir);
float3 al = abs(l);
float2 noise;
float2 rotated;
float sd, t, s;
float d = length(ldir);
noise = tex2D(noisetex, sloc);
noise = normalize(noise * 2.0f - 1.0f);
float2 rotmat0 = float2(noise.x, noise.y);
float2 rotmat1 = float2(-noise.y, noise.x);
float3 off;
s = 0;
for( int i = 0; i < samples; ++i ) {
rotated.x = dot(irreg_kernel[i], rotmat0) * kernelradius;
rotated.y = dot(irreg_kernel[i], rotmat1) * kernelradius;
if( al.x < al.y ) {
if( al.y < al.z )
off = CubeOffsetZXY(l, rotated, texelsize);
else
off = CubeOffsetYXZ(l, rotated, texelsize);
} else {
if( al.x < al.z )
off = CubeOffsetZXY(l, rotated, texelsize);
else
off = CubeOffsetXYZ(l, rotated, texelsize);
}
sd = texCUBE(shadowmap, off).r;
t = ((d > sd) ? 0.0f : 1.0f);
s += ((sd < 0.001f) ? 1.0f : t);
}
return s * (1.0f / samples);
}
CubeOffsetXXX如:
float3 CubeOffsetZXY(float3 swiz, float2 off, float2 texelsize)
{
float3 ret;
ret.xy = swiz.xy + 2.0f * off * texelsize * swiz.z;
ret.z = sqrt(1.0f - dot(ret.xy, ret.xy));
if( swiz.z < 0 )
ret.z *= -1.0f;
return ret;
}
有关详细信息,您应该谷歌不规则PCF 。最糟糕的结果(如“相机关闭”)是:
注意不规则PCF引起的“盐和胡椒”噪音。从远处看它是完全可以接受的(孤岛危机1法)。