使用GPU优化BVH遍历

时间:2014-09-06 17:57:10

标签: algorithm optimization parallel-processing gpu-programming bounding-volume

我创建了每帧生成的边界卷层次结构。由于它的使用,每个节点必须有两个孩子,不多也不少。

Traversal是我现在的程序中最昂贵的计算,它可以防止大型场景(> 2k三角形)以可接受的帧速率运行。我对如何更快地执行它感到茫然。当有许多光线同时通过时,一个带有16个三角形的简单方形会引入明显的帧丢失。

为了遍历它,我使用了本文提供的概念link,其中每个线程遍历以下代码。

可以做些什么来提高它的速度?

此刻,此功能用于更大的内核。将它在自己的调度调用中隔离是否会以使用更多内存来存储其输出为代价?

//o= origin , d =direction, bestT = the intersection distance

bool FindBVHIntersect(vec3 o, vec3 d,inout float bestT){ 

//starting at the root of the bvh(0) the locations of the two children are found (near , far)

uint current=0;
uint last=0;
uint near=uint(bvh[current].w);
uint far=uint(bvh[current+1].w);

//max distance to test intersection
float distance=4294967295.0f;
bool success=false;
bool run= true;

while(run){

//update the children's location
near=uint(bvh[current].w);
far=uint(bvh[current+1].w);

//traversing up from the last child

//ending
if(last==far&&current==0){
run=false;
continue;
}
//go to the parent of the node
else if(last==far){
last=current;
current=bvhAtomics[current/2].y;
continue;
}

//depending on the last position, pick a child
uint tryChild= (last==bvhAtomics[current/2].y)?near:far;

bool delve;
//test the AABB
if(tryChild==near){
delve=FindAABBIntersect(bvh[current].xyz,bvh[current+1].xyz,o,d,0.0f,distance);
}
else{
delve=true;
}

//if the child is a node and needs to be delved.  A triangle intersection test.
if(delve&&tryChild>=(nodeSize-1)*2){

    float pt;
    float ob1;
    float ob2; 

    uint bvhPos=uint(bvh[tryChild+2].x);
    uvec3 indPos=indices[bvhPos].xyz;

    bool tr = FindTriangleIntersect(vertices[indPos.x].xyz, vertices[indPos.y].xyz,     vertices[indPos.z].xyz, o, d, pt, ob1, ob2 );
        if(tr){ 
            distance=pt;
            float t = pt;

            if (t > 0 && t < bestT) {
                bestT = t;
                success=true;
            }

        }

last=tryChild;
}

//switching children or setting up for climbing up the tree
else if(delve){

last =current;
current=tryChild;

}
else{       

last=far;
}


}

return success;
}

我试图在算法的上下文中最小化内存访问,但是我认为是问题的分歧似乎是一个不可能克服的障碍。

0 个答案:

没有答案