我正在使用opencl来开发体素光线投射引擎。我正在尝试做一些与Crassin类似的Gigavoxels。在本文中,他们使用八叉树来存储体素数据。目前我正试图在八叉树内下降,直到我到达包含渲染数据的叶子。
我做了两个实现:一个在GPU上的OpenCl中,另一个在CPU上的C ++中。我遇到的问题是,在GPU上,算法会经历错误的级别,直到到达八叉树内的叶子。 CPU版本提供正确的结果。两个版本的算法是相同的,代码几乎相似。
你们知道可能是什么问题吗?可能是硬件问题,OpenCl问题还是我做错了什么?我在三个不同的nVidia GPU上遇到了相同的结果。
这是C ++代码:
// Calculate actual ray stepping position
glm::vec4 pos = eyeRay_o + eyeRay_d * t;
uint offset = 0;
//check if root is leaf
uint leafFlag = GetLeafBit(octreeNodes[0]);
//get children address of root
uint childrenAddress = GetChildAddress(octreeNodes[0]);
while (iterations < 30) {
iterations++;
// Calculate subdivision offset
offset = (uint)(pos.x * 2) + (uint)(pos.y * 2) * 2 + (uint)(pos.z * 2) * 4;
if (leafFlag == 1) {
//return some colour and exit the loop
break;
}
else
{
glm::uvec4 off = glm::uvec4(pos.x * 2, pos.y * 2, pos.z * 2, pos.w * 2);
pos.x = 2 * pos.x - off.x;
pos.y = 2 * pos.y - off.y;
pos.z = 2 * pos.z - off.z;
pos.w = 2 * pos.w - off.w;
}
// Extract node data from the children
finalAddress = childrenAddress + offset;
leafFlag = GetLeafBit(nodes[finalAddress]);
childrenAddress = GetChildAddress(nodes[finalAddress]);
}
这是OpenCL代码:
// Calculate actual ray stepping position
float4 position = rayOrigin + rayDirection * t;
uint offset = 0;
//check if root is leaf
uint leafFlag = extractOctreeNodeLeaf(octreeNodes[0]);
//get children address of root
uint childrenAddress = extractOctreeNodeAddress(octreeNodes[0]);
//position will be in the [0, 1] interval
//size of octree is 1
while (iterations < 30) {
iterations++;
//calculate the index of the next child based on the position in the current subdivision
offset = (uint)(position.x * 2) + (uint)(position.y * 2) * 2 + (uint)(position.z * 2) * 4;
if (leafFlag == 1) {
//return some colour and exit the loop
break;
}
else
{
//transform the position inside the parent
//to the position inside the child subdivision
//size of child will be considered to be 1
uint4 off;
off.x = floor(position.x * 2);
off.y = floor(position.y * 2);
off.z = floor(position.z * 2);
off.w = floor(position.w * 2);
position = 2 * position - off;
}
// Extract node data from the children
finalAddress = childrenAddress + offset;
leafFlag = extractOctreeNodeLeaf(octreeNodes[finalAddress]);
//each node has an index to an array of 8 children - the index points to the first child
childrenAddress = extractOctreeNodeAddress(octreeNodes[finalAddress]);
}
这是extractOctreeNodeAddress,按要求:
这两个函数只执行一些操作:
OpenCL版本:
inline char extractOctreeNodeLeaf(uint value) {
value = value >> 1;
return value & 1;
}
inline uint extractOctreeNodeAddress(uint value) {
return value >> 2;
}
C ++版本:
inline byte GetLeafBit(uint value)
{
value = value >> 0x1;
return value & 0x1;
}
inline uint GetChildAddress(uint value)
{
return value >> 0x2;
}
嗨,我发现了一些有趣的东西。 我尝试手动测试不同的变量,比较他们的CPU和GPU版本在一个精确的像素和相机的位置和方向。 在下面的代码中,如果我运行程序,就像现在像素正在打印白色,并且值(&gt; 5.5与CPU实现相比完全错误),但是如果我评论最后一个if结构,并取消注释第一个,我得到的结果是红色....这对我来说有点无法解释。有什么想法吗?
if ((x == 265) && (y == 209)) {
/*float epsilon = 0.01f;
float4 stuff = (float4)(0.7604471f, 0.9088342f, 0.9999924f, 0);
if(fabs(pos.x - stuff.x) < epsilon)
temp = (float4)(1, 0, 0, 1);
else
temp = (float4)(1, 1, 1, 1);
break;*/
if(pos.x > 5.5)
{
temp = (float4)(1, 1, 1, 1);
break;
}
}
答案 0 :(得分:1)
主要问题是从float4到uint4的隐式转换。
按元素执行cast元素(仍然是隐式的)解决了这个问题。