为了节省全局内存传输,并且因为代码的所有步骤都单独工作,我试图将所有内核组合成一个内核,前两个(3个)步骤完成为设备呼叫而不是全局呼叫。 这在第一步的后半部分失败了。
我需要调用两次函数来计算图像的两半。无论计算图像的顺序如何,它都会在第二次迭代时崩溃。
在检查了代码之后,并且使用不同的返回点多次运行代码,我发现了导致代码崩溃的原因。
__device__
void IntersectCone( float* ModDistance,
float* ModIntensity,
float3 ray,
int threadID,
modParam param )
{
bool ignore = false;
float3 normal = make_float3(0.0f,0.0f,0.0f);
float3 result = make_float3(0.0f,0.0f,0.0f);
float normDist = 0.0f;
float intensity = 0.0f;
float check = abs( Dot(param.position, Cross(param.direction,ray) ) );
if(check > param.r1 && check > param.r2)
ignore = true;
float tran = param.length / (param.r2/param.r1 - 1);
float length = tran + param.length;
float Lsq = length * length;
float cosSqr = Lsq / (Lsq + param.r2 * param.r2);
//Changes the centre position?
float3 position = param.position - tran * param.direction;
float aDd = Dot(param.direction, ray);
float3 e = position * -1.0f;
float aDe = Dot(param.direction, e);
float dDe = Dot(ray, e);
float eDe = Dot(e, e);
float c2 = aDd * aDd - cosSqr;
float c1 = aDd * aDe - cosSqr * dDe;
float c0 = aDe * aDe - cosSqr * eDe;
float discr = c1 * c1 - c0 * c2;
if(discr <= 0.0f)
ignore = true;
if(!ignore)
{
float root = sqrt(discr);
float sign;
if(c1 > 0.0f)
sign = 1.0f;
else
sign = -1.0f;
//Try opposite sign....?
float3 result = (-c1 + sign * root) * ray / c2;
e = result - position;
float dot = Dot(e, param.direction);
float3 s1 = Cross(e, param.direction);
float3 normal = Cross(e, s1);
if( (dot > tran) || (dot < length) )
{
if(Dot(normal,ray) <= 0)
{
normal = Norm(normal); //This stuff (1)
normDist = Magnitude(result);
intensity = -IntensAt1m * Dot(ray, normal) / (normDist * normDist);
}
}
}
ModDistance[threadID] = normDist; and this stuff (2)
ModIntensity[threadID] = intensity;
}
我可以做两件事来使这不会崩溃,两者都取消了函数的要点:如果我不尝试写入ModDistance []和ModIntensity [],或者如果我不写normDist和强度。
上面的代码抛出了第一次机会异常,但是如果其中任何一个块被注释掉,则不会。 此外,程序仅在第二次调用此例程时崩溃。
一直试图弄清楚这一点,任何帮助都会很棒。
调用它的代码是:
int subrow = threadIdx.y + Mod_Height/2;
int threadID = subrow * (Mod_Width+1) + threadIdx.x;
int obsY = windowY + subrow;
float3 ray = CalculateRay(obsX,obsY);
if( !IntersectSphere(ModDistance, ModIntensity, ray, threadID, param) )
{
IntersectCone(ModDistance, ModIntensity, ray, threadID, param);
}
subrow = threadIdx.y;
threadID = subrow * (Mod_Width+1) + threadIdx.x;
obsY = windowY + subrow;
ray = CalculateRay(obsX,obsY);
if( !IntersectSphere(ModDistance, ModIntensity, ray, threadID, param) )
{
IntersectCone(ModDistance, ModIntensity, ray, threadID, param);
}
答案 0 :(得分:2)
内核资源不足。正如评论中所述,它提供了错误CudaErrorLaunchOutOfResources
。
为避免这种情况,您应该使用__launch_bounds__
说明符来指定内核所需的块尺寸。这将迫使编译器确保有足够的资源。有关__launch_bounds__
的详细信息,请参阅CUDA编程指南。