Question

我正在处理由数组引起的“无法静态确定入口函数的堆栈大小”警告，我需要帮助。

我正在处理代码中的“无法静态确定入口函数的堆栈大小”警告。通过 CUDA ptxas warnings (Stack size for entry) 和 https://devtalk.nvidia.com/default/topic/524712/a-meaning-of-nvlink-warning-stack-size-for-entry-function-cannot-be-statically-determined/ 该警告是由递归引起的。

但是，我未能在代码中找到递归，相反，我发现结构数组也会引起这种警告。

可以通过一个简单的示例显示该问题。（编辑：我可以通过使用union消除这些警告，但我仍然不知道为什么。这些代码在同一个.cu文件中）

#include <iostream>
#include <fstream>
#include <string>
#include <stack>
#include <cstdarg>

#include <limits.h>
#include <windows.h>
#include <tchar.h>
#include <stdio.h>
#include <stdarg.h>
#include <math.h>
#include <malloc.h>
#include <stdlib.h>

#include "cuda_runtime.h"
#include "vector_types.h"

#include "cuComplex.h"

#include <thrust/transform_reduce.h>
#include <thrust/functional.h>
#include <thrust/device_vector.h>
#include <thrust/host_vector.h>

#define checkCudaErrors(val) check((val), #val, __FILE__, __LINE__)

#ifdef __DRIVER_TYPES_H__
#ifndef DEVICE_RESET
#define DEVICE_RESET cudaDeviceReset();
#endif
#else
#ifndef DEVICE_RESET
#define DEVICE_RESET
#endif
#endif

#ifdef __DRIVER_TYPES_H__
static const char *_cudaGetErrorEnum(cudaError_t error) {
    return cudaGetErrorName(error);
}
#endif

template <typename T> void check(T result, char const *const func, const char *const file,
    int const line) {
    if (result) {
        fprintf(stderr, "CUDA error at %s:%d code=%d(%s) \"%s\" \n", file, line,
            static_cast<unsigned int>(result), _cudaGetErrorEnum(result), func);
        DEVICE_RESET
            // Make sure we call CUDA Device Reset before exiting
            exit(EXIT_FAILURE);
    }
}

class ClassABC
{
public:
    __host__ __device__ ClassABC() { ;  }
    int m_iValue;
};

class ClassDEF
{
public:
    __host__ __device__ ClassDEF() { ; }

    //Witout warning
    //union 
    //{
    //    ClassABC m_abc[1];
    //    int m_values[1];
    //};

    //With warning
    ClassABC m_abc[1];
};

__global__ void TestFunc()
{
    ClassDEF def[1];
}

int main()
{
    TestFunc << <1, 1 >> > ();
    return 0;
}

警告：

CUDALINK : nvlink warning : Stack size for entry function '_Z8TestFuncv' cannot be statically determined (target: sm_(35-75))

所以，我的问题是，为什么数组会引起警告，是因为我做错了什么吗？如果需要使用数组，可以摆脱警告吗？它们有害吗？

我在Windows 10和Visual Studio 2017上使用CUDA 10.0.130。警告从sm_35显示为sm_75。

我需要帮助，谢谢！

Answer 1

对我来说，它看起来像是个错误（异常行为），因此未得到答复。我可能是错的，但是对于那些也遇到此问题的人来说，解决方法并不完美。

为什么数组会引起警告，是因为我做错了什么吗？

我不知道。我希望我做错了事，但我认为这可能是cuda 10.0.130的错误。

如果我需要使用数组，可以摆脱警告吗？

使用联合，请参见下面的示例。

它们有害吗？

是的，请参见下面的示例。

这是示例：

class ClassABC
{
public:
    __host__ __device__ ClassABC():m_iValue(0){ ;  }
    __device__ void Add(int v)
    {
        m_iValue += v;
    }
    __device__ void DebugPrint() const
    {
        printf("v=%d;", m_iValue);
    }
    int m_iValue;
};

class ClassDEF
{
public:
    __host__ __device__ ClassDEF() { ; }

    __device__ void Add(int v)
    {
        m_abc[10].Add(v);
        //m_values[10] += v; also work
    }

    __device__ void DebugPrint() const
    {
        m_abc[10].DebugPrint();
    }
    //Witout warning
    union 
    {
        ClassABC m_abc[20];
        int m_values[20];
    };

    //With warning
    //Output:
    //ClassABC m_abc[20];
};

__global__ void TestFunc()
{
    ClassDEF def[100];

    for (int i = 0; i < 100; ++i)
    {
        def[i].Add(i);
        def[i].DebugPrint();
    }
}

int main()
{

    //If use the version with warning, must set stack size, or there will be a stackoverflow.
    //checkCudaErrors(cudaDeviceSetLimit(cudaLimitStackSize, 1 << 16));
    TestFunc << <1, 1 >> > ();
    checkCudaErrors(cudaDeviceSynchronize());
    return 0;
}

首先，这很有害，如果不手动增加堆栈大小，可能会导致堆栈溢出。工会将解决这个问题。

但是，工会不是不是的一个好解决方法：

如果使用并集，请小心对齐ClassABC。

我希望这种解决方法可以对遇到此问题的人有所帮助。而且我仍然怀疑我做错了什么。如果有人知道我做错了，请回答此问题。非常感谢！

Answer 2

使用NVCC 10.1.243（以及简化的示例程序）时-我没有得到警告。您也don't get it on GodBolt。

因此，可能是10.0版中的某个内容或您的特定设置存在问题。

为什么数组会引起nvlink警告：不能静态确定入口函数的堆栈大小

2 个答案: