编辑

Question

我最近通过系统的程序包管理器在Arch-Linux机器上安装了Cuda，我一直在尝试通过运行一个简单的矢量加法程序来测试它是否正常工作。

我只是将代码从this tutorial（都使用一个和多个内核）复制粘贴到名为cuda_test.cu的文件中并运行

> nvcc cuda_test.cu -o cuda_test

在任何一种情况下，程序都可以运行，并且我没有收到任何错误（两者都不会崩溃，并且输出结果是没有错误）。但是，当我尝试在程序上运行Cuda Profiler时：

> sudo nvprof ./cuda_test

我得到结果：

==3201== NVPROF is profiling process 3201, command: ./cuda_test
Max error: 0
==3201== Profiling application: ./cuda_test
==3201== Profiling result:
No kernels were profiled.
No API activities were profiled.
==3201== Warning: Some profiling data are not recorded. Make sure cudaProfilerStop() or cuProfilerStop() is called before application exit to flush profile data.

后一个警告不是我的主要问题，也不是我的问题的主题，我的问题是消息，指出没有配置内核，也没有配置API活动。

这是否意味着程序完全在我的CPU上运行？还是nvprof中的错误？

我找到了关于同一错误here的讨论，但是答案是安装了错误的Cuda版本，在我的情况下，安装的版本是通过系统软件包管理器安装的最新版本。（Version 10.1.243-1）

有什么办法可以让nvprof显示预期的输出吗？

编辑

试图坚持最后的警告并不能解决问题：

添加对cudaProfilerStop()（或cuProfilerStop()）的调用，并根据建议在末尾添加cudaDeviceReset();并链接适当的库（cuda_profiler_api.h或cudaProfiler.h）并使用

进行编译

> nvcc cuda_test.cu -o cuda_test -lcuda

产生一个仍可以运行的程序，但是当运行哪个nvprof时，该程序返回：

==12558== NVPROF is profiling process 12558, command: ./cuda_test
Max error: 0
==12558== Profiling application: ./cuda_test
==12558== Profiling result:
No kernels were profiled.
No API activities were profiled.
==12558== Warning: Some profiling data are not recorded. Make sure cudaProfilerStop() or cuProfilerStop() is called before application exit to flush profile data.
======== Error: Application received signal 139

这并没有解决原始问题，实际上已经创建了一个新错误；当单独使用cudaProfilerStop()或与cuProfilerStop()和cudaDeviceReset();

一起使用时，也会发生同样的情况

代码

如前所述，该代码是从教程复制而来的，以测试Cuda是否正常运行，尽管我也包含了对cudaProfilerStop()和cudaDeviceReset()的调用；为了清楚起见，此处包含以下内容：

#include <iostream>

#include <math.h>

#include <cuda_profiler_api.h>

// Kernel function to add the elements of two arrays

__global__
void add(int n, float *x, float *y)
{
  int index = threadIdx.x;
  int stride = blockDim.x;
  for (int i = index; i < n; i += stride)
      y[i] = x[i] + y[i];
}


int main(void)

{

  int N = 1<<20;

  float *x, *y;


  cudaProfilerStart();


  // Allocate Unified Memory – accessible from CPU or GPU

  cudaMallocManaged(&x, N*sizeof(float));

  cudaMallocManaged(&y, N*sizeof(float));



  // initialize x and y arrays on the host

  for (int i = 0; i < N; i++) {

    x[i] = 1.0f;

    y[i] = 2.0f;

  }



  // Run kernel on 1M elements on the GPU

    add<<<1, 1>>>(N, x, y);



  // Wait for GPU to finish before accessing on host

  cudaDeviceSynchronize();



  // Check for errors (all values should be 3.0f)

  float maxError = 0.0f;

  for (int i = 0; i < N; i++)

    maxError = fmax(maxError, fabs(y[i]-3.0f));

  std::cout << "Max error: " << maxError << std::endl;



  // Free memory

  cudaFree(x);

  cudaFree(y);

  cudaDeviceReset();
  cudaProfilerStop();



  return 0;

}

Answer 1

经过一些搜索，我发现this thread关于编辑版本中的错误代码，这个问题显然是众所周知的。在那里讨论的解决方案是使用标志--unified-memory-profiling off调用nvprof：

> sudo nvprof --unified-memory-profiling off ./cuda_test

这使得nvprof可以按预期工作，即使没有调用cudaProfileStop。

Answer 2

您可以通过使用来解决问题

sudo nvprof --unified-memory-profiling per-process-device  <your program>

nvprof输出的含义是：“未分析任何内核”是什么意思，以及如何解决该问题

编辑

代码

2 个答案: