Nsight Compute无法分析Waveglow(PyTorch应用程序)

时间:2019-07-02 08:01:43

标签: pytorch nsight

我尝试通过以下命令对 https://github.com/NVIDIA/waveglow 进行配置:

nv-nsight-cu-cli --export ./nsight_output ~/.virtualenvs/waveglow/bin/python3 inference.py -f <(ls mel_spectrograms/*.pt) -w waveglow_256channels.pt -o . --is_fp16 -s 0.6

Python命令来自https://github.com/NVIDIA/waveglow#generate-audio-with-our-pre-existing-model的指令, 并且它可以与Nsight System配合使用,而不与Nsight Compute配合使用。

分析不会结束打印此日志;所以我按了Ctrl + C。 另外,它仅分析一个内核,但是我有更多内核。 (由Nsight Systems检查)

...
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 286: 0%....50%....100% - 48 passes
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 287: 0%....50%....100% - 48 passes
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 288: 0%....50%....100% - 48 passes
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 289: 0%....50%....100% - 48 passes
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 290: 0%....50%....100% - 48 passes
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 291: 0%....50%....100% - 48 passes
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 292: 0%....50%....100% - 48 passes
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 293: 0%....50%....100% - 48 passes
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 294: 0%....50%....100% - 48 passes
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 295: 0%....50%....100% - 48 passes
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 296: 0%....50%...^C
==PROF== Received signal, trying to shutdown target application
 - 43 passes
==ERROR== Failed to profile kernel "weight_norm_fwd_first_dim_ker..." in process
==ERROR== An error occurred while trying to profile.
==ERROR== An error occurred while trying to profile
==PROF== Report: nsight_compute_result.nsight-cuprof-report

OS:CentOS Linux 7,Nsight Compute(2019.3.1,内部版本26317742), GPU:Tesla V100-PCIE-32GB

我该如何解决?

1 个答案:

答案 0 :(得分:2)

我认为这里没有任何错误,该工具的行为符合预期。它不仅剖析了一个内核,而且还剖析了日志输出中已经存在的296个内核启动(似乎全部来自一个内核函数)。

您可以控制使用以下命令分析的内核的数量或类型: --launch-count或--kernel-regex选项。您还可以使用--metrics和--section来控制为每个内核收集的指标,因为收集更少的指标可以减少工具的开销。

有关更多可用的命令行选项,请参见https://docs.nvidia.com/nsight-compute/NsightComputeCli/index.html#command-line-options