为什么我的tf.profiler.Profiler抱怨“找到加速器操作但错过了加速器流统计信息”

时间:2018-08-04 04:42:28

标签: python tensorflow profiler

我正在使用tf.profiler.Profiler来尝试了解我的模型,并希望改善其性能。但是profiler.profile_operations(options=opts)profiler.profile_name_scope(options)的输出都缺少GPU时序统计信息。

以下是操作配置文件输出的摘录:

Found accelerator operation but misses accelerator stream stats!

It's likely a gpu tracing issue rather than tf-profiler issue.
If you found your operation missing accelerator time, consider filing a bug to xprof-dev@!


Doc:
op: The nodes are operation kernel type, such as MatMul, Conv2D. Graph nodes belonging to the same type are aggregated together.
requested bytes: The memory requested by the operation, accumulatively.
total execution time: Sum of accelerator execution time and cpu execution time.
cpu execution time: The time from the start to the end of the operation. It's the sum of actual cpu run time plus the time that it spends waiting if part of computation is launched asynchronously.
accelerator execution time: Time spent executing on the accelerator. This is normally measured by the actual hardware library.

Profile:
node name | requested bytes | total execution time | accelerator execution time | cpu execution time
TensorArrayScatterV3           50.66MB (100.00%, 6.21%),      3.49sec (100.00%, 39.64%),             0us (0.00%, 0.00%),      3.49sec (100.00%, 39.64%)
LogicalAnd                            0B (0.00%, 0.00%),       1.86sec (60.36%, 21.11%),             0us (0.00%, 0.00%),       1.86sec (60.36%, 21.11%)
Merge                          690.95KB (93.79%, 0.08%),       675.01ms (39.25%, 7.66%),             0us (0.00%, 0.00%),       675.01ms (39.25%, 7.66%)
TensorArrayReadV3                     0B (0.00%, 0.00%),       487.78ms (31.58%, 5.54%),             0us (0.00%, 0.00%),       487.78ms (31.58%, 5.54%)
Less                            22.08MB (93.70%, 2.71%),       397.04ms (26.05%, 4.51%),             0us (0.00%, 0.00%),       397.04ms (26.05%, 4.51%)
TensorArrayWriteV3                    0B (0.00%, 0.00%),       336.87ms (21.54%, 3.82%),             0us (0.00%, 0.00%),       336.87ms (21.54%, 3.82%)
Switch                                0B (0.00%, 0.00%),       333.95ms (17.71%, 3.79%),             0us (0.00%, 0.00%),       333.95ms (17.71%, 3.79%)
Add                              1.11KB (90.99%, 0.00%),       302.02ms (13.92%, 3.43%),             0us (0.00%, 0.00%),       302.02ms (13.92%, 3.43%)
NextIteration                         0B (0.00%, 0.00%),       294.50ms (10.49%, 3.34%),             0us (0.00%, 0.00%),       294.50ms (10.49%, 3.34%)
...
...

accelerator execution time列中的所有统计信息均为零。

0 个答案:

没有答案