我正在使用tf.profiler.Profiler来尝试了解我的模型,并希望改善其性能。但是profiler.profile_operations(options=opts)
和profiler.profile_name_scope(options)
的输出都缺少GPU时序统计信息。
以下是操作配置文件输出的摘录:
Found accelerator operation but misses accelerator stream stats!
It's likely a gpu tracing issue rather than tf-profiler issue.
If you found your operation missing accelerator time, consider filing a bug to xprof-dev@!
Doc:
op: The nodes are operation kernel type, such as MatMul, Conv2D. Graph nodes belonging to the same type are aggregated together.
requested bytes: The memory requested by the operation, accumulatively.
total execution time: Sum of accelerator execution time and cpu execution time.
cpu execution time: The time from the start to the end of the operation. It's the sum of actual cpu run time plus the time that it spends waiting if part of computation is launched asynchronously.
accelerator execution time: Time spent executing on the accelerator. This is normally measured by the actual hardware library.
Profile:
node name | requested bytes | total execution time | accelerator execution time | cpu execution time
TensorArrayScatterV3 50.66MB (100.00%, 6.21%), 3.49sec (100.00%, 39.64%), 0us (0.00%, 0.00%), 3.49sec (100.00%, 39.64%)
LogicalAnd 0B (0.00%, 0.00%), 1.86sec (60.36%, 21.11%), 0us (0.00%, 0.00%), 1.86sec (60.36%, 21.11%)
Merge 690.95KB (93.79%, 0.08%), 675.01ms (39.25%, 7.66%), 0us (0.00%, 0.00%), 675.01ms (39.25%, 7.66%)
TensorArrayReadV3 0B (0.00%, 0.00%), 487.78ms (31.58%, 5.54%), 0us (0.00%, 0.00%), 487.78ms (31.58%, 5.54%)
Less 22.08MB (93.70%, 2.71%), 397.04ms (26.05%, 4.51%), 0us (0.00%, 0.00%), 397.04ms (26.05%, 4.51%)
TensorArrayWriteV3 0B (0.00%, 0.00%), 336.87ms (21.54%, 3.82%), 0us (0.00%, 0.00%), 336.87ms (21.54%, 3.82%)
Switch 0B (0.00%, 0.00%), 333.95ms (17.71%, 3.79%), 0us (0.00%, 0.00%), 333.95ms (17.71%, 3.79%)
Add 1.11KB (90.99%, 0.00%), 302.02ms (13.92%, 3.43%), 0us (0.00%, 0.00%), 302.02ms (13.92%, 3.43%)
NextIteration 0B (0.00%, 0.00%), 294.50ms (10.49%, 3.34%), 0us (0.00%, 0.00%), 294.50ms (10.49%, 3.34%)
...
...
accelerator execution time
列中的所有统计信息均为零。