Question

我正在使用p3.2xlarge实例来训练模型。

在具有gtx 1070的测试机中，单次迭代使用0.45秒。在p3实例上，时间为0.22秒。

虽然它明显更低，但是我期望性能有更大的提高。

gpu负载约为84％。

我缺少什么吗？还是这是单个V100真正的性能提升？

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.79       Driver Version: 410.79       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:00:1E.0 Off |                    0 |
| N/A   63C    P0   107W / 300W |  15593MiB / 16130MiB |     84%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     25135      C   python                                     15583MiB |
+-----------------------------------------------------------------------------+

AWS P3实例的培训性能低下吗？

0 个答案: