HDD上的PyTorch数据加载器性能下降

时间:2018-08-14 09:37:33

标签: performance io pytorch

我刚刚为DL构建了新的PC,并且正在PyTorch的官方Imagenet example上对其进行测试。当数据集驻留在我的SSD(GoodRam IRDM Pro 240GB SATA3 (IRP-SSDPR-S25B-240))上时,我看到了合理的性能,但是在我的HDD(Toshiba P300 (HDWD120UZSVA))上却变得异常缓慢。这似乎与DataLoader有关。当然,预计SDD的性能将优于HDD,但我认为该工作负载甚至不会因光盘读取而成为瓶颈(在我使用的其他机器上,由于预处理而受到GPU限制或CPU限制),更不用说了在这个程度上。为了进行调查,我编写了一个快速的迭代器包装程序来为数据加载器调用计时

def time_iter(iter):
    while True:
        start = time.time()
            item = next(iter)
            print('Yielding in', time.time() - start)
            yield item
        except StopIteration:
            break

然后通过评估Imagenet上的预训练网络来检查结果。硬盘上的结果:

jatentaki@Dzik:~/Programs/pytorch-examples/imagenet$ python main.py --pretrained -e -b 768 -j 10 ~/2tb/Datasets/ILSVRC2012/
=> using pre-trained model 'resnet18'
Yielding in 94.4384298324585
Test: [0/66]    Time 97.014 (97.014)    Loss 0.6302 (0.6302)    Prec@1     82.943 (82.943)  Prec@5 95.573 (95.573)
Yielding in 0.00038623809814453125
Yielding in 0.00019431114196777344
Yielding in 0.0001766681671142578
Yielding in 0.0002028942108154297
Yielding in 0.00017595291137695312
Yielding in 0.00017023086547851562
Yielding in 0.000179290771484375
Yielding in 0.00019288063049316406
Yielding in 0.00017714500427246094
Yielding in 85.04550909996033
Test: [10/66]   Time 85.408 (16.858)    Loss 1.1352 (0.8930)    Prec@1 69.661 (77.190)  Prec@5 91.927 (92.779)
Yielding in 0.15804052352905273
Yielding in 0.00020647048950195312
Yielding in 2.0329136848449707
Yielding in 0.00020360946655273438

每9次重复出现约90秒的峰值。在SDD上:

jatentaki@Dzik:~/Programs/pytorch-examples/imagenet$ python main.py --pretrained -e -b 768 -j 10 ~/FastDatasets/
=> using pre-trained model 'resnet18'
Yielding in 11.228104829788208
Test: [0/66]    Time 14.272 (14.272)    Loss 0.6302 (0.6302)    Prec@1 82.943 (82.943)  Prec@5 95.573 (95.573)
Yielding in 0.00038361549377441406
Yielding in 0.00030112266540527344
Yielding in 0.0002224445343017578
Yielding in 0.0002486705780029297
Yielding in 0.00018787384033203125
Yielding in 0.0002593994140625
Yielding in 0.00020194053649902344
Yielding in 0.0003197193145751953
Yielding in 0.00019288063049316406
Yielding in 3.4066810607910156
Test: [10/66]   Time 4.013 (1.946)  Loss 1.1352 (0.8930)    Prec@1 69.661     (77.190)  Prec@5 91.927 (92.779)
Yielding in 1.5148968696594238
Yielding in 0.0003371238708496094
Yielding in 0.0002467632293701172

其他诊断:iotop HDD的总磁盘读取峰值为12-14M / s,SDD的峰值为100-120M / s。 hdparm个结果:

jatentaki@Dzik:~$ sudo hdparm -Tt /dev/sda1
/dev/sda1: # SSD
Timing cached reads:   20976 MB in  2.00 seconds = 10503.39 MB/sec
Timing buffered disk reads: 512 MB in  1.02 seconds = 501.77 MB/sec

jatentaki@Dzik:~$ sudo hdparm -Tt /dev/sdb1
/dev/sdb1: # HDD
Timing cached reads:   19484 MB in  2.00 seconds = 9755.75 MB/sec
Timing buffered disk reads: 586 MB in  3.01 seconds = 194.69 MB/sec

问题的可能原因是什么?我正在考虑硬件之间的不正确安装/协同作用(我是总的硬件问题)或DataLoader的工作方式存在一些问题。另一种可能是操作系统配置不正确(Xubuntu 18.04 LTS)。显然有很多缓存正在进行(所有0.000 ... s的产量),会出错吗?

系统:Xubuntu 18.04 LTS PyTorch:0.4.1,来自Anaconda,带有CUDA 9.2和396 nvidia驱动程序。 GPU:GTX 1080 Ti

0 个答案:

没有答案