进行分布式训练时使用torch.load进行错误处理

时间:2020-06-26 04:39:42

标签: python pytorch

当我使用torch.load加载数据集时,会发生错误(我使用4个GPU和一台机器进行分布式训练)

Traceback (most recent call last):
  File "Run.py", line 437, in <module>
    ready_train()
  File "Run.py", line 288, in ready_train
    train_ae(aemodel)
  File "Run.py", line 125, in train_ae
    'nlp_chinese_corpus'])
  File "Run.py", line 51, in read_json
    samples = torch.load(slpath + dir + ".pkl")  # 加载一个数据集下的数据
  File "/home_ex/tianhongtao/SW/anaconda3/envs/Hisense/lib/python3.7/site-packages/torch/serialization.py", line 526, in load
    if _is_zipfile(opened_file):
  File "/home_ex/tianhongtao/SW/anaconda3/envs/Hisense/lib/python3.7/site-packages/torch/serialization.py", line 76, in _is_zipfile
    if ord(magic_byte) != ord(read_byte):
TypeError: ord() expected a character, but string of length 0 found

我该怎么办?

嗨,吉列姆。感谢您的建议,我略微修改了加载数据的方式,现在尚未发生错误。再次感谢。

0 个答案:

没有答案