运行docker image时出现“无法分配内存”错误

时间:2019-10-21 00:05:12

标签: python docker nlp pytorch

我正在使用代码https://github.com/huggingface/transformers/blob/master/examples/run_glue.py改编的形式来训练NLP模型。我正在Windows 10上仅使用CPU的docker工具箱中工作。该代码在本地运行良好,并且我成功构建了docker映像。但是,当我尝试执行“ docker run $ IMAGE_URI”时,在训练步骤中出现以下错误:

  File "xlnet/train_config.py", line 318, in <module>
    global_step, tr_loss = train(train_dataset, model, tokenizer)

  File "xlnet/train_config.py", line 214, in train
    outputs = model(**inputs) 

...

  File "/usr/local/lib/python3.7/site-packages/pytorch_transformers/modeling_xlnet.py", line 383, in rel_shift
    x = torch.index_select(x, 1, torch.arange(klen, device=x.device, dtype=torch.long))

RuntimeError: [enforce fail at CPUAllocator.cpp:64] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 182452224 bytes. Error code 12 (Cannot allocate memory)

当我运行“ docker info”时,它显示“ CPU:8,总内存:7.793GiB”。应该足够了...

然后我尝试分配10GB的内存。没有更多的错误信息。但是它只是在同一个地方退出而无需继续训练。

0 个答案:

没有答案