PyTorch:动态在GPU和CPU之间移动权重

时间:2019-01-17 17:45:25

标签: deep-learning gpu pytorch tensor

我有一个不适合GPU内存的大型架构,但是该架构有一个不错的特性,即该架构的子集只能在任意给定时间运行一段时间。因此,我想动态地加载/卸载CPU和GPU之间未使用的层的权重。如何实现?

第一个可以尝试的方法是在我要移动的参数上调用.cpu().cuda()。不幸的是,that would cause training problems with the optimizer as stated in the docs

cuda(device=None)
Moves all model parameters and buffers to the GPU.

This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on GPU while being optimized.

一个示例用例是实现ProxylessNAS,但是only final trained models are available at the time of writing且体系结构搜索实现不可用。

0 个答案:

没有答案