PyTorch GPU内存不足

时间:2020-09-03 14:34:20

标签: python pytorch gpu

我正在PyTorch中运行评估脚本。我有许多训练有素的模型(* .pt文件),已加载并移动到GPU,总共占用了270MB的GPU内存。我使用的批量大小为1。对于每个样本,我加载一个图像并将其移至GPU。然后,根据样本,我需要运行这些训练有素的模型的序列。一些模型具有张量作为输入和输出。其他模型将张量作为输入,但将字符串作为输出。序列总是中的最终模型具有一个字符串作为输出。中间张量被临时存储在字典中。当模型使用张量输入时,将使用del将其删除。不过,我注意到在每次采样之后,GPU内存都会不断增加,直到整个内存已满。

下面是一些伪代码,可让您更好地了解发生的情况:

with torch.no_grad():
    trained_models = load_models_from_pt() # Loaded and moved to GPU, taking 270MB
    model = Model(trained_models) # Keeps the trained_models in a dictionary by name
    for sample in data_loader:
        # A sample contains a single image and is moved to the GPU
        # A sample also has some other information, but no other tensors
        model.forward(sample)

class Model(nn.Module)
    def __init__(self, trained_models):
        self.trained_models = trained_models
        self.intermediary = {}

    def forward(sample):
        for i, elem in enumerate(sample['sequence']):
             name = elem['name']
             in = elem['input']
             if name == 'a':
                model = self.trained_models['a']
                out = model(self.intermediary[in])
                del self.intermediary[in]
                self.intermediary[i] = out
             elif name == 'b':
                model self.trained_models['b']
                out = model(self.intermediary[in])
                del self.intermediary[in]
                self.intermediary[i] = out
             elif ...

我不知道为什么GPU内存不足。有什么想法吗?

1 个答案:

答案 0 :(得分:1)

尝试在del后面添加torch.cuda.empty_cache()