我正在PyTorch中运行评估脚本。我有许多训练有素的模型(* .pt文件),已加载并移动到GPU,总共占用了270MB的GPU内存。我使用的批量大小为1。对于每个样本,我加载一个图像并将其移至GPU。然后,根据样本,我需要运行这些训练有素的模型的序列。一些模型具有张量作为输入和输出。其他模型将张量作为输入,但将字符串作为输出。序列总是中的最终模型具有一个字符串作为输出。中间张量被临时存储在字典中。当模型使用张量输入时,将使用del
将其删除。不过,我注意到在每次采样之后,GPU内存都会不断增加,直到整个内存已满。
下面是一些伪代码,可让您更好地了解发生的情况:
with torch.no_grad():
trained_models = load_models_from_pt() # Loaded and moved to GPU, taking 270MB
model = Model(trained_models) # Keeps the trained_models in a dictionary by name
for sample in data_loader:
# A sample contains a single image and is moved to the GPU
# A sample also has some other information, but no other tensors
model.forward(sample)
class Model(nn.Module)
def __init__(self, trained_models):
self.trained_models = trained_models
self.intermediary = {}
def forward(sample):
for i, elem in enumerate(sample['sequence']):
name = elem['name']
in = elem['input']
if name == 'a':
model = self.trained_models['a']
out = model(self.intermediary[in])
del self.intermediary[in]
self.intermediary[i] = out
elif name == 'b':
model self.trained_models['b']
out = model(self.intermediary[in])
del self.intermediary[in]
self.intermediary[i] = out
elif ...
我不知道为什么GPU内存不足。有什么想法吗?
答案 0 :(得分:1)
尝试在del后面添加torch.cuda.empty_cache()