训练小型自我关注模型时,引发GPU内存不足错误。我不知道为什么

时间:2020-07-07 13:48:06

标签: python pytorch out-of-memory

当我训练一个小型模型(其目标是对多个输入进行自我关注)时,会引发GPU内存不足错误,如下所示。

RuntimeError:CUDA内存不足。尝试分配2.00 MiB(GPU 0; 11.93 GiB总容量;已分配11.45 GiB;免费1.06 MiB;已缓存15.75 MiB)

这是我的代码:

import torch

class SelfAttention():
    ## this same as self attention code in Transformer

def self_attention_model(input):
    self_attention = SelfAttention()
    distance = nn.MSELoss()
    optimizer = torch.optim.Adam(instances_slf_att.parameters(), lr=0.01, weight_decay=0.00005)
    num_epochs_slf_att = 10
    for epoch in num_epochs_slf_att:
        output = self_attention(input)
        loss = distance_slf_att(output, input)
        optimizer.zero_grad()
        loss.backward()
        optimizer_slf_att.step()
    torch.cuda.empty_cache()
    return output

# It is a dictionary containing users and related items. 
# Items are embeddings and the shape is dynamic.
# Different users have different items. The numbers of items of different users are different.
user_items = {} 

user_output = [] # a list to put outputs
for user in users:
    this_user_items = user_items[user] # this_user_items.shape = torch.Size([1,n,100]). Different user has different n. The range of n is (1,10).
    output = self_attention_model(this_user_items) # self attention code is the same as code in Tranformer.
    user_output[user] = output

这里有1450位用户。每次在用户900或700之后训练此模型时,都会引发此GPU内存不足错误。我不知道为什么对于每个用户,我都清空了cuda缓存。

0 个答案:

没有答案