Question

当我训练一个小型模型（其目标是对多个输入进行自我关注）时，会引发GPU内存不足错误，如下所示。

RuntimeError：CUDA内存不足。尝试分配2.00 MiB（GPU 0； 11.93 GiB总容量；已分配11.45 GiB；免费1.06 MiB；已缓存15.75 MiB）

这是我的代码：

import torch

class SelfAttention():
    ## this same as self attention code in Transformer

def self_attention_model(input):
    self_attention = SelfAttention()
    distance = nn.MSELoss()
    optimizer = torch.optim.Adam(instances_slf_att.parameters(), lr=0.01, weight_decay=0.00005)
    num_epochs_slf_att = 10
    for epoch in num_epochs_slf_att:
        output = self_attention(input)
        loss = distance_slf_att(output, input)
        optimizer.zero_grad()
        loss.backward()
        optimizer_slf_att.step()
    torch.cuda.empty_cache()
    return output

# It is a dictionary containing users and related items. 
# Items are embeddings and the shape is dynamic.
# Different users have different items. The numbers of items of different users are different.
user_items = {} 

user_output = [] # a list to put outputs
for user in users:
    this_user_items = user_items[user] # this_user_items.shape = torch.Size([1,n,100]). Different user has different n. The range of n is (1,10).
    output = self_attention_model(this_user_items) # self attention code is the same as code in Tranformer.
    user_output[user] = output

这里有1450位用户。每次在用户900或700之后训练此模型时，都会引发此GPU内存不足错误。我不知道为什么对于每个用户，我都清空了cuda缓存。

训练小型自我关注模型时，引发GPU内存不足错误。我不知道为什么

0 个答案: