当我训练一个小型模型(其目标是对多个输入进行自我关注)时,会引发GPU内存不足错误,如下所示。
RuntimeError:CUDA内存不足。尝试分配2.00 MiB(GPU 0; 11.93 GiB总容量;已分配11.45 GiB;免费1.06 MiB;已缓存15.75 MiB)
这是我的代码:
import torch
class SelfAttention():
## this same as self attention code in Transformer
def self_attention_model(input):
self_attention = SelfAttention()
distance = nn.MSELoss()
optimizer = torch.optim.Adam(instances_slf_att.parameters(), lr=0.01, weight_decay=0.00005)
num_epochs_slf_att = 10
for epoch in num_epochs_slf_att:
output = self_attention(input)
loss = distance_slf_att(output, input)
optimizer.zero_grad()
loss.backward()
optimizer_slf_att.step()
torch.cuda.empty_cache()
return output
# It is a dictionary containing users and related items.
# Items are embeddings and the shape is dynamic.
# Different users have different items. The numbers of items of different users are different.
user_items = {}
user_output = [] # a list to put outputs
for user in users:
this_user_items = user_items[user] # this_user_items.shape = torch.Size([1,n,100]). Different user has different n. The range of n is (1,10).
output = self_attention_model(this_user_items) # self attention code is the same as code in Tranformer.
user_output[user] = output
这里有1450位用户。每次在用户900或700之后训练此模型时,都会引发此GPU内存不足错误。我不知道为什么对于每个用户,我都清空了cuda缓存。