每次转发后如何释放临时消耗的GPU内存?

时间:2018-08-06 11:46:27

标签: python memory-leaks out-of-memory gpu pytorch

我有一个这样的班级:

class Stem(nn.Module):

    def __init__(self):
        super(Stem, self).__init__()
        self.out_1 = BasicConv2D(3, 32, kernelSize = 3, stride = 2)
        self.out_2 = BasicConv2D(32, 32, kernelSize = 3, stride = 1)
        self.out_3 = BasicConv2D(32, 64, kernelSize = 3, stride = 1, padding = 1)


    def forward(self, x):
        x = self.out_1(x)
        x = self.out_2(x)
        x = self.out_3(x)

        return x

out_1,2,3的属性Stem是以下类的实例:

class BasicConv2D(nn.Module):

    def __init__(self, inChannels, outChannels, kernelSize, stride, padding = 0):
        super(BasicConv2D, self).__init__()
        self.conv = nn.Conv2d(inChannels, outChannels,
                            kernel_size = kernelSize,
                            stride = stride,
                            padding = padding, bias = False)
        self.bn = nn.BatchNorm2d(outChannels,
                                    eps = 0.001,
                                    momentum = 0.1,
                                    affine = True)
        self.relu = nn.ReLU(inplace = False)

    def forward(self, x):
        x = self.conv(x)
        x = self.bn(x)
        y = self.relu(x)
        return y

Stem.forward()内进行训练时,nvidia-smi告诉每一行将消耗x MB的GPU内存,但是Stem.forward()完成后,该内存将不会被释放,导致训练很快崩溃,而导致GPU内存不足。

因此,问题是: 如何释放临时消耗的GPU内存?

1 个答案:

答案 0 :(得分:1)

您的模型看起来确实不错,所以您可能需要对pytorch如何管理内存分配进行一般的了解。我怀疑您只是简单地保持指向返回值(y)的指针(例如,通过累积损失等)。 pytorch存储整个附加的计算图时,您永远不会释放内存。

有关更详细的讨论,请参见this question,尤其是this answer