定义函数以查找用于训练模型的批量大小

Question

有时我遇到问题：

分配张量形状
时的OOM

如参见

在分配形状（1024,100,160）的张量时的OOM

1024是我的批量大小，我不知道剩下的是什么。如果我减少批量大小或模型中的神经元数量，它运行正常。

是否有基于模型和GPU内存计算最佳批量大小的通用方法，因此程序不会崩溃？

修改

由于我的问题可能看起来不清楚，让我按照他的方式说：我希望我的模型最大的批量大小，这将适合我的GPU内存并且不会使程序崩溃。

编辑2

对于那些因为过于宽泛而投票结束这个问题的人：这个问题到底有多宽？有一些算法可以选择一部分数据放入GPU内存。它显然是不完美的，因为数据有时会超过GPU内存。询问算法如何工作，以防止随机崩溃，对我来说似乎很合理。

Answer 1

来自Goodfellow等人最近的深度学习书，chapter 8：

迷你尺寸通常由以下因素驱动：


更大批量提供更准确的梯度估计，但是   回报率低于线性。

通常是多核架构   极小批量未充分利用。这促使使用一些   绝对最小批量大小，低于此大小没有减少   处理小批量的时间。

如果批次中的所有示例都是   并行处理（通常是这种情况），然后是数量   内存与批量大小一致。对于许多硬件设置，这是   批量大小的限制因素。

有些硬件实现   具有特定大小的数组的更好的运行时间。特别是在使用时   GPU，通常为2个批量大小的功率提供更好的运行时间。   2个批量大小的典型功率范围从32到256，有时为16   正在尝试大型模型。

小批量可以提供   正规化效应（Wilson和Martinez，2003），也许是由于   他们为学习过程增添了噪音。通常会出现泛化错误   最适合批量大小为1.培训批量如此之小   可能需要很小的学习率才能保持稳定性   梯度估计的高方差。总运行时间   由于需要采取更多步骤，因此可能会非常高   因为学习率降低，而且需要更多步骤   观察整个训练集。

在实践中，通常意味着“以2为幂，越大越好，只要批量适合您的（GPU）内存”。

您可能还想在Stack Exchange中查阅好几篇好文章：

请记住Keskar等人的论文。上述几个帖子引用的“On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima”已被深度学习界其他受人尊敬的研究人员some objections收到。

希望这会有所帮助......

更新（2017年12月）：Yoshua Bengio＆amp; amp;团队，Three Factors Influencing Minima in SGD（2017年11月）;值得一读的是，它报道了新的理论与实践。学习率和批量大小之间相互作用的实验结果。

Answer 2

您可以使用以下方法估算最大批量大小：

最大批量大小=可用GPU内存字节数/ 4 /（张量大小+可训练参数）

Answer 3

使用pytorchsummary（pip安装）或keras（内置）提供的摘要。

例如

from torchsummary import summary
summary(model)
.....
.....
================================================================
Total params: 1,127,495
Trainable params: 1,127,495
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.02
Forward/backward pass size (MB): 13.93
Params size (MB): 4.30
Estimated Total Size (MB): 18.25
----------------------------------------------------------------

您放入批处理中的每个实例都需要在内存中进行完整的前进/后退传递，而您的模型只需要一次。人们似乎更喜欢批量为2的幂，这可能是因为GPU上的自动布局优化。

在增加批量大小时，别忘了线性增加学习率。

假设我们手边有16 GB的Tesla P100。

(16000 - model_size) / (forward_back_ward_size)
(16000 - 4.3) / 18.25 = 1148.29
rounded to powers of 2 results in batch size 1024

Answer 4

我遇到了类似的GPU内存错误，这是通过使用以下内容配置tensorflow会话来解决的：

# See https://www.tensorflow.org/tutorials/using_gpu#allowing_gpu_memory_growth
config = tf.ConfigProto()
config.gpu_options.allow_growth = True

请参阅：google colaboratory `ResourceExhaustedError` with GPU

Answer 5

定义函数以查找用于训练模型的批量大小

def FindBatchSize(model):
    """#model: model architecture, that is yet to be trained"""
    import os, sys, psutil, gc, tensorflow, keras
    import numpy as np
    from keras import backend as K
    BatchFound= 16

    try:
        total_params= int(model.count_params());    GCPU= "CPU"
        #find whether gpu is available
        try:
            if K.tensorflow_backend._get_available_gpus()== []:
                GCPU= "CPU";    #CPU and Cuda9GPU
            else:
                GCPU= "GPU"
        except:
            from tensorflow.python.client import device_lib;    #Cuda8GPU
            def get_available_gpus():
                local_device_protos= device_lib.list_local_devices()
                return [x.name for x in local_device_protos if x.device_type == 'GPU']
            if "gpu" not in str(get_available_gpus()).lower():
                GCPU= "CPU"
            else:
                GCPU= "GPU"

        #decide batch size on the basis of GPU availability and model complexity
        if (GCPU== "GPU") and (os.cpu_count() >15) and (total_params <1000000):
            BatchFound= 64    
        if (os.cpu_count() <16) and (total_params <500000):
            BatchFound= 64  
        if (GCPU== "GPU") and (os.cpu_count() >15) and (total_params <2000000) and (total_params >=1000000):
            BatchFound= 32      
        if (GCPU== "GPU") and (os.cpu_count() >15) and (total_params >=2000000) and (total_params <10000000):
            BatchFound= 16  
        if (GCPU== "GPU") and (os.cpu_count() >15) and (total_params >=10000000):
            BatchFound= 8       
        if (os.cpu_count() <16) and (total_params >5000000):
            BatchFound= 8    
        if total_params >100000000:
            BatchFound= 1

    except:
        pass
    try:

        #find percentage of memory used
        memoryused= psutil.virtual_memory()
        memoryused= float(str(memoryused).replace(" ", "").split("percent=")[1].split(",")[0])
        if memoryused >75.0:
            BatchFound= 8
        if memoryused >85.0:
            BatchFound= 4
        if memoryused >90.0:
            BatchFound= 2
        if total_params >100000000:
            BatchFound= 1
        print("Batch Size:  "+ str(BatchFound));    gc.collect()
    except:
        pass

    memoryused= [];    total_params= [];    GCPU= "";
    del memoryused, total_params, GCPU;    gc.collect()
    return BatchFound



#####################################################################################################
#####################################################################################################

如何计算最佳批量大小

5 个答案:

定义函数以查找用于训练模型的批量大小