Question

我创建了如下模型（与theano后端的keras）。当我在我的CPU上运行它时，它给了我内存错误。我有8GB DDR3内存，在调用model1.fit之前我的内存消耗为2.3 GB。此外，我可以使用高达7.5GB的RAM，程序崩溃。我也试过在GPU（Nvedia GeForce GTX 860M）4GB上运行但仍然出现内存错误。


  C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\extras\demo_suite>deviceQuery.exe
  deviceQuery.exe Starting...
   CUDA Device Query (Runtime API) version (CUDART static linking)
  Detected 1 CUDA Capable device(s)
  Device 0: "GeForce GTX 970"
    CUDA Driver Version / Runtime Version          8.0 / 8.0
    CUDA Capability Major/Minor version number:    5.2
    Total amount of global memory:                 4096 MBytes (4294967296 bytes)
    (13) Multiprocessors, (128) CUDA Cores/MP:     1664 CUDA Cores
    GPU Max Clock rate:                            1266 MHz (1.27 GHz)
    Memory Clock rate:                             3505 Mhz
    Memory Bus Width:                              256-bit
    L2 Cache Size:                                 1835008 bytes
    Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
    Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
    Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
    Total amount of constant memory:               65536 bytes
    Total amount of shared memory per block:       49152 bytes
    Total number of registers available per block: 65536
    Warp size:                                     32
    Maximum number of threads per multiprocessor:  2048
    Maximum number of threads per block:           1024
    Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
    Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
    Maximum memory pitch:                          2147483647 bytes
    Texture alignment:                             512 bytes
    Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
    Run time limit on kernels:                     Yes
    Integrated GPU sharing Host Memory:            No
    Support host page-locked memory mapping:       Yes
    Alignment requirement for Surfaces:            Yes
    Device has ECC support:                        Disabled
    CUDA Device Driver Mode (TCC or WDDM):         WDDM (Windows Display Driver Model)
    Device supports Unified Addressing (UVA):      Yes
    Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
    Compute Mode:
       < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
  deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GeForce GTX 970
  Result = PASS

当我尝试打印model.summary（）时的输出是

def get_model_convolutional():
    model = keras.models.Sequential()
    model.add(Conv2D(128, (3, 3), activation='relu', strides = (1,1), input_shape=(1028, 1028, 3)))
    model.add(Conv2D(3, (3, 3), strides = (1,1), activation=None))
    sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
    model.compile(loss='categorical_crossentropy', optimizer=sgd)
    return model

if __name__ == "__main__":
    model1 = get_model_convolutional()
    train_x = np.ones((108, 1208, 1208, 3), dtype=np.uint8)
    train_y = np.ones((108, 1204, 1204, 3), dtype = np.uint8)    
    model1.fit(x_train, y_train, verbose = 2,epochs=20, batch_size=4)

为什么需要这么多内存？我试图计算，但我认为应该需要大约1.5GB的内存。这是我的第一个模特。

实现CNN时keras中的内存错误

0 个答案: