从头开始实施VGG16时出现内存不足错误

时间:2020-08-20 05:48:45

标签: python tensorflow keras deep-learning conv-neural-network

我目前正在尝试使用ConvNets,并有2种不同的系统需要培训。第一个是我可以在工作中访问的GPU服务器,第二个是我要用于在家中玩耍的PC(露台工作站)。系统组件如下:

enter image description here

现在我知道GPU服务器具有更大的容量,但是我想确认问题是由于系统的容量,而不是因为我不小心安装了错误的东西。

为此,我使用以下代码从头实现了VGG16网络:

train_data_path = train_path
    train_datagen = ImageDataGenerator(rescale=1./255, validation_split=0.25)

    train_generator = train_datagen.flow_from_directory(
                train_data_path,
                target_size=(self.img_height, self.img_width),
                batch_size=self.batch_size,
                class_mode='categorical',
                subset='training')

    validation_generator = train_datagen.flow_from_directory(
                train_data_path,
                target_size=(self.img_height, self.img_width),
                batch_size=self.batch_size,
                class_mode='categorical',
                subset='validation')

    """trainingDataGenerator = ImageDataGenerator()
    train_generator = trainingDataGenerator.flow_from_directory("C:/Users/but/Desktop/dataScratch/Train", target_size=(384, 384))

    testDataGenerator = ImageDataGenerator()
    validation_generator = testDataGenerator.flow_from_directory("C:/Users/but/Desktop/dataScratch/Valid", target_size=(384, 384))"""

    model = Sequential()

    model.add(Conv2D(input_shape=(384, 384, 3), filters=64, kernel_size=(3,3), padding="same", activation="relu"))

    model.add(Conv2D(filters=64, kernel_size=(3, 3), padding="same", activation="relu"))

    #model.add(MaxPool2D(pool_size=(2,2), strides=(2,2)))

    model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))

    model.add(Conv2D(filters=128, kernel_size=(3,3), padding="same", activation="relu"))

    model.add(Conv2D(filters=128, kernel_size=(3,3), padding="same", activation="relu"))

    model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))

    model.add(Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"))

    model.add(Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"))

    model.add(Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"))

    model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))

    model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))

    model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))

    model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))

    model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))

    model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))

    model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))

    model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))

    model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))



    model.add(Flatten())

    model.add(Dense(units=4096,activation="relu"))
    
    model.add(Dropout(0.30))  # Dropout layer

    model.add(Dense(units=4096,activation="relu"))

    model.add(Dropout(0.20))  # Dropout layer

    model.add(Dense(units=self.classes_num, activation="sigmoid"))


    opt = Adam(lr=0.00001)
    #opt = RMSprop(lr=0.00001)
    model.compile(optimizer=opt, loss=keras.losses.binary_crossentropy, metrics=['acc'])

    model.summary()

此实现确实可以在GPU服务器上正常工作。历次值最大为64,批量大小为最大128。 在尝试工作之后,我想在terra工作站上在家进行尝试。但是我立即遇到了OOM错误:

Epoch 1/64
2020-08-20 07:38:57.111158: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-08-20 07:38:57.311523: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-08-20 07:38:58.143037: W tensorflow/stream_executor/gpu/redzone_allocator.cc:312] Internal: Invoking GPU asm compilation is supported on Cuda non-Windows platforms only
Relying on driver to perform ptx compilation. This message will be only logged once.
2020-08-20 07:38:58.193876: W tensorflow/core/common_runtime/bfc_allocator.cc:243] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.34GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-08-20 07:38:58.194023: W tensorflow/core/common_runtime/bfc_allocator.cc:243] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.34GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-08-20 07:38:58.371176: W tensorflow/core/common_runtime/bfc_allocator.cc:243] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.55GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-08-20 07:38:58.371345: W tensorflow/core/common_runtime/bfc_allocator.cc:243] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.55GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-08-20 07:38:58.420423: W tensorflow/core/common_runtime/bfc_allocator.cc:243] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.28GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-08-20 07:38:58.420580: W tensorflow/core/common_runtime/bfc_allocator.cc:243] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.28GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-08-20 07:38:58.453153: W tensorflow/core/common_runtime/bfc_allocator.cc:243] Allocator (GPU_0_bfc) ran out of memory trying to allocate 989.13MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-08-20 07:38:58.453308: W tensorflow/core/common_runtime/bfc_allocator.cc:243] Allocator (GPU_0_bfc) ran out of memory trying to allocate 989.13MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-08-20 07:38:58.497624: W tensorflow/core/common_runtime/bfc_allocator.cc:243] Allocator (GPU_0_bfc) ran out of memory trying to allocate 16.21MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-08-20 07:38:58.497789: W tensorflow/core/common_runtime/bfc_allocator.cc:243] Allocator (GPU_0_bfc) ran out of memory trying to allocate 16.21MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-08-20 07:38:58.498100: W tensorflow/core/kernels/gpu_utils.cc:48] Failed to allocate memory for convolution redzone checking; skipping this check. This is benign and only means that we won't check cudnn for out-of-bounds reads and writes. This message will only be printed once.
2020-08-20 07:39:08.560754: W tensorflow/core/common_runtime/bfc_allocator.cc:424] Allocator (GPU_0_bfc) ran out of memory trying to allocate 144.00MiB (rounded to 150994944).  Current allocation summary follows.
2020-08-20 07:39:08.560957: I tensorflow/core/common_runtime/bfc_allocator.cc:894] BFCAllocator dump for GPU_0_bfc
2020-08-20 07:39:08.561063: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (256):   Total Chunks: 134, Chunks in use: 134. 33.5KiB allocated for chunks. 33.5KiB in use in bin. 3.2KiB client-requested in use in bin.
2020-08-20 07:39:08.561254: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (512):   Total Chunks: 10, Chunks in use: 10. 5.0KiB allocated for chunks. 5.0KiB in use in bin. 5.0KiB client-requested in use in bin.
2020-08-20 07:39:08.561444: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (1024):  Total Chunks: 16, Chunks in use: 16. 16.3KiB allocated for chunks. 16.3KiB in use in bin. 16.0KiB client-requested in use in bin.
2020-08-20 07:39:08.561641: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (2048):  Total Chunks: 30, Chunks in use: 30. 61.5KiB allocated for chunks. 61.5KiB in use in bin. 60.0KiB client-requested in use in bin.
2020-08-20 07:39:08.561839: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (4096):  Total Chunks: 6, Chunks in use: 5. 40.5KiB allocated for chunks. 33.8KiB in use in bin. 33.8KiB client-requested in use in bin.
2020-08-20 07:39:08.562030: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (8192):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-08-20 07:39:08.562217: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (16384):     Total Chunks: 10, Chunks in use: 10. 167.3KiB allocated for chunks. 167.3KiB in use in bin. 160.0KiB client-requested in use in bin.
2020-08-20 07:39:08.562418: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (32768):     Total Chunks: 5, Chunks in use: 5. 160.0KiB allocated for chunks. 160.0KiB in use in bin. 160.0KiB client-requested in use in bin.
2020-08-20 07:39:08.562621: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (65536):     Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-08-20 07:39:08.562810: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (131072):    Total Chunks: 5, Chunks in use: 5. 731.5KiB allocated for chunks. 731.5KiB in use in bin. 720.0KiB client-requested in use in bin.
2020-08-20 07:39:08.563009: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (262144):    Total Chunks: 5, Chunks in use: 5. 1.51MiB allocated for chunks. 1.51MiB in use in bin. 1.41MiB client-requested in use in bin.
2020-08-20 07:39:08.563214: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (524288):    Total Chunks: 5, Chunks in use: 5. 2.81MiB allocated for chunks. 2.81MiB in use in bin. 2.81MiB client-requested in use in bin.
2020-08-20 07:39:08.563406: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (1048576):   Total Chunks: 5, Chunks in use: 5. 5.63MiB allocated for chunks. 5.63MiB in use in bin. 5.63MiB client-requested in use in bin.
2020-08-20 07:39:08.563599: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (2097152):   Total Chunks: 10, Chunks in use: 10. 22.50MiB allocated for chunks. 22.50MiB in use in bin. 22.50MiB client-requested in use in bin.
2020-08-20 07:39:08.564610: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (4194304):   Total Chunks: 5, Chunks in use: 5. 22.50MiB allocated for chunks. 22.50MiB in use in bin. 22.50MiB client-requested in use in bin.
2020-08-20 07:39:08.564904: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (8388608):   Total Chunks: 25, Chunks in use: 25. 229.04MiB allocated for chunks. 229.04MiB in use in bin. 225.00MiB client-requested in use in bin.
2020-08-20 07:39:08.565228: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (16777216):  Total Chunks: 1, Chunks in use: 1. 27.00MiB allocated for chunks. 27.00MiB in use in bin. 27.00MiB client-requested in use in bin.
2020-08-20 07:39:08.565421: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (33554432):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-08-20 07:39:08.565594: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (67108864):  Total Chunks: 6, Chunks in use: 6. 451.75MiB allocated for chunks. 451.75MiB in use in bin. 392.00MiB client-requested in use in bin.
2020-08-20 07:39:08.565962: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (134217728):     Total Chunks: 3, Chunks in use: 2. 431.18MiB allocated for chunks. 288.00MiB in use in bin. 288.00MiB client-requested in use in bin.
2020-08-20 07:39:08.566260: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (268435456):     Total Chunks: 9, Chunks in use: 9. 7.31GiB allocated for chunks. 7.31GiB in use in bin. 7.31GiB client-requested in use in bin.
2020-08-20 07:39:08.566552: I tensorflow/core/common_runtime/bfc_allocator.cc:917] Bin for 144.00MiB was 128.00MiB, Chunk State: 
2020-08-20 07:39:08.566667: I tensorflow/core/common_runtime/bfc_allocator.cc:923]   Size: 143.18MiB | Requested Size: 1.13MiB | in_use: 0 | bin_num: 19, prev:   Size: 144.00MiB | Requested Size: 144.00MiB | in_use: 1 | bin_num: -1
2020-08-20 07:39:08.566975: I tensorflow/core/common_runtime/bfc_allocator.cc:930] Next region of size 9104897280
2020-08-20 07:39:08.567135: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe00000 of size 1280 next 1
2020-08-20 07:39:08.567298: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe00500 of size 256 next 2
2020-08-20 07:39:08.567387: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe00600 of size 256 next 5
2020-08-20 07:39:08.567468: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe00700 of size 256 next 4
2020-08-20 07:39:08.567549: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe00800 of size 512 next 10
2020-08-20 07:39:08.567632: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe00a00 of size 512 next 12
2020-08-20 07:39:08.567715: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe00c00 of size 1024 next 16
2020-08-20 07:39:08.567784: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe01000 of size 1024 next 18
2020-08-20 07:39:08.567869: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe01400 of size 1024 next 20
2020-08-20 07:39:08.567943: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe01800 of size 2048 next 24
2020-08-20 07:39:08.567993: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe02000 of size 2048 next 26
2020-08-20 07:39:08.568086: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe02800 of size 2048 next 28
2020-08-20 07:39:08.568166: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe03000 of size 3584 next 3
2020-08-20 07:39:08.568245: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe03e00 of size 6912 next 6
2020-08-20 07:39:08.568325: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe05900 of size 2048 next 31
2020-08-20 07:39:08.568410: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe06100 of size 2048 next 34
2020-08-20 07:39:08.568492: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe06900 of size 16384 next 36
2020-08-20 07:39:08.568573: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe0a900 of size 16384 next 38
2020-08-20 07:39:08.568656: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe0e900 of size 256 next 41
2020-08-20 07:39:08.568734: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe0ea00 of size 256 next 44
2020-08-20 07:39:08.568817: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe0eb00 of size 256 next 46
2020-08-20 07:39:08.568899: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe0ec00 of size 256 next 47
2020-08-20 07:39:08.568979: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe0ed00 of size 256 next 48
2020-08-20 07:39:08.569059: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe0ee00 of size 256 next 49
2020-08-20 07:39:08.569143: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe0ef00 of size 256 next 50
2020-08-20 07:39:08.569222: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe0f000 of size 256 next 51
2020-08-20 07:39:08.569298: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe0f100 of size 6912 next 52
2020-08-20 07:39:08.569383: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe10c00 of size 256 next 53
2020-08-20 07:39:08.569465: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe10d00 of size 256 next 54
2020-08-20 07:39:08.569544: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe10e00 of size 512 next 55
2020-08-20 07:39:08.569622: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe11000 of size 512 next 57
2020-08-20 07:39:08.569700: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe11200 of size 1024 next 58
.
.
.
.
.
.
.
2020-08-20 07:39:08.592846: I tensorflow/core/common_runtime/bfc_allocator.cc:958] 2 Chunks of size 603979776 totalling 1.13GiB
2020-08-20 07:39:08.592924: I tensorflow/core/common_runtime/bfc_allocator.cc:958] 5 Chunks of size 1207959552 totalling 5.63GiB
2020-08-20 07:39:08.593012: I tensorflow/core/common_runtime/bfc_allocator.cc:962] Sum Total of in-use chunks: 8.34GiB
2020-08-20 07:39:08.593092: I tensorflow/core/common_runtime/bfc_allocator.cc:964] total_region_allocated_bytes_: 9104897280 memory_limit_: 9104897474 available bytes: 194 curr_region_allocation_bytes_: 18209795072
2020-08-20 07:39:08.593233: I tensorflow/core/common_runtime/bfc_allocator.cc:970] Stats: 
Limit:                  9104897474
InUse:                  8954752000
MaxInUse:               9104890368
NumAllocs:                     778
MaxAllocSize:           1207959552

2020-08-20 07:39:08.593416: W tensorflow/core/common_runtime/bfc_allocator.cc:429] ***************************************************************************************************_
2020-08-20 07:39:08.593504: W tensorflow/core/framework/op_kernel.cc:1655] OP_REQUIRES failed at conv_ops.cc:539 : Resource exhausted: OOM when allocating tensor with shape[16,256,96,96] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
2020-08-20 07:39:08.593626: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Resource exhausted: OOM when allocating tensor with shape[16,256,96,96] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
     [[{{node conv2d_6/convolution}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Traceback (most recent call last):
  File "C:/Users/but/PycharmProjects/ScratchClassifier/ScratchNet.py", line 147, in <module>
    x.model_create("C:/Users/but/Desktop/dataScratch/Train")
  File "C:/Users/but/PycharmProjects/ScratchClassifier/ScratchNet.py", line 108, in model_create
    hist = model.fit_generator(steps_per_epoch=self.batch_size, generator=train_generator, validation_data=validation_generator, validation_steps=8, epochs=self.epochs, callbacks=[checkpoint,early])
  File "C:\Users\but\Miniconda3\envs\kerasEnv\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\but\Miniconda3\envs\kerasEnv\lib\site-packages\keras\engine\training.py", line 1732, in fit_generator
    initial_epoch=initial_epoch)
  File "C:\Users\but\Miniconda3\envs\kerasEnv\lib\site-packages\keras\engine\training_generator.py", line 220, in fit_generator
    reset_metrics=False)
  File "C:\Users\but\Miniconda3\envs\kerasEnv\lib\site-packages\keras\engine\training.py", line 1514, in train_on_batch
    outputs = self.train_function(ins)
  File "C:\Users\but\Miniconda3\envs\kerasEnv\lib\site-packages\tensorflow_core\python\keras\backend.py", line 3727, in __call__
    outputs = self._graph_fn(*converted_inputs)
  File "C:\Users\but\Miniconda3\envs\kerasEnv\lib\site-packages\tensorflow_core\python\eager\function.py", line 1551, in __call__
    return self._call_impl(args, kwargs)
  File "C:\Users\but\Miniconda3\envs\kerasEnv\lib\site-packages\tensorflow_core\python\eager\function.py", line 1591, in _call_impl
    return self._call_flat(args, self.captured_inputs, cancellation_manager)
  File "C:\Users\but\Miniconda3\envs\kerasEnv\lib\site-packages\tensorflow_core\python\eager\function.py", line 1692, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "C:\Users\but\Miniconda3\envs\kerasEnv\lib\site-packages\tensorflow_core\python\eager\function.py", line 545, in call
    ctx=ctx)
  File "C:\Users\but\Miniconda3\envs\kerasEnv\lib\site-packages\tensorflow_core\python\eager\execute.py", line 67, in quick_execute
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.ResourceExhaustedError:  OOM when allocating tensor with shape[16,256,96,96] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
     [[node conv2d_6/convolution (defined at \Users\but\Miniconda3\envs\kerasEnv\lib\site-packages\keras\backend\tensorflow_backend.py:3009) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
 [Op:__inference_keras_scratch_graph_3860]

我什至无法训练批处理大小为1的批处理,在其他帖子中建议使用此批处理来查看它是否有效。然后,我尝试减少密集层中的神经元数量,将其设置为2024而不是4096,然后它起作用了。

我的问题是:密集层中的神经元数量对于硬件而言是否过多,还是您认为我配置有误?

2 个答案:

答案 0 :(得分:1)

您的模型将比Keras实现大,因为您的输入形状更大。 VGG-19通常采用形状为(224, 224, 3)的输入。

虽然我担心硬件问题不切实际,所以我不会去那里。

答案 1 :(得分:1)

您可以尝试不同的输入大小,每层中都有不同数量的神经元。那不是问题。这是查看模型如何作用于不同参数的好方法。但是有一个限制取决于您的硬件。如果您看到模型摘要,则您的模型具有3.33亿个参数!你能想象你的模型有多重吗?

原始VGG16具有1.38亿个参数。

问题是硬件的限制。您的GPU没有足够的内存来分配和加载所需的缓冲区,即使只有1张图像也是如此。您必须减小输入图像的大小或放置conv块(不是一个好主意),或者减少某些层的神经元数量。

但是,如果您确实想在当前的PC上训练该模型,则可以使用tensorflow-cpu而不是tensorflow-gpu(如果您的RAM较大)。但是请注意,训练将比GPU慢得多。