我正在尝试使用TimeDistributed VGG16作为RNN的输入来训练模型,但是GPU分配了所有内存,但使用率保持在最小值。有时,使用率会提高,但会回到0%(使用watch每隔0.1s更新一次)。我该怎么做才能保证GPU的最大利用率?
nvidia-smi输出
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.01 Driver Version: 418.87.01 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... On | 00000000:00:17.0 Off | 0 |
| N/A 53C P0 62W / 300W | 15882MiB / 16130MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-SXM2... On | 00000000:00:18.0 Off | 0 |
| N/A 46C P0 61W / 300W | 15882MiB / 16130MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla V100-SXM2... On | 00000000:00:19.0 Off | 0 |
| N/A 47C P0 63W / 300W | 15882MiB / 16130MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla V100-SXM2... On | 00000000:00:1A.0 Off | 0 |
| N/A 53C P0 63W / 300W | 15882MiB / 16130MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 4 Tesla V100-SXM2... On | 00000000:00:1B.0 Off | 0 |
| N/A 56C P0 67W / 300W | 15882MiB / 16130MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 Tesla V100-SXM2... On | 00000000:00:1C.0 Off | 0 |
| N/A 50C P0 63W / 300W | 15882MiB / 16130MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 6 Tesla V100-SXM2... On | 00000000:00:1D.0 Off | 0 |
| N/A 47C P0 63W / 300W | 15882MiB / 16130MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 7 Tesla V100-SXM2... On | 00000000:00:1E.0 Off | 0 |
| N/A 52C P0 67W / 300W | 15882MiB / 16130MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 30696 C python 15871MiB |
| 1 30696 C python 15871MiB |
| 2 30696 C python 15871MiB |
| 3 30696 C python 15871MiB |
| 4 30696 C python 15871MiB |
| 5 30696 C python 15871MiB |
| 6 30696 C python 15871MiB |
| 7 30696 C python 15871MiB |
+-----------------------------------------------------------------------------+
编辑: 型号代码
model = Sequential()
model.add(TimeDistributed(Conv2D(filters=64, kernel_size=(3, 3), padding="same", activation="relu"), input_shape=(3, 224, 224, 3), name="Conv2D_1"))
model.add(TimeDistributed(Conv2D(filters=64, kernel_size=(3, 3), padding="same", activation="relu"), name="Conv2D_2"))
model.add(TimeDistributed(MaxPool2D(pool_size=(2, 2)), name="MaxPool2D_1"))
model.add(TimeDistributed(Conv2D(filters=128, kernel_size=(3, 3), padding="same", activation="relu"), name="Conv2D_3"))
model.add(TimeDistributed(Conv2D(filters=128, kernel_size=(3, 3), padding="same", activation="relu"), name="Conv2D_4"))
model.add(TimeDistributed(MaxPool2D(pool_size=(2, 2)), name="MaxPool2D_2"))
model.add(TimeDistributed(Conv2D(filters=256, kernel_size=(3, 3), padding="same", activation="relu"), name="Conv2D_5"))
model.add(TimeDistributed(Conv2D(filters=256, kernel_size=(3, 3), padding="same", activation="relu"), name="Conv2D_6"))
model.add(TimeDistributed(Conv2D(filters=256, kernel_size=(3, 3), padding="same", activation="relu"), name="Conv2D_7"))
model.add(TimeDistributed(MaxPool2D(pool_size=(2, 2)), name="MaxPool2D_3"))
model.add(TimeDistributed(Conv2D(filters=512, kernel_size=(3, 3), padding="same", activation="relu"), name="Conv2D_8"))
model.add(TimeDistributed(Conv2D(filters=512, kernel_size=(3, 3), padding="same", activation="relu"), name="Conv2D_9"))
model.add(TimeDistributed(Conv2D(filters=512, kernel_size=(3, 3), padding="same", activation="relu"), name="Conv2D_10"))
model.add(TimeDistributed(MaxPool2D(pool_size=(2, 2)), name="MaxPool2D_4"))
model.add(TimeDistributed(Conv2D(filters=512, kernel_size=(3, 3), padding="same", activation="relu"), name="Conv2D_11"))
model.add(TimeDistributed(Conv2D(filters=512, kernel_size=(3, 3), padding="same", activation="relu"), name="Conv2D_12"))
model.add(TimeDistributed(Conv2D(filters=512, kernel_size=(3, 3), padding="same", activation="relu"), name="Conv2D_13"))
model.add(TimeDistributed(MaxPool2D(pool_size=(2, 2)), name="MaxPool2D_5"))
model.add(TimeDistributed(Flatten(name="Flatten")))
model.add(GRU(6, name="Flatten"))
model.add(Dense(1, activation="sigmoid", name="Dense"))
model = multi_gpu_model(model, gpus=8, cpu_relocation=True)
model.compile(loss="binary_crossentropy", optimizer=opt, metrics="accuracy"])
编辑: 凯拉斯2.3.1 Tensorflow 1.14.0