Question

我遇到了Keras的问题。基本上，当我尝试使用conv2d图层拟合模型时，它会出现以下错误“细分错误（核心已转储）”。

我的代码在CPU上工作。它也可以在没有任何conv2d层的情况下工作（即使它对我的用例无效）。我已经安装了cuda，cudnn和tensorflow。我尝试过重新安装keras和tensorflow。

代码：

def model_build():
    model = Sequential()
    model.add(Conv2D(input_shape = (env_size()[0], env_size()[1], 1), filters=4, kernel_size=(3,3), strides=1, activation=swisher))
    model.add(Conv2D(filters=4, kernel_size=(5,5), strides=1, activation=swisher))
    model.add(Conv2D(filters=4, kernel_size=(5,5), strides=1, activation=swisher))
    model.add(Conv2D(filters=4, kernel_size=(5,5), strides=1, activation=swisher))
    model.add(Flatten())
    model.add(Dense(128, activation='softmax'))
    model.add(Dense(4, activation='softmax'))
    return model

if __name__ == '__main__':
    y = model_build()
    y.compile(loss = "mean_squared_error", optimizer = 'adam')
    y.fit(x=env(), y = np.array([[0,0,0,0]])

错误：

Using TensorFlow backend.
Epoch 1/1
2019-03-27 05:52:27.687323: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-03-27 05:52:27.789975: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-03-27 05:52:27.790819: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1411] Found device 0 with properties:
name: GeForce RTX 2060 major: 7 minor: 5 memoryClockRate(GHz): 1.83
pciBusID: 0000:01:00.0
totalMemory: 5.73GiB freeMemory: 5.40GiB
2019-03-27 05:52:27.790834: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1490] Adding visible gpu devices: 0
2019-03-27 05:52:28.068080: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-03-27 05:52:28.068115: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977]      0
2019-03-27 05:52:28.068121: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0:   N
2019-03-27 05:52:28.068487: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1103] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5147 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2060, pci bus id: 0000:01:00.0, compute capability: 7.5)
2019-03-27 05:52:28.177752: W tensorflow/core/framework/allocator.cc:113] Allocation of 518619136 exceeds 10% of system memory.
2019-03-27 05:52:28.337277: W tensorflow/core/framework/allocator.cc:113] Allocation of 518619136 exceeds 10% of system memory.
2019-03-27 05:52:28.500486: W tensorflow/core/framework/allocator.cc:113] Allocation of 518619136 exceeds 10% of system memory.
2019-03-27 05:52:28.586280: W tensorflow/core/framework/allocator.cc:113] Allocation of 518619136 exceeds 10% of system memory.
2019-03-27 05:52:28.675738: W tensorflow/core/framework/allocator.cc:113] Allocation of 518619136 exceeds 10% of system memory.
Segmentation fault (core dumped)

编辑：

独立的示例。

import numpy as np
import keras

model = keras.models.Sequential() #Sequential model type.
model.add(keras.layers.Conv2D(filters=1, kernel_size=(3,3), strides = 1, activation="sigmoid")) #Convolutional layer.
model.add(keras.layers.Flatten()) #Flatten layer.
model.add(keras.layers.Dense(4)) #Dense layer of 4 units.
model.compile(loss='mean_squared_error', optimizer='adam') #compile model.
y = np.random.rand(1,4) #Random expected output
x = np.random.rand(1, 38, 21, 1) # Random input.
model.fit(x, y) #And fit...

EDIT2：

Keras版本：“ v2.1.6-tf”

Tensorflow-GPU版本：“ v1.12”

Python版本：“ v3.5.2”

CUDA版本：“ v9.0.176”

CUDNN版本：'v7.2.1.38-1 + cuda9.0

Ubuntu版本：“ v16.04”

Answer 1

您的GPU似乎没有足够的内存。您的模型似乎不太大，所以我想问题可能出在线路上：

y.fit(x=env(), y = np.array([[0,0,0,0]])

env()的输出可能太大，无法由您的GPU内存处理。

Answer 2

您的MWE对我来说工作正常（如果我将, input_shape=(38, 21, 1)添加到第一卷积层中）：

import numpy as np
import keras

model = keras.models.Sequential() #Sequential model type.
model.add(keras.layers.Conv2D(filters=1, kernel_size=(3,3), strides = 1, activation="sigmoid", input_shape=(38, 21, 1))) #Convolutional layer.
model.add(keras.layers.Flatten()) #Flatten layer.
model.add(keras.layers.Dense(4)) #Dense layer of 4 units.
model.compile(loss='mean_squared_error', optimizer='adam') #compile model.
y = np.random.rand(2, 4) #Random expected output
x = np.random.rand(2, 38, 21, 1) # Random input.
model.fit(x, y)

这意味着您的问题必须来自系统或安装。

查看compatibility chart of tensorflow表明您的python，tensorflow和CUDA版本应该兼容。

对于您的配置，建议使用cuDNN版本7.0.x。您使用的cuDNN版本7.2可能不兼容。尝试安装/使用cuDNN 7.0.x。

如何修复Keras中的“分段错误（核心已转储）”错误

2 个答案: