按照标题,在尝试使用Keras进行一些图像分类训练时,我遇到了这个常见错误。与几乎所有其他示例不同,我不是尝试自定义任何内容,而只是使用沼泽标准的keras功能! 像this一样,他问了类似的问题,但似乎没有跟进。
我以前在这个项目中有一个issue,但是在升级了cudnn和cudatoolkit(以及相关的NVidia后端)之后,出现了这个新错误。
Conda列表:
# packages in environment at /home/me/Programs/anaconda3/envs/hand-gesture:
#
# Name Version Build Channel
_libgcc_mutex 0.1 main
_tflow_select 2.1.0 gpu
absl-py 0.9.0 py37_0
alabaster 0.7.12 py37_0
argh 0.26.2 py37_0
asn1crypto 1.3.0 py37_0
astor 0.8.0 py37_0
astroid 2.3.3 py37_0
atomicwrites 1.3.0 py37_1
attrs 19.3.0 py_0
autopep8 1.4.4 py_0
babel 2.8.0 py_0
backcall 0.1.0 py37_0
blas 1.0 mkl
bleach 3.1.4 py_0
blinker 1.4 py37_0
bzip2 1.0.8 h7b6447c_0
c-ares 1.15.0 h7b6447c_1001
ca-certificates 2020.1.1 0
cachetools 3.1.1 py_0
cairo 1.14.12 h8948797_3
certifi 2020.4.5.1 py37_0
cffi 1.14.0 py37h2e261b9_0
chardet 3.0.4 py37_1003
click 7.1.1 py_0
cloudpickle 1.4.0 py_0
cryptography 2.8 py37h1ba5d50_0
cudatoolkit 10.1.243 h6bb024c_0
cudnn 7.6.5 cuda10.1_0
cupti 10.1.168 0
cycler 0.10.0 py37_0
dbus 1.13.12 h746ee38_0
decorator 4.4.2 py_0
defusedxml 0.6.0 py_0
diff-match-patch 20181111 py_0
docutils 0.16 py37_0
entrypoints 0.3 py37_0
expat 2.2.6 he6710b0_0
ffmpeg 4.0 hcdf2ecd_0
flake8 3.7.9 py37_0
fontconfig 2.13.0 h9420a91_0
freeglut 3.0.0 hf484d3e_5
freetype 2.9.1 h8a8886c_1
future 0.18.2 py37_0
gast 0.2.2 py37_0
glib 2.63.1 h5a9c865_0
gmp 6.1.2 h6c8ec71_1
google-auth 1.13.1 py_0
google-auth-oauthlib 0.4.1 py_2
google-pasta 0.2.0 py_0
graphite2 1.3.13 h23475e2_0
grpcio 1.27.2 py37hf8bcb03_0
gst-plugins-base 1.14.0 hbbd80ab_1
gstreamer 1.14.0 hb453b48_1
h5py 2.8.0 py37h3010b51_1003 conda-forge
harfbuzz 1.8.8 hffaf4a1_0
hdf5 1.10.2 hba1933b_1
icu 58.2 he6710b0_3
idna 2.9 py_1
imagesize 1.2.0 py_0
importlib_metadata 1.5.0 py37_0
intel-openmp 2020.0 166
intervaltree 3.0.2 py_0
ipykernel 5.1.4 py37h39e3cac_0
ipython 7.13.0 py37h5ca1d4c_0
ipython_genutils 0.2.0 py37_0
isort 4.3.21 py37_0
jasper 2.0.14 h07fcdf6_1
jedi 0.15.2 py37_0
jeepney 0.4.3 py_0
jinja2 2.11.2 py_0
jpeg 9b h024ee3a_2
jsonschema 3.2.0 py37_0
jupyter_client 6.1.3 py_0
jupyter_core 4.6.3 py37_0
keras 2.3.1 0
keras-applications 1.0.8 py_0
keras-base 2.3.1 py37_0
keras-gpu 2.3.1 0
keras-preprocessing 1.1.0 py_1
keyring 21.1.1 py37_2
kiwisolver 1.2.0 py37hfd86e86_0
lazy-object-proxy 1.4.3 py37h7b6447c_0
ld_impl_linux-64 2.33.1 h53a641e_7
libedit 3.1.20181209 hc058e9b_0
libffi 3.2.1 hd88cf55_4
libgcc-ng 9.1.0 hdf63c60_0
libgfortran-ng 7.3.0 hdf63c60_0
libglu 9.0.0 hf484d3e_1
libopencv 3.4.2 hb342d67_1
libopus 1.3.1 h7b6447c_0
libpng 1.6.37 hbc83047_0
libprotobuf 3.11.4 hd408876_0
libsodium 1.0.16 h1bed415_0
libspatialindex 1.9.3 he6710b0_0
libstdcxx-ng 9.1.0 hdf63c60_0
libtiff 4.1.0 h2733197_0
libuuid 1.0.3 h1bed415_2
libvpx 1.7.0 h439df22_0
libxcb 1.13 h1bed415_1
libxml2 2.9.9 hea5a465_1
markdown 3.1.1 py37_0
markupsafe 1.1.1 py37h7b6447c_0
matplotlib 3.1.3 py37_0
matplotlib-base 3.1.3 py37hef1b27d_0
mccabe 0.6.1 py37_1
mistune 0.8.4 py37h7b6447c_0
mkl 2020.0 166
mkl-service 2.3.0 py37he904b0f_0
mkl_fft 1.0.15 py37ha843d7b_0
mkl_random 1.1.0 py37hd6b4f25_0
nbconvert 5.6.1 py37_0
nbformat 5.0.4 py_0
ncurses 6.2 he6710b0_1
numpy 1.18.1 py37h4f9e942_0
numpy-base 1.18.1 py37hde5b4d6_1
numpydoc 0.9.2 py_0
oauthlib 3.1.0 py_0
olefile 0.46 py37_0
opencv 3.4.2 py37h6fd60c2_1
openssl 1.1.1g h7b6447c_0
opt_einsum 3.1.0 py_0
packaging 20.3 py_0
pandoc 2.2.3.2 0
pandocfilters 1.4.2 py37_1
parso 0.5.2 py_0
pathtools 0.1.2 py_1
pcre 8.43 he6710b0_0
pexpect 4.8.0 py37_0
pickleshare 0.7.5 py37_0
pillow 7.0.0 py37hb39fc2d_0
pip 20.0.2 py37_1
pixman 0.38.0 h7b6447c_0
pluggy 0.13.1 py37_0
prompt-toolkit 3.0.4 py_0
prompt_toolkit 3.0.4 0
protobuf 3.11.4 py37he6710b0_0
psutil 5.7.0 py37h7b6447c_0
ptyprocess 0.6.0 py37_0
py-opencv 3.4.2 py37hb342d67_1
pyasn1 0.4.8 py_0
pyasn1-modules 0.2.7 py_0
pycodestyle 2.5.0 py37_0
pycparser 2.20 py_0
pydocstyle 4.0.1 py_0
pyflakes 2.1.1 py37_0
pygments 2.6.1 py_0
pyjwt 1.7.1 py37_0
pylint 2.5.0 py37_0
pyopenssl 19.1.0 py37_0
pyparsing 2.4.7 py_0
pyqt 5.9.2 py37h05f1152_2
pyrsistent 0.16.0 py37h7b6447c_0
pysocks 1.7.1 py37_0
python 3.7.7 hcf32534_0_cpython
python-dateutil 2.8.1 py_0
python-jsonrpc-server 0.3.4 py_0
python-language-server 0.31.10 py37_0
pytz 2019.3 py_0
pyxdg 0.26 py_0
pyyaml 5.3.1 py37h7b6447c_0
pyzmq 18.1.1 py37he6710b0_0
qdarkstyle 2.8.1 py_0
qt 5.9.7 h5867ecd_1
qtawesome 0.7.0 py_0
qtconsole 4.7.3 py_0
qtpy 1.9.0 py_0
readline 8.0 h7b6447c_0
requests 2.23.0 py37_0
requests-oauthlib 1.3.0 py_0
rope 0.16.0 py_0
rsa 4.0 py_0
rtree 0.9.4 py37_1
scipy 1.4.1 py37h0b6359f_0
secretstorage 3.1.2 py37_0
setuptools 46.1.3 py37_0
sip 4.19.8 py37hf484d3e_0
six 1.14.0 py37_0
snowballstemmer 2.0.0 py_0
sortedcontainers 2.1.0 py37_0
sphinx 3.0.3 py_0
sphinxcontrib-applehelp 1.0.2 py_0
sphinxcontrib-devhelp 1.0.2 py_0
sphinxcontrib-htmlhelp 1.0.3 py_0
sphinxcontrib-jsmath 1.0.1 py_0
sphinxcontrib-qthelp 1.0.3 py_0
sphinxcontrib-serializinghtml 1.1.4 py_0
spyder 4.1.2 py37_0
spyder-kernels 1.9.0 py37_0
sqlite 3.31.1 h62c20be_1
tensorboard 2.1.0 py3_0
tensorflow 2.1.0 gpu_py37h7a4bb67_0
tensorflow-base 2.1.0 gpu_py37h6c5654b_0
tensorflow-estimator 2.1.0 pyhd54b08b_0
tensorflow-gpu 2.1.0 h0d30ee6_0
termcolor 1.1.0 py37_1
testpath 0.4.4 py_0
tk 8.6.8 hbc83047_0
toml 0.10.0 py37h28b3542_0
tornado 6.0.4 py37h7b6447c_1
traitlets 4.3.3 py37_0
ujson 1.35 py37h14c3975_0
urllib3 1.25.8 py37_0
watchdog 0.10.2 py37_0
wcwidth 0.1.9 py_0
webencodings 0.5.1 py37_1
werkzeug 1.0.1 py_0
wheel 0.34.2 py37_0
wrapt 1.12.1 py37h7b6447c_1
wurlitzer 2.0.0 py37_0
xz 5.2.5 h7b6447c_0
yaml 0.1.7 had09818_2
yapf 0.28.0 py_0
zeromq 4.3.1 he6710b0_3
zipp 3.1.0 py_0
zlib 1.2.11 h7b6447c_3
zstd 1.3.7 h0b5b093_0
代码
import os
import glob
import shutil
import pickle
import cv2
import numpy as np
import matplotlib.pyplot as plt
import random
from IPython.display import display
from PIL import Image
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, BatchNormalization, Activation
from keras.layers.convolutional import Conv2D, MaxPooling2D
from keras.layers.convolutional import Conv3D, MaxPooling3D
from keras.constraints import maxnorm
from keras.utils import np_utils
from keras.preprocessing.image import ImageDataGenerator
import tensorflow as tf
os.environ["CUDA_VISIBLE_DEVICES"]="1"
# read in the training and validation labels
trainPairs = np.genfromtxt('/home/me/Videos/sign_language/jester-v1-train.csv', delimiter=';', skip_header=0, dtype=[('class', 'S12'),('sign','S50')])
trainLabels = [v for k,v in trainPairs]
validPairs = np.genfromtxt('/home/me/Videos/sign_language/jester-v1-validation.csv', delimiter=';', skip_header=0, dtype=[('class', 'S12'),('sign','S50')])
validLabels = [v for k,v in validPairs]
def copyDirectory(src, dest):
try:
shutil.copytree(src, dest)
# Directories are the same
except shutil.Error as e:
print('Directory not copied. Error: %s' % e)
# Any error saying that the directory doesn't exist
except OSError as e:
print('Directory not copied. Error: %s' % e)
source = '/media/me/other/20bn-jester-v1/'
dest = '/media/me/other/jester/validation/'
# counter = 0
# for k,v in validPairs:
# counter = counter + 1
# source_folder = source + k.decode("utf-8")
# dest_folder = dest + v.decode("utf-8") + "/" + k.decode("utf-8")
# if counter%100 == 0:
# print(k)
# print(v)
# print(counter)
# print(source_folder)
# print(dest_folder)
# if os.path.isdir(source_folder):
# if os.path.isdir(dest + v.decode("utf-8")):
# copyDirectory(source_folder, dest_folder)
# if counter%1000 == 0:
# print(counter)
datagen = ImageDataGenerator(rescale=1./255)
train_it = datagen.flow_from_directory('/media/me/other/jester/train/',
class_mode='categorical',
batch_size=16
)
valid_it = datagen.flow_from_directory('/media/me/other/jester/validation/', class_mode='categorical', batch_size=16)
# test_it = datagen.flow_from_directory('/media/me/other/jester/test/', class_mode='binary', batch_size=64)
seed = 21
epochs = 5
optimizer = 'Adamax'
with tf.device("/cpu:0"):
model = Sequential()
#model = Sequential()
#model.add(Conv2D(32,(3,3), input_shape=(X_train.shape[1:]), padding='same'))
#TODO is this the right shape??
model.add(Conv2D(32,(16,16), strides=(8,8), input_shape=(256, 256, 3), padding='same'))
model.add(Activation('relu'))
#model.add(MaxPooling2D(pool_size=(2, 2), strides=None, padding='valid', data_format=None))
model.add(Conv2D(64, (3,3), input_shape=(3,16,16), activation='relu', padding='same'))
model.add(Dropout(0.2))
model.add(BatchNormalization())
#model.add(Conv2D(64, (3,3), padding='same'))
#model.add(Activation('relu'))
#model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.2))
model.add(BatchNormalization())
model.add(Conv2D(128, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(BatchNormalization())
model.add(Flatten())
model.add(Dropout(0.2))
#model.add(Dense(256, kernel_constraint=maxnorm(3)))
#model.add(Activation('relu'))
#model.add(Dropout(0.2))
#model.add(BatchNormalization())
model.add(Dense(128, kernel_constraint=maxnorm(3)))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(BatchNormalization())
#TODO make this a variable
model.add(Dense(27))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])
for layer in model.layers:
print(layer.output_shape)
print(model.summary())
np.random.seed(seed)
image_batch_train, label_batch_train = next(iter(train_it))
print("Image batch shape: ", image_batch_train.shape)
print("Label batch shape: ", label_batch_train.shape)
dataset_labels = sorted(train_it.class_indices.items(), key=lambda pair:pair[1])
dataset_labels = np.array([key.title() for key, value in dataset_labels])
print(dataset_labels)
from keras import backend as K
K.clear_session()
import keras
keras.backend.clear_session()
model.fit_generator(train_it, steps_per_epoch=16, validation_data=valid_it, validation_steps=8)
日志
from keras import backend as K
K.clear_session()
import keras
keras.backend.clear_session()
model.fit_generator(train_it, steps_per_epoch=16, validation_data=valid_it, validation_steps=8)
Traceback (most recent call last):
File "<ipython-input-19-ba2ec4f0a2a8>", line 8, in <module>
model.fit_generator(train_it, steps_per_epoch=16, validation_data=valid_it, validation_steps=8)
File "/home/me/Programs/anaconda3/envs/hand-gesture/lib/python3.7/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "/home/me/Programs/anaconda3/envs/hand-gesture/lib/python3.7/site-packages/keras/engine/training.py", line 1732, in fit_generator
initial_epoch=initial_epoch)
File "/home/me/Programs/anaconda3/envs/hand-gesture/lib/python3.7/site-packages/keras/engine/training_generator.py", line 42, in fit_generator
model._make_train_function()
File "/home/me/Programs/anaconda3/envs/hand-gesture/lib/python3.7/site-packages/keras/engine/training.py", line 316, in _make_train_function
loss=self.total_loss)
File "/home/me/Programs/anaconda3/envs/hand-gesture/lib/python3.7/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "/home/me/Programs/anaconda3/envs/hand-gesture/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py", line 75, in symbolic_fn_wrapper
return func(*args, **kwargs)
File "/home/me/Programs/anaconda3/envs/hand-gesture/lib/python3.7/site-packages/keras/optimizers.py", line 598, in get_updates
grads = self.get_gradients(loss, params)
File "/home/me/Programs/anaconda3/envs/hand-gesture/lib/python3.7/site-packages/keras/optimizers.py", line 93, in get_gradients
raise ValueError('An operation has `None` for gradient. '
ValueError: An operation has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
编辑1: 遵循Matias的建议并删除
from keras import backend as K
K.clear_session()
import keras
keras.backend.clear_session()
允许我运行一个纪元,但是现在我知道了
Epoch 1/1
16/16 [==============================] - 6s 370ms/step - loss: 4.0208 - accuracy: 0.0391 - val_loss: 7.3795 - val_accuracy: 0.0469
Out[3]: <keras.callbacks.callbacks.History at 0x7f817e63c2d0>
编辑2:
正如Matias指出的那样,我的代码仅设置为运行1个纪元。因此,删除clear_session()
可以解决我的问题。
答案 0 :(得分:1)
我认为问题在于您在训练模型之前要清除会话,这样做是没有意义的,因为清除会话会清除内存中的模型结构,因此TensorFlow端将没有模型表示,因此训练失败。
在这种情况下,请不要使用K.clear_session()
。似乎不需要。