我用tf.keras构建和训练我的模型。由于实际代码很大,因此我将在此处粘贴相关部分,希望我可以解释该问题。感觉应该是一个简单的问题,但是我只是看不到它是什么(即使3小时后)。
这是我在训练期间保存模型的检查点定义:
saved_model_path = "tff_model.hdf5"
checkpoint = tf.keras.callbacks.ModelCheckpoint(filepath=saved_model_path,
monitor='val_loss',
verbose=0,
save_best_only=True,
save_weights_only=False,
mode='auto',
period=1)
为了进行培训,我按以下方式运行fit_generator
函数:
history = merged_model.fit_generator(generator = train_generator,
steps_per_epoch = 100,
epochs = 3,
verbose = 1,
callbacks = [checkpoint],
use_multiprocessing=False,
workers=3,
max_queue_size=4)
train_generator是我手动编写的线程安全生成器,并以以下格式返回批处理:
yield ([a, b, c, d], label)
从生成器返回数据的方式来看,该模型有4个输入,每个输入都是形状为(batch_size, m, n)
的数组。
到目前为止一切正常。稍后,当我要加载模型并进行预测时,请执行以下操作...
首先加载模型:
my_model = tf.keras.models.load_model(saved_model_path)
然后运行生成器以获取1批数据,如下所示:
([a, b, c, d], label) = next(prediction_generator)
prediction_result = tff_model.predict([a, b, c, d],
batch_size = 1,
verbose = 1)
然后我收到以下错误,对我来说什么也没说。
File "<ipython-input-1-c3f6e179d65a>", line 1, in <module>
runfile('C:/Users/sinthes/Desktop/AI_Project/TFF/TFF_with_tfrecord_seq_model_2steps/tff_make_predictions.py', wdir='C:/Users/sinthes/Desktop/AI_Project/TFF/TFF_with_tfrecord_seq_model_2steps')
File "C:\Users\sinthes\Anaconda3\envs\tensorflow-gpu-env\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 668, in runfile
execfile(filename, namespace)
File "C:\Users\sinthes\Anaconda3\envs\tensorflow-gpu-env\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 108, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/sinthes/Desktop/AI_Project/TFF/TFF_with_tfrecord_seq_model_2steps/tff_make_predictions.py", line 80, in <module>
verbose = 1)
File "C:\Users\sinthes\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\keras\engine\training.py", line 1878, in predict
self, x, batch_size=batch_size, verbose=verbose, steps=steps)
File "C:\Users\sinthes\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\keras\engine\training_arrays.py", line 326, in predict_loop
batch_outs = f(ins_batch)
File "C:\Users\sinthes\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\keras\backend.py", line 2986, in __call__
run_metadata=self.run_metadata)
File "C:\Users\sinthes\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 1439, in __call__
run_metadata_ptr)
File "C:\Users\sinthes\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\errors_impl.py", line 528, in __exit__
c_api.TF_GetCode(self.status.status))
FailedPreconditionError: Error while reading resource variable cu_dnnlstm/kernel from Container: localhost. This could mean that the variable was uninitialized. Not found: Container localhost does not exist. (Could not find resource: localhost/cu_dnnlstm/kernel)
[[{{node cu_dnnlstm/ReadVariableOp}} = ReadVariableOp[dtype=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](cu_dnnlstm/kernel)]]
[[{{node dense_2/Softmax/_9}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_594_dense_2/Softmax", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
我尝试使用API中的功能predict_generator
,它起作用了。但是,仅当我使用predict
函数时,它才行不通。那里的人谁可能知道这里出了什么问题?
附录 让我困扰的一件事是,我已经在GPU上训练了模型,而数据生成器已经在CPU上运行。稍后重新加载它并运行预测时,我应该做一些特别的事情吗?事实是,数据生成器中正在运行一个单独的Tensorflow会话(以便能够使用tf.data API),我想知道这是否是引起一切的原因。 (我真的很期待下一个默认版本以急切模式运行的Tensorflow!)