tf.keras.models.predict-FailedPreconditionError

时间:2018-12-08 04:45:25

标签: tensorflow machine-learning keras

我用tf.keras构建和训练我的模型。由于实际代码很大,因此我将在此处粘贴相关部分,希望我可以解释该问题。感觉应该是一个简单的问题,但是我只是看不到它是什么(即使3小时后)。

这是我在训练期间保存模型的检查点定义:

saved_model_path = "tff_model.hdf5" 
checkpoint = tf.keras.callbacks.ModelCheckpoint(filepath=saved_model_path,
                                                monitor='val_loss', 
                                                verbose=0, 
                                                save_best_only=True, 
                                                save_weights_only=False, 
                                                mode='auto', 
                                                period=1)

为了进行培训,我按以下方式运行fit_generator函数:

history = merged_model.fit_generator(generator = train_generator,
                                     steps_per_epoch = 100,
                                     epochs = 3,
                                     verbose = 1,
                                     callbacks = [checkpoint],
                                     use_multiprocessing=False,
                                     workers=3,
                                     max_queue_size=4)

train_generator是我手动编写的线程安全生成器,并以以下格式返回批处理:

yield ([a, b, c, d], label)

从生成器返回数据的方式来看,该模型有4个输入,每个输入都是形状为(batch_size, m, n)的数组。

到目前为止一切正常。稍后,当我要加载模型并进行预测时,请执行以下操作...

首先加载模型:

my_model = tf.keras.models.load_model(saved_model_path)

然后运行生成器以获取1批数据,如下所示:

([a, b, c, d], label) = next(prediction_generator)
prediction_result = tff_model.predict([a, b, c, d], 
                                      batch_size = 1, 
                                      verbose = 1)

然后我收到以下错误,对我来说什么也没说。

  File "<ipython-input-1-c3f6e179d65a>", line 1, in <module>
    runfile('C:/Users/sinthes/Desktop/AI_Project/TFF/TFF_with_tfrecord_seq_model_2steps/tff_make_predictions.py', wdir='C:/Users/sinthes/Desktop/AI_Project/TFF/TFF_with_tfrecord_seq_model_2steps')

  File "C:\Users\sinthes\Anaconda3\envs\tensorflow-gpu-env\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 668, in runfile
    execfile(filename, namespace)

  File "C:\Users\sinthes\Anaconda3\envs\tensorflow-gpu-env\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 108, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "C:/Users/sinthes/Desktop/AI_Project/TFF/TFF_with_tfrecord_seq_model_2steps/tff_make_predictions.py", line 80, in <module>
    verbose = 1)

  File "C:\Users\sinthes\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\keras\engine\training.py", line 1878, in predict
    self, x, batch_size=batch_size, verbose=verbose, steps=steps)

  File "C:\Users\sinthes\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\keras\engine\training_arrays.py", line 326, in predict_loop
    batch_outs = f(ins_batch)

  File "C:\Users\sinthes\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\keras\backend.py", line 2986, in __call__
    run_metadata=self.run_metadata)

  File "C:\Users\sinthes\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 1439, in __call__
    run_metadata_ptr)

  File "C:\Users\sinthes\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\errors_impl.py", line 528, in __exit__
    c_api.TF_GetCode(self.status.status))

FailedPreconditionError: Error while reading resource variable cu_dnnlstm/kernel from Container: localhost. This could mean that the variable was uninitialized. Not found: Container localhost does not exist. (Could not find resource: localhost/cu_dnnlstm/kernel)
     [[{{node cu_dnnlstm/ReadVariableOp}} = ReadVariableOp[dtype=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](cu_dnnlstm/kernel)]]
     [[{{node dense_2/Softmax/_9}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_594_dense_2/Softmax", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

我尝试使用API​​中的功能predict_generator,它起作用了。但是,仅当我使用predict函数时,它才行不通。那里的人谁可能知道这里出了什么问题?

附录 让我困扰的一件事是,我已经在GPU上训练了模型,而数据生成器已经在CPU上运行。稍后重新加载它并运行预测时,我应该做一些特别的事情吗?事实是,数据生成器中正在运行一个单独的Tensorflow会话(以便能够使用tf.data API),我想知道这是否是引起一切的原因。 (我真的很期待下一个默认版本以急切模式运行的Tensorflow!)

0 个答案:

没有答案