Question

我用tf.keras构建和训练我的模型。由于实际代码很大，因此我将在此处粘贴相关部分，希望我可以解释该问题。感觉应该是一个简单的问题，但是我只是看不到它是什么（即使3小时后）。

这是我在训练期间保存模型的检查点定义：

saved_model_path = "tff_model.hdf5" 
checkpoint = tf.keras.callbacks.ModelCheckpoint(filepath=saved_model_path,
                                                monitor='val_loss', 
                                                verbose=0, 
                                                save_best_only=True, 
                                                save_weights_only=False, 
                                                mode='auto', 
                                                period=1)

为了进行培训，我按以下方式运行fit_generator函数：

history = merged_model.fit_generator(generator = train_generator,
                                     steps_per_epoch = 100,
                                     epochs = 3,
                                     verbose = 1,
                                     callbacks = [checkpoint],
                                     use_multiprocessing=False,
                                     workers=3,
                                     max_queue_size=4)

train_generator是我手动编写的线程安全生成器，并以以下格式返回批处理：

yield ([a, b, c, d], label)

从生成器返回数据的方式来看，该模型有4个输入，每个输入都是形状为(batch_size, m, n)的数组。

到目前为止一切正常。稍后，当我要加载模型并进行预测时，请执行以下操作...

首先加载模型：

my_model = tf.keras.models.load_model(saved_model_path)

然后运行生成器以获取1批数据，如下所示：

([a, b, c, d], label) = next(prediction_generator)
prediction_result = tff_model.predict([a, b, c, d], 
                                      batch_size = 1, 
                                      verbose = 1)

然后我收到以下错误，对我来说什么也没说。

  File "<ipython-input-1-c3f6e179d65a>", line 1, in <module>
    runfile('C:/Users/sinthes/Desktop/AI_Project/TFF/TFF_with_tfrecord_seq_model_2steps/tff_make_predictions.py', wdir='C:/Users/sinthes/Desktop/AI_Project/TFF/TFF_with_tfrecord_seq_model_2steps')

  File "C:\Users\sinthes\Anaconda3\envs\tensorflow-gpu-env\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 668, in runfile
    execfile(filename, namespace)

  File "C:\Users\sinthes\Anaconda3\envs\tensorflow-gpu-env\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 108, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "C:/Users/sinthes/Desktop/AI_Project/TFF/TFF_with_tfrecord_seq_model_2steps/tff_make_predictions.py", line 80, in <module>
    verbose = 1)

  File "C:\Users\sinthes\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\keras\engine\training.py", line 1878, in predict
    self, x, batch_size=batch_size, verbose=verbose, steps=steps)

  File "C:\Users\sinthes\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\keras\engine\training_arrays.py", line 326, in predict_loop
    batch_outs = f(ins_batch)

  File "C:\Users\sinthes\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\keras\backend.py", line 2986, in __call__
    run_metadata=self.run_metadata)

  File "C:\Users\sinthes\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 1439, in __call__
    run_metadata_ptr)

  File "C:\Users\sinthes\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\errors_impl.py", line 528, in __exit__
    c_api.TF_GetCode(self.status.status))

FailedPreconditionError: Error while reading resource variable cu_dnnlstm/kernel from Container: localhost. This could mean that the variable was uninitialized. Not found: Container localhost does not exist. (Could not find resource: localhost/cu_dnnlstm/kernel)
     [[{{node cu_dnnlstm/ReadVariableOp}} = ReadVariableOp[dtype=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](cu_dnnlstm/kernel)]]
     [[{{node dense_2/Softmax/_9}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_594_dense_2/Softmax", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

我尝试使用API中的功能predict_generator，它起作用了。但是，仅当我使用predict函数时，它才行不通。那里的人谁可能知道这里出了什么问题？

附录让我困扰的一件事是，我已经在GPU上训练了模型，而数据生成器已经在CPU上运行。稍后重新加载它并运行预测时，我应该做一些特别的事情吗？事实是，数据生成器中正在运行一个单独的Tensorflow会话（以便能够使用tf.data API），我想知道这是否是引起一切的原因。（我真的很期待下一个默认版本以急切模式运行的Tensorflow！）

tf.keras.models.predict-FailedPreconditionError

0 个答案: