如何保存在TPU上训练的Keras模型?

时间:2018-12-26 21:16:31

标签: tensorflow keras google-colaboratory google-cloud-tpu

我正在使用Colab环境通过lstm模型进行实验。但是无法保存经过训练的模型。

sess = tf.keras.backend.get_session()

training_model = lstm_model(seq_len=100, batch_size=128, stateful=False)

tpu_model = tf.contrib.tpu.keras_to_tpu_model(  training_model,
    strategy=tf.contrib.tpu.TPUDistributionStrategy(
    tf.contrib.cluster_resolver.TPUClusterResolver(TPU_WORKER)))


tpu_model.fit_generator( training_generator(seq_len=100, batch_size=1024),    steps_per_epoch=100,
epochs=4
)

export_path = '/content/output/'
tf.saved_model.simple_save(
    sess,
    export_path,
    inputs={'input_image': tpu_model.input},
    outputs={t.name: t for t in tpu_model.outputs})

这里是例外:

FailedPreconditionError                   Traceback (most recent call last)
<ipython-input-13-020e67d3772b> in <module>()
 29         export_path,
 30         inputs={'input_image': tpu_model.input},
---> 31         outputs={t.name: t for t in tpu_model.outputs})


...skipped....

FailedPreconditionError: Error while reading resource variable. This could mean that the variable was uninitialized. Not found: Resource worker/TFOptimizer/iterations/N10tensorflow3VarE does not exist.
 [[{{node ReadVariables_8976001795006639924/_2}} = _ReadVariablesOp[N=40, dtypes=[DT_INT64, DT_INT64, DT_INT64, DT_INT64, DT_INT64, ..., DT_FLOAT, DT_INT64, DT_INT64, DT_INT64, DT_INT64], _device="/job:worker/replica:0/task:0/device:CPU:0"](VarHandles_14315951673884632260/_0, VarHandles_14315951673884632260/_0:1, VarHandles_14315951673884632260/_0:2, VarHandles_14315951673884632260/_0:3, VarHandles_14315951673884632260/_0:4, VarHandles_14315951673884632260/_0:5, VarHandles_14315951673884632260/_0:6, VarHandles_14315951673884632260/_0:7, VarHandles_14315951673884632260/_0:8, VarHandles_14315951673884632260/_0:9, VarHandles_14315951673884632260/_0:10, VarHandles_14315951673884632260/_0:11, VarHandles_14315951673884632260/_0:12, VarHandles_14315951673884632260/_0:13, VarHandles_14315951673884632260/_0:14, VarHandles_14315951673884632260/_0:15, VarHandles_14315951673884632260/_0:16, VarHandles_14315951673884632260/_0:17, VarHandles_14315951673884632260/_0:18, VarHandles_14315951673884632260/_0:19, VarHandles_14315951673884632260/_0:20, VarHandles_14315951673884632260/_0:21, VarHandles_14315951673884632260/_0:22, VarHandles_14315951673884632260/_0:23, VarHandles_14315951673884632260/_0:24, VarHandles_14315951673884632260/_0:25, VarHandles_14315951673884632260/_0:26, VarHandles_14315951673884632260/_0:27, VarHandles_14315951673884632260/_0:28, VarHandles_14315951673884632260/_0:29, VarHandles_14315951673884632260/_0:30, VarHandles_14315951673884632260/_0:31, VarHandles_14315951...
 [[{{node ReadVariables_16894311020792346126/_3_G1412}} = _Send[T=DT_FLOAT, client_terminated=false, recv_device="/job:worker/replica:0/task:0/device:CPU:0", send_device="/job:worker/replica:0/task:0/device:TPU:0", send_device_incarnation=8311516724619575166, tensor_name="edge_133_ReadVariables_16894311020792346126/_3", _device="/job:worker/replica:0/task:0/device:TPU:0"](ReadVariables_16894311020792346126/_3:8)]]

请告知

1 个答案:

答案 0 :(得分:0)

如果您将tf.saved_model.simple_save()通话替换为例如,

tpu_model.save_weights(os.path.join(export_path, 'weights.h5'), overwrite=True)

https://colab.research.google.com/github/tensorflow/tpu/blob/master/tools/colab/shakespeare_with_tpu_and_keras.ipynb#scrollTo=ExQ922tfzSGA中一样?

(该示例和其他示例从https://colab.research.google.com/notebooks/tpu.ipynb的底部链接)