AbortedError:找不到会话902faae618c6a822

时间:2019-10-31 10:27:26

标签: tensorflow keras tpu

这是我的代码,它不能与tpu一起工作,但是它可以在Google colab中的cpu / gpu模式下工作,我不知道为什么,有人知道原因吗?

我的tensorflow版本是1.15.0
x_train.shape为(13887,3000),y_train.shape为(13887,8)
还有
num_features = 301 + 1
sequence_length = 3000
embedding_dimension = 1000
非常感谢!

filter_sizes=[2,2,3,4]
def convolution():
    inn = layers.Input(shape=(sequence_length, embedding_dimension, 1))
    cnns = []
    for size in filter_sizes:
        conv = layers.Conv2D(filters=64, kernel_size=(size, embedding_dimension),
                            strides=1, padding='valid', activation='elu')(inn)
        pool = layers.MaxPool2D(pool_size=(sequence_length-size+1, 1), padding='valid')(conv)
        cnns.append(pool)
    outt = layers.concatenate(cnns)

    model = keras.Model(inputs=inn, outputs=outt)
    return model
def cnn_mulfilter():
    model = keras.Sequential([
        layers.Embedding(input_dim=num_features, output_dim=embedding_dimension,
                        input_length=sequence_length),
        layers.Reshape((sequence_length, embedding_dimension, 1)),
        convolution(),
        layers.Flatten(),
        layers.Dense(20, activation='elu'),
        layers.Dropout(0.2),
        layers.Dense(8, activation='softmax')

    ])
    return model
import os

resolver = tf.contrib.cluster_resolver.TPUClusterResolver('grpc://' + os.environ['COLAB_TPU_ADDR'])
tf.contrib.distribute.initialize_tpu_system(resolver)
strategy = tf.contrib.distribute.TPUStrategy(resolver)

with strategy.scope():
  model = cnn_mulfilter()
  model.compile(optimizer=keras.optimizers.Adam(1e-4),
                 loss=keras.losses.CategoricalCrossentropy(),
                 metrics=['accuracy'])

history = model.fit(x_train.astype(np.float32), y_train.astype(np.float32),
            steps_per_epoch=batch_size,
            epochs=30,
            validation_data=(x_test.astype(np.float32), y_test.astype(np.float32)),
            validation_freq=30)

错误如下

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py in _do_call(self, fn, *args)
   1364     try:
-> 1365       return fn(*args)
   1366     except errors.OpError as e:

10 frames
AbortedError: Session e6d3a6fe721e9009 is not found.

During handling of the above exception, another exception occurred:

AbortedError                              Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py in _do_call(self, fn, *args)
   1382                     '\nsession_config.graph_options.rewrite_options.'
   1383                     'disable_meta_optimizer = True')
-> 1384       raise type(e)(node_def, op, message)
   1385 
   1386   def _extend_graph(self):

AbortedError: Session e6d3a6fe721e9009 is not found.

1 个答案:

答案 0 :(得分:0)

问题在于该文件位于TPU不支持的本地文件系统中,但是错误消息却很奇怪。

移动到TFRecords可以解决问题。