每当从Tensorflow-hub加载模型时,Colab内核都会重新启动

时间:2019-08-30 07:06:54

标签: tensorflow tensorflow-hub

我想尝试tensorflow-hub中提供的嵌入,具体来说就是“通用语句编码器”。我尝试了提供的示例(https://colab.research.google.com/github/tensorflow/hub/blob/master/examples/colab/semantic_similarity_with_tf_hub_universal_encoder.ipynb) 而且效果很好。因此,我尝试对“多语言”模型执行相同的操作,但是每次加载多语言模型时,colab内核都会失败并重新启动。有什么问题,我该如何解决?

import tensorflow as tf
import tensorflow_hub as hub
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd
import re
import seaborn as sns
import tf_sentencepiece
import sentencepiece

# Import the Universal Sentence Encoder's TF Hub module
embed = hub.Module("https://tfhub.dev/google/universal-sentence-encoder-multilingual/1") // This is where the kernel dies.
print("imported model")
# Compute a representation for each message, showing various lengths supported.
word = "코끼리"
sentence = "나는 한국어로 쓰여진 문장이야."
paragraph = (
    "동해물과 백두산이 마르고 닳도록. "
    "하느님이 보우하사 우리나라 만세~")
messages = [word, sentence, paragraph]

# Reduce logging output.
tf.logging.set_verbosity(tf.logging.ERROR)

with tf.Session() as session:
  session.run([tf.global_variables_initializer(), tf.tables_initializer()])
  message_embeddings = session.run(embed(messages))

  for i, message_embedding in enumerate(np.array(message_embeddings).tolist()):
    print("Message: {}".format(messages[i]))
    print("Embedding size: {}".format(len(message_embedding)))
    message_embedding_snippet = ", ".join(
        (str(x) for x in message_embedding[:3]))
    print("Embedding: [{}, ...]\n".format(message_embedding_snippet))

1 个答案:

答案 0 :(得分:0)

我对多语言句子编码器有类似的问题。我通过将tensorflow版本指定为1.14.0并将tf-sentencepiece指定为0.1.83来解决它,因此在colab中运行代码之前,请尝试:

!pip3 install tensorflow==1.14.0
!pip3 install tensorflow-hub
!pip3 install sentencepiece
!pip3 install tf-sentencepiece==0.1.83

我能够在colab中复制您的问题,并且此解决方案正确加载了模型:

enter image description here

这似乎是句子和张量流之间的兼容性问题,请检查有关此问题的here更新。 让我们知道怎么回事。祝您好运,我希望这会有所帮助。

编辑:如果tensorflow 1.14.0版本不起作用,请将其更改为1.13.1 。一旦确定了张量流与句子之间的兼容性,便应解决该问题。