我想尝试tensorflow-hub中提供的嵌入,具体来说就是“通用语句编码器”。我尝试了提供的示例(https://colab.research.google.com/github/tensorflow/hub/blob/master/examples/colab/semantic_similarity_with_tf_hub_universal_encoder.ipynb) 而且效果很好。因此,我尝试对“多语言”模型执行相同的操作,但是每次加载多语言模型时,colab内核都会失败并重新启动。有什么问题,我该如何解决?
import tensorflow as tf
import tensorflow_hub as hub
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd
import re
import seaborn as sns
import tf_sentencepiece
import sentencepiece
# Import the Universal Sentence Encoder's TF Hub module
embed = hub.Module("https://tfhub.dev/google/universal-sentence-encoder-multilingual/1") // This is where the kernel dies.
print("imported model")
# Compute a representation for each message, showing various lengths supported.
word = "코끼리"
sentence = "나는 한국어로 쓰여진 문장이야."
paragraph = (
"동해물과 백두산이 마르고 닳도록. "
"하느님이 보우하사 우리나라 만세~")
messages = [word, sentence, paragraph]
# Reduce logging output.
tf.logging.set_verbosity(tf.logging.ERROR)
with tf.Session() as session:
session.run([tf.global_variables_initializer(), tf.tables_initializer()])
message_embeddings = session.run(embed(messages))
for i, message_embedding in enumerate(np.array(message_embeddings).tolist()):
print("Message: {}".format(messages[i]))
print("Embedding size: {}".format(len(message_embedding)))
message_embedding_snippet = ", ".join(
(str(x) for x in message_embedding[:3]))
print("Embedding: [{}, ...]\n".format(message_embedding_snippet))
答案 0 :(得分:0)
我对多语言句子编码器有类似的问题。我通过将tensorflow版本指定为1.14.0并将tf-sentencepiece指定为0.1.83来解决它,因此在colab中运行代码之前,请尝试:
!pip3 install tensorflow==1.14.0
!pip3 install tensorflow-hub
!pip3 install sentencepiece
!pip3 install tf-sentencepiece==0.1.83
我能够在colab中复制您的问题,并且此解决方案正确加载了模型:
这似乎是句子和张量流之间的兼容性问题,请检查有关此问题的here更新。 让我们知道怎么回事。祝您好运,我希望这会有所帮助。
编辑:如果tensorflow 1.14.0版本不起作用,请将其更改为1.13.1 。一旦确定了张量流与句子之间的兼容性,便应解决该问题。