将带有“令牌”签名的令牌化输入传递给ElMo时出现错误

时间:2019-05-28 07:12:10

标签: python-3.x tensorflow elmo

我是数据科学的新手,我正在探索ElMo嵌入技术的概念。 调用elmo方法时,我使用了signature = tokens选项。但是,当我传递给ElMo的输入字符串张量具有两个以上的元素时,我会得到关于张量形状的错误。

我正在使用Tensorflow版本1.13.1。我尝试更改输入,但仅在字符串的长度为2个元素时才有效。 我无法找出形状不匹配的地方。有人可以提供对此错误的解决方案,也可以解释为什么此“令牌”签名以这种方式起作用吗?

import tensorflow_hub as hub
import tensorflow as tf
import numpy as np
elmo = hub.Module("https://tfhub.dev/google/elmo/2", trainable=True)

tokens_input = [["the", "cat", "is", "on", "the", "mat"],["hello"],
                ["dogs", "are", "in", "the", "fog", ""],["happiness", 
               "cannot", "be", "bought", "by", "us"]]

print(tokens_input,type(tokens_input))
tokens_length = len(tokens_input)
print(tokens_length)

#get the maximum length of elements in tokens_input
strlen = len(max(tokens_input, key=len))
print(strlen)

#append array elements which have lower length than strlen with space
for i in range(tokens_length):
  if len(tokens_input[i])<strlen:
    for j in range(strlen-len(tokens_input[i])):
      tokens_input[i].insert(j,"null")
np_arr = np.array(tokens_input)
print("np_arr details are ",np_arr,np_arr.shape,np_arr.dtype) 

tokens_shp = tf.strings.length(np_arr).shape
print("Tokens shape",tokens_shp)

embeddings = elmo(inputs={
        "tokens": tokens_input,
        "sequence_len": tokens_shp
        },signature="tokens",
        as_dict=True)["elmo"]

我的代码中elmo方法的输入字符串是tokens_input。我可以通过将空字符串附加到数组中来获得tokens_input的正确形状(在这种情况下为((4,6))) 但是当我运行上面的代码时,出现以下错误:

InvalidArgumentError                      Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/importer.py in import_graph_def(graph_def, input_map, return_elements, name, op_dict, producer_op_list)
    425         results = c_api.TF_GraphImportGraphDefWithResults(
--> 426             graph._c_graph, serialized, options)  # pylint: disable=protected-access
    427         results = c_api_util.ScopedTFImportGraphDefResults(results)

InvalidArgumentError: Dimension 0 in both shapes must be equal, but are 2 and 4. Shapes are [2] and [4]. for 'module_apply_tokens/bilm/RNN_0/RNN/MultiRNNCell/Cell0/rnn/while/Select' (op: 'Select') with input shapes: [2], [4,512], [?,512].

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
7 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/importer.py in import_graph_def(graph_def, input_map, return_elements, name, op_dict, producer_op_list)
    428       except errors.InvalidArgumentError as e:
    429         # Convert to ValueError for backwards compatibility.
--> 430         raise ValueError(str(e))
    431 
    432     # Create _DefinedFunctions for any imported functions.

ValueError: Dimension 0 in both shapes must be equal, but are 2 and 4. Shapes are [2] and [4]. for 'module_apply_tokens/bilm/RNN_0/RNN/MultiRNNCell/Cell0/rnn/while/Select' (op: 'Select') with input shapes: [2], [4,512], [?,512].

0 个答案:

没有答案