如何在Keras模型Lambda层中预处理字符串?

时间:2019-04-04 14:00:25

标签: python tensorflow keras deep-learning

我有一个问题,即传递给Lambda层的值(在编译时)是由keras生成的占位符(无值)。编译模型时,.eval()方法将引发错误:

  

您必须使用dtype输入占位符张量'input_1'的值   字符串和形状[?,1]

def text_preprocess(x):
  strings = tf.keras.backend.eval(x)
  vectors = []
  for string in strings:
    vector = string_to_one_hot(string.decode('utf-8'))
    vectors.append(vector)
  vectorTensor = tf.constant(np.array(vectors),dtype=tf.float32)
  return vectorTensor

input_text = Input(shape=(1,), dtype=tf.string)
embedding = Lambda(text_preprocess)(input_text)
dense = Dense(256, activation='relu')(embedding)
outputs = Dense(2, activation='softmax')(dense)

model = Model(inputs=[input_text], outputs=outputs)
model.compile(loss='categorical_crossentropy',optimizer='adam', metrics=['accuracy'])
model.summary()
model.save('test.h5')

如果我将字符串数组静态传递到输入层,则可以编译模型,但是如果我想将模型转换为tflite,则会遇到相同的错误。

#I replaced this line:
input_text = Input(shape=(1,), dtype=tf.string)

#by this lines:
test = tf.constant(["Hello","World"])
input_text = Input(shape=(1,), dtype=tf.string, tensor=test)

#but calling this ...
converter = TFLiteConverter.from_keras_model_file('string_test.h5')
tfmodel = converter.convert()

#... still leads to this error:
  

InvalidArgumentError:必须输入占位符张量的值   dtype字符串和形状为[2] [[{{node input_3}}]]的'input_3'

1 个答案:

答案 0 :(得分:0)

好吧,我终于以这种方式解决了这个问题:

def text_preprocess(x):
  b = tf.strings.unicode_decode(x,'UTF-8')
  b = b.to_tensor(default_value=0)
  #do things with decoded string
  one_hot = K.one_hot(b,one_hot_size)
  return one_hot

...