Question

我有一些自定义嵌入14gb和一个词汇表3gb。我创建了一个tf图来提取嵌入内容。如何确保在为该模型提供服务时，嵌入和词汇表都已加载到内存中，可以处理其余的API请求？

sentences = tf.placeholder(shape=[None], dtype=tf.string, name="sentences")

mapping_strings = tf.constant(vocabList)
table =           tf.contrib.lookup.index_table_from_tensor(mapping=mapping_strings,
                                              num_oov_buckets=1,
                                              default_value=0
                                             )

words = tf.string_split(sentences," ")

emd = table.lookup(words)

emd = tf.cast(emd,dtype=tf.int32)
dense_word_indices = tf.sparse.to_dense(emd)
dense_word_indices = tf.cast(dense_word_indices,dtype=tf.int32)

hashed_word_indices= tf.map_fn(add_n_grams,
                            dense_word_indices,
                            back_prop=False,
                            dtype=tf.int32
                           )
embedding_weights = tf.Variable(tf.constant(0.0, shape=[5002889, 700]),trainable=False, name="embedding_weights") 
embedding_placeholder = tf.placeholder(tf.float32, [5002889,700])
embedding_init = embedding_weights.assign(embedding_placeholder)

embeddings = tf.nn.embedding_lookup(embedding_init,hashed_word_indices)

with tf.Session() as sess:

sess.run([tf.global_variables_initializer(), tf.tables_initializer()])
embedding = sess.run(embeddiings,
                     feed_dict={ 
                         sentences: test_sentences,
                         }
                    )


inputs = {
        "sentences": sentences
    }

outputs = {"sentence_embeddings": output}
export_path= "./"
tf.saved_model.simple_save(sess,
                           export_path,
                           inputs=inputs,
                           outputs=outputs,
                           legacy_init_op = tf.tables_initializer()
                          )

我可以使用docker加载模型，但出现以下错误：

{ "error": "You must feed a value for placeholder tensor    \'Placeholder\' with dtype float and shape [5002889,700]\n\t [[{{node Placeholder}} = Placeholder[_output_shapes=[[5002889,700]], dtype=DT_FLOAT, shape=[5002889,700], _device=\"/job:localhost/replica:0/task:0/device:CPU:0\"]()]]" }

这是由于我没有通过占位符传递嵌入。是否可以在启动容器时将其加载或将其包含在原始模型中？

如何在张量流服务中服务自定义大词嵌入？

0 个答案: