目标：

Question

目标：

从作者样式生成文本。

输入：作者努力训练，预测的种子

输出：从该种子生成的文本

关于keras中嵌入层的问题：

我有原始文本，一个包含几千行文本的平面文本文件。我想将它输入到嵌入层，以便keras对数据进行矢量化。以下是我的文字：

--SNIP
The Wild  West\n Ha ha, ride\n All you see is the sun reflectin\' off of the
--SNIP

and I call it input_text:

num_words = 2000#get 2000 words
tok = Tokenizer(num_words)#tokenize the words
tok.fit_on_texts(input_text)#takes in list of text to train on
#put all words from text into a words array
#this is essentially enumerating them
words = []
for iter in range(num_words):
    words += [key for key,value in tok.word_index.items() if value==iter+1]

#words[:10]
#Class for vectorizing texts, or/and turning texts into sequences 
#(=list of word indexes, where the word of rank i in the dataset (starting at 1) has index i).
X_train = tok.texts_to_sequences(input_text)#turns text to sequence, stating which word comes in what place
X_train = sequence.pad_sequences(X_train, maxlen=100)#pad sequence, essentially padding it with 0's at the end
y_train = words

问题：

似乎我的代码将接受序列，然后当我应用填充时，它只给出序列的前100个。我应该如何分开？

我应该采取整个序列并完成前100个单词（X），然后给出下一个单词（Y）并沿途做一些跳过吗？

我希望输出是下一个单词出现的概率。所以我最后有一个softmax层。基本上我想从种子生成文本。这是正确的做法吗？或者它只是更好

Answer 1

我认为你不会在这个页面here找到更好的答案，顺便提一下github上的代码，潜入或提出更多问题。

Keras - 文本预处理

目标：

从作者样式生成文本。

关于keras中嵌入层的问题：

问题：

1 个答案: