如何使用Tflearn构建单词嵌入模型?

时间:2018-04-07 11:56:15

标签: tensorflow machine-learning nlp tflearn word-embedding

  

更新

我正在研究使用Tflearn进行答案匹配分数预测的嵌入模型。我必须使用tflearn dnn分类器使用句子向量构建模型,现在我必须在dnn模型中添加一个单词嵌入层。怎么做?提前谢谢。

  

“JVMdefines”:使计算机能够运行Java程序

被隐瞒为:

  

“JVMdefines”:[[list([0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)

     

使计算机能够运行Java程序:   list([0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ,0,0,0,0,0,0,0,0,0,0,0,0))]

我的问题:机器是否能够分析任何方法。

  

使“机器”能够运行Java程序

即它可以检测计算机和机器的含义相同。

1 个答案:

答案 0 :(得分:2)

我会发表澄清评论,但我没有足够的声誉这样做,所以我会尝试回答你在原问题中提供的信息......

您的问题似乎不清楚,但是您可以通过以下方式解决tflearn中的二进制分类问题。

第1步:预处理

您需要做的第一件事就是将您的句子标记化并转换为整数列表:

“你喜欢什么样的食物?” ---> [234,64,12,5224,43,96,23]

然后,大多数人将他们的序列填充到相同的长度,切断长的序列或通过用0填充来增加短序列的长度。

[234,64,12,5224,43,96,23] ---> [0,0,0,0 .... 234,64,12,5224,43,96,23]

提示:

from tflearn.data_utils import pad_sequences
padded = pad_sequences(unpadded, maxlen=max_document_length, value=0.)

第2步:模型构建

将所有文本转换为整数序列后,即可构建模型。请注意,我们的输入形状是[None,max_document_length]。 None表示可选大小(允许变量批量大小),max_document_length是我们之前填充的序列的长度。

#Create our model
network = input_data(shape=[None, max_document_length], name='input')

创建嵌入矩阵。请注意,您将嵌入矩阵推送到CPU。输入dim参数正在查找表示词汇量大小的整数。 output_dim是嵌入的大小。

with tf.device('/cpu:0'):
    network = tflearn.embedding(network, input_dim=vocabulary_size, output_dim=128)

#Pass embeddings into an lstm layer (handles sequential problems)
network = tflearn.lstm(network, 512, dropout=0.8)

#Squish data into a fully connected layer, with 2 outputs for binary classification
network = tflearn.fully_connected(network, 2, activation='softmax')

#Perform regression to get the final anaswer
network = tflearn.regression(network, optimizer='rmsprop', learning_rate=0.001,
                         loss='categorical_crossentropy')

#Wrap the graph we just created in a tflearn DNN wrapper
model = tflearn.DNN(network)

#Run model.fit to actually train your model
model.fit(x_train, y_train, n_epoch=15, shuffle=True, validation_set=(x_val, y_val), show_metric=True, batch_size=batch_size)