我不熟悉tensorflow和更高级的机器学习,因此我尝试通过手动实现而不是使用tf.contrib.rnn.RNNCell更好地掌握RNN。我的第一个问题是,我需要展开网络进行反向传播,因此需要遍历整个序列,并且需要保持一致的权重和偏差,因此无法每次都使用tf.layers.dense重新初始化密集层,但是我也需要将我的图层连接到序列的当前时间步,并且我找不到改变密集图层所连接内容的方法。要解决此问题,我尝试实现自己的tf.layers.dense版本,在我收到错误消息之前,它工作正常:NotImplementedError(“试图更新Tensor” ...),当我尝试优化自定义密集层时
我的代码:
import tensorflow as tf
import numpy as np
from tensorflow.contrib import rnn
import random
# -----------------
# WORD PARAMETERS
# -----------------
target_string = ['Hello ','Hello ','World ','World ', '!']
number_input_words = 1
# --------------------------
# TRAINING HYPERPARAMETERS
# --------------------------
training_steps = 4000
batch_size = 9
learning_rate = 0.01
display_step = 150
hidden_cells = 20
# ----------------------
# PREPARE DATA AS DICT
# ----------------------
# TODO AUTOMATICALLY CREATE DICT
dictionary = {'Hello ': 0, 'World ': 1, '!': 2}
reverse_dictionary = dict(zip(dictionary.values(), dictionary.keys()))
vocab_size = len(dictionary)
# ------------
# LSTM MODEL
# ------------
class LSTM:
def __init__(self, sequence_length, number_input_words, hidden_cells, mem_size_x, mem_size_y, learning_rate):
self.sequence = tf.placeholder(tf.float32, (sequence_length, vocab_size), 'sequence')
self.memory = tf.zeros([mem_size_x, mem_size_y])
# sequence_length = self.sequence.shape[0]
units = [vocab_size, 5,4,2,6, vocab_size]
weights = [tf.random_uniform((units[i-1], units[i])) for i in range(len(units))[1:]]
biases = [tf.random_uniform((1, units[i])) for i in range(len(units))[1:]]
self.total_loss = 0
self.outputs = []
for word in range(sequence_length-1):
sequence_w = tf.reshape(self.sequence[word], [1, vocab_size])
layers = []
for i in range(len(weights)):
if i == 0:
layers.append(tf.matmul(sequence_w, weights[0]) + biases[0])
else:
layers.append(tf.matmul(layers[i-1], weights[i]) + biases[i])
percentages = tf.nn.softmax(logits=layers[-1])
self.outputs.append(percentages)
self.total_loss += tf.losses.absolute_difference(tf.reshape(self.sequence[word+1], (1, vocab_size)), tf.reshape(percentages, (1, vocab_size)))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
self.train_operation = optimizer.minimize(loss=self.total_loss, var_list=weights+biases, global_step=tf.train.get_global_step())
lstm = LSTM(len(target_string), number_input_words, hidden_cells, 10, 5, learning_rate)
# ---------------
# START SESSION
# ---------------
with tf.Session() as sess:
sess.run(tf.local_variables_initializer())
sess.run(tf.global_variables_initializer())
sequence = []
for i in range(len(target_string)):
x = [0]*vocab_size
x[dictionary[target_string[i]]] = 1
sequence.append(x)
print(sequence)
for x in range(1000):
sess.run(lstm.train_operation, feed_dict={lstm.sequence: sequence})
prediction, loss = sess.run((lstm.outputs, lstm.total_loss), feed_dict= {lstm.sequence: sequence})
print(prediction)
print(loss)
任何回答告诉我如何将tf.layers.dense每次连接到不同变量,或者告诉我如何解决NotImplementedError的问题将不胜感激。对于这个问题冗长或措辞不佳,我深表歉意,我仍然是stackoverflow的新手。
编辑:
我已经将代码的LSTM类部分更新为: (在def init 内部)
self.sequence = [tf.placeholder(tf.float32, (batch_size, vocab_size), 'sequence') for _ in range(sequence_length-1)]
self.total_loss = 0
self.outputs = []
rnn_cell = rnn.BasicLSTMCell(hidden_cells)
h = tf.zeros((batch_size, hidden_cells))
for i in range(sequence_length-1):
current_sequence = self.sequence[i]
h = rnn_cell(current_sequence, h)
self.outputs.append(h)
但是,我仍然在网上遇到一个错误:h = rnn_cell(current_sequence, h)
关于无法迭代张量。我不是要遍历任何张量,如果我是我,就不是故意的。
答案 0 :(得分:0)
因此,有一种解决此问题的标准方法(这是我所了解的最好方法),而不是尝试创建密集层的新列表。请执行下列操作。在此之前,假设您的隐藏层大小为h_dim
,展开步骤数为num_unroll
,批处理大小为batch_size
h = tf.zeros(...)
outputs= []
for ui in range(num_unroll):
out, state = rnn_cell(x[ui],state)
outputs.append(out)
现在将所有outputs
合并为单个张量[batch_size*num_unroll, h_dim]
通过单个[h_dim, num_classes]
logits = tf.matmul(tf.concat(outputs,...), w) + b
predictions = tf.nn.softmax(logits)
您现在拥有所有展开输入的登录信息。现在,只需将张量重塑为[batch_size, num_unroll, num_classes]
张量即可。
已编辑(数据输入):数据将以num_unroll
许多占位符的列表形式显示。所以,
x = [tf.placeholder(shape=[batch_size,3]...) for ui in range(num_unroll)]
现在说您拥有如下数据,
Hello world bye
Bye hello world
此处批处理大小为2,序列长度为3。转换为一种热编码后,您的数据如下所示(形状[time_steps, batch_size, 3]
。
data = [ [ [1,0,0], [0,0,1] ], [ [0,1,0], [1,0,0] ], [ [0,0,1], [0,1,0] ] ]
现在以以下格式输入数据。
feed_dict = {}
for ui in range(3):
feed_dict[x[ui]] = data[ui]