我正在建立一个RNN模型来预测恶意软件的恶意程度。数据集的格式为[malicisouness(0-1),[features*480]]
,功能非常稀疏,有18个月的数据,按月排序。
我正在尝试从每个月输入一个数据条目,并按时间顺序输入18个数据条目。我希望RNN输出最近一个月(第18个月)的恶意程度并根据它计算损失。
以下是我正在使用的代码,但我无法正确获得输入和张量的形状。
n_steps = 18
n_inputs = 480
n_neurons = 100
n_outputs = 1
n_epochs = 20
batch_size = 50
learning_rate = 0.01
test_Y = np.empty([1065])
train_Y = np.empty([1065])
for i in range(17, len(testY), 18):
np.append(test_Y, testY[i])
np.append(train_Y, trainY[i])
with tf.name_scope("Variable"):
X = tf.placeholder(tf.float32, [None, n_steps, n_inputs], name = "X")
y = tf.placeholder(tf.float32, [None], name = "Y")
weights = tf.Variable(tf.random_normal([n_steps, n_outputs]))
bias = tf.Variable(tf.random_normal([n_outputs]))
with tf.name_scope("RNN"):
lstm_cell = tf.contrib.rnn.LSTMCell(num_units = n_neurons, use_peepholes = True)
rnn_outputs, states = tf.nn.dynamic_rnn(lstm_cell, X, dtype=tf.float32)
stacked_rnn_outputs = tf.reshape(rnn_outputs, [-1, n_neurons])
stacked_outputs = tf.layers.dense(stacked_rnn_outputs, n_outputs,name = "reshape")
outputs = tf.reshape(stacked_outputs, [-1, n_steps])
out = tf.matmul(outputs, weights) + bias
out = tf.unstack(out, axis = 1)
with tf.name_scope("cost"):
loss = tf.reduce_mean(tf.abs(y-out))
with tf.name_scope('train'):
optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate,name = "optimizer")
training_op = optimizer.minimize(loss)
init = tf.global_variables_initializer()
with tf.Session() as sess:
init.run()
n = trainX.shape[0]
for epoch in range(n_epochs):
train_cost = 0
for start, end in zip(range(0, n, batch_size*n_steps), range(batch_size*n_steps, n, batch_size*n_steps)):
y_start = int(start/(batch_size*n_steps))
y_end = int(y_start + batch_size)
X_batch, y_batch = trainX[start:end], train_Y[y_start:y_end]
X_batch = X_batch.reshape((-1, n_steps, n_inputs))
_, l = sess.run([training_op,loss], feed_dict = {X: X_batch, y: y_batch})
train_cost += l
print(epoch, "Train cost:", train_cost/(n//batch_size))
此rnn的输出是:
0火车费用:南郎
1列车费用:南郎
2列车费用:南郎
显然,输入输入不正确,但我不知道如何做正确的事。