我有一个数据集,我试图将线性回归模型拟合到。当我运行它时,它预测所有输入都为0,这让我觉得我做错了训练它,而不是输出都应该是0.
我的数据如下:
Year Month Day DayOfWeek Hour Minute RegisteredUsers Temperature
2017 8 30 4 21 10 137539 16.8
2017 8 30 4 21 20 137539 16.8
2017 8 30 4 21 30 137539 16.8
2017 8 30 4 21 40 137539 16.8
2017 8 30 4 21 50 137539 16.8
我每行都有一个目标:
Target
1.25
1.25
2.50
3.00
1.25
代码:
import numpy as np
import tensorflow as tf
seed = 128
rng = np.random.RandomState(seed)
# Load Training Data
train_x = get_training_data()
train_y = get_training_targets()
# Build Graph
n_features = train_x.shape[1]
x = tf.placeholder(tf.float32, [None, n_features])
y = tf.placeholder(tf.float32, [None, 1])
w = tf.Variable(tf.random_normal([n_features, 1]), name="w")
b = tf.Variable(tf.constant(0.1, shape=[]), name="b")
h = tf.add(tf.matmul(x, w), b)
batch_size = 1000
epochs = 5000
learning_rate = 0.2
cost = tf.reduce_mean(tf.square(tf.subtract(y, h)), name="cost")
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
init = tf.initialize_all_variables()
with tf.Session() as sess:
sess.run(init)
# Train
for epoch in range(epochs):
avg_cost = 0.
total_batch = int(train_x.shape[0]/batch_size)
for i in range(total_batch):
batch_x, batch_y = batch_creator(batch_size, train_x.shape[0], 'train')
sess.run([cost, optimizer], feed_dict = {x: batch_x, y: batch_y})
if (epoch + 1) % 1000 == 0:
print "Epoch", (epoch + 1), "complete."
print "Training complete!"
# Load Test Data
test_x = get_test_data()
test_y = get_test_targets()
# Check Accuracy
pred_temp = tf.equal(h, y)
accuracy = tf.reduce_mean(tf.cast(pred_temp, "float"))
print "Validation Accuracy:", accuracy.eval({x: test_x.values, y: test_y.values})
# Prepare batches
def batch_creator(batch_size, dataset_length, dataset_name):
batch_mask = rng.choice(dataset_length, batch_size)
batch_x = eval(dataset_name + '_x').values[[batch_mask]]
batch_y = eval(dataset_name + '_y').values[[batch_mask]]
return batch_x, batch_y
运行输出:
Epoch 1000 complete.
Epoch 2000 complete.
Epoch 3000 complete.
Epoch 4000 complete.
Epoch 5000 complete.
Training complete!
Validation Accuracy: 0.0
我想知道RegisteredUsers
的列是否比其他列大得多,可能会略微偏移数据,但暂时忽略它会产生相同的结果。
对于我应该尝试获得有意义的结果的任何建议?
我已将批量减小到128并使用了5个纪元。我没有确定准确性,而是在每个时期之后输出成本:
with tf.Session() as sess:
sess.run(init)
for epoch in range(epochs):
avg_cost = 0.
total_batch = int(train_x.shape[0]/batch_size)
for i in range(total_batch):
batch_x, batch_y = batch_creator(batch_size, train_x.shape[0], 'train')
c, _ = sess.run([cost, optimizer], feed_dict = {x: batch_x, y: batch_y})
print "Epoch", epoch, ": cost = ", c
print "Training complete!"
输出结果为:
Epoch 0 : cost = 9.60435e+08
Epoch 1 : cost = 3.88509e+08
Epoch 2 : cost = 6.77093e+07
Epoch 3 : cost = 1.84114e+07
Epoch 4 : cost = 1.18253e+07
Training complete!
思想?