TensorFlow多变量线性回归:总是预测0 - 我错过了什么?

时间:2017-09-05 15:24:40

标签: python machine-learning tensorflow linear-regression

我有一个数据集,我试图将线性回归模型拟合到。当我运行它时,它预测所有输入都为0,这让我觉得我做错了训练它,而不是输出都应该是0.

我的数据如下:

Year  Month  Day  DayOfWeek  Hour  Minute  RegisteredUsers  Temperature
2017    8    30       4       21    10        137539            16.8
2017    8    30       4       21    20        137539            16.8
2017    8    30       4       21    30        137539            16.8
2017    8    30       4       21    40        137539            16.8
2017    8    30       4       21    50        137539            16.8

我每行都有一个目标:

Target
 1.25
 1.25
 2.50
 3.00
 1.25

代码:

import numpy as np
import tensorflow as tf

seed = 128
rng = np.random.RandomState(seed)

# Load Training Data
train_x = get_training_data()
train_y = get_training_targets()

# Build Graph
n_features = train_x.shape[1]

x = tf.placeholder(tf.float32, [None, n_features])
y = tf.placeholder(tf.float32, [None, 1])

w = tf.Variable(tf.random_normal([n_features, 1]), name="w")
b = tf.Variable(tf.constant(0.1, shape=[]), name="b")

h = tf.add(tf.matmul(x, w), b)

batch_size = 1000
epochs = 5000
learning_rate = 0.2

cost = tf.reduce_mean(tf.square(tf.subtract(y, h)), name="cost")
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)

init = tf.initialize_all_variables()

with tf.Session() as sess:

    sess.run(init)

    # Train
    for epoch in range(epochs):
        avg_cost = 0.
        total_batch = int(train_x.shape[0]/batch_size)

        for i in range(total_batch):
            batch_x, batch_y = batch_creator(batch_size, train_x.shape[0], 'train')
            sess.run([cost, optimizer], feed_dict = {x: batch_x, y: batch_y})

        if (epoch + 1) % 1000 == 0:
            print "Epoch", (epoch + 1), "complete."

    print "Training complete!"

    # Load Test Data
    test_x = get_test_data()
    test_y = get_test_targets()

    # Check Accuracy
    pred_temp = tf.equal(h, y)
    accuracy = tf.reduce_mean(tf.cast(pred_temp, "float"))
    print "Validation Accuracy:", accuracy.eval({x: test_x.values, y: test_y.values})

# Prepare batches
def batch_creator(batch_size, dataset_length, dataset_name):
    batch_mask = rng.choice(dataset_length, batch_size)
    batch_x = eval(dataset_name + '_x').values[[batch_mask]]
    batch_y = eval(dataset_name + '_y').values[[batch_mask]]    
    return batch_x, batch_y

运行输出:

Epoch 1000 complete.
Epoch 2000 complete.
Epoch 3000 complete.
Epoch 4000 complete.
Epoch 5000 complete.
Training complete!
Validation Accuracy: 0.0

我想知道RegisteredUsers的列是否比其他列大得多,可能会略微偏移数据,但暂时忽略它会产生相同的结果。

对于我应该尝试获得有意义的结果的任何建议?

从评论编辑

我已将批量减小到128并使用了5个纪元。我没有确定准确性,而是在每个时期之后输出成本:

with tf.Session() as sess:

    sess.run(init)

    for epoch in range(epochs):
        avg_cost = 0.
        total_batch = int(train_x.shape[0]/batch_size)

        for i in range(total_batch):
            batch_x, batch_y = batch_creator(batch_size, train_x.shape[0], 'train')
            c, _ = sess.run([cost, optimizer], feed_dict = {x: batch_x, y: batch_y})

        print "Epoch", epoch, ": cost = ", c

    print "Training complete!"  

输出结果为:

Epoch 0 : cost =  9.60435e+08
Epoch 1 : cost =  3.88509e+08
Epoch 2 : cost =  6.77093e+07
Epoch 3 : cost =  1.84114e+07
Epoch 4 : cost =  1.18253e+07
Training complete!

思想?

0 个答案:

没有答案