在预测使用张量流时获得类似的数据预测

时间:2018-03-19 07:17:05

标签: python python-3.x tensorflow machine-learning data-science

我是机器学习的初学者,我正在研究一个简单的项目,用可用的数据here来预测家庭的用电量。

数据包括4年来每分钟的全球分钟平均有功功率。数据的负责人看起来像这样。

 Date      Time  Global_active_power  Global_reactive_power  Voltage  \
0  16/12/2006  17:24:00                4.216                  0.418   234.84   
1  16/12/2006  17:25:00                5.360                  0.436   233.63   
2  16/12/2006  17:26:00                5.374                  0.498   233.29   
3  16/12/2006  17:27:00                5.388                  0.502   233.74   
4  16/12/2006  17:28:00                3.666                  0.528   235.68   

   Global_intensity  Sub_metering_1  Sub_metering_2  Sub_metering_3  
0              18.4             0.0             1.0            17.0  
1              23.0             0.0             1.0            16.0  
2              23.0             0.0             2.0            17.0  
3              23.0             0.0             1.0            17.0  
4              15.8             0.0             1.0            17.0  

然后我合并了日期和时间列来创建一个日期时间对象,然后创建一个新的时间戳列并保存了' ts'和' Global_active_power'使用以下代码在不同的csv文件中。

df['Timestamp']=pd.to_datetime(df['Date']+ ' '+df['Time'])
df=df[['Global_active_power','ts']]
df['ts'] = df.Timestamp.values.astype(np.int64)
df.to_csv('final.csv')

final.csv的负责人看起来像这样

     Global_active_power           ts
0                4.216  1166289840000000000
1                5.360  1166289900000000000
2                5.374  1166289960000000000
3                5.388  1166290020000000000
4                3.666  1166290080000000000

然后我使用tensorflow创建了一个多层感知器模型,以使用以下代码在时间戳(使用时间戳ts作为特征和global_active_power作为标签)的基础上预测global_active_power。

import tensorflow as tf
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler

data=pd.read_csv('final.csv',dtype={'Global_active_power': 'float'})
data.fillna(0,inplace=True)
data=data.values
n = data.shape[0]
p = data.shape[1]
train_start = 0
train_end = int(np.floor(0.95*n))
test_start = train_end
test_end = n
data_train = data[np.arange(train_start, train_end), :]
data_test = data[np.arange(test_start, test_end), :]
scaler = MinMaxScaler()
scaler.fit(data_train)
data_train=scaler.transform(data_train)
data_test=scaler.transform(data_test)

X_train=data_train[:,1]
X_train=X_train.reshape(-1,1)
X_test=data_test[:,1]
X_test=X_test.reshape(-1,1)
Y_train=data_train[:,0]
Y_test=data_test[:,0]

X = tf.placeholder(dtype=tf.float32, shape=[None,1])
Y = tf.placeholder(dtype=tf.float32, shape=[None])

sigma = 1
weight_initializer = tf.variance_scaling_initializer(mode="fan_avg", distribution="uniform", scale=sigma)
bias_initializer = tf.zeros_initializer()

# Model architecture parameters
n_features = 1
n_neurons_1 = 1024
n_neurons_2 = 512
n_neurons_3 = 256
n_neurons_4 = 128
n_target = 1
# Layer 1: Variables for hidden weights and biases
W_hidden_1 = tf.Variable(weight_initializer([n_features, n_neurons_1]))
bias_hidden_1 = tf.Variable(bias_initializer([n_neurons_1]))
# Layer 2: Variables for hidden weights and biases
W_hidden_2 = tf.Variable(weight_initializer([n_neurons_1, n_neurons_2]))
bias_hidden_2 = tf.Variable(bias_initializer([n_neurons_2]))
# Layer 3: Variables for hidden weights and biases
W_hidden_3 = tf.Variable(weight_initializer([n_neurons_2, n_neurons_3]))
bias_hidden_3 = tf.Variable(bias_initializer([n_neurons_3]))
# Layer 4: Variables for hidden weights and biases
W_hidden_4 = tf.Variable(weight_initializer([n_neurons_3, n_neurons_4]))
bias_hidden_4 = tf.Variable(bias_initializer([n_neurons_4]))

# Output layer: Variables for output weights and biases
W_out = tf.Variable(weight_initializer([n_neurons_4, n_target]))
bias_out = tf.Variable(bias_initializer([n_target]))

# Hidden layer
hidden_1 = tf.nn.relu(tf.add(tf.matmul(X, W_hidden_1), bias_hidden_1))
hidden_2 = tf.nn.relu(tf.add(tf.matmul(hidden_1, W_hidden_2), bias_hidden_2))
hidden_3 = tf.nn.relu(tf.add(tf.matmul(hidden_2, W_hidden_3), bias_hidden_3))
hidden_4 = tf.nn.relu(tf.add(tf.matmul(hidden_3, W_hidden_4), bias_hidden_4))

# Output layer (must be transposed)
out = tf.transpose(tf.add(tf.matmul(hidden_4, W_out), bias_out))

# Cost function
mse = tf.reduce_mean(tf.squared_difference(out, Y))

# Optimizer
opt = tf.train.AdamOptimizer().minimize(mse)

sess = tf.Session()


epochs=1
batch_size=256


sess.run(tf.global_variables_initializer())

for epoch in range(epochs):
    shuffle_indices = np.random.permutation(np.arange(len(Y_train)))
    X_train = X_train[shuffle_indices]
    Y_train = Y_train[shuffle_indices]
    epoch_loss=0
    for i in range(0, len(Y_train)//batch_size):
        start=i*batch_size
        batch_x=X_train[start:start+batch_size]
        batch_y=Y_train[start:start+batch_size]
        _,c=sess.run([opt,mse],feed_dict={X:batch_x,Y:batch_y})
        epoch_loss+=c
    print('Epoch',epoch,'completed out of',epochs,'loss:',epoch_loss)

mse_final=sess.run(mse, feed_dict={X: X_test, Y: Y_test})
print('Final Error',mse_final)

我只运行了一个时代,因为运行一个时代花了太长时间,每个时代之后的纪元损失减少非常微不足道。

运行以下代码后,我得到了这个输出

Epoch 0 completed out of 1 loss: 61.614460044074804
Final Error 0.008720885

但是当我看到预测输出并将其与Y_test进行比较时,预测输出对于每个元素几乎具有相同的值。

pred = sess.run(out, feed_dict={X: X_test})

0 个答案:

没有答案