我目前正在运行以下代码,以便根据6个参数预测房屋的价格:
import pandas as pd
import tensorflow as tf
import numpy as np
housing = pd.read_csv('cal_housing_clean.csv')
X = housing.iloc[:,0:6]
y = housing.iloc[:,6:]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.3)
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaler.fit(X_train)
X_train = pd.DataFrame(data=scaler.transform(X_train),columns = X_train.columns,index=X_train.index)
X_test = pd.DataFrame(data=scaler.transform(X_test),columns = X_test.columns,index=X_test.index)
X_data = tf.placeholder(dtype = "float", shape=[None,6])
y_target = tf.placeholder(dtype = "float", shape=[None,1])
hidden_layer_nodes = 10
w1 = tf.Variable(tf.random_normal(shape=[6,hidden_layer_nodes]))
b1 = tf.Variable(tf.random_normal(shape=[hidden_layer_nodes]))
w2 = tf.Variable(tf.random_normal(shape=[hidden_layer_nodes,1]))
b2 = tf.Variable(tf.random_normal(shape=[1]))
hidden_output = tf.nn.relu(tf.add(tf.matmul(X_data,w1),b1))
y_output = tf.add(tf.matmul(hidden_output,w2),b2)
loss = tf.reduce_mean(tf.square(y_target-y_output))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.00001)
train = optimizer.minimize(loss)
init = tf.global_variables_initializer()
steps = 100000
with tf.Session() as sess:
sess.run(init)
for i in range(steps):
sess.run(train, feed_dict={X_data:X_train,y_target:y_train})
if i%500 == 0:
print('Currently on step {}'.format(i))
training_cost = sess.run(loss, feed_dict={X_data:X_test,y_target:y_test})
print("Training cost=", training_cost/6192)
training_cost = sess.run(loss, feed_dict={X_data:X_test,y_target:y_test})
print("Training cost=", training_cost/6192)
我在此认为,由于test_set包含6192行数据,简单地将总损失或错误除以该值就可以解决问题,但不幸的是我达到了以下输出:
Currently on step 0
Training cost= 9190063.95866
Currently on step 500
Training cost= 9062077.85013
Currently on step 1000
Training cost= 8927415.89664
Currently on step 1500
Training cost= 8795428.38243
Currently on step 2000
Training cost= 8666037.25065
Currently on step 2500
Training cost= 8539182.30491
Currently on step 3000
Training cost= 8414841.71576
其中错误将降至约200万,而我希望值接近100或20万。
我的代码中可能存在错误,使得近似值非常糟糕。我也尝试了不同的learning_rates,结果相同。
我还想尝试通过分批发送测试数据来测试模型。我试过这个:
if i%500 == 0:
rand_ind = np.random.randint(len(X_test),size=8)
feed = {X_data:X_test[rand_ind],y_target:y_test[rand_ind]}
loss = tf.reduce_sum(tf.square(y_target-y_output)) / 8
print(sess.run(loss,feed_dict=feed))
但遗憾的是我总是被告知索引,我选择使用rand_ind"而不是索引"。
答案 0 :(得分:1)
您可以尝试使用tf.train.AdamOptimizer并提高学习率(可能大约为0.1)。这将提高收敛速度。