更改单个测试数据时,线性回归预测不变

时间:2019-06-04 22:25:26

标签: machine-learning scikit-learn

我已经使用scikitlearn建立了线性模型,并且每次都希望进行一次预测。但是当我更改测试数据时,预测不会改变。我该怎么办?

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn import linear_model
from sklearn import preprocessing

X = [[1, 55, 207], [0, 0, 65], [2, 8, 67], [2, 31, 270], [0, 5, 73], [0, 2, 98], [0, 0, 65], [0, 0, 115], [2, 0, 65], [2, 0, 67], [2, 7, 64], [2, 7, 66], [2, 10, 67], [2, 7, 66], [2, 9, 67], [3, 0, 115], [1, 3, 67], [1, 0, 51], [0, 0, 17], [2, 7, 68], [2, 8, 67], [2, 7, 67], [2, 16, 0], [1, 16, 45], [2, 11, 80], [2, 9, 78], [1, 8, 67], [0, 0, 43], [0, 0, 47], [2, 0, 72], [0, 0, 41], [0, 0, 43], [0, 0, 115], [0, 0, 361], [0, 0, 50], [0, 0, 43], [1, 15, 54], [0, 0, 43], [2, 0, 63], [1, 0, 56], [0, 0, 58], [0, 0, 45], [0, 0, 165], [3, 0, 115], [0, 0, 52], [0, 0, 67]]

y = [1690000000, 360000000, 400000000, 4860000000, 460000000, 640000000, 370000000, 1000000000, 360000000, 340000000, 400000000, 390000000, 375000000, 390000000, 375000000, 977500000, 800000000, 331500000, 350000000, 370000000, 370000000, 370000000, 380000000, 185000000, 300000000, 750000000, 301500000, 117000000, 155000000, 310000000, 2170000000, 116000000, 345000000, 1700000000, 287000000, 160000000, 235000000, 217000000, 215000000, 172000000, 312000000, 277000000, 1200000000, 977500000, 240000000, 340000000]


means = list(map(lambda x: sum(x)/float(len(x)), zip(*X)))
new_y = []
for i in range(len(X)):

    new_y.append(np.log(y[i]))
    if X[i][1] == 0:
        X[i][1] = means[1]
    if X[i][2] == 0:
        X[i][2] = means[2]
    if X[i][0] == 0 and X[i][1] < 60:
        X[i][0] = 1
    elif X[i][0] == 0 and X[i][1] < 120:
        X[i][0] =2
    elif  X[i][0] == 0 and X[i][1] > 120:
        X[i][0] = 2.5


X = preprocessing.scale(X)
X_train, X_test, y_train, y_test = train_test_split(X, new_y, test_size=0.30, random_state=5)

model = linear_model.Ridge(alpha=0.1)
model.fit(X_train, y_train)
my_x = [[2, 5, 120]]
my_x = preprocessing.scale(my_x)

prediction = model.predict(my_x)
prediction = np.exp(prediction)
print(int(prediction))

输出为385349681,对于my_x = [[2,5,270]],输出也为385349681。 这些是46所房屋的数据。 y是价格,X包括房间数量,年龄和建筑物面积。

1 个答案:

答案 0 :(得分:1)

我明白了,这是因为对测试数据进行了预处理。它使my_x都为零。

basinhopping