TypeError:不可散列的类型:load_boston数据上的'slice'
我尝试了boston.iloc
和boston.loc
并得到了属性错误:iloc
from sklearn.datasets import load_boston
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
boston = load_boston()
print(boston.data.shape)
print("Data shape: {}".format(boston.data.shape))
print('The first few lines of data: {}'.format(boston.data[0:5,:]))
m = len(boston)
X = boston[:,0]
y = boston[:,1]
print("Number of examples: {}".format(m))
print("Shape of data : {}".format(X.shape))
print("Shape of labels : {}".format(y.shape))
答案 0 :(得分:0)
如果运行print(boston.keys())
,您将获得输出
>>> print(boston.keys())
dict_keys(['data', 'target', 'feature_names', 'DESCR', 'filename'])
您应该首先使用DataFrame
将数据转换为bos = pd.DataFrame(boston.data)
,如下所示:
>>> bos = pd.DataFrame(boston.data)
>>> print(bos.head())
0 1 2 3 4 5 6 7 8 9 10 11 12
0 0.00632 18.0 2.31 0.0 0.538 6.575 65.2 4.0900 1.0 296.0 15.3 396.90 4.98
1 0.02731 0.0 7.07 0.0 0.469 6.421 78.9 4.9671 2.0 242.0 17.8 396.90 9.14
2 0.02729 0.0 7.07 0.0 0.469 7.185 61.1 4.9671 2.0 242.0 17.8 392.83 4.03
3 0.03237 0.0 2.18 0.0 0.458 6.998 45.8 6.0622 3.0 222.0 18.7 394.63 2.94
4 0.06905 0.0 2.18 0.0 0.458 7.147 54.2 6.0622 3.0 222.0 18.7 396.90 5.33
然后您可能想知道为什么该列仅显示其索引而不显示其名称。事实证明列名不是直接嵌入的,并且回想一下,我们有列名列表。因此,让我们将索引转换为列名称:
bos.columns = boston.feature_names
print(bos.head())
给出输出:
>>> print(bos.head())
CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT
0 0.00632 18.0 2.31 0.0 0.538 6.575 65.2 4.0900 1.0 296.0 15.3 396.90 4.98
1 0.02731 0.0 7.07 0.0 0.469 6.421 78.9 4.9671 2.0 242.0 17.8 396.90 9.14
2 0.02729 0.0 7.07 0.0 0.469 7.185 61.1 4.9671 2.0 242.0 17.8 392.83 4.03
3 0.03237 0.0 2.18 0.0 0.458 6.998 45.8 6.0622 3.0 222.0 18.7 394.63 2.94
4 0.06905 0.0 2.18 0.0 0.458 7.147 54.2 6.0622 3.0 222.0 18.7 396.90 5.33
那我相信您希望PRICE
作为您的y
:
bos['PRICE'] = boston.target
print(bos.head())
具有输出
>>> print(bos.head())
CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT PRICE
0 0.00632 18.0 2.31 0.0 0.538 6.575 65.2 4.0900 1.0 296.0 15.3 396.90 4.98 24.0
1 0.02731 0.0 7.07 0.0 0.469 6.421 78.9 4.9671 2.0 242.0 17.8 396.90 9.14 21.6
2 0.02729 0.0 7.07 0.0 0.469 7.185 61.1 4.9671 2.0 242.0 17.8 392.83 4.03 34.7
3 0.03237 0.0 2.18 0.0 0.458 6.998 45.8 6.0622 3.0 222.0 18.7 394.63 2.94 33.4
4 0.06905 0.0 2.18 0.0 0.458 7.147 54.2 6.0622 3.0 222.0 18.7 396.90 5.33 36.2
然后最后将您的数据集分为X
和`Yy:
X = bos.drop('PRICE', axis = 1)
y = bos['PRICE']
下一部分是回归,但首先将数据进一步分为训练和测试集:
X_train, X_test, Y_train, Y_test = sklearn.cross_validation.train_test_split(X, y, test_size = 0.3, random_state = 5)
您可以打印出如下形状:
print(X_train.shape)
print(X_test.shape)
print(Y_train.shape)
print(Y_test.shape)
最后拟合线性模型:
from sklearn.linear_model import LinearRegression
lm = LinearRegression()
lm.fit(X_train, Y_train)
Y_pred = lm.predict(X_test)