无法使用sklearn创建测试和训练集

时间:2018-07-02 20:32:55

标签: python scikit-learn

这是我一直在努力的代码。

import pandas as pd
import numpy as np
from sklearn.datasets import load_boston
housing_data = load_boston()

from sklearn.model_selection import train_test_split
train_set, test_set = train_test_split(housing_data, test_size = 0.2, random_state = 42)

我得到这个错误。

/anaconda3/lib/python3.6/site-packages/sklearn/model_selection/_split.py in train_test_split(*arrays, **options)
   2057 
   2058     return list(chain.from_iterable((safe_indexing(a, train),
-> 2059                                      safe_indexing(a, test)) for a in arrays))
   2060 
   2061 

/anaconda3/lib/python3.6/site-packages/sklearn/model_selection/_split.py in <genexpr>(.0)
   2057 
   2058     return list(chain.from_iterable((safe_indexing(a, train),
-> 2059                                      safe_indexing(a, test)) for a in arrays))
   2060 
   2061 

/anaconda3/lib/python3.6/site-packages/sklearn/utils/__init__.py in safe_indexing(X, indices)
    162             return X[indices]
    163     else:
--> 164         return [X[idx] for idx in indices]
    165 
    166 

/anaconda3/lib/python3.6/site-packages/sklearn/utils/__init__.py in <listcomp>(.0)
    162             return X[indices]
    163     else:
--> 164         return [X[idx] for idx in indices]
    165 
    166 

KeyError: 3

1 个答案:

答案 0 :(得分:4)

如果您查看documentation for load_boston(),则会看到它返回一个Bunch对象。如果您在Spyder的变量资源管理器中检查该对象,则可以看到它包含描述,实际数据(可以从中进行预测的特征),每个特征的标签以及包含您要尝试的值的目标向量进行预测。

load_boston

如果您只想获取数据部分(用于预测的数据特征),则可以运行以下命令:

train_set, test_set = train_test_split(housing_data.data, test_size = 0.2, random_state = 42)

或者,您可以使用以下方法为X和y(特征和目标)创建训练和测试集:

X_train, X_test, y_train, y_test = train_test_split(housing_data.data, housing_data.target, test_size = 0.2, random_state = 42)

哪个会产生以下变量集:

boston train_test_split

编辑:如果您使用return_X_y = True参数调用load_boston(),它将返回一个(data, target)的元组,使您可以执行以下操作,这可能会更优雅:

X, y = load_boston(return_X_y = True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42)