我将针对加利福尼亚住房数据集(来源:https://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html)执行ShuffleSplit()
方法以适应SGD回归。
但是,应用方法时会发生“ n_splits”错误。
代码如下:
from sklearn import cross_validation, grid_search, linear_model, metrics
import numpy as np
import pandas as pd
from sklearn.preprocessing import scale
from sklearn.cross_validation import ShuffleSplit
housing_data = pd.read_csv('cal_housing.csv', header = 0, sep = ',')
housing_data.fillna(housing_data.mean(), inplace=True)
df=pd.get_dummies(housing_data)
y_target = housing_data['median_house_value'].values
x_features = housing_data.drop(['median_house_value'], axis = 1)
from sklearn.cross_validation import train_test_split
from sklearn import model_selection
train_x, test_x, train_y, test_y = model_selection.train_test_split(x_features, y_target, test_size=0.2, random_state=4)
reg = linear_model.SGDRegressor(random_state=0)
cv = ShuffleSplit(n_splits = 10, test_size = 0.2, random_state = 0)
错误如下:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-22-8f8760b04f8c> in <module>()
----> 1 cv = ShuffleSplit(n_splits = 10, test_size = 0.2, random_state = 0)
TypeError: __init__() got an unexpected keyword argument 'n_splits'
我用 0.18版本更新了scikit-learn。
Anaconda版本: 4.5.8
请您提供有关此问题的建议?
答案 0 :(得分:0)
您正在混淆两个不同的模块。
在0.18之前,cross_validation用于ShuffleSplit。因此,n_splits
不存在。 n
用于定义拆分次数
但是,由于您现在已更新到0.18,因此不推荐使用cross_validation
和grid_search
来支持model_selection。
docs here中已提及,这些模块将从版本0.20中删除
所以代替这个:
from sklearn.cross_validation import ShuffleSplit
from sklearn.cross_validation import train_test_split
执行以下操作:
from sklearn.model_selection import ShuffleSplit
fro
m sklearn.model_selection导入train_test_split
然后您可以使用n_splits
。
cv = ShuffleSplit(n_splits = 10, test_size = 0.2, random_state = 0)