特征选择pandas中的值错误

时间:2016-11-15 07:12:01

标签: python pandas machine-learning scikit-learn

我正在尝试在pandas上使用sklearn.feature_selection,但是获得了值错误。     它将在下面逐步解释。

Code:

from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2

print (best_features_train_df.columns)

Result:

Index(['cont1', 'cont2', 'cont3', 'cont4', 'cont5', 'cont6', 'cont7', 'cont8',
   'cont9', 'cont10', 'cont11', 'cont12', 'cont13', 'cont14', 'loss'],
  dtype='object')


Code:

array = best_features_train_df.values
X = array[:,0:14] #Selecting all the features from cont1 to cont14
Y = array[:,14] #Selecting the 'loss' which is the class variable
print (X)
print (Y)

Result:

[[ 0.726  0.246  0.188 ...,  0.595  0.822  0.715]
 [ 0.331  0.737  0.593 ...,  0.366  0.611  0.304]
 [ 0.262  0.358  0.484 ...,  0.373  0.196  0.774]
 ..., 
 [ 0.484  0.786  0.792 ...,  0.443  0.339  0.504]
 [ 0.438  0.422  0.299 ...,  0.853  0.655  0.722]
 [ 0.907  0.621  0.441 ...,  0.946  0.811  0.721]]

 [ 2213.18  1283.6   3005.09 ...,  5762.64  1562.87  4751.72]

Code:

test = SelectKBest(score_func=chi2, k=4)
fit = test.fit(X, Y)

Error:

ValueError: Unknown label type: (array([ 2213.18,  1283.6 ,  3005.09,  ...,  5762.64,  1562.87,  4751.72]),)

当我检查错误时,有人建议Y可能是2d列表,但它不是我错在哪里?请建议我。

1 个答案:

答案 0 :(得分:0)

问题在于你的得分_func'在SelectKBest中。

您的问题似乎是回归问题。你不能使用chi2来解决回归问题。您需要使用f_regression or mutual_info_regression.