我正在尝试在pandas上使用sklearn.feature_selection,但是获得了值错误。 它将在下面逐步解释。
Code:
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2
print (best_features_train_df.columns)
Result:
Index(['cont1', 'cont2', 'cont3', 'cont4', 'cont5', 'cont6', 'cont7', 'cont8',
'cont9', 'cont10', 'cont11', 'cont12', 'cont13', 'cont14', 'loss'],
dtype='object')
Code:
array = best_features_train_df.values
X = array[:,0:14] #Selecting all the features from cont1 to cont14
Y = array[:,14] #Selecting the 'loss' which is the class variable
print (X)
print (Y)
Result:
[[ 0.726 0.246 0.188 ..., 0.595 0.822 0.715]
[ 0.331 0.737 0.593 ..., 0.366 0.611 0.304]
[ 0.262 0.358 0.484 ..., 0.373 0.196 0.774]
...,
[ 0.484 0.786 0.792 ..., 0.443 0.339 0.504]
[ 0.438 0.422 0.299 ..., 0.853 0.655 0.722]
[ 0.907 0.621 0.441 ..., 0.946 0.811 0.721]]
[ 2213.18 1283.6 3005.09 ..., 5762.64 1562.87 4751.72]
Code:
test = SelectKBest(score_func=chi2, k=4)
fit = test.fit(X, Y)
Error:
ValueError: Unknown label type: (array([ 2213.18, 1283.6 , 3005.09, ..., 5762.64, 1562.87, 4751.72]),)
当我检查错误时,有人建议Y可能是2d列表,但它不是我错在哪里?请建议我。
答案 0 :(得分:0)
问题在于你的得分_func'在SelectKBest中。
您的问题似乎是回归问题。你不能使用chi2来解决回归问题。您需要使用f_regression or mutual_info_regression.