我正在尝试让sklearn为线性回归选择最佳k变量(例如k = 1)。这有效,我可以得到R平方,但它并没有告诉我哪个变量是最好的。我怎么能找到它?
我有以下形式的代码(实际变量列表要长得多):
X=[]
for i in range(len(df)):
X.append([averageindegree[i],indeg3_sum[i],indeg5_sum[i],indeg10_sum[i])
training=[]
actual=[]
counter=0
for fold in range(500):
X_train, X_test, y_train, y_test = crossval.train_test_split(X, y, test_size=0.3)
clf = LinearRegression()
#clf = RidgeCV()
#clf = LogisticRegression()
#clf=ElasticNetCV()
b = fs.SelectKBest(fs.f_regression, k=1) #k is number of features.
b.fit(X_train, y_train)
#print b.get_params
X_train = X_train[:, b.get_support()]
X_test = X_test[:, b.get_support()]
clf.fit(X_train,y_train)
sc = clf.score(X_train, y_train)
training.append(sc)
#print "The training R-Squared for fold " + str(1) + " is " + str(round(sc*100,1))+"%"
sc = clf.score(X_test, y_test)
actual.append(sc)
#print "The actual R-Squared for fold " + str(1) + " is " + str(round(sc*100,1))+"%"
答案 0 :(得分:2)
您需要使用get_support:
features_columns = [.......]
fs = SelectKBest(score_func=f_regression, k=5)
print zip(fs.get_support(),features_columns)
答案 1 :(得分:1)
尝试使用b.fit_transform()
代替b.tranform()
。 fit_transform()
函数使用fit并将输入X转换为具有所选要素的新X并返回新X.
...
b = fs.SelectKBest(fs.f_regression, k=1) #k is number of features.
X_train = b.fit_transform(X_train, y_train)
#print b.get_params
...
答案 2 :(得分:0)
这样做的方法是使用您最喜欢的函数(在您的情况下为回归)配置SelectKBest,然后从中获取参数。
我的代码假设您有一个列表features_list
,其中包含X的所有标题的名称。
kb = SelectKBest(score_func=f_regression, k=5) # configure SelectKBest
kb.fit(X, Y) # fit it to your data
# get_support gives a vector [False, False, True, False....]
print(features_list[kb.get_support()])
当然你可以写得比我更pythonic: - )