我是scikit-learn
的新手,但它做了我所希望的。现在,令人抓狂的是,唯一剩下的问题是,我不知道如何打印(甚至更好地写入一个小文本文件)它估计的所有系数,它所选择的所有功能。这样做的方法是什么?
与SGDClassifier相同,但我认为对于所有可以适合的基础对象,交叉验证或没有交叉验证都是一样的。完整的脚本如下。
import scipy as sp
import numpy as np
import pandas as pd
import multiprocessing as mp
from sklearn import grid_search
from sklearn import cross_validation
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import SGDClassifier
def main():
print("Started.")
# n = 10**6
# notreatadapter = iopro.text_adapter('S:/data/controls/notreat.csv', parser='csv')
# X = notreatadapter[1:][0:n]
# y = notreatadapter[0][0:n]
notreatdata = pd.read_stata('S:/data/controls/notreat.dta')
notreatdata = notreatdata.iloc[:10000,:]
X = notreatdata.iloc[:,1:]
y = notreatdata.iloc[:,0]
n = y.shape[0]
print("Data lodaded.")
X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, y, test_size=0.4, random_state=0)
print("Data split.")
scaler = StandardScaler()
scaler.fit(X_train) # Don't cheat - fit only on training data
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test) # apply same transformation to test data
print("Data scaled.")
# build a model
model = SGDClassifier(penalty='elasticnet',n_iter = np.ceil(10**6 / n),shuffle=True)
#model.fit(X,y)
print("CV starts.")
# run grid search
param_grid = [{'alpha' : 10.0**-np.arange(1,7),'l1_ratio':[.05, .15, .5, .7, .9, .95, .99, 1]}]
gs = grid_search.GridSearchCV(model,param_grid,n_jobs=8,verbose=1)
gs.fit(X_train, y_train)
print("Scores for alphas:")
print(gs.grid_scores_)
print("Best estimator:")
print(gs.best_estimator_)
print("Best score:")
print(gs.best_score_)
print("Best parameters:")
print(gs.best_params_)
if __name__=='__main__':
mp.freeze_support()
main()
答案 0 :(得分:12)
配有最佳超参数的SGDClassifier
实例存储在gs.best_estimator_
中。 coef_
和intercept_
是该最佳模型的拟合参数。
答案 1 :(得分:0)
我认为您可能正在寻找“最佳”模型的估计参数,而不是通过网格搜索确定的超参数。您可以将网格搜索中的最佳超参数(在您的情况下为'alpha'和'l1_ratio')插入模型(在您的情况下为'SGDClassifier'),以进行再次训练。然后,您可以从拟合的模型对象中找到参数。
代码可能是这样的:
model2 = SGDClassifier(penalty='elasticnet',n_iter = np.ceil(10**6 / n),shuffle=True, alpha = gs.best_params_['alpha'], l1_ratio=gs.best_params_['l1_ratio'])
print(model2.coef_)