如何在(GridSearchCV)拟合模型后打印估计的系数? (SGDRegressor)

时间:2014-06-23 23:00:23

标签: python scikit-learn

我是scikit-learn的新手,但它做了我所希望的。现在,令人抓狂的是,唯一剩下的问题是,我不知道如何打印(甚至更好地写入一个小文本文件)它估计的所有系数,它所选择的所有功能。这样做的方法是什么?

与SGDClassifier相同,但我认为对于所有可以适合的基础对象,交叉验证或没有交叉验证都是一样的。完整的脚本如下。

import scipy as sp
import numpy as np
import pandas as pd
import multiprocessing as mp
from sklearn import grid_search
from sklearn import cross_validation
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import SGDClassifier


def main():
    print("Started.")
    # n = 10**6
    # notreatadapter = iopro.text_adapter('S:/data/controls/notreat.csv', parser='csv')
    # X = notreatadapter[1:][0:n]
    # y = notreatadapter[0][0:n]
    notreatdata = pd.read_stata('S:/data/controls/notreat.dta')
    notreatdata = notreatdata.iloc[:10000,:]
    X = notreatdata.iloc[:,1:]
    y = notreatdata.iloc[:,0]
    n = y.shape[0]

    print("Data lodaded.")
    X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, y, test_size=0.4, random_state=0)

    print("Data split.")
    scaler = StandardScaler()
    scaler.fit(X_train)  # Don't cheat - fit only on training data
    X_train = scaler.transform(X_train)
    X_test = scaler.transform(X_test)  # apply same transformation to test data

    print("Data scaled.")
    # build a model
    model = SGDClassifier(penalty='elasticnet',n_iter = np.ceil(10**6 / n),shuffle=True)
    #model.fit(X,y)

    print("CV starts.")
    # run grid search
    param_grid = [{'alpha' : 10.0**-np.arange(1,7),'l1_ratio':[.05, .15, .5, .7, .9, .95, .99, 1]}]
    gs = grid_search.GridSearchCV(model,param_grid,n_jobs=8,verbose=1)
    gs.fit(X_train, y_train)

    print("Scores for alphas:")
    print(gs.grid_scores_)
    print("Best estimator:")
    print(gs.best_estimator_)
    print("Best score:")
    print(gs.best_score_)
    print("Best parameters:")
    print(gs.best_params_)


if __name__=='__main__':
    mp.freeze_support()
    main()

2 个答案:

答案 0 :(得分:12)

配有最佳超参数的SGDClassifier实例存储在gs.best_estimator_中。 coef_intercept_是该最佳模型的拟合参数。

答案 1 :(得分:0)

我认为您可能正在寻找“最佳”模型的估计参数,而不是通过网格搜索确定的超参数。您可以将网格搜索中的最佳超参数(在您的情况下为'alpha'和'l1_ratio')插入模型(在您的情况下为'SGDClassifier'),以进行再次训练。然后,您可以从拟合的模型对象中找到参数。

代码可能是这样的:

model2 = SGDClassifier(penalty='elasticnet',n_iter = np.ceil(10**6 / n),shuffle=True, alpha = gs.best_params_['alpha'], l1_ratio=gs.best_params_['l1_ratio'])
print(model2.coef_)