GridSearch paramgrid

时间:2018-02-21 12:57:37

标签: python numpy scikit-learn

我想在线性回归模型上使用k-fold交叉验证,但我想每次从模型中省略1个参数。例如:如果模型有3个变量,那么我希望ab,ac,bc,其中abc是因变量。我不确定如何使用param_grid来执行此操作。

如果有10个变量,那么它只是:

param_grid={'a':[1,10]}

我查看了文档,但他们似乎认为我对该函数很熟悉......

1 个答案:

答案 0 :(得分:0)

正如@ Scratch'N'Purr已经明确表示的那样,这种实现还没有scikit-learn。

选项1 实施自己的 -

from __future__ import print_function

from sklearn.cross_validation import train_test_split
import itertools
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
print(__doc__)

df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))

# Split the dataset in two equal parts
X_train, X_test, y_train, y_test = train_test_split(
    df[['A', 'B', 'C']], df['D'], test_size=0.5, random_state=0)

n_features = X_train.shape[1]
subsets = []
feature_lst = range(3) # truncating to keep it small
for i in range(2, len(feature_lst)+1):
    #subsets.append(i)
    subsets += [list(x) for x in itertools.combinations(feature_lst, i)]
print(subsets)

best_score = -np.inf
best_subset = None
lreg = LinearRegression()
for subset in subsets:
    lreg.fit(X_train.iloc[:, subset], y_train)
    print(lreg.score(X_test.iloc[:, subset], y_test))

<强>输出

这适合每个特征组合的线性回归模型。这是输出 -

[[0, 1], [0, 2], [1, 2], [0, 1, 2]]
-0.173503748866
-0.069322067281
-0.159670591221
-0.173960013649

请注意它如何为您提供所使用功能和相应的R squared值的组合。

选项2 使用执行stepwise回归的scikit-learn LARS实现 -

>>> from sklearn import linear_model
>>> reg = linear_model.LassoLars(alpha=.1)
>>> reg.fit([[0, 0], [1, 1]], [0, 1])  
LassoLars(alpha=0.1, copy_X=True, eps=..., fit_intercept=True,
     fit_path=True, max_iter=500, normalize=True, positive=False,
     precompute='auto', verbose=False)
>>> reg.coef_    
array([ 0.717157...,  0.        ])

希望有所帮助。干杯!