我有一个问题,我想测试多个不具有相同命名参数的模型。您如何使用RandomizedSearchCV
中的管道参数列表,就像您可以在此示例中使用GridSearchCV
一样?
示例来自:
https://scikit-learn.org/stable/auto_examples/compose/plot_compare_reduction.html
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_digits
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.svm import LinearSVC
from sklearn.decomposition import PCA, NMF
from sklearn.feature_selection import SelectKBest, chi2
pipe = Pipeline([
# the reduce_dim stage is populated by the param_grid
('reduce_dim', None),
('classify', LinearSVC())
])
N_FEATURES_OPTIONS = [2, 4, 8]
C_OPTIONS = [1, 10, 100, 1000]
param_grid = [
{
'reduce_dim': [PCA(iterated_power=7), NMF()],
'reduce_dim__n_components': N_FEATURES_OPTIONS,
'classify__C': C_OPTIONS
},
{
'reduce_dim': [SelectKBest(chi2)],
'reduce_dim__k': N_FEATURES_OPTIONS,
'classify__C': C_OPTIONS
},
]
grid = GridSearchCV(pipe, cv=3, n_jobs=2, param_grid=param_grid)
digits = load_digits()
grid.fit(digits.data, digits.target)
答案 0 :(得分:0)
我找到了一种解决方法,该方法依靠鸭子的打字方式,并且不会造成太大的影响。
它依赖于将完整的估计量作为参数传递给管道。我们首先对模型的种类进行抽样,然后对参数进行抽样。为此,我们定义了两个可以采样的类:
from sklearn.model_selection import ParameterSampler
class EstimatorSampler:
"""
Class that holds a model and its parameters distribution.
When sampled, the parameters are first sampled and set to the model,
which is returned.
# Arguments
===========
model : sklearn.base.BaseEstimator
param_distributions : dict
Input to ParameterSampler
# Returns
=========
sampled : sklearn.base.BaseEstimator
"""
def __init__(self, model, param_distributions):
self.model = model
self.param_distributions = param_distributions
def rvs(self, random_state=None):
sampled_params = next(iter(
ParameterSampler(self.param_distributions,
n_iter=1,
random_state=random_state)))
return self.model.set_params(**sampled_params)
class ListSampler:
"""
List container that when sampled, returns one of its item,
with probabilities defined by `probs`.
# Arguments
===========
items : 1-D array-like
probs : 1-D array-like of floats
If not None, it should be the same length of `items`
and sum to 1.
# Returns
=========
sampled item
"""
def __init__(self, items, probs=None):
self.items = items
self.probs = probs
def rvs(self, random_state=None):
item = np.random.choice(self.items, p=self.probs)
if hasattr(item, 'rvs'):
return item.rvs(random_state=random_state)
return item
其余代码在下面定义。
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_digits
from sklearn.model_selection import RandomizedSearchCV
from sklearn.pipeline import Pipeline
from sklearn.svm import LinearSVC
from sklearn.decomposition import PCA, NMF
from sklearn.feature_selection import SelectKBest, chi2
pipe = Pipeline([
# the reduce_dim stage is populated by the param_grid
('reduce_dim', None),
('classify', None)
])
N_FEATURES_OPTIONS = [2, 4, 8]
dim_reducers = ListSampler([EstimatorSampler(est, {'n_components': N_FEATURES_OPTIONS})
for est in [PCA(iterated_power=7), NMF()]] +
[EstimatorSampler(SelectKBest(chi2), {'k': N_FEATURES_OPTIONS})])
C_OPTIONS = [1, 10, 100, 1000]
classifiers = EstimatorSampler(LinearSVC(), {'C': C_OPTIONS})
param_dist = {
'reduce_dim': dim_reducers,
'classify': classifiers
}
grid = RandomizedSearchCV(pipe, cv=3, n_jobs=2, scoring='accuracy', param_distributions=param_dist)
digits = load_digits()
grid.fit(digits.data, digits.target)
答案 1 :(得分:0)
Hyperopt支持跨多个估计器的超参数调整,请检查此wiki以获取更多详细信息(2.2 A搜索空间示例:scikit-learn部分)。
如果您想使用sklearn的GridSearch进行此操作,请查看此post。它建议实现 EstimatorSelectionHelper 估算器,该估算器可以运行不同的估算器,每个估算器都有自己的参数网格。
答案 2 :(得分:0)
这是一个was resolved已有一段时间的老问题了(不确定从哪个scikit学习版本开始)。
您现在可以在RandomizedSearchCV
参数中传递param_distributions
的词典列表。您的示例代码将变为:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_digits
from sklearn.model_selection import RandomizedSearchCV
from sklearn.pipeline import Pipeline
from sklearn.svm import LinearSVC
from sklearn.decomposition import PCA, NMF
from sklearn.feature_selection import SelectKBest, chi2
pipe = Pipeline([
# the reduce_dim stage is populated by the param_grid
('reduce_dim', None),
('classify', LinearSVC())
])
N_FEATURES_OPTIONS = [2, 4, 8]
C_OPTIONS = [1, 10, 100, 1000]
param_grid = [
{
'reduce_dim': [PCA(iterated_power=7), NMF()],
'reduce_dim__n_components': N_FEATURES_OPTIONS,
'classify__C': C_OPTIONS
},
{
'reduce_dim': [SelectKBest(chi2)],
'reduce_dim__k': N_FEATURES_OPTIONS,
'classify__C': C_OPTIONS
},
]
grid = RandomizedSearchCV(pipe, cv=3, n_jobs=2, param_distributions=param_grid)
digits = load_digits()
grid.fit(digits.data, digits.target)
我正在使用sklearn版本0.23.1。