Question

我是初学者，我在下面有以下代码。

def find_peaks(t , neighbour_length=2000):
    peak_index = []
    for i in range(len(t)): # compare element with previous values
        isPeak = True #intialize to true
        for j in range(i, neighbour_length + i):

            # Check if previous index value is present
            if (2*i-j-1 >= 0):
                # Check if next neighbour is less or break
                if(t[i] <= t[2*i-j-1]):
                    isPeak = False
                    break
            # Check if Future element is present
            if (j+i+1 < len(t)):
                #Check if next future neighbour ir less or break
                if(t[i] <= t[i+j+1]):
                    isPeak = False
                    break

        if(isPeak):
            peak_index.append(i)

    return peak_index

这是本地测试，我想要完成的是，

我。在数据集上执行PCA

II。使用只有默认参数的高斯朴素贝叶斯

III。使用StratifiedShuffleSplit

所以最后我希望将上述步骤转移到另一个转储分类器，数据集和功能列表以测试性能的函数。

from sklearn.naive_bayes import GaussianNB
from sklearn.decomposition import PCA

pca = PCA()
model = GaussianNB()
steps = [('pca', pca), ('model', model)]
pipeline = Pipeline(steps)

cv = StratifiedShuffleSplit(n_splits=5, test_size=0.2, random_state=42)
modelwithpca = GridSearchCV(pipeline, param_grid= ,cv=cv)
modelwithpca.fit(X_train,y_train)

在param_grid部分，我不想测试任何参数列表。我只是希望在高斯朴素贝叶斯中使用默认参数，如果这是有道理的。我该怎么改变？

我是否应该如何实例化分类器对象呢？

Answer 1

GridSearchCV的目的是使用不同的参数测试管道中的至少一件事（如果您不想测试不同的参数，则不需要使用GridSearchCV）。所以，一般来说，如果你想让我们说测试不同的PCA n_components。使用GridSearchCV管道的格式如下：

gscv = GridSearchCV(pipeline, param_grid={'{step_name}__{parameter_name}': [possible values]}, cv=cv)

e.g：

# this would perform cv for the 3 different values of n_components for pca
gscv = GridSearchCV(pipeline, param_grid={'pca__n_components': [3, 6, 10]}, cv=cv)

如果您使用GridSearchCV调整PCA，如上所述，这当然意味着您的模型将具有默认值。

如果你不需要参数调整那么GridSearchCV就不行了，因为像这样使用你的模型的GridSearchCV的默认参数，只会产生一个组合的参数网格，所以它会就像只表演简历一样。 如果我理解你的问题，那么这样做是没有意义的：

from sklearn.naive_bayes import GaussianNB
from sklearn.decomposition import PCA
from sklearn.pipeline import Pipeline

pca = PCA()
model = GaussianNB()
steps = [('pca', pca), ('model', model)]
pipeline = Pipeline(steps)

cv = StratifiedShuffleSplit(n_splits=5, test_size=0.2, random_state=42)
# get the default parameters of your model and use them as a param_grid
modelwithpca = GridSearchCV(pipeline, param_grid={'model__' + k: [v] for k, v in model.get_params().items()}, cv=cv)

# will run 5 times as your cv is configured
modelwithpca.fit(X_train,y_train)

希望这有帮助，祝你好运！

在GridSearchCV中，如何仅传递param_grid中的默认参数？

1 个答案: