skLearn教程中GridSearchCV中使用的参数变慢

时间:2018-11-25 20:15:53

标签: python machine-learning scikit-learn

我不确定grid.fit(X,y)为何正确,而不是grid.fit(X_2d, y_2d)

在有关RBF SVM Parameters的本教程中,我们使用GridSearchCV查找SVM的最佳超参数。

它们具有以下代码:

# Dataset for decision function visualization: we only keep the first 
two
# features in X and sub-sample the dataset to keep only 2 classes and
# make it a binary classification problem.

X_2d = X[:, :2]
X_2d = X_2d[y > 0]
y_2d = y[y > 0]
y_2d -= 1

...

param_grid = dict(gamma=gamma_range, C=C_range)

# GridSearchCV will search the parameter space for the best parameters to use, minimizing the score function
cv = StratifiedShuffleSplit(n_splits=5, test_size=0.2, random_state=42)

grid = GridSearchCV(SVC(), param_grid=param_grid, cv=cv)

# ==================== CODE I'M INTERESTED IN  ==================>
# ===== SWITCH `grid.fit(X,y)` with grid.fit(X_2d, y_2d) ========>
grid.fit(X, y)
# ==================== ^^^^^^^^^^^^^^^^^^^  =============>

print("The best parameters are %s with a score of %0.2f"
      % (grid.best_params_, grid.best_score_))

其中X_2dy_2dXy的子集。


仅提供一些信息来说明Xy是什么:

print(X.shape)    #(150,4)
print(y.shape)    #(150,)
print(X_2d.shape) #(100,2)
print(y_2d.shape) #(100,)
print(type(X))    #<class 'numpy.ndarray'>
print(type(y))    #<class 'numpy.ndarray'>
print(type(X_2d)) #<class 'numpy.ndarray'>
print(type(y_2d)) #<class 'numpy.ndarray'>

为什么将上面的代码更改为grid.fit(X_2d, y_2d)无效?我不确定是否要花很长时间,还是不正确。我的Jupyter笔记本只是坐在那里,而grid.fit(X, y)只需要几秒钟。

我最初的想法是,我们希望适合正在运行的实际数据集,而这些数据集是X_2dy_2d,而不是Xy

0 个答案:

没有答案