Question

我目前已经为二进制类实现了概率（至少我认为是这样）。现在，我想扩展此方法以进行回归，并尝试将其用于Boston数据集。不幸的是，我的算法似乎卡住了，我当前正在运行的代码看起来像这样：

from sklearn import decomposition
from sklearn import svm
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
import warnings
warnings.filterwarnings("ignore")
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_boston

boston = load_boston()

X = boston.data
y = boston.target
inputs_train, inputs_test, targets_train, targets_test = train_test_split(X, y, test_size=0.33, random_state=42)

def plotting():
    param_C = [0.01, 0.1]
    param_grid = {'C': param_C, 'kernel': ['poly', 'rbf'], 'gamma': [0.1, 0.01]}
    clf = GridSearchCV(svm.SVR(), cv = 5, param_grid= param_grid)
    clf.fit(inputs_train, targets_train)
    clf = SVR(C=clf.best_params_['C'], cache_size=200, class_weight=None, coef0=0.0,
              decision_function_shape='ovr', degree=5, gamma=clf.best_params_['gamma'],
              kernel=clf.best_params_['kernel'],
              max_iter=-1, probability=True, random_state=None, shrinking=True,
              tol=0.001, verbose=False)
    clf.fit(inputs_train, targets_train)
    a = clf.predict(inputs_test[0])
    print(a)


plotting()

有人可以告诉我，这种方法有什么问题，这不是我得到一些错误消息的事实（我知道，我已经在上面提到过这些错误消息），但是代码从未停止运行。任何建议将不胜感激。

Answer 1

您的代码有几个问题。

首先，永远是 first clf.fit（即网格搜索一个），这就是为什么设置时没有看到任何变化的原因在您的秒 max_iter中tol和clf.fit。
第二，clf=SVR()部分将不起作用，因为：
- 您必须导入它，SVR无法识别
- 您那里有一堆非法论点（decision_function_shape，probability，random_state等）-check the docs为可接受的SVR论点。
第三，您不需要再次明确地使用最佳参数。您应该只在refit=True定义中要求GridSearchCV，然后使用clf.best_estimator_进行预测（注释后编辑：简单地clf.predict也可以）。
< / li>

因此，将内容移到任何函数定义之外，这是代码的有效版本：

from sklearn.svm import SVR
# other imports as-is

# data loading & splitting as-is

param_C = [0.01, 0.1]
param_grid = {'C': param_C, 'kernel': ['poly', 'rbf'], 'gamma': [0.1, 0.01]}
clf = GridSearchCV(SVR(degree=5, max_iter=10000), cv = 5, param_grid= param_grid, refit=True,)
clf.fit(inputs_train, targets_train)
a = clf.best_estimator_.predict(inputs_test[0])
# a = clf.predict(inputs_test[0]) will also work 
print(a)
# [ 21.89849792]

除了degree之外，您正在使用的所有其他允许的参数值实际上都是各自的默认值，因此在SVR定义中真正需要的唯一参数是degree和max_iter。

在安装后，您会得到一些警告（不是错误）：

/databricks/python/lib/python3.5/site-packages/sklearn/svm/base.py:220： ConvergenceWarning：解算器提前终止（max_iter = 10000）。考虑使用StandardScaler或MinMaxScaler预处理数据。

并且在预测之后：

/databricks/python/lib/python3.5/site-packages/sklearn/utils/validation.py:395： DeprecationWarning：在0.17中弃用数据时传递一维数组并将在0.19中引发ValueError。使用以下方法重塑数据 X.reshape（-1，1）（如果您的数据具有单个功能）或X.reshape（1，-1）如果它包含单个样本。弃用警告）

其中已经包含有关下一步操作的建议...

最后但并非最不重要：一个概率分类器（即产生probabilities instead of hard labels的分类器是有效的东西，但“概率”回归模型不是...

使用Python 3.5 和scikit-learn 0.18.1

进行了测试

概率SVM，回归

1 个答案: