Question

我正在使用cross_val_score方法评估desicion_tree_regressor预测模型。问题是，得分似乎是负面的，我真的不明白为什么。

这是我的代码：

all_depths = []
all_mean_scores = []
for max_depth in range(1, 11):
    all_depths.append(max_depth)
    simple_tree = DecisionTreeRegressor(max_depth=max_depth)
    cv = KFold(n_splits=2, shuffle=True, random_state=13)
    scores = cross_val_score(simple_tree, df.loc[:,'system':'gwno'], df['gdp_growth'], cv=cv)
    mean_score = np.mean(scores)
    all_mean_scores.append(np.mean(scores))
    print("max_depth = ", max_depth, scores, mean_score, sem(scores))

结果：

max_depth =  1 [-0.45596988 -0.10215719] -0.2790635315340 0.176906344162 
max_depth =  2 [-0.5532268 -0.0186984] -0.285962600541 0.267264196259 
max_depth =  3 [-0.50359311  0.31992411] -0.0918345038141 0.411758610421 max_depth =  4 [-0.57305355  0.21154193] -0.180755811466 0.392297741456 max_depth =  5 [-0.58994928  0.21180425] -0.189072515181 0.400876761509 max_depth =  6 [-0.71730634  0.22139877] -0.247953784441 0.469352551213 max_depth =  7 [-0.60118621  0.22139877] -0.189893720551 0.411292487323 max_depth =  8 [-0.69635044  0.13976584] -0.278292298411 0.418058142228 max_depth =  9 [-0.78917478  0.30970763] -0.239733577455 0.549441204178 max_depth =  10 [-0.76098227  0.34512503] -0.207928623044 0.553053649792

我的问题如下：

1）得分返回MSE吧？如果是这样，它怎么会消极？

2）我有一个约40个观察和~70个变量的小样本。这可能是问题吗？

提前致谢。

Answer 1

可能会发生。已经在post中回答了！

实际的MSE只是你得到的数字的正面版本。

统一评分API总是最大化分数，因此需要最小化的分数被否定才能使统一评分API正常工作。因此，当分数应该被最小化并且如果它是应该被最大化的分数时，则返回的分数被否定。

Answer 2

TL，DR：

1）否，除非您明确指定，否则它是估算器的默认.score方法。由于你没有，它默认为DecisionTreeRegressor.score，它返回决定系数，即R ^ 2。这可能是消极的。

2）是的，这是一个问题。它解释了为什么你会得到负面的决心系数。

详情：

您使用过这样的功能：

scores = cross_val_score(simple_tree, df.loc[:,'system':'gwno'], df['gdp_growth'], cv=cv)

所以你没有明确传递“评分”参数。我们来看看docs：

评分：字符串，可调用或无，可选，默认值：无

一个字符串（参见模型评估文档）或带有签名记分器的评分者可调用对象/函数（估计器，X，y）。

因此它没有明确说明这一点，但这可能意味着它使用了估算器的默认.score方法。

为了证实这个假设，让我们深入研究source code。我们看到最终使用的得分手如下：

scorer = check_scoring(estimator, scoring=scoring)

所以，让我们看看source for check_scoring

has_scoring = scoring is not None
if not hasattr(estimator, 'fit'):
    raise TypeError("estimator should be an estimator implementing "
                    "'fit' method, %r was passed" % estimator)
if isinstance(scoring, six.string_types):
    return get_scorer(scoring)
elif has_scoring:
    # Heuristic to ensure user has not passed a metric
    module = getattr(scoring, '__module__', None)
    if hasattr(module, 'startswith') and \
       module.startswith('sklearn.metrics.') and \
       not module.startswith('sklearn.metrics.scorer') and \
       not module.startswith('sklearn.metrics.tests.'):
        raise ValueError('scoring value %r looks like it is a metric '
                         'function rather than a scorer. A scorer should '
                         'require an estimator as its first parameter. '
                         'Please use `make_scorer` to convert a metric '
                         'to a scorer.' % scoring)
    return get_scorer(scoring)
elif hasattr(estimator, 'score'):
    return _passthrough_scorer
elif allow_none:
    return None
else:
    raise TypeError(
        "If no scoring is specified, the estimator passed should "
        "have a 'score' method. The estimator %r does not." % estimator)

请注意，scoring=None已经完成，所以：

has_scoring = scoring is not None

意味着has_scoring == False。此外，估算器具有.score属性，因此我们通过此分支：

elif hasattr(estimator, 'score'):
    return _passthrough_scorer

这很简单：

def _passthrough_scorer(estimator, *args, **kwargs):
    """Function that wraps estimator.score"""
    return estimator.score(*args, **kwargs)

最后，我们现在知道scorer是您的估算工具的默认score。让我们检查docs for the estimator，其中明确指出：

返回预测的确定系数R ^ 2.

系数R ^ 2定义为（1-u / v），其中u是回归   平方和（（y_true - y_pred）** 2）.sum（）和v是残差   平方和（（y_true - y_true.mean（））** 2）.sum（）。最好的   得分为1.0，它可以是负数（因为模型可以是   任意恶化）。一个始终预测预期的常数模型   y的值，忽略输入特征，得到R ^ 2得分   0.0。

所以看起来你的分数实际上是决定系数。因此，基本上，对于R ^ 2的负值，这意味着您的模型执行非常。比我们刚刚预测每个输入的预期值（即平均值）更糟糕。这是有道理的，因为你说：

我有一个约40个观察和~70个变量的小样本。威力这是问题吗？

是个问题。当你只有40个观测值时，对70维问题空间进行有意义的预测实际上是没有希望的。

带有decision_tree_regressor模型的负cross_val_score

2 个答案:

TL，DR：

详情：