Question

我使用RandomForestClassifier运行一些模拟来对两个类别进行分类，我使用下面的函数进行调试：

def predict_proba(self, X):

    print(X.shape)

    pred = self.clf.predict_proba(X)

    print(pred.shape)

    pred = pred.T[1]

    print(pred.shape)

    return pred

当我分割数据时，这个例程运行了几次，我得到了上面编码的打印.shape例程的跟随输出。

(62, 93)
(62, 2)
(62,)

(62, 93)
(62, 2)
(62,)

(62, 93)
(62, 2)
(62,)

(62, 93)
(62, 1)
IndexError: index 1 is out of bounds for axis 0 with size 1

我的问题是为什么prediction_proba(X)大部分时间都会输出(62, 2)的形状（如预期的那样），而其他时间输出的形状为(62, 1)？

修改

我想我理解它，看起来分类器只训练了一个类别，因为我没有使用Stratified KFold或类似的东西。

Answer 1

我已经解决了这个问题。之所以发生这种情况，是因为我将数据拆分成块，其中一些块只有2个类别中的1个类别，因此RandomForestClassifier只会.fit()只有1个类别。

我使用sklearn.model_selection.StratifiedKFold解决了这个问题。

RandomForestClassifier prediction_proba（X）仅为一个类别

1 个答案: