Question

我正在尝试使用决策树和for循环来实现套袋和投票。我正在使用sklearn重采样。但是，我得到了Number of labels=97 does not match number of samples=77，我可以看到原因，但不确定如何解决。

数据集中有150个样本。有150个标签所以150 * 0.35 = 97 和97 * 0.8 = 77。 X是长度为150的特征矩阵，并且 y是长度为150的标签向量

下面是我的代码

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.utils import resample


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.35, random_state=3)

predictions = []

for i in range(1,20):
    bootstrap_size = int(0.8*len(X_train))
    bag = resample(X_train, n_samples = bootstrap_size , random_state=i , replace = True) 
    Base_DecisionTree = DecisionTreeClassifier(random_state=3)
    Base_DecisionTree.fit(bag, y_train)
    y_predict = Base_DecisionTree.predict(X_test)
    accuracy = accuracy_score(y_test, y_predict)
    predictions.append(accuracy)

Answer 1

您还应该对标签重新采样并在fit()中使用它。

x_bag, y_bag = resample(X_train, y_train, n_samples = bootstrap_size , random_state=i , replace = True) 
tree = DecisionTreeClassifier(random_state=3)
tree.fit(x_bag, y_bag)

如何使用for循环在决策树上正确实现装袋？

1 个答案: