使用Scikit Random Forest分类器。我试图预测二进制目标并获得概率。
from sklearn.ensemble import RandomForestClassifier
X_train, X_test, y_train, y_test = train_test_split(df, y, test_size=0.25, random_state=42)
rf = RandomForestClassifier(n_estimators=100, random_state=42,class_weight=None, max_features="auto",
bootstrap="False", criterion='entropy')
rf.fit(X_train, y_train)
preds = rf.predict_proba(X_test)
正如你所看到的那样,0.4处的垃圾箱有些奇怪。我可以理解短暂的波动但是这个数字急剧下降到了那个垃圾箱。 考虑到测试集大约是10M。
The size of bin for 0.3 is 1.7M
The size of bin for 0.4 is 0.5M !!
The size of bin for 0.5 is 1.3M
Python随机森林如何在概率分布中形成这样的漏洞?有关此的任何提示或想法?