使用Python的随机森林的高OOB错误

时间:2018-04-05 21:09:23

标签: python machine-learning classification random-forest prediction

我想使用Python中的随机森林分类器学习Python来预测库存变动。我的数据集有8个功能和1201条记录。但在拟合模型并使用它进行预测后,它显示出100%的准确率和100%的OOB误差。我将n_estimators从100修改为一个小值,但OOB错误刚刚下降了几个百分点。这是我的代码:

from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
import numpy as np
import pandas as pd

#File reading
df = pd.read_csv('700.csv')
df.drop(df.columns[0],1,inplace=True)
target = df.iloc[:,8]
print(target)

#train test split
X_train, X_test, y_train, y_test = train_test_split(df, target, test_size=0.3)

#model fit
clf = RandomForestClassifier(n_estimators=100, criterion='gini',oob_score= True)
clf.fit(X_train,y_train)

pred = clf.predict(X_test)
accuaracy = accuracy_score(y_test,pred)
print(clf.oob_score_)
print(accuaracy)

如何修改代码以使oob错误丢失?感谢。

1 个答案:

答案 0 :(得分:0)

如果要检查错误,请像这样使用/修改代码:

Column