我构建了一个二元决策树分类器。从混淆矩阵中我发现类0被错误分类495次而类1被错误分类134次。我想找出决策树中哪些规则实际上导致记录错误分类。
简而言之,哪个记录在哪个树节点失败
是否有机器学习方法可用于在决策树中查找导致其错误分类的规则
混淆矩阵
[[14226 495]
[ 134 3271]]
安装决策树并绘制
cv = CountVectorizer( max_features = 200,analyzer='word',ngram_range=(1, 3))
cv_addr = cv.fit_transform(data.pop('Clean_addr'))
for i, col in enumerate(cv.get_feature_names()):
data[col] = pd.SparseSeries(cv_addr[:, i].toarray().ravel(), fill_value=0)
train = data.drop(['Resi], axis=1)
Y = data['Resi']
X_train, X_test, y_train, y_test = train_test_split(train, Y, test_size=0.3,random_state =8)
rus = RandomUnderSampler(random_state=42)
X_train_res, y_train_res = rus.fit_sample(X_train, y_train)
dt=DecisionTreeClassifier(class_weight="balanced", min_samples_leaf=30)
fit_decision=dt.fit(X_train_res,y_train_res)
from sklearn.externals.six import StringIO
from IPython.display import Image
from sklearn.tree import export_graphviz
import pydotplus
dot_data = StringIO()
export_graphviz(fit_decision, out_file=dot_data,
filled=True, rounded=True,
special_characters=True,feature_names=train.columns)
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
Image(graph.create_png())from sklearn.externals.six import StringIO
from IPython.display import Image
from sklearn.tree import export_graphviz
import pydotplus
dot_data = StringIO()
export_graphviz(fit_decision, out_file=dot_data,
filled=True, rounded=True,
special_characters=True,feature_names=train.columns)
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
Image(graph.create_png())
感谢任何帮助。
Dtree Plot
数据集
https://drive.google.com/open?id=1NhXfwBIB640wJ30AyPKFnbIECCdmpyi5
Resi是目标列。使用我想要预测的其他数据列,我已经计算了Clean_addr列的数据。