如何在python中绘制图形,如R中的varImpPlot()方法图,用于绘制随机森林中的重要变量?

时间:2017-09-14 04:08:54

标签: r python-2.7 matplotlib machine-learning random-forest

我有包含大约370个特征的数据,我已经构建了随机森林以获得重要的功能,但是当我绘制时我无法弄清楚要考虑的特征,因为370特征在x轴看起来非常笨拙。 任何人都可以帮我在python中绘制图形,就像R中varImpPlot()绘制的图形一样。

1 个答案:

答案 0 :(得分:1)

在R中的randomForest包中,varImpPlot()绘制了最重要的前30个变量,您也可以在Python中使用sklearn help page中的示例进行操作:

    ID event.date event.count new.event.count
 1:  1 2016-06-02           0               0
 2:  1 2016-11-28           1               1
 3:  2 2016-06-01           0               0
 4:  2 2018-07-10           0               0
 5:  2 2018-11-03           1               1
 6:  4 2018-06-16           0               0
 7:  5 2017-05-20           0               0
 8:  6 2016-04-02           0               0
 9:  6 2016-07-27           1               1
10:  7 2018-01-16           0               0
11:  7 2018-04-01           1               1
12:  7 2018-04-01           1               1
13:  7 2018-06-15           2               2
14:  7 2018-06-15           2               2
15:  8 2016-08-17           0               0
16:  8 2017-10-07           1               1
17:  9 2016-08-05           0               0
18:  9 2017-01-22           1               1
19:  9 2017-12-01           2               2
20:  9 2018-08-01           3               3
21: 10 2018-08-13           0               0
22: 10 2018-09-09           1               1
23: 11 2018-03-26           0               0
24: 11 2018-08-20           1               1
25: 11 2018-11-24           2               2
26: 12 2017-02-04           0               0
27: 13 2016-05-16           0               0
28: 13 2017-03-08           1               1
29: 13 2018-07-15           1               1
30: 14 2017-11-25           0               0
31: 15 2017-09-21           0               0
    ID event.date event.count new.event.count

要进行绘制,我们可以将重要性得分放入pd.Series并绘制前30名:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier

X, y = make_classification(n_samples=1000,
                           n_features=370,
                           n_informative=16,
                           n_classes=2,
                           random_state=0)

forest = RandomForestClassifier(random_state=0)
forest.fit(X, y)

enter image description here