我有包含大约370个特征的数据,我已经构建了随机森林以获得重要的功能,但是当我绘制时我无法弄清楚要考虑的特征,因为370特征在x轴看起来非常笨拙。 任何人都可以帮我在python中绘制图形,就像R中varImpPlot()绘制的图形一样。
答案 0 :(得分:1)
在R中的randomForest包中,varImpPlot()绘制了最重要的前30个变量,您也可以在Python中使用sklearn help page中的示例进行操作:
ID event.date event.count new.event.count
1: 1 2016-06-02 0 0
2: 1 2016-11-28 1 1
3: 2 2016-06-01 0 0
4: 2 2018-07-10 0 0
5: 2 2018-11-03 1 1
6: 4 2018-06-16 0 0
7: 5 2017-05-20 0 0
8: 6 2016-04-02 0 0
9: 6 2016-07-27 1 1
10: 7 2018-01-16 0 0
11: 7 2018-04-01 1 1
12: 7 2018-04-01 1 1
13: 7 2018-06-15 2 2
14: 7 2018-06-15 2 2
15: 8 2016-08-17 0 0
16: 8 2017-10-07 1 1
17: 9 2016-08-05 0 0
18: 9 2017-01-22 1 1
19: 9 2017-12-01 2 2
20: 9 2018-08-01 3 3
21: 10 2018-08-13 0 0
22: 10 2018-09-09 1 1
23: 11 2018-03-26 0 0
24: 11 2018-08-20 1 1
25: 11 2018-11-24 2 2
26: 12 2017-02-04 0 0
27: 13 2016-05-16 0 0
28: 13 2017-03-08 1 1
29: 13 2018-07-15 1 1
30: 14 2017-11-25 0 0
31: 15 2017-09-21 0 0
ID event.date event.count new.event.count
要进行绘制,我们可以将重要性得分放入pd.Series并绘制前30名:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
X, y = make_classification(n_samples=1000,
n_features=370,
n_informative=16,
n_classes=2,
random_state=0)
forest = RandomForestClassifier(random_state=0)
forest.fit(X, y)