for循环在python中绘制散景中的前n个特征重要性,而不显式键入列名

时间:2018-02-06 14:32:37

标签: python pandas plot bokeh

我想绘制散景中RandomForestClassifier()中的顶部 n 要素,而不在 y 变量中明确指定列名。

  1. 首先,不是在变量y中键入列名,而是直接从randomclassifier的顶部特征中获取列名和值。

    y = df['new']
    x = df.drop('new', axis=1)
    rf = RandomForestClassifier()
    rf.fit(x,y)
    
    #Extract the top feature from above and plot in bokeh
    
    source = ColumnDataSource(df)
    
    p1 = figure(y_range=(0, 10))
    
    # below I would like it to use the top feature in RandomClassifier 
    # instead of explicitly writing the column name, horsePower,
    # from the top features column
    
    p1.line(
        x = 'x',
        y = 'horsePower', 
        source=source,
        legend = 'Car Blue',
        color = 'Blue'
     )
    
  2. 我们可以构建一个for循环来绘制散景中的 n 顶部要素,而不是仅指定第一个要素或第二个要素。我想它接近这个

    for i in range(5):
        p.line(x = 'x', y = ???? , source=source,) #top feature in randomClassifier
        p.circle(x = 'x', y = ???? , source=source, size = 10)
        row = [p]
    
    output_file('TopFeatures')
    show(p)
    
  3. 我已经从模型的RandomForestClassifier中提取了前15个特征,并使用

    打印了前15个特征
     new_rf = pd.Series(rf.feature_importances_,index=x.columns).sort_values(ascending=False) 
    
    print(new_rf[:15]) 
    

1 个答案:

答案 0 :(得分:0)

只需遍历pandas系列的索引值 new_rf ,因为它的索引是列名:

# TOP 1 FEATURE
p1.line(
    x = 'x',
    y = new_rf.index[0], 
    source = source,
    legend = 'Car Blue',
    color = 'Blue'
 )

# TOP 5 FEATURES
for i in new_rf[:5].index:

    output_file("TopFeatures_{}".format(i))

    p = figure(y_range=(0, 10))
    p.line(x = 'x', y = i, source = source)
    p.circle(x = 'x', y = i, source = source, size = 10)

    show(p)