Question

我正在尝试绘制基于前'n'个频率/影响词的X / Y条形图，该词基于位置，而Y轴是这些词的wordCount-而不是频率计算（{{ 1}}）-从这里开始假设tf_idf是最重要的单词，有些单词是代码，但仍然很重要。样本数据：

tf_idf

使用以下功能（我没有开发但完全理解），我可以得到最重要的SITEID word wordCount wordTotal tf SiteID idf tf_idf CAK hpci 328 187653 0.001 1 1.098 0.001920272单词的漂亮图形：

tf_idf

此图像具有完美的x轴-我想要def pretty_plot_top_n(series, top_n=5, index_level=0): r = series\ .groupby(level=index_level)\ .nlargest(top_n)\ .reset_index(level=index_level, drop=True) r.plot.bar() return r.to_frame() pretty_plot_top_n(tf_idf['tf_idf'])和SITEID。现在，我想对其进行“分层”，以使每个单词的y轴均为word，而不是wordCount。

我尝试了几种不同的方法，包括（但不限于）：

tf_idf

功能调整：

#plot without function 
tf_idf.plot(x=['SITEID', 'tf_idf'], y='wordCount', kind="bar")

多行运行（这很明显为什么不起作用，但在某些情况下会说）

# second layer grouping by tf_idf

def pretty_plot_top_n(series, top_n=5, index_level=0, level2=7):
    r = series\
    .groupby(level=index_level)\
    .groupby(level=level2)\
    .nlargest(top_n)\
    .reset_index(level=index_level, drop=True)
    r.plot.bar()
    return r.to_frame()

我的输出总是以空白结尾，或者我根据最高的pretty_plot_top_n(tf_idf['tf_idf']) pretty_plot_top_n(tf_idf['wordCount'])得到单词。但是，tf_idf计算的重点是消除停用词。

我不希望“ to”和“ of”。我想从上面说我的话。如何对它进行分层以首先基于wordCount创建图，然后针对这些单词将y轴绘制为tf_idf？主要偏好只是调整我现有的功能。

根据列的层次结构/另一列的值绘制DataFrame

0 个答案: