在堆积的条形图中绘制出现百分比

时间:2019-01-14 19:53:39

标签: python matplotlib data-science

我试图创建一个堆积的条形图,以显示每个项目在给定年份中发生的百分比。

问题是当我绘制这些值时,并不是所有条形图都显示出来。似乎有些酒吧被显示的酒吧所掩盖。

这是相关代码:

barWidth = 0.85
plt.bar(list(yearly_timeline.index),yearly_timeline.values, color='#a3acff',edgecolor='white',width=barWidth)
plt.bar(list(yearly_links.index),yearly_links.values, color='#FFD700',edgecolor='white',width=barWidth)
plt.bar(list(yearly_status.index),yearly_status.values, color='#b5ffb9',edgecolor='white',width=barWidth)
plt.bar(list(yearly_posts.index),yearly_posts.values,color='#f9bc86',edgecolor='white',width=barWidth)
plt.bar(list(yearly_shared.index),yearly_shared.values,color='#f9bc86',edgecolor='white',width=barWidth)
plt.xticks(list(yearly_links.index))
fig = plt.gcf()
fig.set_size_inches(20,10)
plt.tick_params(labelsize=20)
plt.show()

这是我正在绘制的数据集的示例:

#yearly posts
year
2009     4.907975
2010    11.656442
2013    11.656442
2014    24.539877
2015     7.975460
2016    12.269939
2017    16.564417
2018    10.429448
dtype: float64

#yearly shared
year
2010     1.273885
2011     0.636943
2012     9.554140
2013    29.936306
2014    28.025478
2015    15.923567
2016     7.643312
2017     4.458599
2018     2.547771
dtype: float64

#yearly timeline
year
2010     4.059041
2011    18.450185
2012    18.819188
2013    12.915129
2014    25.830258
2015    16.236162
2016     2.214022
2017     1.107011
2018     0.369004
dtype: float64

#yearly status
year
2009     6.916192
2010     6.997559
2011    15.296989
2012    22.294548
2013    19.528072
2014    13.913751
2015    10.740439
2016     1.790073
2017     1.464605
2018     1.057771
dtype: float64

#yearly links
year
2009     0.655738
2010     0.218579
2011     8.196721
2012     8.524590
2013     1.530055
2014     7.103825
2015    26.338798
2016    17.595628
2017    25.027322
2018     4.808743
dtype: float64

1 个答案:

答案 0 :(得分:0)

对于您来说,可以通过将所有数据汇总到一个DataFrame中来简化代码(我假设它们当前是单独的Series):

创建虚拟数据

my_names = ['timeline','links','status','posts','shared']
my_series = [pd.Series(data=np.random.random(size=(9,)), index=range(2010,2019), name=n) for n in my_names]

将系列列表转换为数据框:

df = pd.DataFrame(my_series).T
display(df)

      timeline     links    status     posts    shared
2010  0.534663  0.107604  0.265774  0.849307  0.149886
2011  0.064561  0.354329  0.557265  0.297695  0.563122
2012  0.646828  0.011643  0.608695  0.493709  0.337949
2013  0.170792  0.083039  0.866962  0.278223  0.501074
2014  0.386262  0.979529  0.972009  0.333049  0.505644
2015  0.764539  0.223265  0.365314  0.712091  0.757626
2016  0.012084  0.700645  0.118666  0.118811  0.332993
2017  0.407492  0.480495  0.399464  0.613331  0.655171
2018  0.072698  0.262846  0.763811  0.783575  0.859755

使用熊猫plot命令的简单方法:

df.plot(kind='bar', stacked=True, width=0.85)

enter image description here

或直接使用matplotlib来提高灵活性:

fig, ax = plt.subplots()
for i,col in enumerate(df.columns):
    ax.bar(df.index, height=df[col], bottom=df.iloc[:,:i].sum(axis=1), edgecolor="white", width=0.85)
ax.set_xticks(df.index)

enter image description here