我必须制作一个(堆叠)条形图,在x轴上有~3000个位置。然而,这些位置中的许多位置不包含条形,但仍然在x轴上标记,使得难以读取绘图。有没有办法只显示现有(堆叠)栏的x-ticks?基于x-tick值的条之间的空格是必要的。如何在matplotlib中解决这个问题?有没有比堆积条形图更合适的情节?我是从pandas交叉表(pd.crosstab())构建的。
链接到情节图像: https://i.stack.imgur.com/qk99z.png
作为我的数据框的样子(感谢gepcel):
import pandas as pd
import numpy as np
N = 3200
df = pd.DataFrame(np.random.randint(1, 5, size=(N, 3)))
df.loc[np.random.choice(df.index, size=3190, replace=False), :] = 0
df_select = df[df.sum(axis=1)>0]
答案 0 :(得分:2)
基本上,如果没有示例,您应该选择总值(也就是堆积值)大于零的刻度。然后手动设置xticks和xticklabels。
假设您有一个如下数据框:
import pandas as pd
import numpy as np
N = 3200
df = pd.DataFrame(np.random.randint(1, 5, size=(N, 3)))
df.loc[np.random.choice(df.index, size=3190, replace=False), :] = 0
然后所选数据应该是这样的:
df_select = df[df.sum(axis=1)>0]
然后你可以绘制一个堆积的条形图,如:
# set width=20, the bar is not too thin to show
plt.bar(df_select.index, df_select[0], width=20, label='0')
plt.bar(df_select.index, df_select[1], width=20, label='1',
bottom=df_select[0])
plt.bar(df_select.index, df_select[2], width=20, label='2',
bottom=df_select[0]+df_select[1])
# Only show the selected ticks, it'll be a little tricky if
# you want ticklabels to be different than ticks
# And still hard to avoid ticklabels overlapping
plt.xticks(df_select.index)
plt.legend()
plt.show()
<强>更新强>:
通过以下方式将文本放在条形图上很容易:
for n, row in df_select.iterrows():
plt.text(n, row.sum()+0.2, n, ha='center', rotation=90, va='bottom')
计算每个条形顶部的位置,并将文本放在那里,并可能添加一些偏移(如+0.2
),并使用rotation=90
来控制旋转。完整代码将是:
df_select = df[df.sum(axis=1)>0]
plt.bar(df_select.index, df_select[0], width=20, label='0')
plt.bar(df_select.index, df_select[1], width=20, label='1',
bottom=df_select[0])
plt.bar(df_select.index, df_select[2], width=20, label='2',
bottom=df_select[0]+df_select[1])
# Here is the part to put text:
for n, row in df_select.iterrows():
plt.text(n, row.sum()+0.2, n, ha='center', rotation=90, va='bottom')
plt.xticks(df_select.index)
plt.legend()
plt.show()
结果:
答案 1 :(得分:0)
这是对gepcel的答案的一种扭曲,它适应具有不同列数的数据框:
# in this case I'm creating the dataframe with 3 columns
# but the code is meant to adapt to dataframes with varying column numbers
df = pd.DataFrame(np.random.randint(1, 5, size=(3200, 3)))
df.loc[np.random.choice(df.index, size=3190, replace=False), :] = 0
df_select = df[df.sum(axis=1)>1]
fig, ax = plt.subplots()
ax.bar(df_select.index, df_select.iloc[:,0], label = df_select.columns[0])
if df_select.shape[1] > 1:
for i in range(1, df_select.shape[1]):
bottom = df_select.iloc[:,np.arange(0,i,1)].sum(axis=1)
ax.bar(df_select.index, df_select.iloc[:,i], bottom=bottom, label =
df_select.columns[i])
ax.set_xticks(df_select.index)
plt.legend(loc='best', bbox_to_anchor=(1, 0.5))
plt.xticks(rotation=90, fontsize=8)