我想使用pandas绘制一个条形图,即两个分类变量和5个数字列。我想首先按一个分类变量分组,并将总和显示为分组条。我还想按第二个分类变量分组,并让每个条形显示第二个类别为堆叠条形。
像我这样的样本数据框可以构造如下:
import pandas as pd
l=100
df = pd.DataFrame({'op1': [random.randint(0,1) for x in range(l)],
'op2': [random.randint(0,1) for x in range(l)],
'op3': [random.randint(0,1) for x in range(l)],
'op4': [random.randint(0,1) for x in range(l)],
'op5': [random.randint(0,1) for x in range(l)],
'cat': random.choices(list('abcde'), k=l),
'gender': random.choices(list('mf-'), k=l)})
df.head()
cat gender op1 op2 op3 op4 op5
0 d m 1 1 1 1 1
1 a m 1 1 0 0 1
2 b - 1 0 1 0 1
3 c m 0 1 0 0 0
4 b - 0 0 1 1 0
5 c f 1 1 1 1 1
6 a - 1 1 0 1 0
7 d f 1 0 1 0 1
8 d m 1 1 0 1 0
9 b - 1 0 1 0 0
我可以轻松地生成分组栏:df.groupby('cat')[['op%s' % i for i in range(1,6)]].sum().plot.bar()
但是,如何让每个酒吧显示性别细分?
答案 0 :(得分:0)
受到vbox指向我的线程的启发,我使用一系列子图实现了它,并且用颜色来捣乱。这是非常糟糕的,如果有人想用更多变量的数据集来使用它,他们需要解决一些问题,但是如果它有用,可以在这里发布。
import matplotlib as mpl
import matplotlib.pyplot as plt
import pandas as pd
import random
l=100
df = pd.DataFrame({'op1': [random.randint(0,1) for x in range(l)],
'op2': [random.randint(0,1) for x in range(l)],
'op3': [random.randint(0,1) for x in range(l)],
'op4': [random.randint(0,1) for x in range(l)],
'op5': [random.randint(0,1) for x in range(l)],
'cat': random.choices(list('abcde'), k=l),
'gender': random.choices(list('mf'), k=l)})
# grab the colors in the current setup (could just use a new cycle instead)
colors = plt.rcParams['axes.prop_cycle'].by_key()['color']
values = df['cat'].unique()
l = len(values)
# make one subplot for every possible value
fig, axes = plt.subplots(1, l, sharey=True)
for i, value in enumerate(values):
ax = axes[i]
# make a dataset that includes gender and all options, then change orientation
df2 = df[df['cat'] == value][['gender', 'op1', 'op2', 'op3', 'op4', 'op5']].groupby('gender').sum().transpose()
# do the stacked plot.
# Note this has all M's one color, F's another
# but we want each bar to have its own colour scheme
df2.plot.bar(stacked=True, width=1, ax=ax, legend=False)
# kludge to change bar colors
# Note: this won't work if one gender is not present
# or if there is a 3rd option for gender, as there is in the sample data
# for this example, I've changed gender to just be m/f
bars = [rect for rect in ax.get_children() if isinstance(rect, mpl.patches.Rectangle)]
for c, b in enumerate(bars[:len(df2)*2]):
b.set_color(colors[c%len(df2)])
if c >= len(df2):
b.set_alpha(0.5)
ax.spines["top"].set_visible(False)
ax.spines["bottom"].set_color('grey')
ax.spines["right"].set_visible(False)
ax.spines["left"].set_visible(False)
ax.set_xticks([])
ax.set_xlabel(value, rotation=45)