在堆叠的DataFrame.plot栏中显示总计和百分比

时间:2020-11-07 10:07:26

标签: python dataframe matplotlib machine-learning data-mining

我有一个很小的DataFrame(它只有两列:“性别”,其值为“男”和“女”,以及“ MaritalSatus”,其值为“单”,“已婚”和“离婚”)。数据分发总结如下:

    Gender  MaritalStatus   Tot.
    Male    Single          225
    Male    Married         296
    Male    Divorced        143
    Female  Single          137
    Female  Married         222
    Female  Divorced        94

使用以下代码,我可以绘制堆叠的条形图:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

pclass_xt = pd.crosstab(df["Gender"], df["MaritalStatus"]) 
pclass_xt.plot(kind='bar', stacked=True)
plt.xlabel("Gender")
plt.ylabel("count")
plt.xticks(rotation=0)
plt.show()

这是我的输出:

enter image description here

我想将每个条形图的总计与每个条形图中每个细分的百分比相加。感谢您的帮助。

1 个答案:

答案 0 :(得分:1)

可以通过使用matplotlib bar函数然后添加文本来获得图形。绘图的代码如下(我假设数据存储在data.csv文件中):

import numpy as np
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df = pd.read_csv('data.csv')
display(df)

pclass_xt_group = df.groupby(by=["Gender", "MaritalStatus"]).count()
pclass_xt_group = pclass_xt_group.T
display(pclass_xt_group)

#Get values from the group and categories
groups = ['Female', 'Male']
female = pclass_xt_group['Female'].to_numpy()[0]
male = pclass_xt_group['Male'].to_numpy()[0]

divorced = [pclass_xt_group['Female']['Divorced'][0], pclass_xt_group['Male']['Divorced'][0]]
married = [pclass_xt_group['Female']['Married'][0], pclass_xt_group['Male']['Married'][0]]
single = [pclass_xt_group['Female']['Single'][0], pclass_xt_group['Male']['Single'][0]]

#add colors
colors = ['#FF9999', '#00BFFF','#C1FFC1']

# The position of the bars on the x-axis
r = range(len(groups))
barWidth = 1

#plot bars
plt.figure(figsize=(10,7))
ax1 = plt.bar(r, divorced, color=colors[0], edgecolor='white', width=barWidth, label="divorced")
ax2 = plt.bar(r, married, bottom=np.array(divorced), color=colors[1], edgecolor='white', width=barWidth, label='married')
ax3 = plt.bar(r, single, bottom=np.array(divorced)+np.array(married), color=colors[2], edgecolor='white', width=barWidth, label='single')
plt.legend()

# Custom X axis
plt.xticks(r, groups, fontweight='bold')
plt.ylabel("Count")

for r1, r2, r3 in zip(ax1, ax2, ax3):
    h1 = r1.get_height()
    h2 = r2.get_height()
    h3 = r3.get_height()
    plt.text(r1.get_x() + r1.get_width() / 2., h1 / 2., "%.2f" % (h1/(h1+h2+h3)), ha="center", va="center", color="white", fontsize=16, fontweight="bold")
    plt.text(r2.get_x() + r2.get_width() / 2., h1 + h2 / 2., "%.2f" % (h2/(h1+h2+h3)), ha="center", va="center", color="white", fontsize=16, fontweight="bold")
    plt.text(r3.get_x() + r3.get_width() / 2., h1 + h2 + h3 / 2., "%.2f" % (h3/(h1+h2+h3)), ha="center", va="center", color="white", fontsize=16, fontweight="bold")
plt.show()

获得的图如下: enter image description here

代码的本质灵感来自https://medium.com/@priteshbgohil/stacked-bar-chart-in-python-ddc0781f7d5f