你好,我有新的分组数据集。 这是结果;
job y
admin. 0 5227
1 1045
blue-collar 0 5208
1 517
entrepreneur 0 755
1 96
housemaid 0 586
1 82
management 0 1507
1 255
retired 0 761
1 331
self-employed 0 759
1 111
services 0 2165
1 260
student 0 364
1 216
technician 0 3434
1 589
unemployed 0 479
1 109
unknown 0 166
1 26
在这种情况下,我想按每项工作的总和将信息绘制成条形图,以获得最重要的工作,这里是我使用的代码,但有错误
import matplotlib.pyplot as plt
plt.figure(figsize=(6,6))
pekerjaan = df_new.groupby(['job','y'])['y'].size().unstack()
pekerjaan.sort_values(by='y',ascending=True).plot(kind='barh',stacked=True)
plt.title('Job')
plt.ylabel('Kind of job')
plt.xlabel('Total')
plt.show()
先谢谢你
答案 0 :(得分:0)
示例数据和导入:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
np.random.seed(25)
n = 100
df_new = pd.DataFrame({
'job': np.random.choice(['admin', 'blue-collar', 'entrepreneur'],
p=[.4, .4, .2], size=n),
'y': np.random.choice([0, 1], size=n)
})
然后在每行中使用 sum
以获取行总数,然后在行总数中使用 sort
:
plt.figure(figsize=(6, 6))
plot_df = df_new.groupby(['job', 'y'])['y'].size().unstack()
plot_df['All'] = plot_df.sum(axis=1)
plot_df = plot_df.sort_values('All')
ax = plot_df.plot(kind='barh', y=[0, 1], stacked=True,
title='Job', xlabel='Kind of Job',
rot=0)
plt.tight_layout()
plt.show()
摘要计数:
plot_df = df_new.groupby(['job', 'y'])['y'].size().unstack()
y 0 1
job
admin 19 17
blue-collar 24 25
entrepreneur 10 5
plot_df
与 All
列:
plot_df['All'] = plot_df.sum(axis=1)
y 0 1 All
job
admin 19 17 36
blue-collar 24 25 49
entrepreneur 10 5 15
sort_values
之后:
plot_df = plot_df.sort_values('All')
y 0 1 All
job
entrepreneur 10 5 15
admin 19 17 36
blue-collar 24 25 49
使用 crosstab
+ margins
的替代方法:
plt.figure(figsize=(6, 6))
plot_df = (
pd.crosstab(df_new['job'], df_new['y'], margins=True)
.iloc[:-1]
.sort_values('All')
)
ax = plot_df.plot(kind='barh', y=[0, 1], stacked=True,
title='Job', xlabel='Kind of Job',
rot=0)
plt.tight_layout()
plt.show()
两者都产生: