对熊猫进行分组后按总和对值进行排序

时间:2021-07-02 14:01:46

标签: python pandas matplotlib

你好,我有新的分组数据集。 这是结果;

job            y
admin.         0    5227
               1    1045
blue-collar    0    5208
               1     517
entrepreneur   0     755
               1      96
housemaid      0     586
               1      82
management     0    1507
               1     255
retired        0     761
               1     331
self-employed  0     759
               1     111
services       0    2165
               1     260
student        0     364
               1     216
technician     0    3434
               1     589
unemployed     0     479
               1     109
unknown        0     166
               1      26

在这种情况下,我想按每项工作的总和将信息绘制成条形图,以获得最重要的工作,这里是我使用的代码,但有错误

import matplotlib.pyplot as plt
plt.figure(figsize=(6,6))
pekerjaan = df_new.groupby(['job','y'])['y'].size().unstack()
pekerjaan.sort_values(by='y',ascending=True).plot(kind='barh',stacked=True)
plt.title('Job')
plt.ylabel('Kind of job')
plt.xlabel('Total')
plt.show()

先谢谢你

1 个答案:

答案 0 :(得分:0)

示例数据和导入:

import numpy as np
import pandas as pd
from matplotlib import pyplot as plt

np.random.seed(25)

n = 100
df_new = pd.DataFrame({
    'job': np.random.choice(['admin', 'blue-collar', 'entrepreneur'],
                            p=[.4, .4, .2], size=n),
    'y': np.random.choice([0, 1], size=n)
})

然后在每行中使用 sum 以获取行总数,然后在行总数中使用 sort

plt.figure(figsize=(6, 6))
plot_df = df_new.groupby(['job', 'y'])['y'].size().unstack()
plot_df['All'] = plot_df.sum(axis=1)
plot_df = plot_df.sort_values('All')

ax = plot_df.plot(kind='barh', y=[0, 1], stacked=True,
                  title='Job', xlabel='Kind of Job',
                  rot=0)
plt.tight_layout()
plt.show()

摘要计数:

plot_df = df_new.groupby(['job', 'y'])['y'].size().unstack()
y              0   1
job                 
admin         19  17
blue-collar   24  25
entrepreneur  10   5

plot_dfAll 列:

plot_df['All'] = plot_df.sum(axis=1)
y              0   1  All
job                      
admin         19  17   36
blue-collar   24  25   49
entrepreneur  10   5   15

sort_values 之后:

plot_df = plot_df.sort_values('All')
y              0   1  All
job                      
entrepreneur  10   5   15
admin         19  17   36
blue-collar   24  25   49

使用 crosstab + margins 的替代方法:

plt.figure(figsize=(6, 6))
plot_df = (
    pd.crosstab(df_new['job'], df_new['y'], margins=True)
        .iloc[:-1]
        .sort_values('All')
)

ax = plot_df.plot(kind='barh', y=[0, 1], stacked=True,
                  title='Job', xlabel='Kind of Job',
                  rot=0)
plt.tight_layout()
plt.show()

两者都产生:

plot 1