我有一个数据框
name salary department position
a 25000 x normal employee
b 50000 y normal employee
c 10000 y experienced employee
d 20000 x experienced employee
我想得到如下格式的结果:
dept total salary salary_percentage count_normal_employee count_experienced_employee
x 55000 55000/115000 1 1
y 60000 60000/115000 1 1
答案 0 :(得分:3)
您可以pivot_table
使用fillna
df1
,groupby
使用sum
,将新列total salary
与sum
分开salary
的原始列df2
和merge
的最后一列:
#pivot df, fill NaN by 0
df1 = df.pivot_table(index='department', columns='position', values='name', aggfunc='count').fillna(0).reset_index()
#reset column name - for nicer df
df1.columns.name = None
print df1
department experienced employee normal employee
0 x 1 1
1 y 1 1
#sum by groups by column department and rename column salary
df2 = df.groupby('department')['salary'].sum().reset_index().rename(columns={'salary':'total salary'})
df2['salary_percentage'] = df2['total salary'] / df['salary'].sum()
print df2
department total salary salary_percentage
0 x 45000 0.428571
1 y 60000 0.571429
print pd.merge(df1, df2, on=['department'])
department experienced employee normal employee total salary \
0 x 1 1 45000
1 y 1 1 60000
salary_percentage
0 0.428571
1 0.571429