DataFrame到用户定义的格式

时间:2016-02-16 17:54:52

标签: python pandas dataframe

我有一个数据框

name  salary department              position
   a   25000          x       normal employee
   b   50000          y       normal employee
   c   10000          y  experienced employee
   d   20000          x  experienced employee

我想得到如下格式的结果:

dept  total salary  salary_percentage count_normal_employee      count_experienced_employee
x      55000           55000/115000                 1                              1
y      60000           60000/115000                 1                              1

1 个答案:

答案 0 :(得分:3)

您可以pivot_table使用fillna df1groupby使用sum,将新列total salarysum分开salary的原始列df2merge的最后一列:

#pivot df, fill NaN by 0
df1 = df.pivot_table(index='department', columns='position', values='name', aggfunc='count').fillna(0).reset_index()
#reset column name - for nicer df 
df1.columns.name = None
print df1
  department  experienced employee  normal employee
0          x                     1                1
1          y                     1                1

#sum by groups by column department and rename column salary
df2 = df.groupby('department')['salary'].sum().reset_index().rename(columns={'salary':'total salary'})

df2['salary_percentage'] = df2['total salary'] / df['salary'].sum() 
print df2
  department  total salary  salary_percentage
0          x         45000           0.428571
1          y         60000           0.571429

print pd.merge(df1, df2, on=['department'])
  department  experienced employee  normal employee  total salary  \
0          x                     1                1         45000   
1          y                     1                1         60000   

   salary_percentage  
0           0.428571  
1           0.571429