大熊猫的分组依据聚集和特定条件

时间:2020-02-09 09:50:46

标签: pandas pandas-groupby

我有一个如下所示的数据框

Sector      Plot       Usage        Status
SE1         1          Garden       Constructed
SE1         2          School       Constructed
SE1         3          Garden       Not_Constructed
SE1         4          School       Constructed
SE1         5          Garden       Not_Constructed
SE1         6          School       Not_Constructed
SE2         1          School       Constructed
SE2         2          School       Constructed
SE2         3          Garden       Constructed
SE2         4          School       Constructed
SE2         5          School       Not_Constructed
SE2         6          School       Not_Constructed

从上面我想在下面的数据帧中准备

预期输出:

Sector  N_of_Garden_Const  N_of_School_Const   N_of_Garden_Not_Const  N_of_School_Not_Const 
SE1     1                  2                   2                      1 
SE2     1                  3                   0                      2

哪里 N_of_Garden_Const =建造的花园数量

N_of_School_Const =所建学校的数量

N_of_Garden_Not_Const =未建花园的数量

N_of_School_Not_Const =未建学校的数量

2 个答案:

答案 0 :(得分:3)

crosstabMultiIndex in columns展平map一起使用:

df = pd.crosstab(df['Sector'], [df['Status'], df['Usage']])
df.columns = df.columns.map('N_of_{0[1]}_{0[0]}_Const'.format)
df = df.reset_index()
print (df)

  Sector  N_of_Garden_Constructed_Const  N_of_School_Constructed_Const  \
0    SE1                              1                              2   
1    SE2                              1                              3   

   N_of_Garden_Not_Constructed_Const  N_of_School_Not_Constructed_Const  
0                                  2                                  1  
1                                  0                                  2  

另一个带有DataFrame.pivot_table并被f-string展平的想法:

df = df.pivot_table(index='Sector', 
                    columns=['Status','Usage'], 
                    aggfunc='size', 
                    fill_value=0)
df.columns = df.columns.map(lambda x: f'N_of_{x[1]}_{x[0]}_Const')
df = df.reset_index()
print (df)
  Sector  N_of_Garden_Constructed_Const  N_of_School_Constructed_Const  \
0    SE1                              1                              2   
1    SE2                              1                              3   

   N_of_Garden_Not_Constructed_Const  N_of_School_Not_Constructed_Const  
0                                  2                                  1  
1                                  0                                  2  

答案 1 :(得分:2)

我希望使用groupby列为UNSTACKING的方法。检查以下代码:

dfGrouped = df.groupby(['Sector', 'Usage', 'Status'])['Plot'].count().unstack([-2, -1])
dfGrouped.columns = ['_'.join(col).strip() for col in dfGrouped.columns.values]
dfGrouped.fillna(0, inplace = True)

输出:

        Garden Constructed  Garden Not_Constructed  School Constructed  School Not_Constructed
Sector
SE1                    1.0                     2.0                 2.0                     1.0
SE2                    1.0                     0.0                 3.0                     2.0