返回熊猫df

时间:2018-06-25 04:23:18

标签: python pandas dataframe group-by pandas-groupby

我试图基于count pandas中的两列返回正在运行的df

对于下面的df,我尝试根据Column 'Event'Column 'Who'确定计数。

import pandas as pd
import numpy as np

d = ({
    'Event' : ['A','B','E','','C','B','B','B','B','E','C','D'],
    'Space' : ['X1','X1','X2','','X3','X3','X3','X4','X3','X2','X2','X1'],
    'Who' : ['Home','Home','Even','Out','Home','Away','Home','Away','Home','Even','Away','Home']
    })

d = pd.DataFrame(data = d)

我尝试了以下方法。

df = d.groupby(['Event', 'Who'])['Space'].count().reset_index(name="count")

哪个产生这个:

  Event   Who  count
0         Out      1
1     A  Home      1
2     B  Away      2
3     B  Home      3
4     C  Away      1
5     C  Home      1
6     D  Home      1
7     E  Even      2

但是我希望它是一个连续的计数而不是总数。

可以df = d.groupby(['Event', 'Who'['Space'].count().reset_index(name="count")进行修改以过滤其他约束,还是必须是mask函数或类似函数?

所以我的预期输出是:

   A_Away A_Home B_Away B_Home C_Away C_Home D_Away D_Home Event Space Who                       
0              1                                            A    X1    Home  
1                                                           B    X1    Home  
2                                                           E    X2    Even  
3                                                                      Out  
4                                          1                C    X3    Home  
5                     1                                     B    X3    Away  
6                            1                              B    X3    Home  
7                                                           B    X4    Away  
8                     2                                     B    X3    Home  
9                            2                              E    X2    Even  
10                                  1                       C    X2    Away  
11                                                1         D    X1    Home  

因此计数被添加到该行。不是整个数据集的总数。

1 个答案:

答案 0 :(得分:1)

以下是获得结果所需的步骤:

  1. 准备“谁”和“事件”作为索引
  2. 使用groupbycumcount
  3. 获取组的累积计数。
  4. 使用unstack
  5. 将您的DataFrame重塑/旋转/取消堆叠为表格格式
  6. 修复列标题
  7. 使用pd.concat将结果与原始结果连接起来

# set the index
v = df.set_index(['Who', 'Event'], append=True)['Space']
# assign `v` the values for the cumulative count 
v[:] = df.groupby(['Event', 'Who']).cumcount().add(1)    
# reshape `v`
v = v.unstack([1, 2], fill_value='')
# fix your headers
v.columns = v.columns.map('{0[1]}_{0[0]}'.format)    
# concatenate the result
pd.concat([v.loc[:, ~v.columns.str.contains('Out')], df], 1)

   A_Home B_Home E_Even C_Home B_Away C_Away D_Home Event Space   Who
0       1                                               A    X1  Home
1              1                                        B    X1  Home
2                     1                                 E    X2  Even
3                                                                 Out
4                            1                          C    X3  Home
5                                   1                   B    X3  Away
6              2                                        B    X3  Home
7                                   2                   B    X4  Away
8              3                                        B    X3  Home
9                     2                                 E    X2  Even
10                                         1            C    X2  Away
11                                                1     D    X1  Home