我试图基于count
pandas
中的两列返回正在运行的df
。
对于下面的df
,我尝试根据Column 'Event'
和Column 'Who'
确定计数。
import pandas as pd
import numpy as np
d = ({
'Event' : ['A','B','E','','C','B','B','B','B','E','C','D'],
'Space' : ['X1','X1','X2','','X3','X3','X3','X4','X3','X2','X2','X1'],
'Who' : ['Home','Home','Even','Out','Home','Away','Home','Away','Home','Even','Away','Home']
})
d = pd.DataFrame(data = d)
我尝试了以下方法。
df = d.groupby(['Event', 'Who'])['Space'].count().reset_index(name="count")
哪个产生这个:
Event Who count
0 Out 1
1 A Home 1
2 B Away 2
3 B Home 3
4 C Away 1
5 C Home 1
6 D Home 1
7 E Even 2
但是我希望它是一个连续的计数而不是总数。
可以df = d.groupby(['Event', 'Who'['Space'].count().reset_index(name="count")
进行修改以过滤其他约束,还是必须是mask
函数或类似函数?
所以我的预期输出是:
A_Away A_Home B_Away B_Home C_Away C_Home D_Away D_Home Event Space Who
0 1 A X1 Home
1 B X1 Home
2 E X2 Even
3 Out
4 1 C X3 Home
5 1 B X3 Away
6 1 B X3 Home
7 B X4 Away
8 2 B X3 Home
9 2 E X2 Even
10 1 C X2 Away
11 1 D X1 Home
因此计数被添加到该行。不是整个数据集的总数。
答案 0 :(得分:1)
以下是获得结果所需的步骤:
groupby
和cumcount
unstack
pd.concat
将结果与原始结果连接起来
# set the index
v = df.set_index(['Who', 'Event'], append=True)['Space']
# assign `v` the values for the cumulative count
v[:] = df.groupby(['Event', 'Who']).cumcount().add(1)
# reshape `v`
v = v.unstack([1, 2], fill_value='')
# fix your headers
v.columns = v.columns.map('{0[1]}_{0[0]}'.format)
# concatenate the result
pd.concat([v.loc[:, ~v.columns.str.contains('Out')], df], 1)
A_Home B_Home E_Even C_Home B_Away C_Away D_Home Event Space Who
0 1 A X1 Home
1 1 B X1 Home
2 1 E X2 Even
3 Out
4 1 C X3 Home
5 1 B X3 Away
6 2 B X3 Home
7 2 B X4 Away
8 3 B X3 Home
9 2 E X2 Even
10 1 C X2 Away
11 1 D X1 Home