我有一个数据框,我正在尝试计算按日期加入群组的人数。所以这个:
individual_id group_id date
a 1 2000-01-01
a 1 2000-01-02
a 1 2000-01-03
b 1 2000-01-02
b 1 2000-01-04
c 1 2000-01-03
c 1 2000-01-04
d 2 2000-01-02
会变成这样:
individual_id group_id date people_in_group
a 1 2000-01-01 1
a 1 2000-01-02 2
a 1 2000-01-03 3
b 1 2000-01-02 2
b 1 2000-01-04 3
c 1 2000-01-03 3
c 1 2000-01-04 3
d 2 2000-01-02 1
答案 0 :(得分:1)
首先,您可以使用GroupBy查看每个日期加入 的次数 - 即
import pandas as pd
from datetime import datetime
import numpy as np
df = pd.DataFrame({'individual_id':['a','a','a','b','b','c','c','d'],
'group_id':[1,1,1,1,1,1,1,2],
'date':[datetime(2000,01,01),datetime(2000,01,02),
datetime(2000,01,03),datetime(2000,01,05),
datetime(2000,01,06),datetime(2000,01,03),
datetime(2000,01,04),datetime(2000,01,02)]})
#df = <dataframe of your original data (mocked up above)>
#Add a placeholder 'rowCounter' column, so that the groups are easily counted.
df['rowCounter'] = np.ones(len(df))
df1 = df.groupby(['individual_id','group_id','date'], as_index=False).sum()
然后,使用cumsum()
函数将它们加起来并包括日期
df1['people_in_group'] = df1.groupby(['individual_id','group_id'], as_index=False)['rowCounter'].transform(pd.Series.cumsum)
(可选)删除我们创建的虚拟行计数器列:
df1 = df1.drop('rowCounter',1)
df1的打印现在显示
individual_id group_id date people_in_group
0 a 1 2000-01-01 1
1 a 1 2000-01-02 2
2 a 1 2000-01-03 3
3 b 1 2000-01-05 1
4 b 1 2000-01-06 2
5 c 1 2000-01-03 1
6 c 1 2000-01-04 2
7 d 2 2000-01-02 1