按子类别计算条目并在熊猫中计算日期

时间:2015-02-23 20:43:14

标签: python pandas

我有一个数据框,我正在尝试计算按日期加入群组的人数。所以这个:

individual_id   group_id     date  
   a              1       2000-01-01  
   a              1       2000-01-02  
   a              1       2000-01-03  
   b              1       2000-01-02  
   b              1       2000-01-04  
   c              1       2000-01-03  
   c              1       2000-01-04  
   d              2       2000-01-02  

会变成这样:

individual_id   group_id     date      people_in_group
   a              1       2000-01-01         1
   a              1       2000-01-02         2
   a              1       2000-01-03         3
   b              1       2000-01-02         2
   b              1       2000-01-04         3
   c              1       2000-01-03         3
   c              1       2000-01-04         3
   d              2       2000-01-02         1

1 个答案:

答案 0 :(得分:1)

首先,您可以使用GroupBy查看每个日期加入 的次数 - 即

import pandas as pd
from datetime import datetime
import numpy as np

df = pd.DataFrame({'individual_id':['a','a','a','b','b','c','c','d'],
                   'group_id':[1,1,1,1,1,1,1,2],
                   'date':[datetime(2000,01,01),datetime(2000,01,02),
                           datetime(2000,01,03),datetime(2000,01,05),
                           datetime(2000,01,06),datetime(2000,01,03),
                           datetime(2000,01,04),datetime(2000,01,02)]})

#df = <dataframe of your original data (mocked up above)>
#Add a placeholder 'rowCounter' column, so that the groups are easily counted.
df['rowCounter'] = np.ones(len(df))    
df1  = df.groupby(['individual_id','group_id','date'], as_index=False).sum()

然后,使用cumsum()函数将它们加起来并包括日期

df1['people_in_group'] = df1.groupby(['individual_id','group_id'], as_index=False)['rowCounter'].transform(pd.Series.cumsum)

(可选)删除我们创建的虚拟行计数器列:

df1 = df1.drop('rowCounter',1)

df1的打印现在显示

  individual_id  group_id       date  people_in_group
0             a         1 2000-01-01                1
1             a         1 2000-01-02                2
2             a         1 2000-01-03                3
3             b         1 2000-01-05                1
4             b         1 2000-01-06                2
5             c         1 2000-01-03                1
6             c         1 2000-01-04                2
7             d         2 2000-01-02                1