pandas groupby id和month

时间:2017-05-07 21:42:02

标签: python pandas dataframe

data=pd.read_csv(path,sep='\t',names=['logtime','dt','uid'])
df=DataFrame(data)
uid=df['uid']
dt=sorted(df['dt'])
df1=pd.Series(uid,name='uid')
df2=pd.Series(dt,name='dt')
df=pd.concat([df1,df2],axis=1)
df= df.groupby('uid',as_index=False).agg(lambda x:x.tolist())
现在是我的代码 这是我在使用id

创建组之前的数据帧的演示示例
id   dt        
a   2012-01-01
a   2012-01-01
a   2012-01-02
b   2012-01-01
b   2012-02-01
c   2012-02-02 
...
ds  2013-03-01
zbd 2013-03-28

我希望按月分组并识别ID并制作新的col时间并计算

 dt     id   times                             count         
2012-01  a  2012-01-01,2012-01-01,2012-01-02   3
         b  2012-01-01                         1
2012-02  b  2012-02-01                         1 
         c  2012-02-02                         1
       ...
2013-03  ds 2013-03-01                         1
         zbd 2013-03-28                        1

1 个答案:

答案 0 :(得分:1)

In [84]: (df.groupby([df['dt'].dt.strftime('%Y-%m'), 'id'])['dt']
    ...:    .agg([lambda x: ','.join(x.astype(str)), 'size'])
    ...:    .rename(columns={'<lambda>':'times', 'size':'count'})
    ...:    .reset_index()
    ...: )
    ...:
Out[84]:
        dt   id                             times  count
0  2012-01    a  2012-01-01,2012-01-01,2012-01-02      3
1  2012-01    b                        2012-01-01      1
2  2012-02    b                        2012-02-01      1
3  2012-02    c                        2012-02-02      1
4  2013-03   ds                        2013-03-01      1
5  2013-03  zbd                        2013-03-28      1