我有一个熊猫df,其中包含日期级别数据和
df
MONTH_YEAR class id accnt_id
2012-01 fruits 1 an
2012-01 fruits 2 abc
2012-01 fruits 1 def
2012-02 fruits 2 abc
2012-02 fruits 2 andi
2011-01 vege 1 an
以此类推。
当前查询:
df.groupby(['class', 'MONTH_YEAR']).agg({'id': 'nunique', 'accnt_id': 'nunique'})
需要输出为:
class MONTH_YEAR id accnt_id cumsum_unique_id
fruits 2012-01 2 3 3
fruits 2012-02 1 2 4
vege 2011-01 1 1 1
如何获取cumsum_unique_id?
答案 0 :(得分:1)
您还需要一步来获取cumsum_unique_id
s=df.groupby(['class', 'MONTH_YEAR']).agg({'id': 'nunique', 'accnt_id': 'nunique'})
s1=df.drop_duplicates(['class','accnt_id']).\
groupby(['class', 'MONTH_YEAR']).accnt_id.count().groupby(level=0).cumsum()
s['cumsum_unique_id']=s1
s
Out[39]:
id accnt_id cumsum_unique_id
class MONTH_YEAR
fruits 2012-01 2 3 3
2012-02 1 2 4
vege 2011-01 1 1 1