我有一个看起来像的数据框。每个object_id代表不同的客户。
date objectId
15/07/18 "__gb5c9e15dfc004930b8ac9d5d1df1880e"
16/07/18 "__g0b2abb9da5d646eb930c1ce9bb6df5ef"
16/07/18 "__c5ff64e5448c44fabe26e88bc0e41497"
17/07/18 "__c7b0a5824a914d7198a328cdf35c95bf"
18/07/18 "__8929216e8d534569ae6fd6701c92fc4c"
19/07/18 "__gec079853a06748a79b4d101713c1e21d"
19/07/18 "__d7f24fa5909b43f4a5282877ed4eed3e"
19/07/18 "__ga523090706304454ba581d79f366816a"
19/07/18 "__d409d75e4207409b8ea030f69b70bf83"
19/07/18 "-g940dc0277b7f46c8b7d8de195a8fd975"
20/07/18 "__d7f24fa5909b43f4a5282877ed4eed3e"
20/07/18 "__ga523090706304454ba581d79f366816a"
21/07/18 "__d409d75e4207409b8ea030f69b70bf83"
21/07/18 "-g940dc0277b7f46c8b7d8de195a8fd975"
我想计算每个客户来多少天。我尝试过
df.groupby(['objectId'])['date'].count().
这给了我客户访问应用程序的总次数,与客户访问应用程序的天数没什么不同。
答案 0 :(得分:1)
您可以将GroupBy
与nunique
结合使用:
res = df.groupby('objectId')['date'].nunique()
print(res)
objectId
-g940dc0277b7f46c8b7d8de195a8fd975 2
__8929216e8d534569ae6fd6701c92fc4c 1
__c5ff64e5448c44fabe26e88bc0e41497 1
__c7b0a5824a914d7198a328cdf35c95bf 1
__d409d75e4207409b8ea030f69b70bf83 2
__d7f24fa5909b43f4a5282877ed4eed3e 2
__g0b2abb9da5d646eb930c1ce9bb6df5ef 1
__ga523090706304454ba581d79f366816a 2
__gb5c9e15dfc004930b8ac9d5d1df1880e 1
__gec079853a06748a79b4d101713c1e21d 1
Name: date, dtype: int64