虽然关于stackoverflow的问题在某种程度上指定了我要在熊猫数据帧上进行的相同查询,但没有一个能够识别共享相同值的多行。
为解释我的问题,我有一个数据框,上面有关于人/时间的信息,以供他们决定何时使用健身房。 看起来像这样,
,User,Time,Date
0, User 1 ,12:00PM ,10/5/20 (Identical 3 times)
1, User 2 ,12:00PM ,10/5/20 (Identical 3 times)
2, User 3 ,12:00PM ,10/5/20
3, User 1 ,1:00PM ,10/4/20 (Identical 2 times)
4, User 2 ,1:00PM ,10/4/20 (Identical 2 times)
5, User 5 ,1:00PM ,10/4/20
6, User 6 ,1:00PM ,10/4/20
7, User 7 ,12:00PM ,10/4/20
9, User 1 ,11:00AM ,10/4/20 (Identical 1 time)
10, User 2 ,11:00AM ,10/4/20 (Identical 1 time)
11, User 3 ,10:00AM ,10/4/20
12, User 6 ,10:00AM ,10/4/20
13, User 7 ,10:00AM ,10/4/20
我的目标是创建一个数据帧,该数据帧将帧中每一行的“时间”和“日期”按“名称”列进行分组,这将创建与共享相同的注册时间/日期的用户数量另一个用户。 通过在上面的数据集上执行此操作,它应该看起来像-
,User, User, Count of identical gym times
0, User 1 , User 2, 3
1, User 3 , User 1, 1
2, User 3 , User 2, 1
3, User 1 , User 5, 1
4, User 2 , User 5, 1
5, User 2 , User 6, 1
6, User 3 , User 6, 1
7, User 3 , User 7, 1
8, User 4 , User 6, 1
9, User 4 , User 7, 1
我遵循了一些指南,试图计算行相似的次数,
df.groupby('Date').User.nunique()
会返回
Date
2020-08-20 6
2020-08-21 13
2020-08-22 15
2020-08-23 18
2020-08-24 25
2020-08-25 24
2020-08-26 24
2020-08-27 24
2020-08-28 20
2020-08-29 12
2020-08-30 8
这没有帮助。还有其他有关此“查询”的指南吗?
答案 0 :(得分:0)
此输出是否有帮助?
df.groupby(['Time','Date'],as_index=False).agg({'User':lambda x: [item for item in x]})
Time Date User
0 10:00AM 10/4/20 [User3, User6, User7]
1 11:00AM 10/4/20 [User1, User2]
2 12:00PM 10/4/20 [User7]
3 12:00PM 10/5/20 [User1, User2, User3]
4 1:00PM 10/4/20 [User1, User2, User5, User6]