Question

df['check'] = ((df['id'] == 123) & (df['date1'] >= date1)) | ((df['id'] == 456) & (df['date2'] >= date2))

present = df.groupby(['id', 'month', 'check'])['userid'].nunique().reset_index(name="usercount")

这是我的代码，因此我的预期输出必须每月在usercount列中具有唯一身份用户数由id分组。我在id中使用了month，check和groupby。

根据我的代码的第一行，check列的类型为bool，但是当我从present数据帧中获得输出时，将统计拥有check的用户值是True，谁也有False。

实际上，它应该计算True列中只有check的用户。

在这方面帮助我

enter image description here

Answer 1

您需要按boolean indexing按check列进行过滤，而不是传递给by中的groupby参数：

#first convert datetimes to start of months
df['month'] = df['month'].dt.floor('d') - pd.offsets.MonthBegin(1)
print (df)
   check      month   id userid
0   True 2019-06-01  123      a
1  False 2019-02-01  123      b
2  False 2019-01-01  123      c
3  False 2019-02-01  123      d
4   True 2019-06-01  123      e
5   True 2020-07-01  123      f
6   True 2020-07-01  123      g
7   True 2020-06-01  123      h

print (df[df['check']])
   check      month   id userid
0   True 2019-06-01  123      a
4   True 2019-06-01  123      e
5   True 2020-07-01  123      f
6   True 2020-07-01  123      g
7   True 2020-06-01  123      h

present = (df[df['check']].groupby(['id', 'month'])['userid']
                          .nunique()
                          .reset_index(name="usercount"))
print (present)
    id      month  usercount
0  123 2019-06-01          2
1  123 2020-06-01          1
2  123 2020-07-01          2

在熊猫中使用布尔值的Groupby值

1 个答案: