Question

我有这个数据集的df：

customer action    date
1049381  share    9/29/2017
1049381  level_up 10/6/2017
105460   share    9/22/2017
105460   share    9/23/2017
105668   level_up 9/8/2017
105668   share    9/8/2017
105668   level_up 9/18/2017
105668   share    9/18/2017
105668   share    9/20/2017
905669   share    9/25/2017
905669   level_up 9/25/2017

我想统计（汇总）用户在同一天进行“ level_up”和“ share”的情况。像这样：

customer  share_wth_level_up
1049381         0
105460          0
105668          2
905669          1

我从pandas开始，但是找不到解决方案，因为它没有为每一行（唯一的）提供汇总的df。

df.groupby(['customer','date']).size().value_counts()

结果

Answer 1

一种解决方案是使用GroupBy + nunique并测试长度等于2的长度。然后使用GroupBy + sum总计这些实例。

df_grp = df.groupby(['customer', 'date'])['action'].nunique() == 2
res = df_grp.groupby('customer').sum().astype(int)

print(res)

customer
105460     0
105668     2
905669     1
1049381    0
Name: action, dtype: int32

Answer 2

首先使用duplicated过滤df，然后根据客户和日期进行分组，以检查所有实际使用的unique值。

 s=df[df.groupby('customer').date.apply(pd.Series.duplicated,keep=False)].groupby(['customer','date']).action.nunique()
(s[s==2]//2).sum(level=0).reindex(df.customer.unique(),fill_value=0)
Out[166]: 
customer
1049381    0
105460     0
105668     2
905669     1
Name: action, dtype: int64

通过其他列值汇总行-Python / Pandas中的Countif

2 个答案: