Question

customer_id  transaction_id    month  year 
          1    3                7     2014
          1    4                7     2014
          2    5                7     2014
          2    6                8     2014
          1    7                8     2014
          3    8                9     2015
          1    9                9     2015
          4    10               9     2015
          5    11               9     2015
          2    12               9     2015

我对R基础非常熟悉。任何帮助将不胜感激。

预期输出应如下所示：

month   year  number_unique_customers_added
 7      2014     2
 8      2014     0
 9      2015     3

在2014年的第7个月和第7个月中，仅存在customer_id 1和2，因此添加的客户数为2。在2014年的第8个月和2014年，没有添加新的客户ID。因此，在此期间应添加零个客户。最终在2015年和第9个月，添加了新的customer_id 3,4和5。因此，在此期间增加的新客户数量为3。

Answer 1

使用data.table：

require(data.table)

dt[, .SD[1,], by = customer_id][, uniqueN(customer_id), by = .(year, month)]

说明：我们首先删除每个客户的所有后续交易（当她是“新客户”时，我们会对第一个客户感兴趣），然后按年份和月份的每种组合计算唯一客户。

Answer 2

我们首先使用dplyr创建一列来指示客户是否重复，然后我们group_by month和year对每个客户的新客户进行计数组。

library(dplyr)
df %>%
  mutate(unique_customers = !duplicated(customer_id)) %>%
  group_by(month, year) %>%
  summarise(unique_customers = sum(unique_customers))

#  month  year unique_customers
#  <int> <int>            <int>
#1     7  2014                2
#2     8  2014                0
#3     9  2015                3

查找每月增加的客户数量

2 个答案: