Question

我正在尝试确定每家商店每周唯一客户的数量。

我有一段代码可以完成这项任务，但制表并不是我想要的。

我有下表：

store   week    customer_ID
1          1    1
1          1    1
1          1    2
1          2    1
1          2    2
1          2    3
2          1    1
2          1    1
2          1    2
2          2    2
2          2    3
2          2    3

因此，每周我都要计算有多少独特客户。

例如，假设客户1在第1周访问过，然后在第2周重新访问，这不会被视为唯一访问。

如果该客户在第1周或任何其他周访问了商店2。然后，这将被视为商店二的独特访问。

结果如下：

store   week    unique Customers
1           1   2
1           2   1
2           1   2
2           2   1

我使用了以下但不正确的

agg <-  aggregate(data=df, customer_ID~ week+store, function(x) length(unique(x)))

structure(list(store = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L, 2L, 2L), week = c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 
2L, 2L), customer_ID = c(1L, 1L, 2L, 1L, 2L, 3L, 1L, 1L, 2L, 
2L, 3L, 3L)), .Names = c("store", "week", "customer_ID"), class = "data.frame", row.names = c(NA, 
-12L))

Answer 1

这是一个基本的R方法。我们的想法是将数据拆分为data.frames列表，每个商店一个。假设观察按周排序，则删除重复的客户ID观察。使用您的函数聚合子集data.frame。然后do.call和rbind将结果放入单个data.frame：

do.call(rbind, lapply(split(df, df$store),
                      function(i) aggregate(data=i[!duplicated(i$customer_ID),],
                                            customer_ID ~ week+store, length)))
    week store customer_ID
1.1    1     1           2
1.2    2     1           1
2.1    1     2           2
2.2    2     2           1

要确保在尝试此操作之前正确订购了您的data.frame，您可以使用order：

df <- df[order(df$store, df$week), ]

如果感兴趣的话，我也会提出一个data.table解决方案。

库（data.table） setDT（DF）

df[df[, !duplicated(customer_ID), by=store]$V1, 
   .(newCust=length(customer_ID)), by=.(store, week)]
   store week newCust
1:     1    1       2
2:     1    2       1
3:     2    1       2
4:     2    2       1

此方法使用逻辑向量df[, !duplicated(customer_ID), by=store]$V1按商店将数据子集化为唯一ID，然后按商店周计算新客户的唯一数量。

R中两组的唯一实例

1 个答案: