Question

我正在尝试向现有数据框添加列，以便该列定义每个用户购买的不同产品的数量。玩具的例子是

Customer    Product
1           Chocolate
1           Candy
1           Soda
2           Chocolate
2           Chocolate
2           Chocolate
3           Insulin
3           Candy

输出应该是

Customer    Product     #Products
1           Chocolate   3
1           Candy       3
1           Soda        3
2           Chocolate   1
2           Chocolate   1
2           Chocolate   1
3           Insulin     2
3           Candy       2

我想在没有for循环的情况下这样做，因为我有数百万行，而且需要永远。我已经使用了data.table和其他方法来获取每个客户的产品数量，但我不知道如何轻松地将其作为列添加到现有数据框中。

提前致谢！

Answer 1

在基础R中，我建议ave：

within(mydf, {
    count = ave(Product, Customer, FUN = function(x) length(unique(x)))
})
##   Customer   Product count
## 1        1 Chocolate     3
## 2        1     Candy     3
## 3        1      Soda     3
## 4        2 Chocolate     1
## 5        2 Chocolate     1
## 6        2 Chocolate     1
## 7        3   Insulin     2
## 8        3     Candy     2

你也可以试试“data.table”包：

library(data.table)
as.data.table(mydf)[, count := length(unique(Product)), by = Customer][]
##    Customer   Product count
## 1:        1 Chocolate     3
## 2:        1     Candy     3
## 3:        1      Soda     3
## 4:        2 Chocolate     1
## 5:        2 Chocolate     1
## 6:        2 Chocolate     1
## 7:        3   Insulin     2
## 8:        3     Candy     2

Answer 2

你应该对这样的事情很好（假设df是你的数据）：

df.agr=aggregate(Product~Customer,data=df, FUN=function(x) length(unique(x)))
df=cbind(df, Count=apply(df, MARGIN=1, FUN=function(r) df.agr$Product[match(r[1],df.agr$Customer)]))

它不会快速闪耀，但绝对比快速快。

Answer 3

我使用plyr来处理涉及split-apply-combine的任何事情。在这种情况下，我们将数据拆分为Customer并在Product上应用长度唯一函数，然后合并结果

require(plyr)
ddply(df, .(Customer), transform, num.products = length(unique(Product)))

  Customer   Product num.products
1        1 Chocolate            3
2        1     Candy            3
3        1      Soda            3
4        2 Chocolate            1
5        2 Chocolate            1
6        2 Chocolate            1
7        3   Insulin            2
8        3     Candy            2

奖励，以防您想要更小的摘要数据框。

ddply(df, .(Customer), summarize, num.products = length(unique(Product)))

  Customer num.products
1        1            3
2        2            1
3        3            2

在R中创建一个包含出现次数的列

3 个答案: