我有以下数据集,我想识别每个customer_ID具有最高金额的产品,并将其转换为新列。我还希望每个ID只保留一条记录。
生成数据集的数据:
x <- data.frame(customer_id=c(1,1,1,2,2,2),
product=c("a","b","c","a","b","c"),
amount=c(50,125,100,75,110,150))
实际数据集如下所示:
customer_id product amount
1 a 50
1 b 125
1 c 100
2 a 75
2 b 110
2 c 150
想要的输出应该如下所示:
customer_ID product_b product_c
1 125 0
2 0 150
答案 0 :(得分:2)
我们可以使用tidyverse
执行此操作。按“customer_id”分组后,slice
具有最大“金额”的行,paste
带有前缀('product_')到'product'列(如果需要)和spread
到宽幅
library(dplyr)
library(tidyr)
x %>%
group_by(customer_id) %>%
slice(which.max(amount)) %>%
mutate(product = paste0("product_", product)) %>%
spread(product, amount, fill = 0)
# customer_id product_b product_c
#* <dbl> <dbl> <dbl>
#1 1 125 0
#2 2 0 150
另一种选择是arrange
数据集按'customer_id'和'amount'降序排列,得到基于'customer_id'的distinct
行和'spread to'wide'
arrange(x, customer_id, desc(amount)) %>%
distinct(customer_id, .keep_all = TRUE) %>%
spread(customer_id, amount, fill = 0)
答案 1 :(得分:1)
使用reshape2
包,
library(reshape2)
x1 <- x[!!with(x, ave(amount, customer_id, FUN = function(i) i == max(i))),]
dcast(x1, customer_id ~ product, value.var = 'amount', fill = 0)
# customer_id b c
#1 1 125 0
#2 2 0 150