给定类似于以下的数据集
dat = structure(list(OpportunityId = c("006a000000zLXtZAAW", "006a000000zLXtZAAW",
"006a000000gst", "006a000000gstg", "006a000000gstg",
"006a000000zLXtZAAW"), IsWon = c(1, 1, 1, 1, 1, 1),
sequence = c("LLLML", "LHHHL", "LLLML", "HMLLL", "LLLLL", "LLLLL")),
.Names = c("OpportunityId","IsWon", "sequence"), row.names = c(NA, 6L), class = "data.frame")
dat
如何添加与特定商机ID相关联的每个序列,以便最终看起来像。
oppid sequence
006... LLL, LML, MMM
007... MMM, MML, MMH, LLL, HHH
007... LML, MMM
有什么想法吗?
答案 0 :(得分:2)
我们可以在{OpportunityId'
分组后paste
'序列'
library(data.table)
setDT(dat)[, .(sequence = toString(unique(sequence))) ,
by = .(oppid = OpportunityId)]
答案 1 :(得分:2)
也许aggregate
和unique
的组合可以提供帮助。
aggregate(sequence ~ OpportunityId, dat, unique)
# OpportunityId sequence
#1 006a000000gst LLLML
#2 006a000000gstg HMLLL, LLLLL
#3 006a000000zLXtZAAW LLLML, LHHHL, LLLLL
正如@akrun在评论中所指出的,在这种情况下,序列列存储为列表。
如有必要,可以通过以下方式将sequence
列中的列表转换为字符格式(每行一个字符串):
dat$sequence <- sapply(dat$sequence, paste, collapse=", ")
答案 2 :(得分:1)
使用dplyr
library(dplyr)
dat_new <- dat %>%
group_by(OpportunityId) %>%
summarise(sequence = toString(sequence)) %>%
distinct(.keep_all = TRUE)
输出
# OpportunityId IsWon sequence
# 1 006a000000zLXtZAAW 1 LLLML, LHHHL, LLLLL
# 2 006a000000gst 1 LLLML
# 3 006a000000gstg 1 HMLLL, LLLLL