我目前正在从事的项目中,我大约有多个CSV文件,需要通过匹配客户ID来匹配人们参加的所有活动组合。他们去过多次活动。我需要用逗号分隔的新列中匹配的事件。有多个数据集需要比较。由于数据量大,excel中的Vlookup被冻结。我如何在R中做到这一点?我已经安装了tidyverse。有什么建议吗?
List 1 (one CSV file) List 2 (second CSV file)
LastName FirstName CustID Event LastName FirstName CustID Event
Robson Jonson 23019 NP5 Robson Jonson 23019 GRT2
Robson Jonson 23019 RTE3
Result Needed
LastName FirstName CustID Matched Events
Rob Jonson 23019 NPS, GRT2, RTE3
答案 0 :(得分:0)
您的数据集具有相同的结构,因此可以与rbind
按行组合:
lst1 <- read.table(header=TRUE, stringsAsFactors=FALSE, text='
LastName FirstName CustID Event
Robson Jonson 23019 NP5
Robson Jonson 23019 RTE3')
lst2 <- read.table(header=TRUE, stringsAsFactors=FALSE, text='
LastName FirstName CustID Event
Robson Jonson 23019 GRT2')
lst <- rbind(lst1, lst2)
然后您需要一个聚合:
aggregate(lst$Event,
list(LastName = lst$LastName, FirstName = lst$FirstName, CustID = lst$CustID),
paste, collapse=", ")
# LastName FirstName CustID x
# 1 Robson Jonson 23019 NP5, RTE3, GRT2