我匹配R中的2个数据帧,并且当第二个数据帧中存在重复记录时出现问题。此外,我尝试索引匹配,但给出不正确的结果。所以我的数据框是:
V1 <- c("AB1", "AB2", "AB3" ,"AB4" ,"AB5" ,"AB6" ,"AB7","AB8","AB9" ,"AB10")
V3 <- c ("","AB3", "","","","","AB6","","","AB11")
V4 <- c("","","","","","","","","","AB12")
df1 <- data.frame(V1,V3,V4)
df1$V2 <- 0
和
V5 <- c("AB1","AB2","AB2","AB2", "AB3", "AB4", "AB5", "AB6")
V6 <- c(1,2,2,2,3,4,5,6)
df2 <- data.frame(V5,V6)
我有两个代码,第一个不正确匹配数据,第二个代码可以工作,但它会生成NA。另外,当df2中存在两条记录时,我会尝试返回一个总和(即AB2应该是6而不是2)?任何帮助将不胜感激。
我使用的代码:
df1$V2[match(df2$V5,df1$V1, nomatch=0)] <- df2$V6[match(df1$V1,df2$V5, nomatch = 0)]
df1$V2 <- df2$V6[match(df1$V1,df2$V5)]
答案 0 :(得分:2)
我们可以与data.table
library(data.table)
dfN <- setDT(df2)[, .(V2 = sum(V6)), .(V5)]
setDT(df1)[dfN, V2 := i.V2, on = .(V1 = V5)]
或者将上述两个结合起来
setDT(df1)[setDT(df2)[df1, .(V2 = sum(V6)),
on = .(V5= V1), by = .EACHI, nomatch = 0], V2 := i.V2, on = .(V1 = V5)]
答案 1 :(得分:1)
IIUC,这是一个base
解决方案:
# Sum of V6 by V5
df2_sum <- aggregate(V6 ~ V5, df2, sum)
# Merge df1 and df2_sum by V1 and V5
merge(df1, df2_sum, by.x = "V1", by.y = "V5")