数据帧r - 匹配和求和所需的帮助

时间:2018-03-30 12:36:05

标签: r

我匹配R中的2个数据帧,并且当第二个数据帧中存在重复记录时出现问题。此外,我尝试索引匹配,但给出不正确的结果。所以我的数据框是:

V1 <- c("AB1", "AB2", "AB3" ,"AB4" ,"AB5" ,"AB6" ,"AB7","AB8","AB9" ,"AB10")
V3 <- c ("","AB3", "","","","","AB6","","","AB11")
V4 <- c("","","","","","","","","","AB12")


df1 <- data.frame(V1,V3,V4)
df1$V2 <- 0

V5 <- c("AB1","AB2","AB2","AB2", "AB3", "AB4", "AB5", "AB6")
V6 <- c(1,2,2,2,3,4,5,6)
df2 <- data.frame(V5,V6)

我有两个代码,第一个不正确匹配数据,第二个代码可以工作,但它会生成NA。另外,当df2中存在两条记录时,我会尝试返回一个总和(即AB2应该是6而不是2)?任何帮助将不胜感激。

我使用的代码:

df1$V2[match(df2$V5,df1$V1, nomatch=0)] <- df2$V6[match(df1$V1,df2$V5, nomatch = 0)]

df1$V2 <- df2$V6[match(df1$V1,df2$V5)]

2 个答案:

答案 0 :(得分:2)

我们可以与data.table

进行联接
library(data.table)
dfN <- setDT(df2)[, .(V2 = sum(V6)), .(V5)]
setDT(df1)[dfN, V2 := i.V2, on = .(V1 = V5)]

或者将上述两个结合起来

setDT(df1)[setDT(df2)[df1, .(V2 = sum(V6)), 
    on = .(V5= V1), by = .EACHI, nomatch = 0], V2 := i.V2, on = .(V1 = V5)]

答案 1 :(得分:1)

IIUC,这是一个base解决方案:

# Sum of V6 by V5
df2_sum <- aggregate(V6 ~ V5, df2, sum)

# Merge df1 and df2_sum by V1 and V5
merge(df1, df2_sum, by.x = "V1", by.y = "V5")