我有两个数据框,比如df1和df2。现在我想根据df1和df2之间多列的匹配来子集df2。
e.g
df1
A B #column names, rows in df1 are unique, A1,B1 etc are characters
A1 B1
A2 B2
......
df2
C D E F G
A1 B1 E1 F1 G1
A2 B2 E2 .......
A1 B2 E3 .......
A1 B1 E4 .......
A2 B1 E5 .......
这里我想将df1中的列A和B与df2中的列C和D相匹配,并构造一个新的数据帧df3,其中df3的每一行存储df2的行索引,其中匹配发生。 对于我的例子,它应该是
df3
c(1,4)
c(2)
最初我在考虑粘贴字符并进行字符串比较以进行匹配,但我怀疑这不是有效的方法来做到这一点,还有更好的想法吗?
答案 0 :(得分:0)
这是否满足您的需求?
df1 <- data.frame(A = c("A1", "A2"),
B = c("B1","B2"))
df2 <- data.frame(C = c("A1", "A2", "A1", "A1", "A2"),
D = c("B1", "B2", "B2", "B1", "B1"),
E = rnorm(5))
df2$row <- 1:nrow(df2)
df2
m <- merge(df1, df2, by.x = c("A","B"),
by.y = c("C","D"),
all.x = T, sort = FALSE)
res <- aggregate(row ~ A +B , data=m, paste, sep ="", collapse = ",")
sapply(res, class)
答案 1 :(得分:0)
如果您在制作大量数据时,我认为我的答案不是最有效的方法。如果我只是写一个原型来快速回答,我会合并它们。
df1<-data.frame(A=c("A1","A2"),B=c("B1","B2"))
df2<-data.frame(C=c("A1","A2","A1","A1","A2"),D=c("B1","B2","B2","B1","B1"))
names(df1)<-c("C","D")
df1$is_df1<-"Y"
df2$rownumber<-c(1:nrow(df2))
z<-merge(df2,df1,all.x=TRUE)
do.call(rbind,lapply(split(z,paste(z$C,z$D)),function(x)paste(x$rownumber,collapse=",")))