我有两个data.frames(这里只报告了一个子集,因为它们太大了):
DF1:
"G1" "G2" IL3RA ABCC1 SRSF9 ADAM19 IL22RA2 BIK UROD ALG3 SLC35C2 GGH OR12D3 SEC31A OSBPL3 HIST1H2BK
DF2:
"S1" "S2" "S3" IL3RA 0 0 SRSF9 1 1 A1CF 0 0 A1CF1 1 1 GGH 2 0 HIST1H2BK 0 0 AAK1 0 0
我想要以下输出:
"G1" "S2" "S3" "G2" "S2" "S3" IL3RA 0 0 GGH 2 0 SRSF9 1 1 HIST1H2BK 0 0
我在另一个类似的情况下应用了建议的功能。功能是:
lapply(DF1,函数(x)DF2 [na.omit(匹配(DF2 [[1]],x)),])
令人惊讶的是,在这种情况下,它不起作用。我真的不知道为什么..我正好复制了帖子中标题为:“lop%in the data of data.frame”的帖子,我的新数据却没有。由于DF1和DF2太大,我试图使用集群来拥有更多的内存,假设问题出在可用内存中......但没有。它给出的输出如下:
"S1" "S2" "S3" IL3RA 0 0 SRSF9 1 1 "S1" "S2" "S3" GGH 2 0 AAK1 0 0
有人可以帮我吗?
最佳
乙
答案 0 :(得分:1)
这应该这样做。
df1 <- structure(list(G1 = c("IL3RA", "SRSF9", "IL22RA2", "UROD", "SLC35C2",
"OR12D3", "OSBPL3"), G2 = c("ABCC1", "ADAM19", "BIK", "ALG3",
"GGH", "SEC31A", "HIST1H2BK")), .Names = c("G1", "G2"), class = "data.frame", row.names = c(NA,
-7L))
df2 <- structure(list(S1 = c("IL3RA", "SRSF9", "A1CF", "A1CF1", "GGH",
"HIST1H2BK", "AAK1"), S2 = c(0L, 1L, 0L, 1L, 2L, 0L, 0L), S3 = c(0L,
1L, 0L, 1L, 0L, 0L, 0L)), .Names = c("S1", "S2", "S3"), class = "data.frame", row.names = c(NA,
-7L))
idx1 <- match(df1$G1, df2$S1)
idx1 <- idx1[!is.na(idx1)]
idx2 <- match(df1$G2, df2$S1)
idx2 <- idx2[!is.na(idx2)]
out <- cbind(df2[idx1, ], df2[idx2, ])
> out
S1 S2 S3 S1 S2 S3
1 IL3RA 0 0 GGH 2 0
2 SRSF9 1 1 HIST1H2BK 0 0
修改:使用lapply
out <- lapply(df1, function(x) {
idx <- match(x, df2$S1)
idx <- idx[!is.na(idx)]
df2[idx, ]
})
# now `out` is a list of data.frames
out.f <- do.call(cbind, out)
# they'll be combined by columns
G1.S1 G1.S2 G1.S3 G2.S1 G2.S2 G2.S3
1 IL3RA 0 0 GGH 2 0
2 SRSF9 1 1 HIST1H2BK 0 0