我有两个不同行号的表。我想根据两列的内容合并表。然而,问题是我不希望合并时变量的顺序很重要。例如:
Gene1 Gene2 p-value
ARID1A TP53 0.0007
ATM ATR 0.004
merge(Table1, Table2, by = c("Gene1", "Gene2"), all.x = TRUE)
我试过了:
Camera
但问题是它只会合并“ATM'和' ATR'但不是' TP53'和' ARID1A'因为它们的顺序不一样。
有没有办法合并两个表而不考虑列顺序?
答案 0 :(得分:3)
使用sqldf
:
library(sqldf)
sqldf("
SELECT df1.*,
df2.`p.value`
FROM df1, df2
WHERE (df1.Gene1 = df2.Gene1 AND
df1.Gene2 = df2.Gene2) OR
(df1.Gene1 = df2.Gene2 AND
df1.Gene2 = df2.Gene1)")
# Gene1 Gene2 p.value p.value
# 1 TP53 ARID1A 1e-03 7e-04
# 2 ATM ATR 5e-04 4e-03
答案 1 :(得分:1)
我们可以对基因名称进行排序然后合并:
#sort gene names
df1$GeneMin <- pmin(df1$Gene1, df1$Gene2)
df1$GeneMax <- pmax(df1$Gene1, df1$Gene2)
df2$GeneMin <- pmin(df2$Gene1, df2$Gene2)
df2$GeneMax <- pmax(df2$Gene1, df2$Gene2)
# then merge
merge(df1, df2, by = c("GeneMin", "GeneMax"))
# GeneMin GeneMax Gene1.x Gene2.x p.value.x Gene1.y Gene2.y p.value.y
# 1 ARID1A TP53 TP53 ARID1A 1e-03 ARID1A TP53 7e-04
# 2 ATM ATR ATM ATR 5e-04 ATM ATR 4e-03
# tidy up columns, column names
#....
或者我们可以合并两次然后rbind:
# double merge, this might cause unexpected results
rbind(
merge(df1, df2, by = c("Gene1", "Gene2")),
merge(df1, df2, by.x = c("Gene1", "Gene2"), by.y = c("Gene2", "Gene1"))
)
# Gene1 Gene2 p.value.x p.value.y
# 1 ATM ATR 5e-04 4e-03
# 2 TP53 ARID1A 1e-03 7e-04
数据强>
# data
df1 <- read.table(text = "
Gene1 Gene2 p-value
TP53 ARID1A 0.001
ATM ATR 0.0005", header = TRUE, as.is = TRUE)
df2 <- read.table(text = "
Gene1 Gene2 p-value
ARID1A TP53 0.0007
ATM ATR 0.004", header = TRUE, as.is = TRUE)