我的R在DF中有问题。我有一个带基因名称的df1。
V1 V2
GENE A GENE E
GENE B GENE D
GENE C GENE A
GENE D GENE B
GENE E GENE C
和另一个名为df2的数据框如下:
Name ID Symbol
GENE A 1254 AKT
GENE B 1879 POU5F1
GENE C 5689 EGR1
GENE D 2385 JUN
GENE E 5687 MYC
我想要的输出如下:
NameSource SourceID NameTarget TargetID
AKT 1254 MYC 5687
POU5F1 1879 JUN 2385
EGR1 5689 AKT 1254
JUN 2385 POU5F1 1879
MYC 5687 EGR1 5689
我尝试使用以下语法:
genes <- df1[which(df1$V1, df2$Symbol), ]
和:
genes <- df1$V1 %in% df2$Symbol
但由于某种原因,我无法得到我期待的输出。任何人吗?
答案 0 :(得分:4)
我们可以使用match
res <- cbind(df2[-1], df2[match(df1$V2, df2$Name),-1])[c(2,1,4 ,3)]
colnames(res) <- c("NameSource", "SourceID", "NameTarget", "TargetID")
row.names(res) <- NULL
res
# NameSource SourceID NameTarget TargetID
#1 AKT 1254 MYC 5687
#2 POU5F1 1879 JUN 2385
#3 EGR1 5689 AKT 1254
#4 JUN 2385 POU5F1 1879
#5 MYC 5687 EGR1 5689
答案 1 :(得分:3)
与dplyr
和管道
df1 <- data.frame(V1=c("GENE A","GENE B","GENE C","GENE D", "GENE E"), V2=c("GENE E","GENE D","GENE A","GENE B", "GENE C"))
df2 <- data.frame(Name= c("GENE A","GENE B","GENE C","GENE D", "GENE E"),
ID= c(1254,1879,5689,2385,5687),
Symbol= c("AKT", "POU5F1", "EGR1", "JUN", "MYC")
)
library(dplyr)
df1 %>% right_join(df2, by=c("V1"="Name")) %>%
right_join(df2, by=c("V2"="Name")) %>%
arrange(V1) %>%
select(ID.x, Symbol.x, ID.y, Symbol.y) %>%
setNames(c("SourceID", "NameSource", "TargetID", "NameTarget"))
# SourceID NameSource TargetID NameTarget
#1 1254 AKT 5687 MYC
#2 1879 POU5F1 2385 JUN
#3 5689 EGR1 1254 AKT
#4 2385 JUN 1879 POU5F1
#5 5687 MYC 5689 EGR1