我有3个文件。我需要获取第一个文件,并且对于每一行,都需要匹配文件2中的第一列。然后从file2中获取相应的别名,并将其与file3(描述或别名列)进行匹配,然后打印OMIM ID。
File1:
**Symbol**
MCL1
ABCB1
BAX
IKZF1
WWOX
BCL2L1
BCL2L11
CCND1
TNFSF10
File2:
**Symbol2 Aliases**
MCL1 MCL1, BCL2 family apoptosis regulator
ABCB1 ATP binding cassette subfamily B member 1
WWOX WW domain containing oxidoreductase
BCL2L1 RB transcriptional corepressor 1
BOK peroxisome proliferator activated receptor gamma
RHOA ras homolog family member A
ABCC1 C-X-C motif chemokine ligand 12
PARP1 poly(ADP-ribose) polymerase 1
BAK1 BRCA1, DNA repair associated
file3:
**description OMIM Aliases**
MCL1, BCL2 family apoptosis regulator 159552 G protein subunit alpha 12
ATP binding cassette subfamily B member 1 171050 matrix metallopeptidase 9
BCL2 associated X, apoptosis regulator 600040 cadherin 1
IKAROS family zinc finger 1 603023 Janus kinase 2
WW domain containing oxidoreductase 605131 ataxin 3
BCL2 like 1 600039 RB transcriptional corepressor 1
BCL2 like 11 603827 transferrin receptor
cyclin D1 168461 C-C motif chemokine ligand 2
TNF superfamily member 10 603598 prostaglandin-endoperoxide synthase 2
Expected result:
**Symbol Symbol1 description/Aliases OMIM**
MCL1 MCL1 MCL1, BCL2 family apoptosis regulator 159552
ABCB1 ABCB1 ATP binding cassette subfamily B member 1 171050
BAX
IKZF1
WWOX WWOX WW domain containing oxidoreductase 605131
BCL2L1 BCL2L1 RB transcriptional corepressor 1 600039
BCL2L11
CCND1
TNFSF10
我使用了merge和inner_join,但是没有达到预期。有什么帮助吗?
答案 0 :(得分:1)
另一种可能性是重命名要合并的相关列,然后将JDK_HOME
与purrr::reduce
(或在基数R dplyr::left_join
中与Reduce
一起使用)
merge
或在基数R
中names(df2) <- c("Symbol", "Description/Aliases")
names(df3) <- c("Description/Aliases", "OMIM", "Aliases")
purrr::reduce(list(df1, df2, df3), dplyr::left_join) %>% dplyr::select(-Aliases)
# Symbol Description/Aliases OMIM
#1 MCL1 MCL1, BCL2 family apoptosis regulator 159552
#2 ABCB1 ATP binding cassette subfamily B member 1 171050
#3 BAX <NA> NA
#4 IKZF1 <NA> NA
#5 WWOX WW domain containing oxidoreductase 605131
#6 BCL2L1 RB transcriptional corepressor 1 NA
#7 BCL2L11 <NA> NA
#8 CCND1 <NA> NA
#9 TNFSF10 <NA> NA
Reduce(function(x, y) merge(x, y, all.x = T), list(df1, df2, df3))
答案 1 :(得分:0)
您的merge
语句中有错误。语法为merge(x, y, by.x, by.y, all)
。因此,您的代码将类似于:
df1 <- merge(file_1, file_2, by.x = "Symbol", by.y = "Symbol2", all.x = TRUE)
df2 <- merge(df1, file_3, by.x = "Aliases", by.y = "description", all.x = TRUE)