根据列合并多个文件并打印第n列R

时间:2018-08-28 20:24:59

标签: r join merge

我有3个文件。我需要获取第一个文件,并且对于每一行,都需要匹配文件2中的第一列。然后从file2中获取相应的别名,并将其与file3(描述或别名列)进行匹配,然后打印OMIM ID。

File1:

**Symbol**
MCL1
ABCB1
BAX
IKZF1
WWOX
BCL2L1
BCL2L11
CCND1
TNFSF10

File2:

**Symbol2   Aliases**
MCL1    MCL1, BCL2 family apoptosis regulator
ABCB1   ATP binding cassette subfamily B member 1
WWOX    WW domain containing oxidoreductase
BCL2L1  RB transcriptional corepressor 1
BOK peroxisome proliferator activated receptor gamma
RHOA    ras homolog family member A
ABCC1   C-X-C motif chemokine ligand 12
PARP1   poly(ADP-ribose) polymerase 1
BAK1    BRCA1, DNA repair associated

file3:
**description   OMIM    Aliases**
MCL1, BCL2 family apoptosis regulator   159552  G protein subunit alpha 12
ATP binding cassette subfamily B member 1   171050  matrix metallopeptidase 9
BCL2 associated X, apoptosis regulator  600040  cadherin 1
IKAROS family zinc finger 1 603023  Janus kinase 2
WW domain containing oxidoreductase 605131  ataxin 3
BCL2 like 1 600039  RB transcriptional corepressor 1
BCL2 like 11    603827  transferrin receptor
cyclin D1   168461  C-C motif chemokine ligand 2
TNF superfamily member 10   603598  prostaglandin-endoperoxide synthase 2

Expected result:
**Symbol    Symbol1 description/Aliases OMIM**
MCL1    MCL1    MCL1, BCL2 family apoptosis regulator   159552
ABCB1   ABCB1   ATP binding cassette subfamily B member 1   171050
BAX         
IKZF1           
WWOX    WWOX    WW domain containing oxidoreductase 605131
BCL2L1  BCL2L1  RB transcriptional corepressor 1    600039
BCL2L11         
CCND1           
TNFSF10         

我使用了merge和inner_join,但是没有达到预期。有什么帮助吗?

2 个答案:

答案 0 :(得分:1)

另一种可能性是重命名要合并的相关列,然后将JDK_HOMEpurrr::reduce(或在基数R dplyr::left_join中与Reduce一起使用)

merge

或在基数R

names(df2) <- c("Symbol", "Description/Aliases")
names(df3) <- c("Description/Aliases", "OMIM", "Aliases")

purrr::reduce(list(df1, df2, df3), dplyr::left_join) %>% dplyr::select(-Aliases)
#   Symbol                       Description/Aliases   OMIM
#1    MCL1     MCL1, BCL2 family apoptosis regulator 159552
#2   ABCB1 ATP binding cassette subfamily B member 1 171050
#3     BAX                                      <NA>     NA
#4   IKZF1                                      <NA>     NA
#5    WWOX       WW domain containing oxidoreductase 605131
#6  BCL2L1          RB transcriptional corepressor 1     NA
#7 BCL2L11                                      <NA>     NA
#8   CCND1                                      <NA>     NA
#9 TNFSF10                                      <NA>     NA

样本数据

Reduce(function(x, y) merge(x, y, all.x = T), list(df1, df2, df3))

答案 1 :(得分:0)

您的merge语句中有错误。语法为merge(x, y, by.x, by.y, all)。因此,您的代码将类似于:

df1 <- merge(file_1, file_2, by.x = "Symbol", by.y = "Symbol2", all.x = TRUE)
df2 <- merge(df1, file_3, by.x = "Aliases", by.y = "description", all.x = TRUE)