从.xlsx匹配并列出R中的列术语

时间:2018-08-03 11:27:20

标签: r excel

嗨,我是R的新手,我需要匹配.xlsx列中的术语以获取三个.xlsx之间的匹配数据列表。文件中的数据是这样的:

来自one.xlsx:

OneID   NameOne
ACR019  Acropectoral Syndrome
ACR020  Acropectorovertebral
GNT015  Genital Dwarfism
ACR023  Acral Dysostosis Dyserythropoiesis Syndrome

来自two.xlsx:

TwoID   TwoName
607907  DERMATOFIBROSARCOMA PROTUBERANS
304730  DERMOIDS OF CORNEA
605967  ACROPECTORAL SYNDROME
102510  ACROPECTOROVERTEBRAL

来自three.xlsx:

ThreeID ThreeName
OM85203 Acropectoral syndrome
OM67092 Dermoids cornea
OM76580 Acardia
OM45632 Hypertryptophanemia

.xlsx中的最终结果文件必须如下所示:

OneID  NameOne                TwoID  TwoName                 ThreeID ThreeName
ACR019 Acropectoral Syndrome  605967 ACROPECTORAL SYNDROME   OM85203 Acropectoral syndrome
ACR020 Acropectorovertebral   102510 ACROPECTOROVERTEBRAL    -
-                             304730  DERMOIDS OF CORNEA     OM67092 Dermoids cornea

非常感谢,欢迎提出任何建议或帮助编写代码。

1 个答案:

答案 0 :(得分:0)

那又怎么样:由于您唯一的公共字段是各种数据集中的名称,我们必须使用它们作为连接各种.xlsx的键,在进行一些小的转换之后(通常恕我直言,使用描述作为键不是一个好主意) ,但在这种情况下我们不能做任何不同),使用merge()函数。

导入三个MSExcel文件后,您可以执行以下操作:

# first your data (fake)
one <- data.frame(OneID=c('ACR019','ACR020','GNT015','ACR023'),
                      NameOne = c('Acropectoral Syndrome','Acropectorovertebral','Genital Dwarfism','Acral Dysostosis Dyserythropoiesis Syndrome'))

two <- data.frame(OneID=c('A607907','304730','605967','102510'),
                       NameTwo = c('DERMATOFIBROSARCOMA PROTUBERANS','DERMOIDS OF CORNEA','ACROPECTORAL SYNDROME','ACROPECTOROVERTEBRAL'))


three <-data.frame(OneID=c('OM85203','OM67092','OM76580','OM45632'),
                       NameThree = c('Acropectoral syndrome','Dermoids cornea','Acardia','Hypertryptophanemia'))

# then, to have uniques keys, you can put all of them as upper cases to create ids:
    one$ID <- toupper(one$NameOne)
    two$ID <- toupper(two$NameTwo)
    three$ID <- toupper(three$NameThree)

# after that, you can merge the dataframes:    
merged <- merge(merge(one,two, by ='ID', all = TRUE),three, by ='ID', all = TRUE)

#lastly, you give them the names you want (to columns)    
colnames(merged) <- c('ID', 'OneID','NameOne','TwoID','NameTwo','ThreeID','NameThree')

# here the results   
merged

> merged
                                           ID  OneID                                     NameOne   TwoID                         NameTwo
1                                     ACARDIA   <NA>                                        <NA>    <NA>                            <NA>
2 ACRAL DYSOSTOSIS DYSERYTHROPOIESIS SYNDROME ACR023 Acral Dysostosis Dyserythropoiesis Syndrome    <NA>                            <NA>
3                       ACROPECTORAL SYNDROME ACR019                       Acropectoral Syndrome  605967           ACROPECTORAL SYNDROME
4                        ACROPECTOROVERTEBRAL ACR020                        Acropectorovertebral  102510            ACROPECTOROVERTEBRAL
5             DERMATOFIBROSARCOMA PROTUBERANS   <NA>                                        <NA> A607907 DERMATOFIBROSARCOMA PROTUBERANS
6                             DERMOIDS CORNEA   <NA>                                        <NA>    <NA>                            <NA>
7                          DERMOIDS OF CORNEA   <NA>                                        <NA>  304730              DERMOIDS OF CORNEA
8                            GENITAL DWARFISM GNT015                            Genital Dwarfism    <NA>                            <NA>
9                         HYPERTRYPTOPHANEMIA   <NA>                                        <NA>    <NA>                            <NA>
  ThreeID             NameThree
1 OM76580               Acardia
2    <NA>                  <NA>
3 OM85203 Acropectoral syndrome
4    <NA>                  <NA>
5    <NA>                  <NA>
6 OM67092       Dermoids cornea
7    <NA>                  <NA>
8    <NA>                  <NA>
9 OM45632   Hypertryptophanemia