嗨,我是R的新手,我需要匹配.xlsx列中的术语以获取三个.xlsx之间的匹配数据列表。文件中的数据是这样的:
来自one.xlsx:
OneID NameOne
ACR019 Acropectoral Syndrome
ACR020 Acropectorovertebral
GNT015 Genital Dwarfism
ACR023 Acral Dysostosis Dyserythropoiesis Syndrome
来自two.xlsx:
TwoID TwoName
607907 DERMATOFIBROSARCOMA PROTUBERANS
304730 DERMOIDS OF CORNEA
605967 ACROPECTORAL SYNDROME
102510 ACROPECTOROVERTEBRAL
来自three.xlsx:
ThreeID ThreeName
OM85203 Acropectoral syndrome
OM67092 Dermoids cornea
OM76580 Acardia
OM45632 Hypertryptophanemia
.xlsx中的最终结果文件必须如下所示:
OneID NameOne TwoID TwoName ThreeID ThreeName
ACR019 Acropectoral Syndrome 605967 ACROPECTORAL SYNDROME OM85203 Acropectoral syndrome
ACR020 Acropectorovertebral 102510 ACROPECTOROVERTEBRAL -
- 304730 DERMOIDS OF CORNEA OM67092 Dermoids cornea
非常感谢,欢迎提出任何建议或帮助编写代码。
答案 0 :(得分:0)
那又怎么样:由于您唯一的公共字段是各种数据集中的名称,我们必须使用它们作为连接各种.xlsx的键,在进行一些小的转换之后(通常恕我直言,使用描述作为键不是一个好主意) ,但在这种情况下我们不能做任何不同),使用merge()
函数。
导入三个MSExcel文件后,您可以执行以下操作:
# first your data (fake)
one <- data.frame(OneID=c('ACR019','ACR020','GNT015','ACR023'),
NameOne = c('Acropectoral Syndrome','Acropectorovertebral','Genital Dwarfism','Acral Dysostosis Dyserythropoiesis Syndrome'))
two <- data.frame(OneID=c('A607907','304730','605967','102510'),
NameTwo = c('DERMATOFIBROSARCOMA PROTUBERANS','DERMOIDS OF CORNEA','ACROPECTORAL SYNDROME','ACROPECTOROVERTEBRAL'))
three <-data.frame(OneID=c('OM85203','OM67092','OM76580','OM45632'),
NameThree = c('Acropectoral syndrome','Dermoids cornea','Acardia','Hypertryptophanemia'))
# then, to have uniques keys, you can put all of them as upper cases to create ids:
one$ID <- toupper(one$NameOne)
two$ID <- toupper(two$NameTwo)
three$ID <- toupper(three$NameThree)
# after that, you can merge the dataframes:
merged <- merge(merge(one,two, by ='ID', all = TRUE),three, by ='ID', all = TRUE)
#lastly, you give them the names you want (to columns)
colnames(merged) <- c('ID', 'OneID','NameOne','TwoID','NameTwo','ThreeID','NameThree')
# here the results
merged
> merged
ID OneID NameOne TwoID NameTwo
1 ACARDIA <NA> <NA> <NA> <NA>
2 ACRAL DYSOSTOSIS DYSERYTHROPOIESIS SYNDROME ACR023 Acral Dysostosis Dyserythropoiesis Syndrome <NA> <NA>
3 ACROPECTORAL SYNDROME ACR019 Acropectoral Syndrome 605967 ACROPECTORAL SYNDROME
4 ACROPECTOROVERTEBRAL ACR020 Acropectorovertebral 102510 ACROPECTOROVERTEBRAL
5 DERMATOFIBROSARCOMA PROTUBERANS <NA> <NA> A607907 DERMATOFIBROSARCOMA PROTUBERANS
6 DERMOIDS CORNEA <NA> <NA> <NA> <NA>
7 DERMOIDS OF CORNEA <NA> <NA> 304730 DERMOIDS OF CORNEA
8 GENITAL DWARFISM GNT015 Genital Dwarfism <NA> <NA>
9 HYPERTRYPTOPHANEMIA <NA> <NA> <NA> <NA>
ThreeID NameThree
1 OM76580 Acardia
2 <NA> <NA>
3 OM85203 Acropectoral syndrome
4 <NA> <NA>
5 <NA> <NA>
6 OM67092 Dermoids cornea
7 <NA> <NA>
8 <NA> <NA>
9 OM45632 Hypertryptophanemia