我正在尝试将xml文件导入R并将其转换为数据帧,但是在获取不同节点时遇到了麻烦。许多节点中都有字符(例如:“”),因此我很难指定要拔出它们。随着层次结构的向下移动,我还不太清楚如何拔出较低级别的节点。
我正在使用xmlParse
和xmlToDataFrame
doc <- xmlParse("http://www.orphadata.org/data/xml/en_product6.xml")
doc2 <-xmlToDataFrame(nodes=getNodeSet(doc,"//Disorder"))[c("OrphaNumber")]
#this works but when I try to add more nodes with unusual characters or lower levels it fails.
doc3 <-xmlToDataFrame(nodes=getNodeSet(doc,"//Disorder"))[c("OrphaNumber","Name lang="en"")]
#or when I try to grab a lower node
doc4 <-xmlToDataFrame(nodes=getNodeSet(doc,"//Disorder"))[c("OrphaNumber","/DisorderGeneAssociation")]
预期结果是
head(doc3)
OrphaNumber Name lang="en"
166024 Multiple epiphyseal dysplasia,
166035 Brachydactyly-short stature-retinitis pigmentosa syndrome
head(doc4)
OrphaNumber DisorderGeneAssociationStatus
166024 <SourceOfValidation>22587682[PMID]
166035 <SourceOfValidation>28285769[PMID]</SourceOfValidation>