将XML导入R

时间:2019-05-21 19:02:11

标签: r xml import

我正在尝试将xml文件导入R并将其转换为数据帧,但是在获取不同节点时遇到了麻烦。许多节点中都有字符(例如:“”),因此我很难指定要拔出它们。随着层次结构的向下移动,我还不太清楚如何拔出较低级别的节点。

我正在使用xmlParsexmlToDataFrame

doc <- xmlParse("http://www.orphadata.org/data/xml/en_product6.xml")
doc2 <-xmlToDataFrame(nodes=getNodeSet(doc,"//Disorder"))[c("OrphaNumber")]

#this works but when I try to add more nodes with unusual characters or lower levels it fails. 

doc3 <-xmlToDataFrame(nodes=getNodeSet(doc,"//Disorder"))[c("OrphaNumber","Name lang="en"")]

#or when I try to grab a lower node
doc4 <-xmlToDataFrame(nodes=getNodeSet(doc,"//Disorder"))[c("OrphaNumber","/DisorderGeneAssociation")]

预期结果是

head(doc3)
OrphaNumber   Name lang="en"
166024        Multiple epiphyseal dysplasia,
166035        Brachydactyly-short stature-retinitis pigmentosa syndrome


head(doc4)
OrphaNumber   DisorderGeneAssociationStatus

166024        <SourceOfValidation>22587682[PMID]
166035        <SourceOfValidation>28285769[PMID]</SourceOfValidation>

0 个答案:

没有答案