我有一个XML文件,其中包含父母级别,我需要将其转换为数据帧。
以下是xml文件的示例:
<Header>
<Response>
<Response-type>
1
</Response-type>
<CODE>
1
</CODE>
</Response>
<Identification>
<Request>
<Request-name>
Testing
</Request-name>
<Request-time>
<Year>
2015
</Year>
<Month>
December
</Month>
<Time>
<Hour>
1
</Hour>
<Minute>
20
</Minute>
</Time>
</Request-time>
</Request>
</Identification>
</Header>
我尝试将XML文件转换为列表列表,如下所示:
library(XML)
xml <- xmlTreeParse("myfile.xml", useInternalNodes = TRUE)
xml_list <- xmlToList(xml)
当我尝试将列表转换为数据帧时,会出现问题,如下所示:
as.data.frame(t(as.data.frame(xml_list)))
我得到以下错误:
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows: 1, 0
答案 0 :(得分:0)
这是使用xml2库的解决方案。我假设每个标头节点只有一个响应和标识节点 代码中的注释解释了一步一步的过程:
library(xml2)
page<-read_xml("<Header>
<Response>
<Response-type> 1 </Response-type>
<CODE> 1 </CODE>
</Response>
<Identification>
<Request>
<Request-name> Testing </Request-name>
<Request-time>
<Year> 2015 </Year>
<Month> December </Month>
<Time> <Hour> 1 </Hour> <Minute> 20 </Minute></Time>
</Request-time>
</Request>
</Identification> </Header>")
#Find all nodes, (assuming a single node in the document)
all<-xml_find_all(page, '//*')
#Goal is to find all the nodes at the bottom of hierarchy
#Find the children for each node
children<-lapply(all, xml_children)
#determine the number of children on each node
#and check for the number of children if ==0 then this is a leaf node
nochildren<-sapply(children, length)==0
#titles of each leaf
names<-xml_name(all[nochildren])
#values of each leaf
values<-trimws(xml_text(all[nochildren]))
#build dataframe
df<-data.frame(t(values))
names(df)<-names