使用R将不相等的XML节点转换为数据帧

时间:2018-03-08 10:20:25

标签: r xml

我有一个XML文件,其中包含父母级别,我需要将其转换为数据帧。

以下是xml文件的示例:

<Header>
    <Response>
        <Response-type>
            1
        </Response-type>
        <CODE>
            1
        </CODE>
    </Response>
    <Identification>
        <Request>
            <Request-name>
                Testing
            </Request-name>
            <Request-time>
                <Year>
                    2015
                </Year>
                <Month>
                    December
                </Month>
                <Time>
                    <Hour>
                        1
                    </Hour>
                    <Minute>
                        20
                    </Minute>
                </Time>
            </Request-time>
        </Request>
    </Identification>
</Header>

我尝试将XML文件转换为列表列表,如下所示:

library(XML)
xml <- xmlTreeParse("myfile.xml", useInternalNodes = TRUE)
xml_list <- xmlToList(xml)

当我尝试将列表转换为数据帧时,会出现问题,如下所示:

as.data.frame(t(as.data.frame(xml_list)))

我得到以下错误:

Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  : 
  arguments imply differing number of rows: 1, 0

1 个答案:

答案 0 :(得分:0)

这是使用xml2库的解决方案。我假设每个标头节点只有一个响应和标识节点 代码中的注释解释了一步一步的过程:

library(xml2)
page<-read_xml("<Header>
    <Response>
      <Response-type> 1  </Response-type>
         <CODE> 1 </CODE>
      </Response>
      <Identification>
         <Request>
           <Request-name> Testing </Request-name>
           <Request-time>
             <Year>  2015 </Year>
             <Month> December </Month>
             <Time>  <Hour> 1 </Hour> <Minute> 20 </Minute></Time>
           </Request-time>
         </Request>
      </Identification> </Header>")

#Find all nodes, (assuming a single node in the document)
all<-xml_find_all(page, '//*')

#Goal is to find all the nodes at the bottom of hierarchy
#Find the children for each node 
children<-lapply(all, xml_children)
#determine the number of children on each node
#and check for the number of children if ==0 then this is a leaf node
nochildren<-sapply(children, length)==0

#titles of each leaf
names<-xml_name(all[nochildren])
#values of each leaf
values<-trimws(xml_text(all[nochildren]))

#build dataframe
df<-data.frame(t(values))
names(df)<-names