在R中将XML转换为Dataframe

时间:2019-04-01 17:22:10

标签: r xml dataframe

我正在努力将XML转换为R中的Dataframe see XML here

到目前为止,我已经设法将所有xml属性都放入一个数据帧中,但是需要一些“ qualifier_id”的“ Q id”信息。这是当前代码:

df_list <- lapply(list.filenames, function(f) {
doc <- read_xml(f)

setNames(data.frame(
  xml_attr(xml_find_all(doc, "//Event"), "timestamp"),
  xml_attr(xml_find_all(doc, "//Event"), "id"),
  xml_attr(xml_find_all(doc, "//Event"), "version"),
  xml_attr(xml_find_all(doc, "//Event"), "last_modified"),
  xml_attr(xml_find_all(doc, "//Event"), "y"),
  xml_attr(xml_find_all(doc, "//Event"), "x"),
  xml_attr(xml_find_all(doc, "//Event"), "outcome"),
  xml_attr(xml_find_all(doc, "//Event"), "team_id"),
  xml_attr(xml_find_all(doc, "//Event"), "sec"),
  xml_attr(xml_find_all(doc, "//Event"), "min"),
  xml_attr(xml_find_all(doc, "//Event"), "period_id"),
  xml_attr(xml_find_all(doc, "//Event"), "type_id"),
  xml_attr(xml_find_all(doc, "//Event"), "event_id")

), c("timestamp", "id", "version", "last_modified", "y", "x", "outcome", "team_id", "sec", "min", "period_id", "type_id", "event_id"))

})

数据框如下所示: Dataframe

理想情况下,我会为一些“ qualifier_id”添加额外的列。例如,名为“ 213”的列包含来自“值”的值和不存在的值。

预先感谢

1 个答案:

答案 0 :(得分:0)

您已经在使用xml2,因此使用正确的xpath相当容易。

library(xml2)

样本数据

doc <- read_xml("<?xml version='1.0' encoding='ISO-8859-1'?>
                <root>
                <Event id='1'>
                <Q id='' qualifier_id='12'/>
                <Q id='' qualifier_id='123' value='hello'/>
                <Q id='' qualifier_id='1234'/>
                <Q id='' qualifier_id='1' value='goodbye'/>
                </Event>
                <Event id='2'>
                <Q id='' qualifier_id='2'/>
                <Q id='' qualifier_id='1234'/>
                <Q id='' qualifier_id='1' value='goodbye'/>
                </Event>
                </root>")

代码

#get list of Event-nodes
Event.nodes <- xml_find_all( doc, "//Event")

#in each node, find the first Q-node with qualifier_id attribute == 123
#from this node, extract the value of attribute 'value' 
#if no Q-node is found, return NA
xml_attr( xml_find_first( Event.nodes, "./Q[@qualifier_id='123']"), "value" )

#[1] "hello" NA