我正在努力将XML转换为R中的Dataframe see XML here
到目前为止,我已经设法将所有xml属性都放入一个数据帧中,但是需要一些“ qualifier_id”的“ Q id”信息。这是当前代码:
df_list <- lapply(list.filenames, function(f) {
doc <- read_xml(f)
setNames(data.frame(
xml_attr(xml_find_all(doc, "//Event"), "timestamp"),
xml_attr(xml_find_all(doc, "//Event"), "id"),
xml_attr(xml_find_all(doc, "//Event"), "version"),
xml_attr(xml_find_all(doc, "//Event"), "last_modified"),
xml_attr(xml_find_all(doc, "//Event"), "y"),
xml_attr(xml_find_all(doc, "//Event"), "x"),
xml_attr(xml_find_all(doc, "//Event"), "outcome"),
xml_attr(xml_find_all(doc, "//Event"), "team_id"),
xml_attr(xml_find_all(doc, "//Event"), "sec"),
xml_attr(xml_find_all(doc, "//Event"), "min"),
xml_attr(xml_find_all(doc, "//Event"), "period_id"),
xml_attr(xml_find_all(doc, "//Event"), "type_id"),
xml_attr(xml_find_all(doc, "//Event"), "event_id")
), c("timestamp", "id", "version", "last_modified", "y", "x", "outcome", "team_id", "sec", "min", "period_id", "type_id", "event_id"))
})
数据框如下所示: Dataframe
理想情况下,我会为一些“ qualifier_id”添加额外的列。例如,名为“ 213”的列包含来自“值”的值和不存在的值。
预先感谢
答案 0 :(得分:0)
您已经在使用xml2,因此使用正确的xpath
相当容易。
library(xml2)
样本数据
doc <- read_xml("<?xml version='1.0' encoding='ISO-8859-1'?>
<root>
<Event id='1'>
<Q id='' qualifier_id='12'/>
<Q id='' qualifier_id='123' value='hello'/>
<Q id='' qualifier_id='1234'/>
<Q id='' qualifier_id='1' value='goodbye'/>
</Event>
<Event id='2'>
<Q id='' qualifier_id='2'/>
<Q id='' qualifier_id='1234'/>
<Q id='' qualifier_id='1' value='goodbye'/>
</Event>
</root>")
代码
#get list of Event-nodes
Event.nodes <- xml_find_all( doc, "//Event")
#in each node, find the first Q-node with qualifier_id attribute == 123
#from this node, extract the value of attribute 'value'
#if no Q-node is found, return NA
xml_attr( xml_find_first( Event.nodes, "./Q[@qualifier_id='123']"), "value" )
#[1] "hello" NA