我想将XML文件转换为数据帧。我找到了一些允许我读取XML数据的函数,但是我无法获得与初始XML文件具有相同结构的数据框(=在Excel中打开XML文件时将获得的结构)。 / p>
这是我原来的XML代码:
<Data>
<Frame timestamp='17/09/2014 20:55:00.902' timecode='75299902' >
<Object type='Taxi' DISTANCE='3037' VOLUME='1668' id='15593' code='0' />
<Object type='Taxi' DISTANCE='3605' VOLUME='931' id='15603' code='4' />
<Object type='Bus' DISTANCE='3563' VOLUME='488' id='15604' code='9' />
<Object type='Taxi' DISTANCE='4942' VOLUME='57' id='15624' code='1' />
<Object type='Taxi' DISTANCE='784' VOLUME='47' id='15625' code='10' />
<Object type='Taxi' DISTANCE='3301' VOLUME='2041' id='15626' code='42' />
<Object type='Bus' DISTANCE='2040' VOLUME='2945' id='15630' code='27' />
<Object type='Airplane' DISTANCE='2865' VOLUME='2722' Z='0' />
</Frame>
<TrackingFrame timestamp='17/09/2014 20:54:59.771' timecode='75299771' >
<Object type='Taxi' DISTANCE='4941' VOLUME='51' id='15624' code='1' />
<Object type='Taxi' DISTANCE='789' VOLUME='47' id='15625' code='10' />
<Object type='Taxi' DISTANCE='3300' VOLUME='2069' id='15626' code='42' />
<Object type='Bus' DISTANCE='2027' VOLUME='2947' id='15630' code='27' />
<Object type='Airplane' DISTANCE='2865' VOLUME='2722' Z='0' />
</Frame>
</Data>
这允许我已经获得数据列表: 库(XML)
# Convert xml data to R
data <- xmlTreeParse(file="c:/R/CL/filename.xml",useInternalNode=TRUE)
# Create a list of the data
xl<-xmlToList(data)
理想情况下,我希望获得基于此XML数据的数据框,该数据框与在Excel中输入XML数据时的数据框相同。但是,当我查看xl的输出时,我发现它是在Objects和Times中组织的。通常,当我在Excel中打开XML文件时,此信息被链接(并且每个对象也包含具有时间信息的列)
这是xl&lt; -xmlToList(data)的输出:
$Frame$Object
type DISTANCE VOLUME id code
"Taxi" "3037" "1668" "15593" "0"
$Frame$Object
type DISTANCE VOLUME id code
"Taxi" "3605" "931" "15603" "4"
$Frame$Object
type DISTANCE VOLUME id code
“Bus” "3563" "488" "15604" "9"
$Frame$Object
type DISTANCE VOLUME id code
"Taxi" "2161" "1592" "15615" "21"
$Frame$Object
type DISTANCE VOLUME id code
"Taxi" "4942" "57" "15624" "1"
$Frame$Object
type DISTANCE VOLUME id code
"Taxi" "784" "47" "15625" "10"
$Frame$Object
type DISTANCE VOLUME id code
"Taxi" "3301" "2041" "15626" "42"
$Frame$Object
type DISTANCE VOLUME id code
“Bus” "2040" "2945" "15630" "27"
$Frame$Object
type DISTANCE VOLUME Z
"Airplane" "2865" "2722" "0"
$Frame$Time
timestamp timecode
"17/09/2014 20:54:59.902" "75299902"
$Frame$Object
type DISTANCE VOLUME id code
"Taxi" "4941" "51" "15624" "1"
$Frame$Object
type DISTANCE VOLUME id code
"Taxi" "789" "47" "15625" "10"
$Frame$Object
type DISTANCE VOLUME id code
"Taxi" "3300" "2069" "15626" "42"
$Frame$Object
type DISTANCE VOLUME id code
“Bus” "2027" "2947" "15630" "27"
$Frame$Object
type DISTANCE VOLUME Z
"Airplane" "2865" "2722" "0"
$Frame$Time
timestamp timecode
"17/09/2014 20:54:59.771" "75299771"
此列表包含2个表结构/框架:Frame $ Object和Frame $ Time。我想将这两个结构组合成一个组合表(通过重复列时间戳和时间码以及每个对象的时间信息)。
请参阅下面的所需输出(与您在Excel中输入XML文件时的结构相同):
type DISTANCE VOLUME id code z timestamp timecode
Taxi 3037 1668 15593 0 17/09/2014 20:54:59.902 75299902
Taxi 3605 931 15603 4 17/09/2014 20:54:59.902 75299902
Bus 3563 488 15604 9 17/09/2014 20:54:59.900 75299902
Taxi 4942 57 15624 1 17/09/2014 20:54:59.900 75299902
Taxi 784 47 15625 10 17/09/2014 20:54:59.900 75299902
Taxi 3301 2041 15626 42 17/09/2014 20:54:59.900 75299902
Bus 2040 2945 15630 27 17/09/2014 20:54:59.900 75299902
Airplane 2865 2722 0 17/09/2014 20:54:59.900 75299902
Taxi 4941 51 15624 1 17/09/2014 20:54:59.771 75299771
Taxi 789 47 15625 10 17/09/2014 20:54:59.771 75299771
Taxi 3300 2069 15626 42 17/09/2014 20:54:59.771 75299771
Bus 2027 2947 15630 27 17/09/2014 20:54:59.771 75299771
Airplane 2865 2722 0 17/09/2014 20:54:59.771 75299771
哪些功能可以达到这个效果?先谢谢你的帮助!
答案 0 :(得分:3)
您可以使用xml2
和dplyr
进行快速转换:
library(xml2)
library(dplyr)
dat <- "<Data>
<Frame timestamp='17/09/2014 20:55:00.902' timecode='75299902' >
<Object type='Taxi' DISTANCE='3037' VOLUME='1668' id='15593' code='0' />
<Object type='Taxi' DISTANCE='3605' VOLUME='931' id='15603' code='4' />
<Object type='Bus' DISTANCE='3563' VOLUME='488' id='15604' code='9' />
<Object type='Taxi' DISTANCE='4942' VOLUME='57' id='15624' code='1' />
<Object type='Taxi' DISTANCE='784' VOLUME='47' id='15625' code='10' />
<Object type='Taxi' DISTANCE='3301' VOLUME='2041' id='15626' code='42' />
<Object type='Bus' DISTANCE='2040' VOLUME='2945' id='15630' code='27' />
<Object type='Airplane' DISTANCE='2865' VOLUME='2722' Z='0' />
</Frame>
<Frame timestamp='17/09/2014 20:54:59.771' timecode='75299771' >
<Object type='Taxi' DISTANCE='4941' VOLUME='51' id='15624' code='1' />
<Object type='Taxi' DISTANCE='789' VOLUME='47' id='15625' code='10' />
<Object type='Taxi' DISTANCE='3300' VOLUME='2069' id='15626' code='42' />
<Object type='Bus' DISTANCE='2027' VOLUME='2947' id='15630' code='27' />
<Object type='Airplane' DISTANCE='2865' VOLUME='2722' Z='0' />
</Frame>
</Data>"
doc <- read_xml(dat)
# bind the data.frames built in the iterator together
bind_rows(lapply(xml_find_all(doc, "//Frame"), function(x) {
# extract the attributes from the parent tag as a data.frame
parent <- data.frame(as.list(xml_attrs(x)), stringsAsFactors=FALSE)
# make a data.frame out of the attributes of the kids
kids <- bind_rows(lapply(xml_children(x), function(x) as.list(xml_attrs(x))))
# combine them
cbind.data.frame(parent, kids, stringsAsFactors=FALSE)
}))
## Source: local data frame [13 x 8]
##
## timestamp timecode type DISTANCE VOLUME id code Z
## (chr) (chr) (chr) (chr) (chr) (chr) (chr) (chr)
## 1 17/09/2014 20:55:00.902 75299902 Taxi 3037 1668 15593 0 NA
## 2 17/09/2014 20:55:00.902 75299902 Taxi 3605 931 15603 4 NA
## 3 17/09/2014 20:55:00.902 75299902 Bus 3563 488 15604 9 NA
## 4 17/09/2014 20:55:00.902 75299902 Taxi 4942 57 15624 1 NA
## 5 17/09/2014 20:55:00.902 75299902 Taxi 784 47 15625 10 NA
## 6 17/09/2014 20:55:00.902 75299902 Taxi 3301 2041 15626 42 NA
## 7 17/09/2014 20:55:00.902 75299902 Bus 2040 2945 15630 27 NA
## 8 17/09/2014 20:55:00.902 75299902 Airplane 2865 2722 NA NA 0
## 9 17/09/2014 20:54:59.771 75299771 Taxi 4941 51 15624 1 NA
## 10 17/09/2014 20:54:59.771 75299771 Taxi 789 47 15625 10 NA
## 11 17/09/2014 20:54:59.771 75299771 Taxi 3300 2069 15626 42 NA
## 12 17/09/2014 20:54:59.771 75299771 Bus 2027 2947 15630 27 NA
## 13 17/09/2014 20:54:59.771 75299771 Airplane 2865 2722 NA NA 0
您需要根据需要转换类型。
如果你坚持使用XML
套餐,你可以做类似的事情:
doc <- xmlParse(dat)
bind_rows(xpathApply(doc, "//Frame", function(x) {
parent <- data.frame(as.list(xmlAttrs(x)), stringsAsFactors=FALSE)
kids <- bind_rows(lapply(xmlChildren(x), function(x) as.list(xmlAttrs(x))))
cbind.data.frame(parent, kids, stringsAsFactors=FALSE)
}))
答案 1 :(得分:0)
尝试
data <- xmlParse(file="c:/R/CL/filename.xml")
等等:
sapply(getNodeSet(data, "//Frame/Object[@type]"), xmlValue)
它应该为您提供节点Frame下所有类型的节点对象的向量。 更多信息: http://www.w3schools.com/xsl/xpath_syntax.asp
答案 2 :(得分:0)
考虑savedComp: Component = null;
...
if (this.savedComp) {
this.savedComp.dispose();
}
this.loader.loadIntoLocation(DynamicComponent, this.element, 'attach')
then((res) => {res.instance.model = model; this.savedComp = res;});
库的XML
路由,其中包含为每个子项检索xpathsapply()
和timestamp
的解决方法,并处理timecode
和{{1}的缺失属性}}:
id