我想使用R来读取Google My track创建的.kml文件中的“when”值(摘录如下):
?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2"
xmlns:gx="http://www.google.com/kml/ext/2.2"
xmlns:atom="http://www.w3.org/2005/Atom">
<Document>
<open>1</open>
<visibility>1</visibility>
<name><![CDATA[2013-06-29 1:09pm]]></name>
<atom:author><atom:name><![CDATA[Created by Google My Tracks on Android.]]></atom:name> </atom:author>
...
<gx:MultiTrack>
<altitudeMode>absolute</altitudeMode>
<gx:interpolate>1</gx:interpolate>
<gx:Track>
<when>2013-06-29T17:09:04.564Z</when>
<gx:coord>-79.305048 43.710639 72.9000015258789</gx:coord>
<when>2013-06-29T17:09:06.135Z</when>
<gx:coord>-79.304971 43.710653 67.4000015258789</gx:coord>
<when>2013-06-29T17:09:08.135Z</when>
<gx:coord>-79.305193 43.710535 78.19999694824219</gx:coord>
<when>2013-06-29T17:09:09.135Z</when>
节点“when”是“对应于位置的时间值(在gx:coord元素中指定)”。 “gx:coord”是“由经度,纬度和海拔高度三个值组成的坐标值”。 (https://developers.google.com/kml/documentation/kmlreference#gxtrack)
我想要的值的路径是:
kml/Document/Placemark/gx:MultiTrack/gx:Track/when
来自:xmlstarlet el“filename.kml”
我能够使用以下方法提取坐标和高度:
coords <- xpathSApply(check, "//gx:coord", xmlValue)
lat <- sapply(strsplit(as.character(coords)," "), "[",1)
lon <- sapply(strsplit(as.character(coords)," "), "[",2)
ele <- sapply(strsplit(as.character(coords)," "), "[",3)
但我当时无法得到。我想从文件中删除的是:
17:09:04.564
17:09:06.135
17:09:08.135
17:09:09.135
将它们与坐标和高程对齐。
我试过了:
timeStamp <- xpathSApply(check, "//gx:MultiTrack", xmlValue)
给我一个可以解析的字符串,因为时间以“T”开头并以“Z”结尾:
[1] "absolute12013-06-29T17:09:04.564Z-79.305048 43.710639 72.90000152587892013-06- 29T17:09:06.135Z-79.304971 43.710653 67.40000152587892013-06-29T17:09:08.135Z-79.305193 43.710535 78.199996948242192013-06-29T17:09:09.135Z-79.305164 43.710592 77.699996948242192013-06-29T17:09:10.134Z-79.305097 43.710614 67.52013-06-29T17:09:11.137Z-79.305066 43.710572
有什么好主意吗?提前谢谢。
修改-----&gt;
我不优雅的解决方案:
file_name <- "2013-06-29 1-09pm.kml"
library(XML)
# read XML tree schema
check <-xmlInternalTreeParse(file=file_name)
library(gsubfn)
# read kml file into a string
z <- xpathSApply(check, "//gx:MultiTrack", xmlValue)
# find text bounded by (and including) T and Z
x <- strapply(z,"T.+?Z")
# unpack the resulting list
x1 <- unlist(x)
# get rid of the initial T
x2 <- gsub("T", "", x1)
# get rid of the trailing Z
x3 <- gsub("Z", "", x2)
# convert it to a time format
time <- strptime(x3, "%H:%M:%OS")
答案 0 :(得分:0)
有更好的HTML解析方法,但我看到没有人发布。这样可以工作,但是通常使用html解析不感兴趣。
library(qdap)
x <- unlist(genXtract(dat, "<when>", "</when>"))
y <- unlist(genXtract(dat, "<gx:coord>", "</gx:coord>"))
x[sapply(x, function(x) !identical(x, character(0)))]
y[sapply(y, function(x) !identical(x, character(0)))]
## > x[sapply(x, function(x) !identical(x, character(0)))]
## <when> : </when>14 <when> : </when>16
## "2013-06-29T17:09:04.564Z" "2013-06-29T17:09:06.135Z"
## <when> : </when>18 <when> : </when>20
## "2013-06-29T17:09:08.135Z" "2013-06-29T17:09:09.135Z"
## > y[sapply(y, function(x) !identical(x, character(0)))]
## <gx:coord> : </gx:coord>15
## "-79.305048 43.710639 72.9000015258789"
## <gx:coord> : </gx:coord>17
## "-79.304971 43.710653 67.4000015258789"
## <gx:coord> : </gx:coord>19
## "-79.305193 43.710535 78.19999694824219"