如何使用R的xpathApply提取kml坐标并将它们放在数据框中

时间:2013-12-29 04:39:36

标签: xml r xpath kml

我在google earth中创建了一条路径,然后使用以下说明复制并粘贴了kml文件(https://developers.google.com/kml/faq#validation - 如何创建KML文件?)

使用R的xml包我使用xmlInternalTreeParse没有问题:

doc2<-xmlInternalTreeParse("ROUTE_3.kml")

但这是我在尝试使用xpathApply时得到的:

xpathApply(doc2,"/kml//coordinates",xmlValue)
list()

在我删除了kml标签的属性之后,我得到了以下内容:

    xpathApply(doc2,"/kml//coordinates",xmlValue)
    [[1]]
    [1] "4.538678046760991,43.96218242485241,0 4.536099605055323,43.96220903572051,0              
    4.53771014982657,43.96415063050954,0 4.536106012183452,43.96535632643623,0  
    4.538664824256699,43.9660402294286,0 4.539486616025195,43.96777930035288,0 
    4.54165951159373,43.96623221715382,0 4.543909553814832,43.96588360581748,0 
    4.541906820403621,43.96447824521096,0 4.543519784610379,43.96288529313735,0 
    4.540449258644572,43.9633940089841,0 4.544185719673153,43.9516337999984,0 
    4.536212701406948,43.94157791460842,0 4.539125112498221,43.96125976359349,0"

我使用http://www.kmlvalidator.com/home.htm检查了原始kml文件,并说该文件“有效并符合最佳做法”。我是xpath的新手(一般来说是xml所以任何关于如何使用kml标签属性处理这个问题的建议都会受到赞赏。

既然我已将坐标作为列表的元素,那么有一种聪明的方法可以使用lon lat elv作为列标题来创建三列数据框吗? 我尝试了以下但我确信有更好的方法(感谢:Split column at delimiter in data frame):如果您有更直接的解决方案,请告诉我。谢谢。

ll<-xpathApply(doc2,"/kml//coordinates",xmlValue)
s<-ll[[1]]
ss<-strsplit(s,split=" ")

df <- data.frame(do.call('rbind', strsplit(as.character(ss[[1]]),',',fixed=TRUE)))
colnames(df)<-c("lon", "lat", "elv")
df
                lon               lat elv
1  4.538678046760991 43.96218242485241   0
2  4.536099605055323 43.96220903572051   0
3   4.53771014982657 43.96415063050954   0
4  4.536106012183452 43.96535632643623   0
5  4.538664824256699  43.9660402294286   0
6  4.539486616025195 43.96777930035288   0
7   4.54165951159373 43.96623221715382   0
8  4.543909553814832 43.96588360581748   0
9  4.541906820403621 43.96447824521096   0
10 4.543519784610379 43.96288529313735   0
11 4.540449258644572  43.9633940089841   0
12 4.544185719673153  43.9516337999984   0
13 4.536212701406948 43.94157791460842   0
14 4.539125112498221 43.96125976359349   0

这是原始的kml文件:

<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2" xmlns:kml="http://www.opengis.net/kml/2.2" xmlns:atom="http://www.w3.org/2005/Atom">
<Document>
  <name>KmlFile</name>
    <StyleMap id="m_ylw-pushpin">
    <Pair>
        <key>normal</key>
        <styleUrl>#s_ylw-pushpin</styleUrl>
    </Pair>
    <Pair>
        <key>highlight</key>
        <styleUrl>#s_ylw-pushpin_hl</styleUrl>
    </Pair>
</StyleMap>
<Style id="s_ylw-pushpin">
    <IconStyle>
        <scale>1.1</scale>
        <Icon>
            <href>http://maps.google.com/mapfiles/kml/pushpin/ylw-pushpin.png</href>
        </Icon>
        <hotSpot x="20" y="2" xunits="pixels" yunits="pixels"/>
    </IconStyle>
</Style>
<Style id="s_ylw-pushpin_hl">
    <IconStyle>
        <scale>1.3</scale>
        <Icon>
            <href>http://maps.google.com/mapfiles/kml/pushpin/ylw-pushpin.png</href>
        </Icon>
        <hotSpot x="20" y="2" xunits="pixels" yunits="pixels"/>
    </IconStyle>
</Style>
<Placemark>
    <name>ROUTE_3</name>
    <styleUrl>#m_ylw-pushpin</styleUrl>
    <LineString>
        <tessellate>1</tessellate>
        <coordinates>
4.538678046760991,43.96218242485241,0 
4.536099605055323,43.96220903572051,0 
4.53771014982657,43.96415063050954,0
4.536106012183452,43.96535632643623,0 
4.538664824256699,43.9660402294286,0 
4.539486616025195,43.96777930035288,0 
4.54165951159373,43.96623221715382,0 
4.543909553814832,43.96588360581748,0 
4.541906820403621,43.96447824521096,0 
4.543519784610379,43.96288529313735,0 
4.540449258644572,43.9633940089841,0 
4.544185719673153,43.9516337999984,0 
4.536212701406948,43.94157791460842,0      
4.539125112498221,43.96125976359349,0 
        </coordinates>
    </LineString>
</Placemark>
</Document>
</kml>

更新:做了一点阅读之后。特别是标题为 - 在内部XML树/ DOM中查找匹配节点的XML包文档部分 - 详细信息。我现在知道kml标签属性处理命名空间,所以我将xpathApply更正为:

xpathApply(doc2,"/kml:kml//kml:coordinates",xmlValue)

请注意,该路径现在包含kml:namespace。

现在我可以使用kml文件而无需修改。这是一个包含在函数中的示例:

library(XML)
KML_geo_path_coordinates_to_dataframe<-function(kml_file){
#this requires the xml library
doc2<-xmlInternalTreeParse(kml_file)
#the namespace issue (kml:) is explained in the getNodeSet(XML) R documentation under Details
ll<-xpathApply(doc2,"/kml:kml//kml:coordinates",xmlValue)
# ll delivers a list, I take the element I need out...a long string of coordinates    separated by "  "
s<-ll[[1]]
#however it may need some clean up
s<-gsub(pattern="\t",replacement="",x=s)
s<-gsub(pattern="\n",replacement="",x=s)

#split out the coordinate sets lon, lat, elv
ss<-strsplit(s,split=" ")
df <- data.frame(do.call('rbind', strsplit(as.character(ss[[1]]),',',fixed=TRUE)))
colnames(df)<-c("lon", "lat", "elv")

return(df)
}

1 个答案:

答案 0 :(得分:0)

实施@Gavin的优秀建议:(假设文件名为map.kml)。

library(rgdal)

setwd("<directory containing kml file>")

system(paste("ogrinfo", "map.kml")) # diagnostic to identify the layers
# Had to open data source read-only.
# INFO: Open of `map.kml'
#       using driver `KML' successful.
# 1: KmlFile (Line String)          <- This is the layer name

map <- readOGR(dsn="map.kml",layer="KmlFile")
df  <- data.frame(map@lines[[1]]@Lines[[1]]@coords)
colnames(df) <- c("lon","lat")
df

#         lon      lat
# 1  4.538678 43.96218
# 2  4.536100 43.96221
# 3  4.537710 43.96415
# 4  4.536106 43.96536
# 5  4.538665 43.96604
# ...

一些注意事项:

  1. readOGR(...)的KML驱动程序需要文件名(可选择带路径)作为dsn,并将kml名称标签的文本作为图层。开头的系统调用识别图层。

  2. readOGR(...)抛出了z维度。因此,如果您需要,这种方法对您不起作用。

  3. 坐标的位置取决于几何形状和元素数量。在您的情况下,您只有一条路径。

  4. 您的文件实际上存在错误,在第2行(xmlns:gx命名空间声明中缺少结束引号)。你需要修复它或文件不会导入..