使用R

时间:2016-02-26 16:41:46

标签: xml r xml-parsing openstreetmap

我有一个XML(OSM)文件,看起来像这样(小例子):

<way id="86015" version="1" timestamp="2016-02-26T15:01:32Z">
    <nd ref="85642"/>
    <nd ref="85641"/>
    <nd ref="86016"/>
    <nd ref="85642"/>
  </way>
  <relation id="1" version="1" timestamp="2016-02-26T15:01:32Z">
    <member type="way" ref="2" role="outer"/>
    <member type="way" ref="12" role="outer"/>
    <member type="way" ref="17" role="outer"/>
    <member type="way" ref="22" role="outer"/>
    <member type="way" ref="27" role="outer"/>
    <member type="way" ref="60" role="outer"/>
    <member type="way" ref="65" role="outer"/>
    <member type="way" ref="71" role="outer"/>
    <member type="way" ref="75" role="outer"/>
    <member type="way" ref="79" role="outer"/>
    <member type="way" ref="84" role="outer"/>
    <member type="way" ref="92" role="outer"/>
    <member type="way" ref="108" role="outer"/>
    <member type="way" ref="112" role="outer"/>
    <member type="way" ref="132" role="outer"/>
    <member type="way" ref="150" role="outer"/>
    <member type="way" ref="166" role="outer"/>
    <member type="way" ref="173" role="outer"/>
    <member type="way" ref="178" role="outer"/>
    <tag k="type" v="multipolygon"/>
    <tag k="note" v="00000 ExampleCity"/>
    <tag k="plz" v="00000"/>
  </relation>

我想要做的是使用R中的XML package对文件应用一些更改,尤其是<relation>部分。

1)我想更改v=属性

<tag k="type" v="multipolygon"/>

<tag k="type" v="boundary"/>

2)我想在所有<relation>父节点

中添加一个新节点
<tag k='boundary' v='postal_code' />

3)更改k=属性部分:

<tag k="note" v="00000 ExampleCity"/>

<tag k="city" v="00000 ExampleCity"/>

我可以通过以下方式找到所有<relations>: (doc是文件名)

getNodeSet(doc,"//relation")

或获取所有tags

的所有<realtions>

但我无法弄清楚如何实际覆盖并添加我需要的部分。

1 个答案:

答案 0 :(得分:0)

如上所述,请考虑XSLT,这是一种专用的声明性编程语言,旨在操作XML文档以满足最终用途需求。虽然R不维护全面的XSLT处理器,但它可以与其他语言/软件(如Python和Excel)进行交互。即使对于后者,R也可以使用RDCOMClient库来模仿Excel宏:

XSLT 脚本(另存为下面要使用的外部.xsl或.xslt文件)

<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output version="1.0" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*"/>  

  <!-- IDENTITY TRANSFORM -->
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <!-- CHANGE @v ATTRIBUTE -->
  <xsl:template match="tag[@k='type']">
    <xsl:copy>      
      <xsl:copy-of select="@k"/>
      <xsl:attribute name="v">boundary</xsl:attribute>
    </xsl:copy>
  </xsl:template>

  <!-- CHANGE @k ATTRIBUTE -->
  <xsl:template match="tag[@k='note']">
    <xsl:copy>      
      <xsl:attribute name="k">city</xsl:attribute>
      <xsl:copy-of select="@v"/>
    </xsl:copy>
  </xsl:template>

  <!-- ADD NODE -->
  <xsl:template match="relation">
    <xsl:copy>      
      <xsl:apply-templates select="@*"/>
      <xsl:apply-templates select="member"/>
      <xsl:apply-templates select="tag"/>
      <tag k='boundary' v='postal_code' />
    </xsl:copy>
  </xsl:template>      
</xsl:transform>

Python 脚本(使用lxml模块)

import lxml.etree as ET

# LOAD ORIGINAL XML AND XSLT SCRIPT
dom = ET.parse('Input.xml')
xslt = ET.parse('XSLTScript.xsl')

# TRANSFORM XML INTO A NEW DOM OBJECT
transform = ET.XSLT(xslt)
newdom = transform(dom)

# CONVERT TO STRING
tree_out = ET.tostring(newdom, encoding='UTF-8', pretty_print=True,  xml_declaration=True)

# OUTPUT TO FILE
xmlfile = open('Output.xml'),'wb')
xmlfile.write(tree_out)
xmlfile.close()

R 脚本(调用上面的.py脚本,假设python是系统PATH变量)

system('python "C:\\Path\\To\\Python\\Script.py"')

或者,Excel可以运行XSLT,R复制该过程。

Excel (使用MSXML对象,在此处进行后期绑定)

Public Sub RunXSLT()
    Dim xmlDoc As Object, xslDoc As Object, newDoc As Object

    Set xmlDoc = CreateObject("MSXML2.DOMDocument")
    Set xslDoc = CreateObject("MSXML2.DOMDocument")
    Set newDoc = CreateObject("MSXML2.DOMDocument")

    xmlDoc.Load "C\Path\To\Input.xml"
    xmlDoc.async = False

    xslDoc.Load "C\Path\To\XSLTScript.xsl"
    xslDoc.async = False
    xmlDoc.transformNodeToObject xslDoc, newDoc
    newDoc.Save "C\Path\To\Output.xml"

    Set newDoc = Nothing
    Set xslDoc = Nothing
    Set xmlDoc = Nothing

End Sub

R 脚本(使用RDCOMClient进行上述复制)

library(RDCOMClient)

xmlfile = COMCreate("MSXML2.DOMDocument")
xslfile = COMCreate("MSXML2.DOMDocument")
newxmlfile = COMCreate("MSXML2.DOMDocument")

xmlstr = 'C\\Path\\To\\Input.xml'
xslstr = 'C\\Path\\To\\XSLTScript.xsl'
newxmlstr = 'C\\Path\\To\\Output.xml'

# LOADING XML & XSLT FILES
xmlfile.async = FALSE
xmlfile$Load(xmlstr)

xslfile.async = FALSE
xslfile$Load(xslstr)

# TRANSFORMING XML FILE USING XLST INTO NEW FILE
xmlfile$transformNodeToObject(xslfile, newxmlfile)
newxmlfile$Save(newxmlstr)

最终XML输出

<?xml version='1.0' encoding='UTF-8'?>
<osm version="0.6" generator="CGImap 0.0.2">
  <node id="298884272" lat="54.0901447" lon="12.2516513" user="SvenHRO" 
        uid="46882" visible="true" version="1" changeset="676636" 
        timestamp="2008-09-21T21:37:45Z"/>
  <way id="86015" version="1" timestamp="2016-02-26T15:01:32Z">
    <nd ref="85642"/>
    <nd ref="85641"/>
    <nd ref="86016"/>
    <nd ref="85642"/>
  </way>
  <relation id="1" version="1" timestamp="2016-02-26T15:01:32Z">
    <member type="way" ref="2" role="outer"/>
    <member type="way" ref="12" role="outer"/>
    <member type="way" ref="17" role="outer"/>
    <member type="way" ref="22" role="outer"/>
    <member type="way" ref="27" role="outer"/>
    <member type="way" ref="60" role="outer"/>
    <member type="way" ref="65" role="outer"/>
    <member type="way" ref="71" role="outer"/>
    <member type="way" ref="75" role="outer"/>
    <member type="way" ref="79" role="outer"/>
    <member type="way" ref="84" role="outer"/>
    <member type="way" ref="92" role="outer"/>
    <member type="way" ref="108" role="outer"/>
    <member type="way" ref="112" role="outer"/>
    <member type="way" ref="132" role="outer"/>
    <member type="way" ref="150" role="outer"/>
    <member type="way" ref="166" role="outer"/>
    <member type="way" ref="173" role="outer"/>
    <member type="way" ref="178" role="outer"/>
    <tag k="type" v="boundary"/>
    <tag k="city" v="00000 ExampleCity"/>
    <tag k="plz" v="00000"/>
    <tag k="boundary" v="postal_code"/>
  </relation>
</osm>