我有一个XML(OSM)文件,看起来像这样(小例子):
<way id="86015" version="1" timestamp="2016-02-26T15:01:32Z">
<nd ref="85642"/>
<nd ref="85641"/>
<nd ref="86016"/>
<nd ref="85642"/>
</way>
<relation id="1" version="1" timestamp="2016-02-26T15:01:32Z">
<member type="way" ref="2" role="outer"/>
<member type="way" ref="12" role="outer"/>
<member type="way" ref="17" role="outer"/>
<member type="way" ref="22" role="outer"/>
<member type="way" ref="27" role="outer"/>
<member type="way" ref="60" role="outer"/>
<member type="way" ref="65" role="outer"/>
<member type="way" ref="71" role="outer"/>
<member type="way" ref="75" role="outer"/>
<member type="way" ref="79" role="outer"/>
<member type="way" ref="84" role="outer"/>
<member type="way" ref="92" role="outer"/>
<member type="way" ref="108" role="outer"/>
<member type="way" ref="112" role="outer"/>
<member type="way" ref="132" role="outer"/>
<member type="way" ref="150" role="outer"/>
<member type="way" ref="166" role="outer"/>
<member type="way" ref="173" role="outer"/>
<member type="way" ref="178" role="outer"/>
<tag k="type" v="multipolygon"/>
<tag k="note" v="00000 ExampleCity"/>
<tag k="plz" v="00000"/>
</relation>
我想要做的是使用R中的XML package
对文件应用一些更改,尤其是<relation>
部分。
1)我想更改v=
属性
<tag k="type" v="multipolygon"/>
到
<tag k="type" v="boundary"/>
2)我想在所有<relation>
父节点
<tag k='boundary' v='postal_code' />
3)更改k=
属性部分:
<tag k="note" v="00000 ExampleCity"/>
到
<tag k="city" v="00000 ExampleCity"/>
我可以通过以下方式找到所有<relations>
:
(doc
是文件名)
getNodeSet(doc,"//relation")
或获取所有tags
<realtions>
但我无法弄清楚如何实际覆盖并添加我需要的部分。
答案 0 :(得分:0)
如上所述,请考虑XSLT,这是一种专用的声明性编程语言,旨在操作XML文档以满足最终用途需求。虽然R不维护全面的XSLT处理器,但它可以与其他语言/软件(如Python和Excel)进行交互。即使对于后者,R也可以使用RDCOMClient
库来模仿Excel宏:
XSLT 脚本(另存为下面要使用的外部.xsl或.xslt文件)
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output version="1.0" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*"/>
<!-- IDENTITY TRANSFORM -->
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<!-- CHANGE @v ATTRIBUTE -->
<xsl:template match="tag[@k='type']">
<xsl:copy>
<xsl:copy-of select="@k"/>
<xsl:attribute name="v">boundary</xsl:attribute>
</xsl:copy>
</xsl:template>
<!-- CHANGE @k ATTRIBUTE -->
<xsl:template match="tag[@k='note']">
<xsl:copy>
<xsl:attribute name="k">city</xsl:attribute>
<xsl:copy-of select="@v"/>
</xsl:copy>
</xsl:template>
<!-- ADD NODE -->
<xsl:template match="relation">
<xsl:copy>
<xsl:apply-templates select="@*"/>
<xsl:apply-templates select="member"/>
<xsl:apply-templates select="tag"/>
<tag k='boundary' v='postal_code' />
</xsl:copy>
</xsl:template>
</xsl:transform>
Python 脚本(使用lxml模块)
import lxml.etree as ET
# LOAD ORIGINAL XML AND XSLT SCRIPT
dom = ET.parse('Input.xml')
xslt = ET.parse('XSLTScript.xsl')
# TRANSFORM XML INTO A NEW DOM OBJECT
transform = ET.XSLT(xslt)
newdom = transform(dom)
# CONVERT TO STRING
tree_out = ET.tostring(newdom, encoding='UTF-8', pretty_print=True, xml_declaration=True)
# OUTPUT TO FILE
xmlfile = open('Output.xml'),'wb')
xmlfile.write(tree_out)
xmlfile.close()
R 脚本(调用上面的.py脚本,假设python是系统PATH变量)
system('python "C:\\Path\\To\\Python\\Script.py"')
或者,Excel可以运行XSLT,R复制该过程。
Excel 宏(使用MSXML对象,在此处进行后期绑定)
Public Sub RunXSLT()
Dim xmlDoc As Object, xslDoc As Object, newDoc As Object
Set xmlDoc = CreateObject("MSXML2.DOMDocument")
Set xslDoc = CreateObject("MSXML2.DOMDocument")
Set newDoc = CreateObject("MSXML2.DOMDocument")
xmlDoc.Load "C\Path\To\Input.xml"
xmlDoc.async = False
xslDoc.Load "C\Path\To\XSLTScript.xsl"
xslDoc.async = False
xmlDoc.transformNodeToObject xslDoc, newDoc
newDoc.Save "C\Path\To\Output.xml"
Set newDoc = Nothing
Set xslDoc = Nothing
Set xmlDoc = Nothing
End Sub
R 脚本(使用RDCOMClient进行上述复制)
library(RDCOMClient)
xmlfile = COMCreate("MSXML2.DOMDocument")
xslfile = COMCreate("MSXML2.DOMDocument")
newxmlfile = COMCreate("MSXML2.DOMDocument")
xmlstr = 'C\\Path\\To\\Input.xml'
xslstr = 'C\\Path\\To\\XSLTScript.xsl'
newxmlstr = 'C\\Path\\To\\Output.xml'
# LOADING XML & XSLT FILES
xmlfile.async = FALSE
xmlfile$Load(xmlstr)
xslfile.async = FALSE
xslfile$Load(xslstr)
# TRANSFORMING XML FILE USING XLST INTO NEW FILE
xmlfile$transformNodeToObject(xslfile, newxmlfile)
newxmlfile$Save(newxmlstr)
最终XML输出
<?xml version='1.0' encoding='UTF-8'?>
<osm version="0.6" generator="CGImap 0.0.2">
<node id="298884272" lat="54.0901447" lon="12.2516513" user="SvenHRO"
uid="46882" visible="true" version="1" changeset="676636"
timestamp="2008-09-21T21:37:45Z"/>
<way id="86015" version="1" timestamp="2016-02-26T15:01:32Z">
<nd ref="85642"/>
<nd ref="85641"/>
<nd ref="86016"/>
<nd ref="85642"/>
</way>
<relation id="1" version="1" timestamp="2016-02-26T15:01:32Z">
<member type="way" ref="2" role="outer"/>
<member type="way" ref="12" role="outer"/>
<member type="way" ref="17" role="outer"/>
<member type="way" ref="22" role="outer"/>
<member type="way" ref="27" role="outer"/>
<member type="way" ref="60" role="outer"/>
<member type="way" ref="65" role="outer"/>
<member type="way" ref="71" role="outer"/>
<member type="way" ref="75" role="outer"/>
<member type="way" ref="79" role="outer"/>
<member type="way" ref="84" role="outer"/>
<member type="way" ref="92" role="outer"/>
<member type="way" ref="108" role="outer"/>
<member type="way" ref="112" role="outer"/>
<member type="way" ref="132" role="outer"/>
<member type="way" ref="150" role="outer"/>
<member type="way" ref="166" role="outer"/>
<member type="way" ref="173" role="outer"/>
<member type="way" ref="178" role="outer"/>
<tag k="type" v="boundary"/>
<tag k="city" v="00000 ExampleCity"/>
<tag k="plz" v="00000"/>
<tag k="boundary" v="postal_code"/>
</relation>
</osm>