我有以下XML文件 test.graphml ,我试图使用R中的XML包进行操作。
<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns
http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<graph id="G" edgedefault="directed">
<node id="n0"/>
<node id="n1"/>
<node id="n2"/>
<node id="n3"/>
<node id="n4"/>
<edge source="n0" target="n1"/>
<edge source="n0" target="n2"/>
<edge source="n2" target="n3"/>
<edge source="n1" target="n3"/>
<edge source="n3" target="n4"/>
</graph>
</graphml>
我想将节点n0,n1,n2和n3嵌套到一个新的图节点中,如下所示。
<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns
http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<graph id="G" edgedefault="directed">
<graph id="g1">
<node id="n0"/>
<node id="n1"/>
<node id="n2"/>
<node id="n3"/>
</graph>
<node id="n4"/>
<edge source="n0" target="n1"/>
<edge source="n0" target="n2"/>
<edge source="n2" target="n3"/>
<edge source="n1" target="n3"/>
<edge source="n3" target="n4"/>
</graph>
</graphml>
我编写的代码包含由于缺乏XML处理经验而无法解决的未知和错误。我非常感谢一些指示,这将有助于我继续。
library(XML)
# Read file
x <- xmlParse("test.graphml")
ns <- c(graphml ="http://graphml.graphdrawing.org/xmlns")
# Create new graph node
ng <- xmlNode("graph", attrs = c("id" = "g1"))
# Add n0-n3 as children of new graph node
n0_n1_n2_n3 <- getNodeSet(x,"//graphml:node[@id = 'n0' or @id='n1' or @id='n2' or @id='n3']", namespaces = ns)
ng <- append.xmlNode(ng, n0_n1_n2_n3)
# Get only graph node
g <- getNodeSet(x,"//graphml:graph", namespaces = ns)
# Remove nodes n0-n3 from the only graph node
# How I do this?
# This did not work: removeNodes(g, n0_n1_n2_n3)
# Add new graph node as child of only graph node
g <- append.xmlNode(g, ng)
#! Error message:
Error in UseMethod("append") :
no applicable method for 'append' applied to an object of class "XMLNodeSet"
答案 0 :(得分:1)
考虑XSLT,这是转换XML文件的专用语言。由于您需要修改XML(在选择的子组中添加父节点)并且必须在未声明的命名空间前缀(xmlns="http://graphml.graphdrawing.org/xmlns"
)中导航,因此XSLT是最佳解决方案。
但是,到目前为止,R还没有完全兼容的XSL模块来运行XSLT 1.0脚本,就像其他通用语言(Java,PHP,Python)一样。尽管如此,R可以使用system()
调用外部程序(包括上述语言)或专用XSLT处理器(Xalan, Saxon),或调用命令行解释器,包括PowerShell和终端xsltproc。以下是后面的解决方案。
XSLT (另存为.xsl,将在R脚本中引用)
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:doc="http://graphml.graphdrawing.org/xmlns"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<xsl:output method="xml" omit-xml-declaration="no" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="doc:graphml">
<xsl:copy>
<xsl:copy-of select="document('')/*/@xsi:schemaLocation"/>
<xsl:apply-templates select="doc:graph"/>
</xsl:copy>
</xsl:template>
<xsl:template match="doc:graph">
<xsl:element name="{local-name()}" namespace="http://graphml.graphdrawing.org/xmlns">
<xsl:apply-templates select="@*"/>
<xsl:element name="graph" namespace="http://graphml.graphdrawing.org/xmlns">
<xsl:attribute name="id">g1</xsl:attribute>
<xsl:apply-templates select="doc:node[position() < 5]"/>
</xsl:element>
<xsl:apply-templates select="doc:node[@id='n4']|doc:edge"/>
</xsl:element>
</xsl:template>
<xsl:template match="doc:graph/@*">
<xsl:attribute name="{local-name()}"><xsl:value-of select="."/></xsl:attribute>
</xsl:template>
<xsl:template match="doc:node|doc:edge">
<xsl:element name="{local-name()}" namespace="http://graphml.graphdrawing.org/xmlns">
<xsl:attribute name="{local-name(@*)}"><xsl:value-of select="@*"/></xsl:attribute>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
PowerShell 脚本(适用于Windows PC用户,另存为XMLTransform.ps1)
param ($xml, $xsl, $output)
if (-not $xml -or -not $xsl -or -not $output) {
Write-Host "& .\xslt.ps1 [-xml] xml-input [-xsl] xsl-input [-output] transform-output"
exit;
}
trap [Exception]{
Write-Host $_.Exception;
}
$xslt = New-Object System.Xml.Xsl.XslCompiledTransform;
$xslt.Load($xsl);
$xslt.Transform($xml, $output);
Write-Host "generated" $output;
R 脚本(调用命令行操作)
library(XML)
# WINDOWS USERS
ps <- '"C:\\Path\\To\\XMLTransform.ps1"' # POWER SHELL SCRIPT
input <- '"C:\\Path\\To\\Input.xml"' # XML SOURCE
xsl <- '"C:\\Path\\To\\XSLTScript.xsl"' # XSLT SCRIPT
output <- '"C:\\Path\\To\\Output.xml"' # BLANK, EMPTY FILE PATH TO BE CREATED
system(paste('Powershell.exe -executionpolicy remotesigned -File',
ps, input, xsl, output)) # NOTE SECURITY BYPASS ARGS
doc <- xmlParse("C:\\Path\\To\\Output.xml")
# UNIX (MAC/LINUX) USERS
system("xsltproc /path/to/XSLTScript.xsl /path/to/input.xml -o /path/to/output.xml")
doc <- xmlParse("/path/to/output.xml")
print(doc)
# <?xml version="1.0" encoding="utf-8"?>
# <graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
# <graph id="G" edgedefault="directed">
# <graph id="g1">
# <node id="n0"/>
# <node id="n1"/>
# <node id="n2"/>
# <node id="n3"/>
# </graph>
# <node id="n4"/>
# <edge source="n0"/>
# <edge source="n0"/>
# <edge source="n2"/>
# <edge source="n1"/>
# <edge source="n3"/>
# </graph>
# </graphml>