使用R XML将子节点添加到现有节点

时间:2016-07-31 13:55:15

标签: r xml graphml

我有以下XML文件 test.graphml ,我试图使用R中的XML包进行操作。

<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns"  
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns
     http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
  <graph id="G" edgedefault="directed">
    <node id="n0"/>
    <node id="n1"/>
    <node id="n2"/>
    <node id="n3"/>
    <node id="n4"/>
    <edge source="n0" target="n1"/>
    <edge source="n0" target="n2"/>
    <edge source="n2" target="n3"/>
    <edge source="n1" target="n3"/>
    <edge source="n3" target="n4"/>
  </graph>
</graphml>

我想将节点n0,n1,n2和n3嵌套到一个新的图节点中,如下所示。

<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns"  
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns
     http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
  <graph id="G" edgedefault="directed">
    <graph id="g1">
       <node id="n0"/>
       <node id="n1"/>
       <node id="n2"/>
       <node id="n3"/>
    </graph>
    <node id="n4"/>
    <edge source="n0" target="n1"/>
    <edge source="n0" target="n2"/>
    <edge source="n2" target="n3"/>
    <edge source="n1" target="n3"/>
    <edge source="n3" target="n4"/>
  </graph>
</graphml>

我编写的代码包含由于缺乏XML处理经验而无法解决的未知和错误。我非常感谢一些指示,这将有助于我继续。

library(XML)

# Read file
x <- xmlParse("test.graphml")
ns <- c(graphml ="http://graphml.graphdrawing.org/xmlns")

# Create new graph node
ng <- xmlNode("graph", attrs = c("id" = "g1"))

# Add n0-n3 as children of new graph node 
n0_n1_n2_n3 <- getNodeSet(x,"//graphml:node[@id = 'n0' or @id='n1' or @id='n2' or @id='n3']", namespaces = ns)
ng <- append.xmlNode(ng, n0_n1_n2_n3)

# Get only graph node
g <- getNodeSet(x,"//graphml:graph", namespaces = ns)

# Remove nodes n0-n3 from the only graph node
# How I do this?
# This did not work: removeNodes(g, n0_n1_n2_n3)

# Add new graph node as child of only graph node    
g <- append.xmlNode(g, ng)
  #! Error message:
  Error in UseMethod("append") : 
  no applicable method for 'append' applied to an object of class "XMLNodeSet"

1 个答案:

答案 0 :(得分:1)

考虑XSLT,这是转换XML文件的专用语言。由于您需要修改XML(在选择的子组中添加父节点)并且必须在未声明的命名空间前缀(xmlns="http://graphml.graphdrawing.org/xmlns")中导航,因此XSLT是最佳解决方案。

但是,到目前为止,R还没有完全兼容的XSL模块来运行XSLT 1.0脚本,就像其他通用语言(Java,PHP,Python)一样。尽管如此,R可以使用system()调用外部程序(包括上述语言)或专用XSLT处理器(Xalan, Saxon),或调用命令行解释器,包括PowerShell和终端xsltproc。以下是后面的解决方案。

XSLT (另存为.xsl,将在R脚本中引用)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
     xmlns:doc="http://graphml.graphdrawing.org/xmlns"  
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    
     xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">    
  <xsl:output method="xml" omit-xml-declaration="no" indent="yes"/>
  <xsl:strip-space elements="*"/>

  <xsl:template match="doc:graphml">
    <xsl:copy>
      <xsl:copy-of select="document('')/*/@xsi:schemaLocation"/>
      <xsl:apply-templates select="doc:graph"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="doc:graph">
    <xsl:element name="{local-name()}" namespace="http://graphml.graphdrawing.org/xmlns">
      <xsl:apply-templates select="@*"/>
      <xsl:element name="graph" namespace="http://graphml.graphdrawing.org/xmlns">
        <xsl:attribute name="id">g1</xsl:attribute>
        <xsl:apply-templates select="doc:node[position() &lt; 5]"/>
      </xsl:element>
      <xsl:apply-templates select="doc:node[@id='n4']|doc:edge"/>
    </xsl:element>
  </xsl:template>

  <xsl:template match="doc:graph/@*">
    <xsl:attribute name="{local-name()}"><xsl:value-of select="."/></xsl:attribute>
  </xsl:template>  

   <xsl:template match="doc:node|doc:edge">   
    <xsl:element name="{local-name()}" namespace="http://graphml.graphdrawing.org/xmlns">
      <xsl:attribute name="{local-name(@*)}"><xsl:value-of select="@*"/></xsl:attribute>
    </xsl:element>
   </xsl:template>       
</xsl:stylesheet>

PowerShell 脚本(适用于Windows PC用户,另存为XMLTransform.ps1)

param ($xml, $xsl, $output)

if (-not $xml -or -not $xsl -or -not $output) {
    Write-Host "& .\xslt.ps1 [-xml] xml-input [-xsl] xsl-input [-output] transform-output"
    exit;
}

trap [Exception]{
    Write-Host $_.Exception;
}

$xslt = New-Object System.Xml.Xsl.XslCompiledTransform;
$xslt.Load($xsl);
$xslt.Transform($xml, $output);

Write-Host "generated" $output;

R 脚本(调用命令行操作)

library(XML)

# WINDOWS USERS
ps <- '"C:\\Path\\To\\XMLTransform.ps1"'  # POWER SHELL SCRIPT
input <- '"C:\\Path\\To\\Input.xml"'      # XML SOURCE
xsl <- '"C:\\Path\\To\\XSLTScript.xsl"'   # XSLT SCRIPT
output <- '"C:\\Path\\To\\Output.xml"'    # BLANK, EMPTY FILE PATH TO BE CREATED

system(paste('Powershell.exe -executionpolicy remotesigned -File', 
             ps, input, xsl, output))              # NOTE SECURITY BYPASS ARGS
doc <- xmlParse("C:\\Path\\To\\Output.xml")

# UNIX (MAC/LINUX) USERS
system("xsltproc /path/to/XSLTScript.xsl /path/to/input.xml -o /path/to/output.xml")
doc <- xmlParse("/path/to/output.xml")

print(doc)    
# <?xml version="1.0" encoding="utf-8"?>
# <graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
#   <graph id="G" edgedefault="directed">
#     <graph id="g1">
#       <node id="n0"/>
#       <node id="n1"/>
#       <node id="n2"/>
#       <node id="n3"/>
#     </graph>
#     <node id="n4"/>
#     <edge source="n0"/>
#     <edge source="n0"/>
#     <edge source="n2"/>
#     <edge source="n1"/>
#     <edge source="n3"/>
#   </graph>
# </graphml>