合并具有相同结构和不同数据的XML文件

时间:2009-12-23 19:16:16

标签: xml merge

我正在尝试合并两个具有相同结构的文件,以及一些共同的数据。因此,如果两个文件中的节点具有相同的名称,则应使用两个原始节点的子节点创建新节点。原始文件如下:

file1.xml
<?xml version='1.0' encoding='UTF-8'?>
<BROADRIDGE>
    <SECURITY CUSIP='CUSIP1' DESCRIPT='CUSIP1'>
        <CUSTOMER ID='M1'/>
        <CUSTOMER ID='M2'/>
        <CUSTOMER ID='M3'/>
    </SECURITY>
    <SECURITY CUSIP='CUSIP3' DESCRIPT='CUSIP3'>
        <CUSTOMER ID='M4'/>
        <CUSTOMER ID='M5'/>
        <CUSTOMER ID='M6'/>
    </SECURITY>
</BROADRIDGE>

file2.xml
<?xml version='1.0' encoding='UTF-8'?>
<BROADRIDGE>
    <SECURITY CUSIP='CUSIP1' DESCRIPT='CUSIP1'>
        <CUSTOMER ID='B1'/>
        <CUSTOMER ID='B2'/>
        <CUSTOMER ID='B3'/>
    </SECURITY>
    <SECURITY CUSIP='CUSIP2' DESCRIPT='CUSIP2'>
        <CUSTOMER ID='B4'/>
        <CUSTOMER ID='B5'/>
        <CUSTOMER ID='B6'/>
    </SECURITY>
</BROADRIDGE>

我们的想法是创建一个新的XML文件,其结构与包含两个文件的信息相同,合并具有相同CUSIP属性的SECURITY节点。在这种情况下,结果应如下:

<?xml version="1.0" encoding="UTF-8"?>
<BROADRIDGE>
    <SECURITY CUSIP="CUSIP1">
        <CUSTOMER ID="M1"/>
        <CUSTOMER ID="M2"/>
        <CUSTOMER ID="M3"/>
        <CUSTOMER ID='B1'/>
        <CUSTOMER ID='B2'/>
        <CUSTOMER ID='B3'/>
    </SECURITY>
    <SECURITY CUSIP="CUSIP3">
        <CUSTOMER ID="M4"/>
        <CUSTOMER ID="M5"/>
        <CUSTOMER ID="M6"/>
    </SECURITY>
    <SECURITY CUSIP="CUSIP2">
        <CUSTOMER ID="B4"/>
        <CUSTOMER ID="B5"/>
        <CUSTOMER ID="B6"/>
    </SECURITY>
</BROADRIDGE>

我已经定义了folling xml来加入它们:

<?xml version="1.0"?>                                  
<MASTERFILE>
   <FILE>\file1.xml</FILE>
   <FILE>\file2.xml</FILE>
</MASTERFILE>

以下XSL进行合并:

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:template match="/MASTERFILE">
        <BROADRIDGE>
            <xsl:variable name="securities" select="document(FILE)/BROADRIDGE/SECURITY"/>
            <xsl:for-each select="$securities">
                <xsl:if test="generate-id(.) = generate-id($securities[@CUSIP=current()/@CUSIP])">
                    <SECURITY>
                        <xsl:attribute name="CUSIP" ><xsl:value-of select="@CUSIP"/></xsl:attribute>
                        <xsl:for-each select="CUSTOMER">
                            <CUSTOMER>
                                <xsl:attribute name="ID" ><xsl:value-of select="@ID"/></xsl:attribute>
                            </CUSTOMER>
                        </xsl:for-each>
                    </SECURITY>
                </xsl:if>
            </xsl:for-each>
        </BROADRIDGE>
    </xsl:template>
</xsl:stylesheet>

但我得到以下内容:

<?xml version="1.0" encoding="UTF-8"?>
<BROADRIDGE>
    <SECURITY CUSIP="CUSIP1">
        <CUSTOMER ID="M1"/>
        <CUSTOMER ID="M2"/>
        <CUSTOMER ID="M3"/>
    </SECURITY>
    <SECURITY CUSIP="CUSIP3">
        <CUSTOMER ID="M4"/>
        <CUSTOMER ID="M5"/>
        <CUSTOMER ID="M6"/>
    </SECURITY>
    <SECURITY CUSIP="CUSIP2">
        <CUSTOMER ID="B4"/>
        <CUSTOMER ID="B5"/>
        <CUSTOMER ID="B6"/>
    </SECURITY>
</BROADRIDGE>

知道为什么它没有将两个文件中的CUSTOMERS合并为SECURITY和CUSIP = CUSIP1?

4 个答案:

答案 0 :(得分:1)

对于参与给定转换的每个节点,generate-id()函数保证不同。当你在不同的文档上调用它时,它们将不一样

您应该比较文档中CUSIPS的字符串值而不是ID。

如果你可以使用xslt 2.0(比1好很多),这将有效

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
        <xsl:output indent="yes"/>
        <xsl:template match="/MASTERFILE">
                <BROADRIDGE>
                        <xsl:variable name="securities" select="document(FILE)/BROADRIDGE/SECURITY"/>
                        <xsl:for-each select="distinct-values($securities/@CUSIP)">
                                <SECURITY>
                                        <xsl:attribute name="CUSIP">
                                                <xsl:value-of select="."/>
                                        </xsl:attribute>

                                        <xsl:for-each select="distinct-values($securities[@CUSIP = 'CUSIP1']/CUSTOMER/@ID)">
                                                <CUSTOMER>
                                                  <xsl:attribute name="ID">
                                                  <xsl:value-of select="."/>
                                                  </xsl:attribute>
                                                </CUSTOMER>
                                        </xsl:for-each>
                                </SECURITY>
                        </xsl:for-each>
                </BROADRIDGE>
        </xsl:template>
</xsl:stylesheet>

答案 1 :(得分:1)

(请参阅我对OP上的“单向合并”的评论。)这是我(非常低效)合并问题的解决方案:

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:variable name="set1" select="document('file1.xml')/BROADRIDGE/SECURITY"/>
    <xsl:variable name="set2" select="document('file2.xml')/BROADRIDGE/SECURITY"/>

    <xsl:template match="/">
        <BROADRIDGE>
            <!-- walk over all relevant nodes -->
            <xsl:for-each select="$set1 | $set2">
                <xsl:variable name="position" select="position()"/>
                <xsl:variable name="cusip" select="@CUSIP"/>
                <!-- if we see this CUSIP for the first time, --> 
                <xsl:if test="count($nodes[position() &lt; $position][@CUSIP = $cusip])=0">
                    <SECURITY>                            
                        <xsl:attribute name="CUSIP"><xsl:value-of select="$cusip"/></xsl:attribute>
                        <!-- copy nodes from both sets with matching attribute -->
                        <xsl:copy-of select="$set1[@CUSIP = $cusip]/*"/>
                        <xsl:copy-of select="$set2[@CUSIP = $cusip]/*"/>
                    </SECURITY>
                </xsl:if>
            </xsl:for-each>
        </BROADRIDGE>
    </xsl:template>
</xsl:stylesheet>

请注意,样式表不会假设任何特定文档 - 它只是将两个文件作为变量加载。可以通过参数化要加载的XML文档的URL来改进xslt设计

要将合并应用于多个文档,您可以创建一个文件,例如master.xml,列出要处理的所有文件,如下所示:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="merge.xslt"?>
<files>
  <file>file1.xml</file>
  <file>file2.xml</file>
  ...
  <file>fileN.xml</file>    
</files>

在file1.xml中,我有这个:

<?xml version='1.0' encoding='UTF-8'?>
<BROADRIDGE>
  <SECURITY CUSIP='CUSIP1' DESCRIPT='CUSIP1'>
    <CUSTOMER ID='M1'/>
    <CUSTOMER ID='M2'/>
    <CUSTOMER ID='M3'/>
  </SECURITY>
  <SECURITY CUSIP='CUSIP3' DESCRIPT='CUSIP3'>
    <CUSTOMER ID='M4'/>
    <CUSTOMER ID='M5'/>
    <CUSTOMER ID='M6'/>
  </SECURITY>
</BROADRIDGE>

在file2.xml中,我有:

<?xml version='1.0' encoding='UTF-8'?>
<BROADRIDGE>
  <SECURITY CUSIP='CUSIP1' DESCRIPT='CUSIP1'>
    <CUSTOMER ID='B1'/>
    <CUSTOMER ID='B2'/>
    <CUSTOMER ID='B3'/>
  </SECURITY>
  <SECURITY CUSIP='CUSIP2' DESCRIPT='CUSIP2'>
    <CUSTOMER ID='B4'/>
    <CUSTOMER ID='B5'/>
    <CUSTOMER ID='B6'/>
  </SECURITY>
</BROADRIDGE>

merge.xslt是前一个版本的修改版本,现在能够处理可变数量的文件(master.xml中列出的文件):

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="/">
  <xsl:call-template name="merge-files"/>
</xsl:template>

<!-- loop through file names, load documents -->
<xsl:template name="merge-files">
  <xsl:param name="files" select="/files/file/text()"/>
  <xsl:param name="num-files" select="count($files)"/>
  <xsl:param name="curr-file" select="0"/>
  <xsl:param name="set" select="/*[0]"/>
  <xsl:choose> <!-- if we still have files, concat them to $set -->
    <xsl:when test="$curr-file &lt; $num-files">
      <xsl:call-template name="merge-files">
        <xsl:with-param name="files" select="$files"/>
        <xsl:with-param name="num-files" select="$num-files"/>
        <xsl:with-param name="curr-file" select="$curr-file + 1"/>
        <xsl:with-param name="set" select="$set | document($files[$curr-file+1])/BROADRIDGE/SECURITY"/>
      </xsl:call-template>
    </xsl:when>
    <xsl:otherwise> <!-- no more files, start merging. -->
      <xsl:call-template name="merge">
        <xsl:with-param name="nodes" select="$set"/>
      </xsl:call-template>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

<!-- perform the actual merge -->
<xsl:template name="merge">
  <xsl:param name="nodes"/>
  <BROADRIDGE>
    <xsl:for-each select="$nodes"> <!-- look at all possible nodes to merge -->
      <xsl:variable name="position" select="position()"/>
      <xsl:variable name="cusip" select="@CUSIP"/>

      <!-- when we encounter this id for the 1st time -->
      <xsl:if test="count($nodes[position() &lt; $position][@CUSIP = $cusip])=0"> 
        <SECURITY>
          <xsl:attribute name="CUSIP"><xsl:value-of select="$cusip"/></xsl:attribute>
          <!-- copy all node data related to this cusip here -->
          <xsl:for-each select="$nodes[@CUSIP = $cusip]">
            <xsl:copy-of select="*"/>
          </xsl:for-each>
        </SECURITY>
      </xsl:if>
    </xsl:for-each>
  </BROADRIDGE>
</xsl:template>

</xsl:stylesheet>

运行它给我这个输出:

<BROADRIDGE>
  <SECURITY CUSIP="CUSIP1">
    <CUSTOMER ID="M1"/>
    <CUSTOMER ID="M2"/>
    <CUSTOMER ID="M3"/>
    <CUSTOMER ID="B1"/>
    <CUSTOMER ID="B2"/>
    <CUSTOMER ID="B3"/>
  </SECURITY>
  <SECURITY CUSIP="CUSIP3">
    <CUSTOMER ID="M4"/>
    <CUSTOMER ID="M5"/>
    <CUSTOMER ID="M6"/>
  </SECURITY>
  <SECURITY CUSIP="CUSIP2">
    <CUSTOMER ID="B4"/>
    <CUSTOMER ID="B5"/>
    <CUSTOMER ID="B6"/>
  </SECURITY>
</BROADRIDGE>

答案 2 :(得分:1)

要么你太复杂了,要么你还没有提到这个问题的其他方面:

<xsl:variable name="file1" select="document(/MASTERFILE/FILE[1])"/>
<xsl:variable name="file2" select="document(/MASTERFILE/FILE[2])"/>

<xsl:template match="/">
   <BROADRIDGE>
      <xsl:apply-templates select="$file1/BROADRIDGE/SECURITY"/>
      <xsl:copy-of select="$file2/BROADRIDGE/SECURITY[not(@CUISP=$file1/BROADRIDGE/SECURITY/@CUISP)]"/>
   </BROADRIDGE>
</xsl:template>

<xsl:template match="SECURITY">
   <SECURITY>
      <xsl:copy-of select="*"/>
      <xsl:copy-of select="$file2/BROADRIDGE/SECURITY[@CUSIP=current()/@CUSIP]/*"/>
   </SECURITY>
</xsl:template>

答案 3 :(得分:0)

罗兰,谢谢你的例子。根据您发送的第一个代码,我开发了以下模板:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:variable name="nodes" select="document(/MASTERFILE/FILE)/BROADRIDGE/SECURITY"/>
    <xsl:template match="/">
        <BROADRIDGE>
            <!-- walk over all relevant nodes -->
            <xsl:for-each select="$nodes">
                <xsl:variable name="position" select="position()"/>
                <xsl:variable name="cusip" select="@CUSIP"/>
                <!-- if we see this CUSIP for the first time, --> 
                <xsl:if test="count($nodes[position() &lt; $position][@CUSIP = $cusip])=0">
                    <SECURITY>                            
                        <xsl:attribute name="CUSIP"><xsl:value-of select="$cusip"/></xsl:attribute>
                        <xsl:attribute name="DESCRIPT"><xsl:value-of select="@DESCRIPT"/></xsl:attribute>
                        <!-- copy nodes from both sets with matching attribute -->
                        <xsl:copy-of select="$nodes[@CUSIP = $cusip]/*"/>
                    </SECURITY>
                </xsl:if>
            </xsl:for-each>
        </BROADRIDGE>
    </xsl:template>

我只是向文档功能提供文件列表,因此它创建了一个包含所有文件中所有SECURITY节点的节点集。当我将它应用于以下xml

<?xml version="1.0"?>
<MASTERFILE>
   <FILE>\file1.xml</FILE>
   <FILE>\file2.xml</FILE>
   <FILE>\file3.xml</FILE>
</MASTERFILE>

完美无缺。谢谢您的样品