合并XML文件,同时忽略重复元素

时间:2017-01-17 22:57:02

标签: xml linux xslt xslt-1.0

我想要合并2个XML文件,但我不想更改原始文件中的任何现有元素。在linux系统上执行此操作的最佳方法是什么?

注意:有些关于使用XSLT的帖子似乎与我需要的很接近,但是我没有安装XSLT处理器(也没有权限安装它)。也就是说,我确实安装了xsltproc,但我不确定这会有所帮助。如果xsltproc会有所帮助,请提供合适的命令行示例。

以下是原始文件的片段:

<?xml version="1.0" encoding="utf-8"?>
<config xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance">
   <Comment>This file was automatically generated.</Comment>
   <FieldAttrs>
      <Name>FieldAttrsAll</Name>
      <Field>
         <Name>wLegExchInstIds</Name>
         <Fid>6203</Fid>
         <Type>StringVector</Type>
         <CheckModified>true</CheckModified>
         <PublishField>true</PublishField>
         <ClearDaily>false</ClearDaily>
      </Field>

      <Field>
         <Name>wPartitionId</Name>
         <Fid>5886</Fid>
         <Type>Integer</Type>
         <CheckModified>true</CheckModified>
         <PublishField>true</PublishField>
         <ClearDaily>false</ClearDaily>
      </Field>
   </FieldAttrs>
</config>

这是我需要合并的新文件:

<?xml version="1.0" encoding="utf-8"?>
<config xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance">
   <Comment>This file was automatically generated.</Comment>
   <FieldAttrs>
      <Name>FieldAttrsAll</Name>
      <Field>
         <Name>wLegExchInstIds</Name>
         <Fid>6203</Fid>
         <Type>StringVector</Type>
         <CheckModified>false</CheckModified>
         <PublishField>false</PublishField>
         <ClearDaily>false</ClearDaily>
      </Field>    
      <Field>
         <Name>wPartitionId</Name>
         <Fid>5886</Fid>
         <Type>Integer</Type>
         <CheckModified>false</CheckModified>
         <PublishField>false</PublishField>
         <ClearDaily>false</ClearDaily>
      </Field>    
      <Field>
         <Name>wUnverifiedPriceIndicator</Name>
         <Fid>5885</Fid>
         <Type>Bool</Type>
         <CheckModified>true</CheckModified>
         <PublishField>true</PublishField>
         <ClearDaily>true</ClearDaily>
      </Field>
      <Field>
         <Name>wCorrIsIrregular</Name>
         <Fid>5884</Fid>
         <Type>Bool</Type>
         <CheckModified>false</CheckModified>
         <PublishField>true</PublishField>
         <ClearDaily>true</ClearDaily>
      </Field>

   </FieldAttrs>
</config>

特别注意2件事:

  1. 新文件中更改了某些元素的现有值,
  2. 新文件中添加了新元素。
  3. 鉴于上述文件,我希望输出看起来如下:

    <config xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance">
      <Comment>This file was automatically generated.</Comment>
       <FieldAttrs>
          <Name>FieldAttrsAll</Name>
          <Field>
             <Name>wLegExchInstIds</Name>
             <Fid>6203</Fid>
             <Type>StringVector</Type>
             <CheckModified>true</CheckModified>
             <PublishField>true</PublishField>
             <ClearDaily>false</ClearDaily>
          </Field>
    
          <Field>
             <Name>wPartitionId</Name>
             <Fid>5886</Fid>
             <Type>Integer</Type>
             <CheckModified>true</CheckModified>
             <PublishField>true</PublishField>
             <ClearDaily>false</ClearDaily>
          </Field>
    
          <Field>
             <Name>wUnverifiedPriceIndicator</Name>
             <Fid>5885</Fid>
             <Type>Bool</Type>
             <CheckModified>true</CheckModified>
             <PublishField>true</PublishField>
             <ClearDaily>true</ClearDaily>
          </Field>
          <Field>
             <Name>wCorrIsIrregular</Name>
             <Fid>5884</Fid>
             <Type>Bool</Type>
             <CheckModified>false</CheckModified>
             <PublishField>true</PublishField>
             <ClearDaily>true</ClearDaily>
          </Field>    
       </FieldAttrs>
    </config>
    

2 个答案:

答案 0 :(得分:1)

考虑以下使用document()函数从外部XML解析的XSLT。这种方法实际上从更大的XML文件开始,从较短的XML解析值以删除重复项,而不是添加不同的节点:

XSLT (另存为.xsl文件,引用第二个XML文件保存在与第一个相同的目录中)

<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output version="1.0" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*"/>

 <!-- Identity Transform -->
 <xsl:template match="@*|node()">
   <xsl:copy>
     <xsl:apply-templates select="@*|node()"/>      
   </xsl:copy>
 </xsl:template>  

 <xsl:template match="FieldAttrs">
   <xsl:copy>
     <xsl:copy-of select="Name"/>
     <xsl:copy-of select="document('ShorterXML.xml')/config/FieldAttrs/Field"/>
     <xsl:apply-templates/>
   </xsl:copy>
 </xsl:template>

 <xsl:template match="Field[Name=document('ShorterXML.xml')/config/FieldAttrs/Field/Name]"/>

</xsl:transform>

Linux 命令行(仅在同一目录中引用其中一个XML文件作为输入)

xsltproc transform.xsl LongerXML.xml -o output.xml

<强>输出

<?xml version="1.0" encoding="UTF-8"?>
<config xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance">
  <Comment>This file was automatically generated.</Comment>
  <FieldAttrs>
    <Name>FieldAttrsAll</Name>
    <Field>
      <Name>wLegExchInstIds</Name>
      <Fid>6203</Fid>
      <Type>StringVector</Type>
      <CheckModified>true</CheckModified>
      <PublishField>true</PublishField>
      <ClearDaily>false</ClearDaily>
    </Field>
    <Field>
      <Name>wPartitionId</Name>
      <Fid>5886</Fid>
      <Type>Integer</Type>
      <CheckModified>true</CheckModified>
      <PublishField>true</PublishField>
      <ClearDaily>false</ClearDaily>
    </Field>
    <Name>FieldAttrsAll</Name>
    <Field>
      <Name>wUnverifiedPriceIndicator</Name>
      <Fid>5885</Fid>
      <Type>Bool</Type>
      <CheckModified>true</CheckModified>
      <PublishField>true</PublishField>
      <ClearDaily>true</ClearDaily>
    </Field>
    <Field>
      <Name>wCorrIsIrregular</Name>
      <Fid>5884</Fid>
      <Type>Bool</Type>
      <CheckModified>false</CheckModified>
      <PublishField>true</PublishField>
      <ClearDaily>true</ClearDaily>
    </Field>
  </FieldAttrs>
</config>

答案 1 :(得分:0)

我能够使用xsh以给定的方式合并这两个文件,XML::LibXML围绕libxml2使用my $old := open old.xml ; $field := hash Name //Field ; open new.xml ; for //Field { $exists = xsh:lookup('field', Name) ; if not($exists) copy . into $old/config/FieldAttrs ; } save :f merged.xml $old ;

withLatestFrom