从xml中删除重复记录

时间:2014-11-11 10:20:44

标签: xslt

<?xml version="1.0" encoding="utf-8"?>
<Employee_Data>
    <Employee>
        <NEW_HIRE_OR_REHIRE />
        <RETIREMENT_BENEFITS />
        <ADDRESS />
        <PERSONAL>
            <Record>
                <SSN>327408678</SSN>
                <WDEmpID>10032417</WDEmpID>
                <Initiator />
                <Effective>20141014</Effective>
                <SeqNum>320</SeqNum>
                <Last_Name>VAN TREECK</Last_Name>
                <First_Name>DENISE</First_Name>
                <Middle_Name>J</Middle_Name>
                <Social_Suffix />
                <Birth_Date>19560422</Birth_Date>
                <Gender>F</Gender>
                <Ethnicity>0</Ethnicity>
                <Marital_Status_Date>19781202</Marital_Status_Date>
                <Marital_Status>M</Marital_Status>
                <CITIZENSHIP />
                <Military_Service_Status />
                <Disability />
                <Indicator>PersonalRecordChangesIndicator</Indicator>
            </Record>
        </PERSONAL>
        <STATUS />
        <POSITION>
            <Record>
                <SSN>327408678</SSN>
                <WDEmpID>10032417</WDEmpID>
                <Initiator />
                <Effective>20141006</Effective>
                <SeqNum>250</SeqNum>
                <ActionCode />
                <DEFAULTJOBCODE>YYY01</DEFAULTJOBCODE>
                <SAPJOBCODE>30000715</SAPJOBCODE>
                <PayEntity>NC1</PayEntity>
                <DIVISION>CORP</DIVISION>
                <ORGANIZATION>GES</ORGANIZATION>
                <COMPANYNUMBER>01</COMPANYNUMBER>
                <LEDGERDEPARTMENT>CHEN050D</LEDGERDEPARTMENT>
                <SALARY_GRADE>LC008</SALARY_GRADE>
                <TEMPORARY_POSITION_INDICATOR_FOR_POSITION_RECORD />
                <LocationCode>AP10</LocationCode>
                <LocationDepartment>050D</LocationDepartment>
                <WDEmpID>10032417</WDEmpID>
                <Indicator>Position and Location Change Indicator</Indicator>
            </Record>
            <Record>
                <SSN>327408678</SSN>
                <WDEmpID>10032417</WDEmpID>
                <Initiator />
                <Effective>20141006</Effective>
                <SeqNum>250</SeqNum>
                <ActionCode />
                <DEFAULTJOBCODE>YYY01</DEFAULTJOBCODE>
                <SAPJOBCODE>30000715</SAPJOBCODE>
                <PayEntity>NC1</PayEntity>
                <DIVISION>CORP</DIVISION>
                <ORGANIZATION>GES</ORGANIZATION>
                <COMPANYNUMBER>01</COMPANYNUMBER>
                <LEDGERDEPARTMENT>CHEN050D</LEDGERDEPARTMENT>
                <SALARY_GRADE>LC008</SALARY_GRADE>
                <TEMPORARY_POSITION_INDICATOR_FOR_POSITION_RECORD />
                <LocationCode>AP10</LocationCode>
                <LocationDepartment>050D</LocationDepartment>
                <WDEmpID>10032417</WDEmpID>
                <Indicator>Position and Location Department change Indicator
                </Indicator>
            </Record>
        </POSITION>
        <COMPENSATION />
        <SALES_TERRITORY />
        <TERMINATION />
    </Employee>
</Employee_Data>   

这是由1 xslt生成的xml。我基本上想要检查位置选项卡下的2条记录是否相似,然后只保留1个其他记录。在这种情况下,输出应如下所示。

<?xml version="1.0" encoding="utf-8"?>
<Employee_Data>
    <Employee>
        <NEW_HIRE_OR_REHIRE />
        <RETIREMENT_BENEFITS />
        <ADDRESS />
        <PERSONAL>
            <Record>
                <SSN>327408678</SSN>
                <WDEmpID>10032417</WDEmpID>
                <Initiator />
                <Effective>20141014</Effective>
                <SeqNum>320</SeqNum>
                <Last_Name>VAN TREECK</Last_Name>
                <First_Name>DENISE</First_Name>
                <Middle_Name>J</Middle_Name>
                <Social_Suffix />
                <Birth_Date>19560422</Birth_Date>
                <Gender>F</Gender>
                <Ethnicity>0</Ethnicity>
                <Marital_Status_Date>19781202</Marital_Status_Date>
                <Marital_Status>M</Marital_Status>
                <CITIZENSHIP />
                <Military_Service_Status />
                <Disability />
                <Indicator>PersonalRecordChangesIndicator</Indicator>
            </Record>
        </PERSONAL>
        <STATUS />
        <POSITION>
            <Record>
                <SSN>327408678</SSN>
                <WDEmpID>10032417</WDEmpID>
                <Initiator />
                <Effective>20141006</Effective>
                <SeqNum>250</SeqNum>
                <ActionCode />
                <DEFAULTJOBCODE>YYY01</DEFAULTJOBCODE>
                <SAPJOBCODE>30000715</SAPJOBCODE>
                <PayEntity>NC1</PayEntity>
                <DIVISION>CORP</DIVISION>
                <ORGANIZATION>GES</ORGANIZATION>
                <COMPANYNUMBER>01</COMPANYNUMBER>
                <LEDGERDEPARTMENT>CHEN050D</LEDGERDEPARTMENT>
                <SALARY_GRADE>LC008</SALARY_GRADE>
                <TEMPORARY_POSITION_INDICATOR_FOR_POSITION_RECORD />
                <LocationCode>AP10</LocationCode>
                <LocationDepartment>050D</LocationDepartment>
                <WDEmpID>10032417</WDEmpID>
                <Indicator>Position and Location Change Indicator</Indicator>
            </Record>
            </POSITION>
        <COMPENSATION />
        <SALES_TERRITORY />
        <TERMINATION />
    </Employee>
</Employee_Data>   

1 个答案:

答案 0 :(得分:0)

要删除重复记录,您可以使用Muenchian分组:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="no" indent="yes"/>
<xsl:strip-space elements="*"/>
  <xsl:key name="seqNumByRecord" match="Record" use="SeqNum"/>
  <xsl:template match="node()|@*">
    <xsl:copy>
        <xsl:apply-templates select="node()|@*"/>
    </xsl:copy>
  </xsl:template>
  <xsl:template match=
  "Record[not(generate-id() = generate-id(key('seqNumByRecord', SeqNum)[1]))]"
  />
</xsl:stylesheet>

第一个模板是identity transform,它匹配所有属性和节点并复制它们。第二个模板是一个匹配所有重复项的空模板。由于模板为空,因此将删除这些记录。有关Muenchian Grouping的详细解释,建议使用Jeni Tennison的这篇文章:http://www.jenitennison.com/xslt/grouping/muenchian.html,您还可以在Stackoverflow上找到许多优秀的答案。

对于多样性,第二种解决方案:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output omit-xml-declaration="yes" indent="yes"/>
  <xsl:strip-space elements="*"/>
  <xsl:template match="node()|@*">
    <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
    </xsl:copy>
  </xsl:template>
  <xsl:template match="Record[SeqNum = following::Record/SeqNum]"/>
</xsl:stylesheet>

在此方法中,用于删除重复项的空模板会匹配与以下记录具有相同SeqNum的所有记录。正如您在上述文章中所发现的那样,Muenchian方法被认为更有效。