Question

我不经常使用XML，但我需要调整一些数据。我在记事本++下面发了一行。我需要删除LoanID重复的整行。这些文件包含大约200,000行，其中200个LoanID是重复的。

因为整行不需要重复，但只有1“列”我不能使用TextFX插件。例如，BorrowerID可能包含重复项。只允许LoanID包含重复项。

第1行：

<ns1:Loan>ns1:Identifiers:LoanID>876298<LoanID>  <ns1:IsRegulatedLoan>ND,6</ns1:IsRegulatedLoan><ns1:Originator>TestBank</ns1:Originator><ns1:ServicerID>Testbank NV</ns1:ServicerID><ns1:BorrowerID>26547</ns1:BorrowerID><ns1:PropertyID>364239</ns1:PropertyID>

LINE2：

ns1:Loan ns1:Identifiers>:LoanID>819305:LoanID>
ns1:IsRegulatedLoan>ND,6/:IsRegulatedLoanns1:Originator>TestBank/ns1:Originator>ns1:ServicerID>Testbank NV</ns1:ServicerID>ns1:BorrowerID>195797:BorrowerID>

Answer 1

在“行”级别处理XML并不是一个好主意，因为行结尾没有特别的意义，很容易改变。

对于这种操作，大多数人会使用XSLT。 XSLT有一个学习曲线，但如果您要使用XML，那么它是您工具包的重要组成部分，因此非常值得掌握它。典型代码（在XSLT 2.0中）看起来像这样：

<xsl:for-each-group select="ns1:Loan" group-by="LoanId">
  <xsl:copy-of select="current-group()[1]"/>
</xsl:for-each-group>

，如果给出一组副本，除了第一个之外都会丢弃。

删除XML中某些列包含重复项的行

1 个答案: