所有级别的XSLT剥离标签

时间:2019-05-06 16:02:14

标签: xml xslt

我有一些XML需要使用XML进行转换。当我创建XSLT时,数据采用一种格式,但是后来格式改变了,所以我需要相应地更改XSLT。

XSLT应该创建一个原始文本标签,然后去除句子<S>标签中的元数据,并将它们附加到变量名(即<ENAMEX type="PERSON"...变成ENAMEX_PERSON)。在整个xml为<DOC> ... </DOC>之前,但现在为<NORMDOC> <DOC> ... </DOC> ... </NORMDOC>,因此我在选择模式中对其进行了修复,但是现在它去除了<TXT>之前的所有标记,而在选择模式之前则没有只是DOC/。如何更改我的XSLT,使其仅在TXT中进行剥离?

输入

<NORMDOC>
<DOC>
<DOCID>123</DOCID>
<FI fitype="B" xref="12345">
<FIName>BA</FIName>
<FITIN>456</FITIN>
</FI>
<OIs>
<OI xref="54321">
<OIName>BA</OIName>
</OI>
</OIs>
<Subjects>
<Subject stype="PER" xref="111111">
<SubjectFullName type="L">DISNEY/WALT</SubjectFullName>
<SubjectLastName type="L">DISNEY</SubjectLastName>
<SubjectFirstName type="L">WALT</SubjectFirstName>


<SubjectPhone type="Work">1234567890</SubjectPhone>
<SubjectPhone type="Residence">9876543210</SubjectPhone>
</Subject>
</Subjects>
<TXT>
<S sid="123-SENT-001">INTRODUCTION  this is being filed to report suspicious activity between customer<WH/>&apos;<WH/>s personal account and his animation business.</S> <S sid="123-SENT-002">The following suspect was identified: <ENAMEX type="PERSON" id="PER-123-000">WALT DISNEY</ENAMEX>.</S> <S sid="123-SENT-003">The reportable amount is <NUMEX type="MONEY" id="MON-123-001">$123,456</NUMEX>.</S> <S sid="123-SENT-004">The suspicious activity took place between <TIMEX type="DATE" id="DAT-123-002">06/01/1923</TIMEX> and <TIMEX type="DATE" id="DAT-123-003">12/15/1966</TIMEX> at studios in <LOCEX type="LOCATION" id="LOC-123-004">Los Angeles</LOCEX>, <LOCEX type="STATE" id="STA-123-005">CA</LOCEX> (<ENAMEX type="BRANCH" id="BRA-123-006">Sixth &amp; Central</ENAMEX>; <LOCEX type="LOCATION" id="LOC-123-007">Wilshire</LOCEX>-<LOCEX type="LOCATION" id="LOC-123-008">La Brea</LOCEX>; <ENAMEX type="ORGANIZATION" id="ORG-123-009">La Brea-Rosewood</ENAMEX>; Melrose-Fairfax) and theatres in <LOCEX type="LOCATION" id="LOC-123-010">Los Angeles</LOCEX>, CA.</S>
</TXT>
</DOC>
<ENTINFO ID="ACC-123-081" TYPE="ACCOUNT" NORM="222222222" REFID="ACC-123-081" ACCT-TYPE="CHK" MENTION="account: animation studio checking account 222222222" />
</NORMDOC>

XSLT

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

  <xsl:output method="xml" indent="yes" />

  <xsl:template match="/">
    <DOC>
      <xsl:apply-templates select="NORMDOC/DOC/*" />
      <xsl:apply-templates select="NORMDOC/DOC/TXT" mode="extra"/>
   </DOC>
  </xsl:template>

  <xsl:template match="*">
    <xsl:copy>
      <xsl:value-of select="current()"/>
     </xsl:copy>
  </xsl:template>

  <xsl:template match="TXT">
    <RAW_TXT>
      <xsl:value-of select="current()"/>
     </RAW_TXT>
  </xsl:template>

  <xsl:template match="TXT" mode="extra">
  <TXT>
    <xsl:for-each select="*">
      <xsl:element name="{local-name()}">
        <xsl:for-each select="*">
          <xsl:variable name="type" select="@type"/>
          <xsl:element name="{concat(name(), '_', $type)}">
          <xsl:value-of select="current()"/>
        </xsl:element>
        </xsl:for-each>
      </xsl:element>
    </xsl:for-each>
  </TXT>
  </xsl:template>
</xsl:stylesheet>

实际输出

<DOC>
   <DOCID>123</DOCID>
   <FI>
BA
456
</FI>
   <OIs>

BA

</OIs>
   <Subjects>

DISNEY/WALT
DISNEY
WALT


1234567890
9876543210

</Subjects>
   <RAW_TXT>
INTRODUCTION  this is being filed to report suspicious activity between customer's personal account and his animation business. The following suspect was identified: WALT DISNEY. The reportable amount is $123,456. The suspicious activity took place between 06/01/1923 and 12/15/1966 at studios in Los Angeles, CA (Sixth &amp; Central; Wilshire-La Brea; La Brea-Rosewood; Melrose-Fairfax) and theatres in Los Angeles, CA.
</RAW_TXT>
   <TXT>
      <S>
         <WH_/>
         <WH_/>
      </S>
      <S>
         <ENAMEX_PERSON>WALT DISNEY</ENAMEX_PERSON>
      </S>
      <S>
         <NUMEX_MONEY>$123,456</NUMEX_MONEY>
      </S>
      <S>
         <TIMEX_DATE>06/01/1923</TIMEX_DATE>
         <TIMEX_DATE>12/15/1966</TIMEX_DATE>
         <LOCEX_LOCATION>Los Angeles</LOCEX_LOCATION>
         <LOCEX_STATE>CA</LOCEX_STATE>
         <ENAMEX_BRANCH>Sixth &amp; Central</ENAMEX_BRANCH>
         <LOCEX_LOCATION>Wilshire</LOCEX_LOCATION>
         <LOCEX_LOCATION>La Brea</LOCEX_LOCATION>
         <ENAMEX_ORGANIZATION>La Brea-Rosewood</ENAMEX_ORGANIZATION>
         <LOCEX_LOCATION>Los Angeles</LOCEX_LOCATION>
      </S>
   </TXT>
</DOC>

预期产量

<DOC>
   <DOCID>123</DOCID>
   <FI>
<FINAME>BA</FINAME><FITIN>456</FITIN>
</FI>
   <OIs>
<OINAME>BA</OINAME>
</OIs>
   <Subjects>
<SubjectFullName>DISNEY/WALT</SubjectFullName>
<SubjectLastName>DISNEY</SubjectLastName>
<SubjectFirstName>WALT</SubjectFirstName>
<SubjectPhone_Work>1234567890</SubjectPhone_Work>
<SubjectPhone_Residence>9876543210</SubjectPhone_Residence>
</Subjects>
   <RAW_TXT>
INTRODUCTION  this is being filed to report suspicious activity between customer's personal account and his animation business. The following suspect was identified: WALT DISNEY. The reportable amount is $123,456. The suspicious activity took place between 06/01/1923 and 12/15/1966 at studios in Los Angeles, CA (Sixth &amp; Central; Wilshire-La Brea; La Brea-Rosewood; Melrose-Fairfax) and theatres in Los Angeles, CA.
</RAW_TXT>
   <TXT>
      <S>
         <WH_/>
         <WH_/>
      </S>
      <S>
         <ENAMEX_PERSON>WALT DISNEY</ENAMEX_PERSON>
      </S>
      <S>
         <NUMEX_MONEY>$123,456</NUMEX_MONEY>
      </S>
      <S>
         <TIMEX_DATE>06/01/1923</TIMEX_DATE>
         <TIMEX_DATE>12/15/1966</TIMEX_DATE>
         <LOCEX_LOCATION>Los Angeles</LOCEX_LOCATION>
         <LOCEX_STATE>CA</LOCEX_STATE>
         <ENAMEX_BRANCH>Sixth &amp; Central</ENAMEX_BRANCH>
         <LOCEX_LOCATION>Wilshire</LOCEX_LOCATION>
         <LOCEX_LOCATION>La Brea</LOCEX_LOCATION>
         <ENAMEX_ORGANIZATION>La Brea-Rosewood</ENAMEX_ORGANIZATION>
         <LOCEX_LOCATION>Los Angeles</LOCEX_LOCATION>
      </S>
   </TXT>
</DOC>

2 个答案:

答案 0 :(得分:1)

超越身份规则是解决问题的最佳方法。此样式表:

func animateCatTailWagging() {
    catImage1 = UIImage(named: "gacha_title_cat_01")
    catImage2 = UIImage(named: "gacha_title_cat_02")
    catImage3 = UIImage(named: "gacha_title_cat_03")
    catImage4 = UIImage(named: "gacha_title_cat_04")
    catImage5 = UIImage(named: "gacha_title_cat_05")
    catImage6 = UIImage(named: "gacha_title_cat_06")
    catImage7 = UIImage(named: "gacha_title_cat_07")

    catImages = [catImage1, catImage2, catImage3, catImage4, catImage5, catImage6]

    animatedCatImage = UIImage.animatedImage(with: catImages, duration: 1.0)
    catImage.image = animatedCatImage
}

输出:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
  <xsl:template match="node()|@*" name="identity">
    <xsl:copy>
      <xsl:apply-templates select="node()|@*"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="NORMDOC">
    <xsl:apply-templates/>
  </xsl:template>

  <xsl:template match="TXT">
    <RAW_TXT>
      <xsl:value-of select="."/>
    </RAW_TXT>
    <xsl:call-template name="identity"/>
  </xsl:template>

  <xsl:template match="TXT/S/text()|ENTINFO"/>
</xsl:stylesheet>

请注意:对<DOC> <DOCID>123</DOCID> <FI fitype="B" xref="12345"> <FIName>BA</FIName> <FITIN>456</FITIN> </FI> <OIs> <OI xref="54321"> <OIName>BA</OIName> </OI> </OIs> <Subjects> <Subject stype="PER" xref="111111"> <SubjectFullName type="L">DISNEY/WALT</SubjectFullName> <SubjectLastName type="L">DISNEY</SubjectLastName> <SubjectFirstName type="L">WALT</SubjectFirstName> <SubjectPhone type="Work">1234567890</SubjectPhone> <SubjectPhone type="Residence">9876543210</SubjectPhone> </Subject> </Subjects> <RAW_TXT>INTRODUCTION this is being filed to report suspicious activity between customer's personal account and his animation business.The following suspect was identified: WALT DISNEY.The reportable amount is $123,456.The suspicious activity took place between 06/01/1923 and 12/15/1966 at studios in Los Angeles, CA (Sixth &amp; Central; Wilshire-La Brea; La Brea-Rosewood; Melrose-Fairfax) and theatres in Los Angeles, CA.</RAW_TXT> <TXT> <S sid="123-SENT-001"> <WH/> <WH/> </S> <S sid="123-SENT-002"> <ENAMEX type="PERSON" id="PER-123-000">WALT DISNEY</ENAMEX> </S> <S sid="123-SENT-003"> <NUMEX type="MONEY" id="MON-123-001">$123,456</NUMEX> </S> <S sid="123-SENT-004"> <TIMEX type="DATE" id="DAT-123-002">06/01/1923</TIMEX> <TIMEX type="DATE" id="DAT-123-003">12/15/1966</TIMEX> <LOCEX type="LOCATION" id="LOC-123-004">Los Angeles</LOCEX> <LOCEX type="STATE" id="STA-123-005">CA</LOCEX> <ENAMEX type="BRANCH" id="BRA-123-006">Sixth &amp; Central</ENAMEX> <LOCEX type="LOCATION" id="LOC-123-007">Wilshire</LOCEX> <LOCEX type="LOCATION" id="LOC-123-008">La Brea</LOCEX> <ENAMEX type="ORGANIZATION" id="ORG-123-009">La Brea-Rosewood</ENAMEX> <LOCEX type="LOCATION" id="LOC-123-010">Los Angeles</LOCEX> </S> </TXT> </DOC> 元素使用“绕过规则” ;使用空规则来剥离NORMDOC'文本节点childs和S元素及其后代;使用命名模板可以覆盖ENTINFO元素的标识规则,但又不会丢失其重用的机会。

答案 1 :(得分:1)

AFAICT,以下样式表返回预期结果:

XSLT 1.0

<xsl:stylesheet version="2.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<xsl:template match="/NORMDOC">
    <xsl:apply-templates select="DOC"/>
</xsl:template>

<xsl:template match="*">
    <xsl:copy>
        <xsl:apply-templates/>
    </xsl:copy>
</xsl:template>

<xsl:template match="TXT">
    <RAW_TXT>
        <xsl:value-of select="."/>
    </RAW_TXT>
    <xsl:copy>
        <xsl:apply-templates/>
    </xsl:copy>
</xsl:template>

<xsl:template match="S">
    <xsl:copy>
        <xsl:apply-templates select="*" mode="extra"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="*" mode="extra">
    <xsl:element name="{name()}_{@type}">
        <xsl:apply-templates/>
    </xsl:element>
</xsl:template>

</xsl:stylesheet>