从xml中剥离标记,同时使用xslt保留包含的文本和父标记

时间:2013-06-05 19:53:31

标签: xml xslt strip-tags

我正在尝试从以下XML中删除标记:

<vocabularyModel>
<conceptDomain name="ActAccountType">
    <annotations>
        <documentation>
            <definition>
                <text>
                    <p>
                        <b>Description: </b>more txt here </p>
                    <p>
                        <i>Examples: </i>
                    </p>
                    <p/>
                    <ul>
                        <li>
                            <p>Patient billing accounts</p>
                        </li>
                        <li>
                            <p>Cost center</p>
                        </li>
                        <li>
                            <p>Cash</p>
                        </li>
                    </ul>
                </text>
            </definition>
        </documentation>
    </annotations>
</conceptDomain>
<conceptDomain name="ActAdjudicationInformationCode">
    <annotations>
        <documentation>
            <definition>
                <text>
                    <p>long text.</p>
                    <p>long text.</p>
                    <p>long text.</p>
                    <p>long text.</p>
                </text>
            </definition>
        </documentation>
    </annotations>
</conceptDomain>
<conceptDomain name="ActAdjudicationType">
    <annotations>
        <documentation>
            <definition>
                <text>
                    <p>
                        <b>Description: </b>more text.</p>
                    <p>
                        <i>Examples: </i>
                    </p>
                    <p/>
                    <ul>
                        <li>
                            <p>adjudicated with adjustments</p>
                        </li>
                        <li>
                            <p>adjudicated as refused</p>
                        </li>
                        <li>
                            <p>adjudicated as submitted</p>
                        </li>
                    </ul>
                </text>
            </definition>
        </documentation>
    </annotations>
</conceptDomain>

如果文本下方的所有子标记都将被删除,但所需的xml和文本将如下所示:

<vocabularyModel>
<conceptDomain name="ActAccountType">
    <annotations>
        <documentation>
            <definition>
                <text>
                   Description: more txt here 
                        Examples: 
                          Patient billing accounts
                          Cost center
                          Cash
                </text>
            </definition>
        </documentation>
    </annotations>
</conceptDomain>
<conceptDomain name="ActAdjudicationInformationCode">
    <annotations>
        <documentation>
            <definition>
                <text>
                    long text.
                    long text.
                    long text.
                    long text.
                </text>
            </definition>
        </documentation>
    </annotations>>
</conceptDomain>
<conceptDomain name="ActAdjudicationReason">
    <annotations>
        <documentation>
            <definition>
                <text>
                    long text.
                    long text.
                    long text.
                    long text.
                </text>
            </definition>
        </documentation>
    </annotations>
    <specializesDomain name="ActReason"/>
</conceptDomain>
<conceptDomain name="ActAdjudicationType">
    <annotations>
        <documentation>
            <definition>
                <text>
                        Description: more text.
                        Examples: 
                            adjudicated with adjustments
                            adjudicated as refused
                            adjudicated as submitted
                </text>
            </definition>
        </documentation>
    </annotations>
</conceptDomain>

我在这里尝试了以下其他地方并进行了修改:

<xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>

<xsl:template match="node()|@*">
    <xsl:copy>
        <xsl:apply-templates select="node()|@*"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="p | b | li | ul | i">
    <xsl:apply-templates/>
</xsl:template>

但是这并没有删除任何元素,即使我将匹配限制在元素上也是如此。我还尝试了以下几种变体:

    <xsl:output method="xml"  indent="yes"/>


<xsl:template name="strip-tags">
    <xsl:param name="html"/>
    <xsl:choose>
        <xsl:when test="contains($html, '&lt;')">
            <xsl:value-of select="substring-before($html, '&lt;')"/>
            <xsl:call-template name="strip-tags">
                <xsl:with-param name="html" select="substring-after($html, '&gt;')"/>
            </xsl:call-template>
        </xsl:when>
        <xsl:otherwise>
            <xsl:value-of select="$html"/>
        </xsl:otherwise>
    </xsl:choose>
</xsl:template>


<xsl:template match="@* | node()">
    <xsl:copy>
        <xsl:apply-templates select="@* | node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="definition">
    <xsl:call-template name="strip-tags">
        <xsl:with-param name="html" select="text"/>
    </xsl:call-template>
</xsl:template>

如果我省略身份转换,将删除所有标签,但其他方面只会复制原始XML的内容。任何帮助都感激不尽。斯科特

1 个答案:

答案 0 :(得分:0)

您显示的第一个样式表的输出(如果添加缺少的xsl:stylesheet元素是

<vocabularyModel>
   <conceptDomain name="ActAccountType">
      <annotations>
         <documentation>
            <definition>
               <text>Description: more txt here Examples: Patient billing accountsCost centerCash</text>
            </definition>
         </documentation>
      </annotations>
   </conceptDomain>
   <conceptDomain name="ActAdjudicationInformationCode">
      <annotations>
         <documentation>
            <definition>
               <text>long text.long text.long text.long text.</text>
            </definition>
         </documentation>
      </annotations>
   </conceptDomain>
   <conceptDomain name="ActAdjudicationType">
      <annotations>
         <documentation>
            <definition>
               <text>Description: more text.Examples: adjudicated with adjustmentsadjudicated as refusedadjudicated as submitted</text>
            </definition>
         </documentation>
      </annotations>
   </conceptDomain>
</vocabularyModel>

这似乎是你想要的。也许您的真实输入是在命名空间中?