Question

如果citations节点的内容如下所示：

                <p>

            WAJWAJADS:

            </p>

<p>

            asdf

            </p>

<p>

            ALSOAS:

            </p>

<p>

            lorem ipsum...<br />
lorem<br />
blah blah <i>

            adfas &amp; dasdsaafs

            </i>, April 2011.<br />
lorem lorem dear lord the whitespace

            </p>

有没有办法使用XSLT将其转换为格式正确的HTML？

normalize-space()只是将所有内容汇总在一起。我设法做的最好的事情是在normalize-space()循环中的所有p个后代for-each，并将它们包装在p元素中。但是，任何内部标签仍然会丢失。

有没有更好的方法来解析所见即所得产生的火车残骸？不幸的是，我无法控制生成的XML。

Answer 1

我已经修改了Martin Honnen的答案：

<xsl:template match="text()">
    <xsl:value-of select="normalize-space(.)"/>
    <xsl:if test="substring(., string-length(.)) = ' ' and substring(., string-length(.) - 1, string-length(.)) != '  '">
        <xsl:text> </xsl:text>
    </xsl:if>
</xsl:template>

它测试最后一个字符是否为空格而后两个字符不是两个空格，如果为true，则插入一个空格。

Answer 2

首先需要一个格式良好的XML和root。

假设你有这个，你可以应用身份转换将源树复制到结果，剥离空格标签之间，可选择生成HTML输出（无XML声明）和缩进，并仅在文本节点中使用normalize-space()。

试试这个样式表：

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:strip-space elements="*"/>
    <xsl:output indent="yes" method="html"/>

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="text()">
         <xsl:value-of select="normalize-space(.)"/>
    </xsl:template>

</xsl:stylesheet>

应用于您提供的数据的结果将是：

<p>WAJWAJADS:</p>
<p>asdf</p>
<p>ALSOAS:</p>
<p>lorem ipsum...<br>lorem<br>blah blah<i>adfas &amp; dasdsaafs</i>, April 2011.<br>lorem lorem dear lord the whitespace
</p>

您可以在 XSLT Fiddle

中查看应用于您的示例的结果

UPDATE 1 ：在每个文本节点周围添加额外空格（并在计算节点的字符串值时避免连接），您可以将最后一个模板替换为：

<xsl:template match="text()">
    <xsl:value-of select="concat(' ',normalize-space(.),' ')"/>
</xsl:template>

结果：

<html>
   <p> WAJWAJADS: </p>
   <p> asdf </p>
   <p> ALSOAS: </p>
   <p> lorem ipsum... <br> lorem <br> blah blah <i> adfas &amp; dasdsaafs </i> , April 2011. <br> lorem lorem dear lord the whitespace 
   </p>
</html>

请参阅：http://xsltransform.net/3NzcBsE/1

更新2 ：在每个复制的元素后添加空格或换行符。在第一个模板中的<xsl:text>
</xsl:text>之后放置此<xsl:text> </xsl:text>（换行）或</xsl:copy>（对于空格）：

<xsl:template match="@*|node()"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> <xsl:text>
</xsl:text> </xsl:template>

结果：

<html> WAJWAJADS: asdf ALSOAS: lorem ipsum... lorem blah blahadfas & dasdsaafs , April 2011. lorem lorem dear lord the whitespace </html>

请参阅：http://xsltransform.net/3NzcBsE/2

Answer 3

使用身份转换模板以及执行规范化空间的文本节点的模板：

<xsl:template match="text()"><xsl:value-of select="normalize-space()"/></xsl:template>

Answer 4

如果示例包含真实文本而不是乱码，那么这个问题会更容易理解。＆＃34; 节点开始/结束和文本之间没有额外的空格。＆＃34;对预期结果的准确描述不够准确。

我将在这里猜测，并假设您确实想要在一个空间内执行<＃34; 空间运行＆＃34;在所有文本节点上操作。这可以按如下方式完成：

XSLT 1.0

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<!-- identity transform -->
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="text()" priority="1">
    <xsl:variable name="temp" select="normalize-space(concat('x', ., 'x'))" />
    <xsl:value-of select="substring($temp, 2, string-length($temp) - 2)"/>
</xsl:template>

</xsl:stylesheet>

当应用于以下测试输入时：

<chapter>


           <p>

    This         question          would         have

been       a     lot    <b>   easier      </b>      to understand 

        if     the      example   contained     

   <i>     real  </i>    text    instead   of 

   gibberish.

                     </p>


    <p>

    Here     is       an      example       of     preserving   zero     spaces 

    between    text   nodes:<br/>(continued)       on   a new   line. 




    </p>


        <p>

    Here  is       another      example       of     

    preserving   zero     spaces     within    a      text

    node:     <i>some     text  in      italic</i>       followed    

    by   normal      text. 


    </p>


</chapter>

结果将是：

<?xml version="1.0" encoding="UTF-8"?>
<chapter>
   <p> This question would have been a lot <b> easier </b> to understand if the example contained <i> real </i> text instead of gibberish. </p>
   <p> Here is an example of preserving zero spaces between text nodes:<br/>(continued) on a new line. </p>
   <p> Here is another example of preserving zero spaces within a text node: <i>some text in italic</i> followed by normal text. </p>
</chapter>

-
请注意，在HTML中呈现时，输入和输出之间没有区别。

转换节点内容以删除空格

4 个答案: