我有一个我需要通过XSL转换的HTML文档。
HTML文档包含
的用法
即,
ation.</span> </p><br/>All ...
首先我遇到了麻烦,因为没有定义。 所以我定义了它:
<?xml version=\"1.0\"?>
<!DOCTYPE html [
<!ENTITY nbsp " ">
"]>
我是通过在将代码发送到转换之前将该代码添加到HTML字符串中来实现的。在转换之后,ENTITY声明很方便,并且,是的,很好,转换实际上已成功。
然而!由于nbsp被定义为空格,因此生成的HTML / XML看到字符串" "
实际上被空格字符替换。
这不是我想要的。我需要结果的一部分与源不同。
所以,我尝试重新定义,就像这样:
<?xml version=\"1.0\"?>
<!DOCTYPE html [
<!ENTITY nbsp "&nbsp;">
"]>
但是,现在我没有看到结果中的空格,而是看到了字符"&nbsp;"
如果我试试这个:
<?xml version=\"1.0\"?>
<!DOCTYPE html [
<!ENTITY nbsp " ">
"]>
我得到一个递归声明异常。
我如何包含特殊字符'&amp;'在定义?
p.s。,这个转换我在Java 8中运行,默认引擎(我猜那是xalan?)。
全部谢谢!
以下是如何重现的简短示例。很抱歉没有提前提供。
package com.astraia.app.mainframe;
import java.io.*;
import javax.xml.transform.*;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
public class ShortExample
{
public static void main(String[] args)
{
StringBuffer htmlMain = new StringBuffer(500);
htmlMain .append("<html><head></head>")
.append(" <body>)")
.append(" <p data-tags=\"personal\"><strong>name: Nerea Morry, Id: 5678</strong><br/></p>")
.append(" <p><span>some text</span> </p><br/>some more text")
.append(" </body>")
.append("</html>");
StringBuffer xsl = new StringBuffer(500);
xsl .append("<?xml version=\"1.0\" encoding=\"UTF-8\"?>")
.append("<xsl:stylesheet xmlns:xsl=\"http://www.w3.org/1999/XSL/Transform\" version=\"1.0\">")
.append(" <xsl:output method=\"xml\" version=\"1.0\" encoding=\"UTF-8\" omit-xml-declaration=\"yes\" />")
.append(" <xsl:template match=\"node()|@*\" >")
.append(" <!-- Copy all nodes -->")
.append(" <xsl:copy>")
.append(" <xsl:apply-templates select=\"node()|@*\" />")
.append(" </xsl:copy>")
.append(" </xsl:template>")
.append(" <!-- Anonymize all text within tags indicated as personal -->")
.append(" <xsl:template match=\"*[@data-tags = 'personal' ]//text()[normalize-space(.) != '']\">ANONYMIZED TEXT</xsl:template>")
.append(" </xsl:stylesheet>");
String plainHtml = htmlMain.toString();
String transformation = xsl.toString();
// results in   being replaced by a space
printResult("results in   being replaced by a space", plainHtml," ", transformation);
// results in seemingly non-replaced escape code &
printResult("results in seemingly non-replaced escape code &", plainHtml,"&nbsp", transformation);
// results in recursion exception
printResult("results in recursion exception", plainHtml," ", transformation);
// also results in recursion exception
printResult("also results in recursion exception", plainHtml,"&nbsp;", transformation);
// but what will result in:
// <html><head/> <body>) <p data-tags="personal"><strong>ANONYMIZED TEXT</strong><br/></p> <p><span>some text</span> </p><br/>some more text </body></html>
// ?
}
public static void printResult(String message, String plainHtml, String definition, String transformation) {
System.out.print(message);
System.out.println(performTransformation(plainHtml,definition, transformation));
System.out.println("\n-----");
}
public static String performTransformation(String plainHtml, String definition, String transformation)
{
String retval = null;
try {
StringWriter result = new StringWriter();
StringBuffer header = new StringBuffer(100);
header .append("<?xml version=\"1.0\"?>")
.append("<!DOCTYPE html [")
.append(" <!ENTITY nbsp REPLACE_ME>")
.append("]>\n");
String headerText = header.toString().replace("REPLACE_ME", "\"" + definition + "\"");
String wholeText = new StringBuffer(headerText).append(plainHtml).toString();
TransformerFactory factory = TransformerFactory.newInstance();
Source xslt = new StreamSource(new StringReader(transformation));
Transformer transformer = factory.newTransformer(xslt);
Source text = new StreamSource(new StringReader(wholeText));
transformer.transform(text, new StreamResult(result));
retval = result.toString();
}
catch (Exception e) {
System.out.println(e.getMessage());
}
return retval;
}
}
以下是我的小样本应用程序的输出:
results in   being replaced by a space<html><head/> <body>) <p data-tags="personal"><strong>ANONYMIZED TEXT</strong><br/></p> <p><span>some text</span> </p><br/>some more text </body></html>
-----
results in seemingly non-replaced escape code &<html><head/> <body>) <p data-tags="personal"><strong>ANONYMIZED TEXT</strong><br/></p> <p><span>some text</span>&nbsp</p><br/>some more text </body></html>
-----
results in recursion exceptionjavax.xml.transform.TransformerException: com.sun.org.apache.xml.internal.utils.WrappedRuntimeException: Recursive entity reference "nbsp". (Reference path: nbsp -> nbsp -> nbsp),
null
ERROR: 'Recursive entity reference "nbsp". (Reference path: nbsp -> nbsp -> nbsp),'
-----
ERROR: 'com.sun.org.apache.xml.internal.utils.WrappedRuntimeException: Recursive entity reference "nbsp". (Reference path: nbsp -> nbsp -> nbsp),'
also results in recursion exceptionERROR: 'Recursive entity reference "nbsp". (Reference path: nbsp -> nbsp -> nbsp),'
ERROR: 'com.sun.org.apache.xml.internal.utils.WrappedRuntimeException: Recursive entity reference "nbsp". (Reference path: nbsp -> nbsp -> nbsp),'
javax.xml.transform.TransformerException: com.sun.org.apache.xml.internal.utils.WrappedRuntimeException: Recursive entity reference "nbsp". (Reference path: nbsp -> nbsp -> nbsp),
null
-----
4次尝试的区别在于:
</span> </p><br/>some more text
</span>&nbsp</p><br/>some more text
exception
exception
答案 0 :(得分:1)
我相信你有两个选择:
将输出方法更改为html
;
这将输出任何不间断的空格
将输出编码更改为ASCII
;
这将输出任何不间断的空格 
注意:如果您将输出方法保留为xml
且编码保留为UTF-8
,则序列化结果仍应包含未转义非破碎的空间。您的处理链中可能还有其他东西可以防止这种情况发生 - 或者您可能将该字符误认为是常规空间(毕竟,在大多数情况下它们的呈现方式相同)。