以下代码未正确地将输入数据转换为XML。我是这么认为的,因为我不希望 Transformer 生成带有无效xml字符的输出(我在谈论&)。
以下是代码:
package com.example.test.formatter;
import java.io.StringWriter;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import android.test.AndroidTestCase;
import android.util.Log;
public class XmlTest extends AndroidTestCase {
public void testFormat() {
try {
String arbitraryInput = "Arbitrary input: \uD83D"; // we don't have control over this input
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
Document document = documentBuilder.newDocument();
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.setOutputProperty(OutputKeys.INDENT, "true");
StringWriter stringWriter = new StringWriter();
StreamResult result = new StreamResult(stringWriter);
DOMSource source = new DOMSource(document);
Element root = document.createElement("root");
Element subElement = document.createElement("key");
subElement.setTextContent(arbitraryInput);
root.appendChild(subElement);
document.appendChild(root);
stringWriter.getBuffer().setLength(0);
transformer.transform(source, result);
String parsed = stringWriter.toString(); // <root><key>Arbitrary input: �</key></root>
Log.e("parsed", parsed);
}
catch(Throwable ex) {
ex.printStackTrace();
}
}
}
我期待得到像
这样的东西<root><key>Arbitrary input: & #55357;</key></root>
但我得到了:
<root><key>Arbitrary input: �</key></root>
那么,如果我想获得Transformer的有效XML输出,该怎么办?
谢谢!
修改
我认为输出无效,因为当我尝试用PHP处理生成的XML输出时:
<?php
$data = "<root><key>Arbitrary input: �</key></root>";
$xmlDocument = new \DOMDocument();
$xmlDocument->loadXML($data);
我收到警告(如果环境配置为在警告时抛出异常,则会出现异常):
PHP Warning: DOMDocument::loadXML(): xmlParseCharRef: invalid xmlChar value 55357 in Entity, line: 1 in /tmp/test.php on line 6
PHP Stack trace:
PHP 1. {main}() /tmp/test.php:0
PHP 2. DOMDocument->loadXML() /tmp/test.php:6
请注意,如果我尝试使用DOMDocument(PHP)进行处理,则以下代码一切都会正常:
$data = " <root><key>Arbitrary input: & #55357;</key></root>";
Java Transformer或DOMDocument(PHP)做错了。你能指出我吗?
谢谢!
答案 0 :(得分:1)
经过多次调查后:\ uD83D确实是一个无效的角色。范围\ uD800到\ uDFFF由引导和跟踪代理的Unicode标准保留,永远不会分配字符。
如果只有字符有效,Java转换器使用的编码将是正确的。但由于事实并非如此,您正在尝试组装无效的XML文档。
构造
<root><key>Arbitrary input: & #55357;</key></root>
显然没有反映输入数据,它意味着key的值是
Arbitrary input: & #55357;
这与你想要的不同。