Java Xml转换和代理

时间:2013-10-31 16:06:35

标签: java android xml transformer surrogate-pairs

以下代码未正确地将输入数据转换为XML。我是这么认为的,因为我不希望 Transformer 生成带有无效xml字符的输出(我在谈论&)。

以下是代码:

package com.example.test.formatter;

import java.io.StringWriter;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import android.test.AndroidTestCase;
import android.util.Log;

public class XmlTest extends AndroidTestCase {

    public void testFormat() {

        try {
            String arbitraryInput = "Arbitrary input: \uD83D"; // we don't have control over this input

            DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
            DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
            Document document = documentBuilder.newDocument();

            TransformerFactory transformerFactory = TransformerFactory.newInstance();
            Transformer transformer = transformerFactory.newTransformer();
            transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
            transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
            transformer.setOutputProperty(OutputKeys.INDENT, "true");

            StringWriter stringWriter = new StringWriter();
            StreamResult result = new StreamResult(stringWriter);
            DOMSource source = new DOMSource(document);

            Element root = document.createElement("root");
            Element subElement = document.createElement("key");
            subElement.setTextContent(arbitraryInput);
            root.appendChild(subElement);

            document.appendChild(root);

            stringWriter.getBuffer().setLength(0);
            transformer.transform(source, result);

            String parsed = stringWriter.toString(); // <root><key>Arbitrary input: &#55357;</key></root>
            Log.e("parsed", parsed);
        }
        catch(Throwable ex) {
            ex.printStackTrace();
        }

    }

}

我期待得到像

这样的东西
<root><key>Arbitrary input: &amp; #55357;</key></root>

但我得到了:

<root><key>Arbitrary input: &#55357;</key></root>

那么,如果我想获得Transformer的有效XML输出,该怎么办?

谢谢!

修改

我认为输出无效,因为当我尝试用PHP处理生成的XML输出时:

<?php

$data = "<root><key>Arbitrary input: &#55357;</key></root>";

$xmlDocument = new \DOMDocument();
$xmlDocument->loadXML($data);

我收到警告(如果环境配置为在警告时抛出异常,则会出现异常):

PHP Warning:  DOMDocument::loadXML(): xmlParseCharRef: invalid xmlChar value 55357 in Entity, line: 1 in /tmp/test.php on line 6
PHP Stack trace:
PHP   1. {main}() /tmp/test.php:0
PHP   2. DOMDocument->loadXML() /tmp/test.php:6

请注意,如果我尝试使用DOMDocument(PHP)进行处理,则以下代码一切都会正常:

$data = " <root><key>Arbitrary input: &amp; #55357;</key></root>";

Java Transformer或DOMDocument(PHP)做错了。你能指出我吗?

谢谢!

1 个答案:

答案 0 :(得分:1)

经过多次调查后:\ uD83D确实是一个无效的角色。范围\ uD800到\ uDFFF由引导和跟踪代理的Unicode标准保留,永远不会分配字符。

如果只有字符有效,Java转换器使用的编码将是正确的。但由于事实并非如此,您正在尝试组装无效的XML文档。

构造

<root><key>Arbitrary input: &amp; #55357;</key></root>

显然没有反映输入数据,它意味着key的值是

Arbitrary input: & #55357;

这与你想要的不同。