SAXParser无法正确读取org.w3c.dom.Document到java.io.InputStream的转换

时间:2014-10-28 11:24:39

标签: java dom sax

我正在使用以下代码来解析org.w3c.dom.Document javax.xml.parsers.SAXParser

try
    {
        // --- Prepare our SAX parser ---
        SAXParserFactory factory = SAXParserFactory.newInstance();
        factory.setValidating(true);
        SAXParser parser = factory.newSAXParser();
        // parser.parse(xmlFile, xmlValidator); /* Does not validate unsaved changes */

        // --- Create a stream form our already parsed xml document ---
        ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
        Source xmlSource = new DOMSource(xmlDocument);
        Result outputTarget = new StreamResult(outputStream);
        TransformerFactory.newInstance().newTransformer().transform(xmlSource, outputTarget);

        // --- Validate the xmlDocument ---
        parser.parse(new ByteArrayInputStream(outputStream.toByteArray()), xmlValidator);
    }
catch (ParserConfigurationException | SAXException | TransformerException | TransformerFactoryConfigurationError | IOException e)
    {
        e.printStackTrace();
    }

解析文档时,我收到错误消息

Line 1: Document root element 'MyRootName' must match DOCTYPE root 'null'.

如果我只解析xmlFile所基于的xmlDocument,那么一切正常。

我已经确保xmlDocument已经初始化并且有效,我甚至尝试将xmlDocument.getDocumentElement()传递给DOMSource我已经确保有效且我期待的是javax.xml.parsers.SAXParser(即文档的根节点,具有正确的名称)

为什么java.io.InputStream读取{{1}}的方式与从文件系统读取'xmlFile`的方式不同?

修改

相关问题(我尝试过所有这些解决方案都无济于事): how to create an InputStream from a Document or Node

我找到了原因,详情如下: Parsing xml with DOM, DOCTYPE gets erased

1 个答案:

答案 0 :(得分:1)

所以问题不在parser,而Transformer正在剥离XML中的<!DOCTYPE ...>行。要解决此问题,只需设置一个变换器属性,使其包含DTD文件。

    // --- Create a transformer and transform our Document into an InputStream ---
    Transformer transformer = TransformerFactory.newInstance().newTransformer();
    // By default the transformer strips out the DOCTYPE tag so we must re-add our DTD file declaration
    transformer.setOutputProperty(OutputKeys.DOCTYPE_SYSTEM, xmlFile.getParent() + "\\" + xmlDocument.getDoctype().getSystemId());
    transformer.transform(xmlSource, outputTarget);

如果您只是传入DTD文件名,解析器将在程序启动的位置搜索它,建议指定DTD文件的直接路径,如上所述。