DOM解析器setExpandEntityReferences(false)生成实体引用和扩展文本节点

时间:2013-03-08 13:34:37

标签: java xml dom xml-parsing

我想解析一个XML文档,并尽可能保持它的表示尽可能接近源代码,特别是我希望得到ENTITY_REFERENCE节点。但是我输入了ENTITY_REFERENCE节点后跟一个表示实体引用扩展的TEXT_NODE。

import java.io.IOException;
import java.io.PrintWriter;
import java.io.StringReader;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;

import org.w3c.dom.Document;
import org.w3c.dom.ls.DOMImplementationLS;
import org.w3c.dom.ls.LSOutput;
import org.w3c.dom.ls.LSSerializer;
import org.xml.sax.EntityResolver;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;

public class Main {

    public static void main(String[] args) throws Exception {
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        factory.setExpandEntityReferences(false);
        DocumentBuilder builder = factory.newDocumentBuilder();

        final String xml = new String(
                    "<?xml version=\"1.0\"?>" + 
                    "<!DOCTYPE simple SYSTEM \"simple.dtd\" [" + 
                    "<!ENTITY a \"abhijeet\">" + 
                    "]>" +
                    "<simple> &a;   </simple>");

        builder.setEntityResolver(new EntityResolver() {

            @Override
            public InputSource resolveEntity(String publicId, String systemId) throws SAXException, IOException {
                return new InputSource(new StringReader(""));
            }
        });
        Document document = builder.parse(new InputSource(new StringReader(xml)));

        final DOMImplementationLS domImplementationLS = (DOMImplementationLS) builder.getDOMImplementation();
        LSSerializer LSSerializer = domImplementationLS.createLSSerializer();
        LSOutput LSOutput = domImplementationLS.createLSOutput();
        LSOutput.setCharacterStream(new PrintWriter(System.out));
        LSSerializer.write(document, LSOutput);
    }

}

如果您想在此处运行代码:http://ideone.com/Rldi2S

热点结果: Java(TM)SE运行时环境(版本1.7.0_15-b03) Java(TM)SE运行时环境(版本1.7.0_17-b02) Java(TM)SE运行时环境(版本1.6.0_26-b03)

是一样的:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE simple SYSTEM "simple.dtd" [<!ENTITY a 'abhijeet'>
]>
<simple> &a;abhijeet   </simple>

“&amp; a;”是实体引用节点,后跟它的扩展“abhijeet”文本节点。

我的期望是:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE simple SYSTEM "simple.dtd" [<!ENTITY a 'abhijeet'>
]>
<simple> &a;   </simple>

是我缺乏知识,我的代码中的错误还是解析器被破坏了?

0 个答案:

没有答案