DOM Parser冻结了具有DOCTYPE声明的HTML

时间:2016-08-28 08:01:54

标签: java parsing dom freeze doctype

该程序从我的站点读取两个HTML,然后解析每个。 第一个HTML(pass.html)中没有DOCTYPE声明。 pass.html正常解析。

第二个HTML(freeze.html) 有一个DOCTYPE声明。 freeze.html被认为是 fully valid 通过W3C的验证服务。 但是,当我尝试解析freeze.html时,程序会冻结.parse(is)

有什么问题?

import java.io.InputStream;
import java.net.URL;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;

import org.w3c.dom.Document;

class DOMCallFreezes {
    public static void main(String[] args) throws Exception {
        new DOMCallFreezes().main();
    }

    void main() throws Exception {
        demo("pass.html");
        demo("freeze.html");
    }

    void demo(String htmlName) throws Exception {
        final String baseUrl = "http://x19290.appspot.com/dom-no-good/";
        URL url = new URL(baseUrl + htmlName);
        try (final InputStream is = url.openStream()) {
            final Document doc = newDocumentBuilder().parse(is);
            final DOMSource src = new DOMSource(doc);
            final StreamResult dst = new StreamResult(System.out);
            newTransformer().transform(src, dst);
        }
    }

    DocumentBuilder newDocumentBuilder() throws Exception {
        final DocumentBuilderFactory f = DocumentBuilderFactory.newInstance();
        return f.newDocumentBuilder();
    }

    Transformer newTransformer() throws Exception {
        final TransformerFactory f = TransformerFactory.newInstance();
        return f.newTransformer();
    }
}

pass.html

<?xml version="1.0" encoding="US-ASCII"?>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>pass</title>
</head>
<body>
   <h1>no DOCTYPE declaration</h1>
   </body>
</html>

freeze.html

<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<title>freeze</title>
</head>
<body>
    <h1>has DOCTYPE declaration</h1>
</body>
</html>

1 个答案:

答案 0 :(得分:1)

以下设置指示解析器不要从DOCTYPE声明加载外部DTD。更改方法newDocumentBuilder()

DocumentBuilder newDocumentBuilder() throws Exception {
    final DocumentBuilderFactory f = DocumentBuilderFactory.newInstance();
    f.setValidating(false);
    f.setAttribute("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
    return f.newDocumentBuilder();
}