完整的异常堆栈:
Exception in thread "main" org.w3c.dom.DOMException: HIERARCHY_REQUEST_ERR: An attempt was made to insert a node where it is not permitted.
at org.apache.xerces.dom.CoreDocumentImpl.insertBefore(Unknown Source)
at org.apache.xerces.dom.NodeImpl.appendChild(Unknown Source)
at com.enniu.crawler.core.saxon.main(saxon.java:39)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
我的代码:
public class saxon {
public static void main(String args[]) throws IOException, SAXException, ParserConfigurationException, XPathFactoryConfigurationException, XPathExpressionException {
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
domFactory.setNamespaceAware(true);
DocumentBuilder builder = null;
builder = domFactory.newDocumentBuilder();
Document doc = builder.parse("test.html");
Document newDoc = builder.newDocument();
XPathFactory xpf = XPathFactoryImpl.newInstance(XPathConstants.DOM_OBJECT_MODEL);
XPath xPath = xpf.newXPath();
XPathExpression compile = xPath.compile("//div[not (contains(class, 'sss'))]");
Object result = compile.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
for(int i = 0; i < nodes.getLength(); i++) {
Node copyNode = newDoc.importNode(nodes.item(i), true);
newDoc.appendChild(copyNode);// line 39
}
printXmlDocument(newDoc);
}
public static void printXmlDocument(Document document) {
DOMImplementationLS domImplementationLS =
(DOMImplementationLS) document.getImplementation();
LSSerializer lsSerializer =
domImplementationLS.createLSSerializer();
String string = lsSerializer.writeToString(document);
System.out.println(string);
}
}
的test.html
<table>
<div>aa</div>
<div class="sss">ss</div>
<div>dd</div>
</table>
答案 0 :(得分:2)
因为有效的http文档不能有两个根。我的代码尝试生成如下文档:
<div>aa</div>
<div>dd</div>
文档中有两个根,因此获得异常。