我正在阅读来自服务器的大量XML文件,解析它们并从每个文件中提取一些标签以存储在数据库中。在读取这些XML文件的过程中,DOM解析器有时会抛出此异常:
Caused by: org.xml.sax.SAXParseException: XML document structures must start and end within the same entity.
我希望能够处理此异常并继续解析'错误'文件并从中获取数据。我从服务器返回了大约2000个XML文件,其中~100个文件触发了这个异常。奇怪的是,当我手动检查XML文件时,其中的所有标签都是完美排列的。
这是我的代码:
for (int k = 0; k < listdata.length; k++) {
String xmldata = listdata[k].getcategorydata();
System.out.println("XML File:" + xmldata.toString());
try {
DocumentBuilderFactory factory = DocumentBuilderFactory
.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
File is = new File(xmldata);
//InputSource is = new InputSource(new StringReader(xmldata));
Document doc = builder.parse(is);
Element docEle = doc.getDocumentElement();
System.out.println("Root element of the document: "
+ docEle.getNodeName());
NodeList links = docEle.getElementsByTagName("Some tag Name");
System.out.println("Total actionLink: " + links.getLength());
if (links != null && links.getLength() > 0) {
for (int l = 0; l < links.getLength(); l++) {
Node node = links.item(l);
if (node.getNodeType() == Node.ELEMENT_NODE) {
System.out.println("=====================");
Element e = (Element) node;
NodeList nodeList = e
.getElementsByTagName("Tag Name..");
pathname = nodeList.item(0).getChildNodes().item(0)
.getNodeValue();
System.out.println("Name: " + pathname);
}
}
}
} catch (SAXParseException e) {
System.out.println("Error" + e.getSystemId());
e.printStackTrace();
}
}