JDom2 SAXBuilder - 使用有效XML文件

时间:2016-08-09 12:30:22

标签: java xml sax jdom

我编写了一个作为linux守护程序的应用程序,它正在解析大约100 MB的XML文件。

import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import org.jdom2.input.SAXBuilder;
import org.jdom2.output.Format;
import org.jdom2.output.XMLOutputter;
import org.jdom2.Document;
import org.jdom2.Element;
import org.jdom2.JDOMException;

public class MyReader {


    private Document doc;
    private Element rootNode;


    public MyReader(String Filename) {      
        try {
        doc = (Document) new SAXBuilder().build(new File(Filename));
        rootNode = doc.getRootElement();    

        } catch ( NullPointerException | NumberFormatException | IOException e) {
            e.printStackTrace();
        } catch (JDOMException e) {
            e.printStackTrace();
        }
    }
}

我正在使用这种方法处理大约14个文件一周,有时其中一个文件无法生成此堆栈跟踪:

org.jdom2.input.JDOMParseException: Error on line 15410 of document file:/home/files/new/100.xml: XML document structures must start and end within the same entity.
    at org.jdom2.input.sax.SAXBuilderEngine.build(SAXBuilderEngine.java:228)
    at org.jdom2.input.sax.SAXBuilderEngine.build(SAXBuilderEngine.java:277)
    at org.jdom2.input.sax.SAXBuilderEngine.build(SAXBuilderEngine.java:264)
    at org.jdom2.input.SAXBuilder.build(SAXBuilder.java:1116)
    at com.scoobydoo.files.MyReader.<init>(MyReader.java:38)
    at application.Daemon.checkNewImportFiles(Daemon.java:225)
    at application.Daemon.startApplication(Daemon.java:68)
    at application.Daemon.run(Daemon.java:36)
Caused by: org.xml.sax.SAXParseException; systemId: file:/home/files/new/100.xml; lineNumber: 15410; columnNumber: 35; XML document structures must start and end within the same entity.
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203)
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177)
    at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:400)
    at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:327)
    at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1437)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.endEntity(XMLDocumentFragmentScannerImpl.java:904)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.endEntity(XMLDocumentScannerImpl.java:563)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.endEntity(XMLEntityManager.java:1399)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityScanner.java:1811)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.skipChar(XMLEntityScanner.java:1460)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2824)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606)
    at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:118)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:510)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:848)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:777)
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
    at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:643)
    at org.jdom2.input.sax.SAXBuilderEngine.build(SAXBuilderEngine.java:217)
    ... 8 more

奇怪的是,如果我第二次尝试导入文件,则导入时没有任何错误。

当然,我已经检查过该文件,并使用xmllint对其进行了验证,并且没有报告任何问题。

我的猜测是SAXBuilder()。build()打开一个文件的InputStream,由于某种原因被截断,你知道如何检查这个或任何其他可能导致此问题的问题吗?

先感谢大家!

********更新********************* ****

嗨,大家好, 我今天早上有一种照明,正在处理文件,然后是用户上传。我的猜测是,当该文件尚未完全上传时,该过程开始读取该文件,因此不是文件完成就失败了。

这与问题的匹配以及之后尝试读取文件的事实已经完成,因为在“人工检查”时刻上传已完成。

我做了一个更改以验证文件是否在尝试处理之前已完全上传(不知道为什么我之前没有这样做),如果问题再次出现,会告诉您。

谢谢!

0 个答案:

没有答案