XML文档结构必须在intellij

时间:2017-05-31 13:30:13

标签: java xml intellij-idea

我正在解析xml文件。它适用于某些文件,而某些文件则不适用。

我的代码是:

public static String parseXml(String xmlFileName) {
    StringBuilder docText = new StringBuilder();

    try {
        DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
       domFactory.setNamespaceAware(true);
       //domFactory.setValidating(false);
        DocumentBuilder builder = domFactory.newDocumentBuilder();

        builder.setEntityResolver(new EntityResolver() {
            @Override
            public InputSource resolveEntity(String publicId, String systemId)
                    throws SAXException, IOException {
                if (systemId.contains("pdf2xml.dtd")) {
                    return new InputSource(
                            new ByteArrayInputStream("<?xml version='1.0' encoding='UTF-8'?>".getBytes()));
                } else
                    return null;
            }
        });
        System.out.println("File is : " + xmlFileName);
        Document doc = builder.parse(new FileInputStream(xmlFileName));
        System.out.println("root of xml file" + doc.getDocumentElement().getNodeName());
        NodeList nodes = doc.getElementsByTagName("text");
        /**Do Something here*/
       }

我尝试通过domFactory.setValidating(false);禁用验证,但它不起作用。

我检查了xml文件,看起来很好,所有标签都已正确关闭(虽然我是xml中的新手)。

堆栈跟踪:

  

[致命错误]:210:67:XML文档结构必须在同一实体内开始和结束。   org.xml.sax.SAXParseException; lineNumber:210; columnNumber:67; XML文档结构必须在同一实体中开始和结束。       在org.apache.xerces.parsers.DOMParser.parse(未知来源)       在org.apache.xerces.jaxp.DocumentBuilderImpl.parse(未知来源)

这是xml内容:。

    <?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE pdf2xml SYSTEM "pdf2xml.dtd">

<pdf2xml producer="poppler" version="0.34.0">
<page number="1" position="absolute" top="0" left="0" height="1188" width="918">
    <fontspec id="0" size="13" family="Times" color="#4c4c4c"/>
    <fontspec id="1" size="34" family="Times" color="#4c4c4c"/>
    <fontspec id="2" size="28" family="Times" color="#4c4c4c"/>
    <fontspec id="3" size="19" family="Times" color="#4c4c4c"/>
    <fontspec id="4" size="16" family="Times" color="#4c4c4c"/>
<image top="52" left="45" width="828" height="461" src="./app/utils/resume/pdf/DISC-Aditya_Thakur-1_1.jpg"/>
<image top="1009" left="45" width="225" height="96" src="./app/utils/resume/pdf/DISC-Aditya_Thakur-1_2.jpg"/>
<text top="1140" left="45" width="415" height="15" font="0">Copyright 2016 Innermetrix Incorporated • All rights reserved</text>
<text top="416" left="45" width="243" height="36" font="1"><b>Aditya Thakur</b></text>
<text top="457" left="45" width="179" height="30" font="2">May 25, 2016</text>
<text top="551" left="45" width="747" height="21" font="3">This Innermetrix Disc Index is a modern interpretation of Dr. William Marston's</text>
<text top="578" left="45" width="770" height="21" font="3">behavioral dimensions. Marston's research uncovered four quadrants of behavior</text>
<text top="606" left="45" width="809" height="21" font="3">which help to understand a person's behavioral preferences.  This Disc Index will help</text>
<text top="633" left="45" width="703" height="21" font="3">you understand your behavioral style and how to maximize your potential.</text>
<text top="1027" left="293" width="217" height="18" font="4">Anthony Robbins Coaching</text>
<text top="1055" left="293" width="183" height="18" font="4">www.tonyrobbins.com</text>
<text top="1084" left="293" width="5" height="18" font="4"> </text>
</page>
<page number="2" position="absolute" top="0" left="0" height="1188" width="918">
    <fontspec id="5" size="22" family="Times" color="#1c8cc4"/>
    <fontspec id="6" size="22" family="Times" color="#303030"/>
    <fontspec id="7" size="13" family="Times" color="#000000"/>
    <fontspec id="8" size="25" family="Times" color="#4c4c4c"/>
    <fontspec id="9" size="14" family="Times" color="#7f7f7f"/>
    <fontspec id="10" size="40" family="Times" color="#ffffff"/>
    <fontspec id="11" size="16" family="Times" color="#000000"/>
    <fontspec id="12" size="14" family="Times" color="#4c4c4c"/>
    <fontspec id="13" size="14" family="Times" color="#4c4c4c"/>
<image top="18" left="30" width="83" height="83" src="./app/utils/resume/pdf/DISC-Aditya_Thakur-2_1.png"/>
<text top="46" left="128" width="174" height="24" font="5"><b>The DISC Index</b></text>
<text top="46" left="317" width="228" height="24" font="6"><b>Executive Summary</b></text>
<text top="551" left="891" width="0" height="15" font="7">Aditya Thakur</text>
<text top="1140" left="45" width="415" height="15" font="0">Copyright 2016 Innermetrix Incorporated • All rights reserved</text>
<text top="1140" left="865" width="8" height="15" font="7">2</text>
<text top="152" left="196" width="526" height="27" font="8"><b>Natural and Adaptive Styles Comparison</b></text>
<text top="590" left="44" width="26" height="17" font="9">    0</text>
<text top="556" left="44" width="27" height="17" font="9">  10</text>
<text top="521" left="44" width="27" height="17" font="9">  20</text>
<text top="487" left="44" width="27" height="17" font="9">  30</text>
<text top="452" left="44" width="27" height="17" font="9">  40</text>
<text top="418" left="44" width="27" height="17" font="9">  50</text>
<text top="383" left="44" width="27" height="17" font="9">  60</text>
<text top="349" left="44" width="27" height="17" font="9">  70</text>
<text top="314" left="44" width="27" height="17" font="9">  80</text>
<text top="280" left="44" width="27" height="17" font="9">  90</text>
<text top="245" left="44" width="27" height="17" font="9">100</text>
<text top="620" left="156" width="29" height="42" font="10"><b>D</b></text>
<text top="675" left="147" width="56" height="18" font="11">56 / 77</text>
<text top="620" left="359" width="16" height="42" font="10"><b>I</b></text>
<text top="675" left="343" width="56" height="18" font="11">53 / 67</text>
<text top="620" left="552" width="22" height="42" font="10"><b>S</b></text>

//****************Line no 200 starts*****************//
<text top="618" left="320" width="72" height="18" font="11">Inspiring</text>
<text top="652" left="307" width="97" height="18" font="11">Enthusiastic</text>
<text top="686" left="322" width="67" height="18" font="11">Sociable</text>
<text top="721" left="329" width="54" height="18" font="11">Poised</text>
<text top="755" left="316" width="79" height="18" font="11">Charming</text>
<text top="789" left="311" width="89" height="18" font="11">Convincing</text>
<text top="823" left="317" width="78" height="18" font="11">Reflective</text>
<text top="857" left="299" width="112" height="18" font="11">Matter-of-fact</text>
<text top="892" left="311" width="88" height="18" font="11">Withdrawn</text>
<text top="926" left="333" width="46" height="18" font="14"><b>Aloof</b></text>
<text top="999" left="328" width="54" height="21" font="19"><b>Low I</b></text>
<text top="242" left="495" width="134" height="27" font="16"><b>Stabilizing</b></text>
<text top="305" left="536" width="53" height="21" font="17"><b>Pace:</b></text>
<text top="351" left="474" width="176" height="18" font="11">How you tend to pace</text>
<text top="373" left="507" width="111" height="18" font="11">things in your</text>
<text top="394" left="510" width="104" height="18" font="11">environment</text>
<text top="462" left="531" width="63" height="21" font="20"><b>High S</b></text>
<text top="550" left="531" width="63" height="18" font="14"><b>Patient</b></text>
<text top="584" left="517" width="91" height="18" font="11">Predictable</text>
<text top="618" left="533" width="59" height="18" font="11">Passive</text>
<text top="652" left="514" width="97" height="18" font="11">Complacent</text>
//****************Line no 210 ends*****************//
//**********Last 5 line *************************//
<text top="778" left="45" width="792" height="18" font="11">___________________________________________________________________________________________________________</text>
<text top="806" left="45" width="792" height="18" font="11">___________________________________________________________________________________________________________</text>
<text top="835" left="45" width="792" height="18" font="11">___________________________________________________________________________________________________________</text>
</page>
</pdf2xml>

第210行是 - <text top="999" left="328" width="54" height="21" font="19"><b>Low I</b></text>

提前致谢。

0 个答案:

没有答案
相关问题