当前状态END_ELEMENT不属于statesCHARACTERS,COMMENT,CDATA,SPACE,ENTITY_REFERENCE,DTD对getText()

时间:2017-10-05 09:38:13

标签: java stax

我对java这么新,但是我正在为学校做这个项目。我有一个4GB的XML文件(它是一个维基百科转储)需要解析。我使用StAX和我的代码运行超过400,000行(几乎50MB),但后来我得到了这个错误。

  

线程中的异常" main" java.lang.IllegalStateException:当前   州END_ELEMENT不属于州CHARACTERS,COMMENT,CDATA,   SPACE,ENTITY_REFERENCE,DTD对getText()有效   com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.getText(XMLStreamReaderImpl.java:1081)     在tagremoving1.TagRemoving1.main(TagRemoving1.java:65)

当我使用getText()时,我在某处读到了我要检查null或空元素,所以我做了。然后它走得更远,但再次停止同样的错误。我几乎到处都抬头。我不知道什么是错的。 这是我的代码:

XMLInputFactory factory = XMLInputFactory.newInstance();
     File file = new File("source.xml");
     FileInputStream fileReader = new FileInputStream(file);    
     factory.setProperty(XMLInputFactory.IS_COALESCING, true);
            factory.setProperty(XMLInputFactory.IS_REPLACING_ENTITY_REFERENCES,true);
            factory.setProperty(XMLInputFactory.IS_SUPPORTING_EXTERNAL_ENTITIES,false);
     PrintWriter writer1 = new PrintWriter("result.txt", "UTF-8");   

    XMLStreamReader reader = factory.createXMLStreamReader(fileReader);
    int counter = 1;
    while(reader.hasNext()){

        if(reader.next() == 1){ //If it is START_ELEMENT
            String name = reader.getLocalName();
            switch(name){
                case "page":
                    writer1.println("\r\npage" + counter + ":");  
                    counter++;
                    break;

                case "title":
                    reader.next();
                    if(reader != null && !"".equals(reader.toString())) 
                            writer1.println("Title: " + reader.getText());
                    break;

                case "text":
                    reader.next();
                    if(reader != null && !"".equals(reader.toString()))
                        writer1.println("Text: " + reader.getText());
                    break;

                default:
                    break;
            }
        }

    }
    writer1.flush();
    writer1.close();

有什么建议吗?

1 个答案:

答案 0 :(得分:0)

好吧,我明白了!

我将另一个条件 reader.hasText()添加到最终'if'然后一切都很好。这是代码:

def grade_scantron(test_answers, test_key):
    right = 0
    for i in test_answers:
        if test_answers[i] == test_key[i]:
            right += 1
            return right                  # change here
        elif len(test_answers) != len(test_key):
            return -1