无法使用Groovy中的SAX Parser解析其他标记中的XML标记

时间:2014-12-08 15:17:21

标签: xml parsing groovy saxparser

我有这个样本xml格式,我试图解析。

<records>
      <ae_documentTitleBegin /><ae_subDocumentTitleGenerated generatedTitle="Introduction" /><ae_750584b7e5364775bf21d91c5020b965_clauseBegin /><ae_clauseTitleBegin />Introduction<ae_clauseTitleEnd /><ae_clauseBodyBegin />ABL <ae_definedTermInstanceBegin />CREDIT AGREEMENT<ae_definedTermInstanceEnd /><ae_documentTitleEnd />
      <car name='HSV Maloo' make='Holden' year='2006'>
        <ae_definedTermTitleBegin />Australia<ae_definedTermTitleEnd />
        <ae_clauseTitleBegin />1.02 <u>Accounting Terms </u>.<ae_clauseTitleEnd />

      </car>
      <car name='P50' make='Peel' year='1962'>
        <ae_definedTermTitleBegin />Isle of Man<ae_definedTermTitleEnd />
        <ae_clauseTitleBegin />Smallest Street-Legal Car at 99cm wide and 59 kg in weight<ae_clauseTitleEnd />
      </car>
      <car name='Royale' make='Bugatti' year='1931'>
        <ae_definedTermTitleBegin />France<ae_definedTermTitleEnd />
        <ae_clauseTitleBegin />Most Valuable Car at $15 million<ae_clauseTitleEnd />
      </car>
    </records>

我实现的sax解析器看起来像

import javax.xml.parsers.SAXParserFactory
import org.xml.sax.helpers.DefaultHandler
import org.xml.sax.*

class SAXXMLParser extends DefaultHandler {
    ArrayList<String> DefinedTermTitles = new ArrayList<>();
    ArrayList<String> ClauseTitles = new ArrayList<>();
    ArrayList<String> DocumentTitles = new ArrayList<>();
    String currentMessage;
    boolean countryFlag = false;
    StringBuilder message = new StringBuilder();

    void startElement(String ns, String localName, String qName, Attributes atts) {
        switch (qName) {
            case 'ae_clauseTitleBegin':
                //messages.add(currentMessage)
                countryFlag = true;
                break

            case 'ae_documentTitleBegin':
                //messages.add(currentMessage)
                countryFlag = true;
                break

            case 'ae_definedTermTitleBegin':
                //messages.add(currentMessage)
                countryFlag = true; 
                break           
         }      
    }   

    void characters(char[] chars, int offset, int length) {
        if (countryFlag) {
            message.append(new String(chars, offset, length));
            //println(currentMessage)
        }
    }

    void endElement(String ns, String localName, String qName) {
        switch (qName) {        
            case 'ae_clauseTitleEnd':
                ClauseTitles.add(message.toString());
                countryFlag = false;
                message.setLength(0);
                break
            case 'ae_documentTitleEnd':
                DocumentTitles.add(message.toString());
                countryFlag = false;
                message.setLength(0);
                break
            case 'ae_definedTermTitleEnd':
                DefinedTermTitles.add(message.toString());
                countryFlag = false; 
                message.setLength(0);
                break
         }
    }
}

我得到的输出是

Calling XML Parser
[Australia, Isle of Man, France] <<-- DefinedTermTitles

[Introduction, 1.02 Accounting Terms ., Smallest Street-Legal Car at 99cm wide and 59 kg in weight, Most Valuable Car at $15 million] <<-- ClauseTitles

[] <<-- DocuemntTitles
ENd of XML Parser

这是错误的。正如您在第二个列表中所看到的那样,介绍已经出现在DocumentTitles的第三个列表中(因为我将文档标题标记附加到该列表中)。引言也不正确它应该是ABL信用协议的介绍。我不知道为什么会这样。我猜它是因为标签内有标签。我需要一种方法来只获取忽略标签的文本

0 个答案:

没有答案