我有这个样本xml格式,我试图解析。
<records>
<ae_documentTitleBegin /><ae_subDocumentTitleGenerated generatedTitle="Introduction" /><ae_750584b7e5364775bf21d91c5020b965_clauseBegin /><ae_clauseTitleBegin />Introduction<ae_clauseTitleEnd /><ae_clauseBodyBegin />ABL <ae_definedTermInstanceBegin />CREDIT AGREEMENT<ae_definedTermInstanceEnd /><ae_documentTitleEnd />
<car name='HSV Maloo' make='Holden' year='2006'>
<ae_definedTermTitleBegin />Australia<ae_definedTermTitleEnd />
<ae_clauseTitleBegin />1.02 <u>Accounting Terms </u>.<ae_clauseTitleEnd />
</car>
<car name='P50' make='Peel' year='1962'>
<ae_definedTermTitleBegin />Isle of Man<ae_definedTermTitleEnd />
<ae_clauseTitleBegin />Smallest Street-Legal Car at 99cm wide and 59 kg in weight<ae_clauseTitleEnd />
</car>
<car name='Royale' make='Bugatti' year='1931'>
<ae_definedTermTitleBegin />France<ae_definedTermTitleEnd />
<ae_clauseTitleBegin />Most Valuable Car at $15 million<ae_clauseTitleEnd />
</car>
</records>
我实现的sax解析器看起来像
import javax.xml.parsers.SAXParserFactory
import org.xml.sax.helpers.DefaultHandler
import org.xml.sax.*
class SAXXMLParser extends DefaultHandler {
ArrayList<String> DefinedTermTitles = new ArrayList<>();
ArrayList<String> ClauseTitles = new ArrayList<>();
ArrayList<String> DocumentTitles = new ArrayList<>();
String currentMessage;
boolean countryFlag = false;
StringBuilder message = new StringBuilder();
void startElement(String ns, String localName, String qName, Attributes atts) {
switch (qName) {
case 'ae_clauseTitleBegin':
//messages.add(currentMessage)
countryFlag = true;
break
case 'ae_documentTitleBegin':
//messages.add(currentMessage)
countryFlag = true;
break
case 'ae_definedTermTitleBegin':
//messages.add(currentMessage)
countryFlag = true;
break
}
}
void characters(char[] chars, int offset, int length) {
if (countryFlag) {
message.append(new String(chars, offset, length));
//println(currentMessage)
}
}
void endElement(String ns, String localName, String qName) {
switch (qName) {
case 'ae_clauseTitleEnd':
ClauseTitles.add(message.toString());
countryFlag = false;
message.setLength(0);
break
case 'ae_documentTitleEnd':
DocumentTitles.add(message.toString());
countryFlag = false;
message.setLength(0);
break
case 'ae_definedTermTitleEnd':
DefinedTermTitles.add(message.toString());
countryFlag = false;
message.setLength(0);
break
}
}
}
我得到的输出是
Calling XML Parser
[Australia, Isle of Man, France] <<-- DefinedTermTitles
[Introduction, 1.02 Accounting Terms ., Smallest Street-Legal Car at 99cm wide and 59 kg in weight, Most Valuable Car at $15 million] <<-- ClauseTitles
[] <<-- DocuemntTitles
ENd of XML Parser
这是错误的。正如您在第二个列表中所看到的那样,介绍已经出现在DocumentTitles的第三个列表中(因为我将文档标题标记附加到该列表中)。引言也不正确它应该是ABL信用协议的介绍。我不知道为什么会这样。我猜它是因为标签内有标签。我需要一种方法来只获取忽略标签的文本