我有一个非常大的XML文件,因此我将StAX用作流XML解析器。但是,有一些未转义的HTML标签,我可以指出如何处理。例如:
<ArticleTitle>Frequent <i>BRAF</i><sup>V600E</sup> mutation has no effect on tumor invasiveness in patients with Langerhans cell histiocytosis.</ArticleTitle>
我无法使用以下代码提取上面的标题:
while (xmlReader.hasNext()) {
XMLEvent event = xmlReader.nextEvent();
if (event.isStartEvent() && event.asStartElement().getName().getLocalPart().equals("ArticleTitle")) {
//this throws an javax.xml.stream.XMLStreamException: ParseError at //[row,col]:[3886149,32]
//Message: elementGetText() function expects text only elment but START_ELEMENT was //encountered.
String text = xmlReader.getElementText();
}
我该如何获取<ArticleTitle>
和</ArticleTitle>
之间的字符?