我有一个流式传输到xml解析器的xml文件。
xml文件的内容包含html标记,我想忽略它:
<overview>
<p>Situated on a peninsula halfway up the west coast of India, Mumbai (formerly Bombay) is India's economic powerhouse and home to more millionaires than any other city on the Indian sub-continent.</p>
<p>The Portuguese established this old Hindu city as a colony in 1509.</p>
<p>Like many Indian cities, the streets of Mumbai are congested with cattle, carts and motor vehicles and the air is thick with smog.</p>
</overview>
解析概述的方法是:
private String readOverview(XmlPullParser parser) throws IOException, XmlPullParserException{
parser.require(XmlPullParser.START_TAG, ns, TAG_OVERVIEW);
String overview = readText(parser);
parser.require(XmlPullParser.END_TAG, ns, TAG_OVERVIEW);
return overview;
}
错误是:预期:END_TAG {null}overview (position:START_TAG <p>@6:10 in java.io.InputStreamReader@537c80f4)
。
答案 0 :(得分:0)
如果您可以将CDATA标记添加到XML文件中。然后你应该可以忽略HTML标签。
参考:XML Cdata - 解释清楚
答案 1 :(得分:0)
诀窍是了解XmlPullParser works
的方式一旦理解了它,就可以实现一个找到<p>
标签的函数,并根据需要处理它们。在这种情况下,制作List<String>
示例:强>
//Extract Tags
private List<String> readHtml(XmlPullParser parser) throws IOException, XmlPullParserException {
List<String> result = new ArrayList<String>();
//Required Tag is in calling function
//holder for current line
String curr_line = "";
//get current tag name
String current_tag_name = "";
//while an end tag is not found
while (parser.next() != XmlPullParser.END_TAG){
//if a start tag is found continue
if (parser.getEventType() != XmlPullParser.START_TAG){
continue;
}
//get current tag
current_tag_name = parser.getName();
if (current_tag_name.equals(TAG_P)){
curr_line = readText(parser);
}
else{
skip(parser);
}
if (curr_line != null){
result.add(curr_line);
}
}
return result;
}
答案 2 :(得分:0)
That error is occurring because the parser is reading unmatched tags.
I needed my parser to read unmatched HTML tags without throwing an error, and this is what worked for me:
parser.setFeature("http://xmlpull.org/v1/doc/features.html#relaxed", true);
This worked for me on emulators as far back as 4.1.1 (JellyBean).
If you want to ignore the HTML tags, the CDATA
option is a better solution.