如何使用XMLPullParser忽略Html元素?

时间:2015-02-18 07:09:51

标签: java android xml

我有一个流式传输到xml解析器的xml文件。

xml文件的内容包含html标记,我想忽略它:

    <overview>
      <p>Situated on a peninsula halfway up the west coast of India, Mumbai (formerly Bombay) is India's economic powerhouse and home to more millionaires than any other city on the Indian sub-continent.</p>
      <p>The Portuguese established this old Hindu city as a colony in 1509.</p>
      <p>Like many Indian cities, the streets of Mumbai are congested with cattle, carts and motor vehicles and the air is thick with smog.</p>
</overview>

解析概述的方法是:

private String readOverview(XmlPullParser parser) throws IOException, XmlPullParserException{
        parser.require(XmlPullParser.START_TAG, ns, TAG_OVERVIEW);
        String overview = readText(parser);
        parser.require(XmlPullParser.END_TAG, ns, TAG_OVERVIEW);
        return overview;
    }

错误是:预期:END_TAG {null}overview (position:START_TAG <p>@6:10 in java.io.InputStreamReader@537c80f4)

3 个答案:

答案 0 :(得分:0)

如果您可以将CDATA标记添加到XML文件中。然后你应该可以忽略HTML标签。

参考:XML Cdata - 解释清楚

答案 1 :(得分:0)

诀窍是了解XmlPullParser works

的方式

一旦理解了它,就可以实现一个找到<p>标签的函数,并根据需要处理它们。在这种情况下,制作List<String>

示例:

//Extract Tags
private List<String> readHtml(XmlPullParser parser) throws IOException, XmlPullParserException {

    List<String> result = new ArrayList<String>();
    //Required Tag is in calling function

    //holder for current line
    String curr_line = "";
    //get current tag name
    String current_tag_name = "";

    //while an end tag is not found
    while (parser.next() != XmlPullParser.END_TAG){
        //if a start tag is found continue
        if (parser.getEventType() != XmlPullParser.START_TAG){
            continue;
        }
        //get current tag
        current_tag_name = parser.getName();
        if (current_tag_name.equals(TAG_P)){
            curr_line = readText(parser);
        }
        else{
            skip(parser);
        }
        if (curr_line != null){
            result.add(curr_line);
        }
    }
    return result;
}

答案 2 :(得分:0)

That error is occurring because the parser is reading unmatched tags.

I needed my parser to read unmatched HTML tags without throwing an error, and this is what worked for me:

    parser.setFeature("http://xmlpull.org/v1/doc/features.html#relaxed", true);

This worked for me on emulators as far back as 4.1.1 (JellyBean).

If you want to ignore the HTML tags, the CDATA option is a better solution.