Question

我有一个流式传输到xml解析器的xml文件。

xml文件的内容包含html标记，我想忽略它：

    <overview>
      <p>Situated on a peninsula halfway up the west coast of India, Mumbai (formerly Bombay) is India's economic powerhouse and home to more millionaires than any other city on the Indian sub-continent.</p>
      <p>The Portuguese established this old Hindu city as a colony in 1509.</p>
      <p>Like many Indian cities, the streets of Mumbai are congested with cattle, carts and motor vehicles and the air is thick with smog.</p>
</overview>

解析概述的方法是：

private String readOverview(XmlPullParser parser) throws IOException, XmlPullParserException{
        parser.require(XmlPullParser.START_TAG, ns, TAG_OVERVIEW);
        String overview = readText(parser);
        parser.require(XmlPullParser.END_TAG, ns, TAG_OVERVIEW);
        return overview;
    }

错误是：预期：END_TAG {null}overview (position:START_TAG <p>@6:10 in java.io.InputStreamReader@537c80f4)。

Answer 1

如果您可以将CDATA标记添加到XML文件中。然后你应该可以忽略HTML标签。

参考：XML Cdata - 解释清楚

Answer 2

诀窍是了解XmlPullParser works

的方式

一旦理解了它，就可以实现一个找到<p>标签的函数，并根据需要处理它们。在这种情况下，制作List<String>

示例：

//Extract Tags private List<String> readHtml(XmlPullParser parser) throws IOException, XmlPullParserException { List<String> result = new ArrayList<String>(); //Required Tag is in calling function //holder for current line String curr_line = ""; //get current tag name String current_tag_name = ""; //while an end tag is not found while (parser.next() != XmlPullParser.END_TAG){ //if a start tag is found continue if (parser.getEventType() != XmlPullParser.START_TAG){ continue; } //get current tag current_tag_name = parser.getName(); if (current_tag_name.equals(TAG_P)){ curr_line = readText(parser); } else{ skip(parser); } if (curr_line != null){ result.add(curr_line); } } return result; }

Answer 3

That error is occurring because the parser is reading unmatched tags.

I needed my parser to read unmatched HTML tags without throwing an error, and this is what worked for me:

    parser.setFeature("http://xmlpull.org/v1/doc/features.html#relaxed", true);

This worked for me on emulators as far back as 4.1.1 (JellyBean).

If you want to ignore the HTML tags, the CDATA option is a better solution.

如何使用XMLPullParser忽略Html元素？

3 个答案: