使用XML Pull Parser以XML格式解析CDATA部分

时间:2013-08-03 06:44:41

标签: android xml-parsing xmlpullparser

示例XML

<feed xmlns="http://www.w3.org/2005/Atom">
    <title>NDTV News - Top Stories</title>
    <link>http://www.ndtv.com/</link>
    <description>Latest entries</description>
    <language>en</language>
    <pubDate>Wed, 31 Jul 2013 22:33:00 GMT</pubDate>
    <lastBuildDate>Wed, 31 Jul 2013 22:33:00 GMT</lastBuildDate>
    <entry>
    <title>Narendra Modi to be BJP's PM candidate, announcement before crucial assembly polls: sources</title>
    <link>http://feedproxy.google.com/~r/NdtvNews-TopStories/~3/XN7dMIDe5YI/story01.htm</link>
    <published>Wed, 31 Jul 2013 13:58:31 GMT</published>
    <author>
    <name>user42715</name>
    </author>
    <content type="html"><![CDATA[<div align="center"><a href="http://www.ndtv.com/news/images/topstory_thumbnail/  Shatrughan_Sinha_agency_120.jpg"><img border="0" src="http://www.ndtv.com/news/images/topstory_thumbnail/Shatrughan_Sinha_agency_120.jpg" alt="2013-07-29-08-43-05" /></a></div><p><span style="font-size: large;">The BJP is likely to anoint Narendra Modi as its prime ministerial candidate for the 2014 elections and make a formal announcement to that effect by September.</span><br /><br /><span style="font-size: large;"> The BJP is likely to anoint Narendra Modi as its prime ministerial candidate for the 2014 elections and make a formal announcement to that effect by September. </span><br /><br /><span style="font-size: large;">The BJP is likely to anoint Narendra Modi as its prime ministerial candidate for the 2014 elections and make a formal announcement to that effect by September.   </span><br /><br /></p>]]></content>
   </entry>
</feed>

使用以下代码,我能够检索标签内的值和值。

XmlPullParserFactory factory = XmlPullParserFactory.newInstance();
        private XmlPullParser parser = factory.newPullParser();
        private InputStream urlStream = downloadUrl(urlString);
        parser.setInput(urlStream, null);
        int eventType = parser.getEventType();
        boolean done = false;

        while (eventType != XmlPullParser.END_DOCUMENT && !done) {
            tagName = parser.getName();

            switch (eventType) {
            case XmlPullParser.START_DOCUMENT:                  
                break;
            case XmlPullParser.START_TAG:
                if (tagName.equals("entry")) {                      
                }
                if (tagName.equals("title")) {
                    title = parser.nextText().toString();
                    Log.i(TITLE, title);
                }
                if (tagName.equals("published")) {
                    pubDate = parser.nextText().toString();
                    Log.i(PUBLISHEDDATE, pubDate);
                }

                if (tagName.equals("author")) {
                    readAuthor(parser);
                    Log.i(AUTHOR, author);
                }

                break;
            case XmlPullParser.END_TAG:
                if (tagName.equals("feed")) {
                    done = true;
                } else if (tagName.equals("entry")) {

                    rssFeed = new RssFeedStructure(title);
                    rssFeedList.add(rssFeed);
                }
                break;
            }
            eventType = parser.next();
        }

        private String readAuthor(XmlPullParser parser) throws IOException,
            XmlPullParserException {
            parser.nextTag();
            parser.require(XmlPullParser.START_TAG, null, "name");
            author = parser.nextText().toString();
            parser.require(XmlPullParser.END_TAG, null, "name");
            return author;
        }

如何从标签中检索

标签中的“href”值和文本值(BJP可能会涂抹Narendra Modi .....)。

1 个答案:

答案 0 :(得分:1)

您可以使用JSoup。下载@ http://jsoup.org/download。将jar添加到libs文件夹。

要解析器,我将rss feed复制到assests文件夹中的xml文件。 (localy)

XmlPullParser xpp = factory.newPullParser();
InputStream is = this.getAssets().open("xmlparser.xml");
xpp.setInput(is, "UTF_8");

您可以使用以下内容,因为您有网址。我已经展示了如何提取网址和内容。你需要像平常那样提取其他标签的内容。

  XmlPullParser xpp = factory.newPullParser();

    xpp.setInput(urlStream, null);

    boolean insideItem = false;

    // Returns the type of current event: START_TAG, END_TAG, etc..
    int eventType = xpp.getEventType();
    while (eventType != XmlPullParser.END_DOCUMENT) {
        if (eventType == XmlPullParser.START_TAG) {

            if (xpp.getName().equalsIgnoreCase("entry")) {
                insideItem = true;
            }
             else if (xpp.getName().equalsIgnoreCase("content")) {
                    if (insideItem)
                    {
                        Document doc = Jsoup.parse(xpp.nextText());

                        Elements links = doc.select("a[href]"); // a with href
                          for (Element link : links) {
                                Log.i("........",""+link.attr("abs:href"));
                            }

                        Element divcontent = doc.select("span").first();

                        Log.i("..........",""+divcontent.text());

                    }
                }
        } else if (eventType == XmlPullParser.END_TAG
                && xpp.getName().equalsIgnoreCase("entry")) {
            insideItem = false;
        }

        eventType = xpp.next(); // move to next element
    }

} catch (MalformedURLException e) {
    e.printStackTrace();
} catch (XmlPullParserException e1) {
    e1.printStackTrace();
} catch (IOException e) {
    e.printStackTrace();
}
}

记录:

08-03 08:03:04.413: I/........(1524): http://www.ndtv.com/news/images/topstory_thumbnail/   Shatrughan_Sinha_agency_120.jpg
08-03 08:03:04.423: I/..........(1524): The BJP is likely to anoint Narendra Modi as its prime ministerial candidate for the 2014 elections and make a formal announcement to that effect by September.

编辑:循环遍历元素

Elements divcontent = doc.select("span");
for(int k= 1;k<divcontent.size();k++)
{
     String spancontent =divcontent.get(k).text();
     Log.i("..........",spancontent);
}