在Android应用程序中解析RSS源

时间:2012-04-21 18:18:33

标签: android rss

我正在尝试从RSS Feed中检索数据。我的计划运作良好,但有一个例外。 Feed的项目结构如下:

<title></title>
<link></link>
<description></description>

我可以检索数据,但是当标题有'&amp;'时返回的字符串在前面的字符处停止。例如,这个标题:

<title>A&amp;T To Play Four Against Bears</title>

当我期望回​​到'A&amp; T To Four Against Bears'时,我只能回到'A'。

任何人都可以告诉我是否可以修改现有的RSSReader类来解释是否存在&amp; amp:

import android.util.Log;

import java.net.URL; import java.util.ArrayList; import javax.xml.parsers.DocumentBuilder; import javax.xml.parsers.DocumentBuilderFactory; import org.w3c.dom.CharacterData; import org.w3c.dom.Document; import org.w3c.dom.Element; import org.w3c.dom.Node; import org.w3c.dom.NodeList;

public class RSSReader {

private static RSSReader instance = null;

private RSSReader() {
}

public static RSSReader getInstance() {
    if (instance == null) {
        instance = new RSSReader();
    }
    return instance;
}

public ArrayList<Story> getStories(String address) {
    ArrayList<Story> stories = new ArrayList<Story>();
    try {
        DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
        URL u = new URL(address);
        Document doc = builder.parse(u.openStream());
        NodeList nodes = doc.getElementsByTagName("item");
        for (int i = 0; i < nodes.getLength(); i++) {
            Element element = (Element) nodes.item(i);
            Story currentStory = new Story(getElementValue(element, "title"),
                    getElementValue(element, "description"),
                    getElementValue(element, "link"),
                    getElementValue(element, "pubDate"));
            stories.add(currentStory);
        }//for
    }//try
    catch (Exception ex) {
        if (ex instanceof java.net.ConnectException) {
        }
    }
    return stories;
}

private String getCharacterDataFromElement(Element e) {
    try {
        Node child = e.getFirstChild();
        if (child instanceof CharacterData) {
            CharacterData cd = (CharacterData) child;
            return cd.getData();
        }
    } catch (Exception ex) {
        Log.i("myTag2", ex.toString());
    }
    return "";
} //private String getCharacterDataFromElement

protected float getFloat(String value) {
    if (value != null && !value.equals("")) {
        return Float.parseFloat(value);
    } else {
        return 0;
    }
}

protected String getElementValue(Element parent, String label) {
    return getCharacterDataFromElement((Element) parent.getElementsByTagName(label).item(0));
}

}

关于如何解决这个问题的任何想法?

2 个答案:

答案 0 :(得分:1)

我用我使用的解析器测试了rss feed,它解析如下。 看起来它是可解析的,但正如我在评论中所写,由于CDATA既被使用也被转义,有像“A&amp; T”这样的文本,但你可以在解析xml后替换它们。

D/*** TITLE      : A&T To Play Four Against Longwood
D/*** DESCRIPTION: A&amp;T baseball takes a break from conference play this weekend.
D/*** TITLE      : Wilkerson Named MEAC Rookie of the Week
D/*** DESCRIPTION: Wilkerson was 6-for-14 for the week of April 9-15.
D/*** TITLE      : Lights, Camera, Action
D/*** DESCRIPTION: A&amp;T baseball set to play nationally televised game on ESPNU.
D/*** TITLE      : Resilient Aggies Fall To USC Upstate
D/*** DESCRIPTION: Luke Tendler extends his hitting streak to 10 games.
D/*** TITLE      : NCCU Defeats A&T In Key Conference Matchup
D/*** DESCRIPTION: Kelvin Freeman leads the Aggies with three hits.

我正在分享我用来比较与你的不同的rss feed解析器的大部分内容。

<强> XmlPullFeedParser.java

package com.nesim.test.rssparser;

import java.util.ArrayList;
import java.util.List;

import org.xmlpull.v1.XmlPullParser;

import android.util.Log;
import android.util.Xml;

public class XmlPullFeedParser extends BaseFeedParser {

  public XmlPullFeedParser(String feedUrl) {
    super(feedUrl);
  }

  public List<Message> parse() {
    List<Message> messages = null;
    XmlPullParser parser = Xml.newPullParser();
    try {
      // auto-detect the encoding from the stream
      parser.setInput(this.getInputStream(), null);
      int eventType = parser.getEventType();
      Message currentMessage = null;
      boolean done = false;
      while (eventType != XmlPullParser.END_DOCUMENT && !done){
        String name = null;
        switch (eventType){
          case XmlPullParser.START_DOCUMENT:
            messages = new ArrayList<Message>();
            break;
          case XmlPullParser.START_TAG:
            name = parser.getName();
            if (name.equalsIgnoreCase(ITEM)){
              currentMessage = new Message();
            } else if (currentMessage != null){
              if (name.equalsIgnoreCase(LINK)){
                currentMessage.setLink(parser.nextText());
              } else if (name.equalsIgnoreCase(DESCRIPTION)){
                currentMessage.setDescription(parser.nextText());
              } else if (name.equalsIgnoreCase(PUB_DATE)){
                currentMessage.setDate(parser.nextText());
              } else if (name.equalsIgnoreCase(TITLE)){
                currentMessage.setTitle(parser.nextText());
              } else if (name.equalsIgnoreCase(DATES)){
                currentMessage.setDates(parser.nextText());
              } 
            }
            break;
          case XmlPullParser.END_TAG:
            name = parser.getName();
            if (name.equalsIgnoreCase(ITEM) && currentMessage != null){
              messages.add(currentMessage);
            } else if (name.equalsIgnoreCase(CHANNEL)){
              done = true;
            }
            break;
        }
        eventType = parser.next();
      }
    } catch (Exception e) {
      Log.e("AndroidNews::PullFeedParser", e.getMessage(), e);
      throw new RuntimeException(e);
    }
    return messages;
  }
}

<强> BaseFeedParser.java

package com.nesim.test.rssparser;

import java.io.IOException;
import java.io.InputStream;
import java.net.MalformedURLException;
import java.net.URL;

public abstract class BaseFeedParser implements FeedParser {

  // names of the XML tags
  static final String CHANNEL = "channel";
  static final String PUB_DATE = "pubDate";
  static final  String DESCRIPTION = "description";
  static final  String LINK = "link";
  static final  String TITLE = "title";
  static final  String ITEM = "item";
  static final  String DATES = "dates";
  private final URL feedUrl;

  protected BaseFeedParser(String feedUrl){
    try {
      this.feedUrl = new URL(feedUrl);
    } catch (MalformedURLException e) {
      throw new RuntimeException(e);
    }
  }

  protected InputStream getInputStream() {
    try {
      return feedUrl.openConnection().getInputStream();
    } catch (IOException e) {
      throw new RuntimeException(e);
    }
  }
}

<强> FeedParser.java

package com.nesim.test.rssparser;

import java.util.List;

public interface FeedParser {
  List<Message> parse();
}

答案 1 :(得分:0)

似乎你没有像我提供的那样更改你的代码。如果你坚持像那样解析它,你需要先获取xml并操纵它以进行正确的解析。 我也给了一个类,在这条消息的末尾将xml作为文本。 请改变你的代码,试着写下结果。

如果你改变这一行,你就会成功。

从getStories函数中删除这一行:

DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
URL u = new URL(address);
Document doc = builder.parse(u.openStream());

添加以下内容,而不是删除这些行:

WebRequest response = new WebRequest("http://www.ncataggies.com/rss.dbml?db_oem_id=24500&RSS_SPORT_ID=74515&media=news",PostType.GET);
String htmltext = response.Get();

int firtItemIndex = htmltext.indexOf("<item>");
String htmltextHeader = htmltext.substring(0,firtItemIndex);
String htmltextBody = htmltext.substring(firtItemIndex);

htmltextBody = htmltextBody.replace("<title>", "<title><![CDATA[ ");
htmltextBody = htmltextBody.replace("</title>", "]]></title>");

htmltextBody = htmltextBody.replace("<link>", "<link><![CDATA[ ");
htmltextBody = htmltextBody.replace("</link>", "]]></link>");

htmltextBody = htmltextBody.replace("<guid>", "<guid><![CDATA[ ");
htmltextBody = htmltextBody.replace("</guid>", "]]></guid>");
htmltextBody = htmltextBody.replace("&amp;", "&");
htmltext = htmltextHeader + htmltextBody;

Document doc = XMLfunctions.XMLfromString(htmltext);

<强> WebRequest.java

package com.nesim.test;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.UnknownHostException;
import java.nio.charset.Charset;

import org.apache.http.HttpResponse;
import org.apache.http.client.CookieStore;
import org.apache.http.client.HttpClient;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.client.protocol.ClientContext;
import org.apache.http.impl.client.BasicCookieStore;
import org.apache.http.impl.client.DefaultHttpClient;
import org.apache.http.protocol.BasicHttpContext;
import org.apache.http.protocol.HttpContext;


public class WebRequest {
  public enum PostType{
    GET, POST;
  }

  public String _url;
  public String response = "";
  public PostType _postType;
  CookieStore _cookieStore = new BasicCookieStore();

  public WebRequest(String url) {
    _url = url;
    _postType = PostType.POST;
  }

  public WebRequest(String url, CookieStore cookieStore) {
    _url = url;
    _cookieStore = cookieStore;
    _postType = PostType.POST;
  }

  public WebRequest(String url, PostType postType) {
    _url = url;
    _postType = postType;
  }

  public String Get() {
    HttpClient httpclient = new DefaultHttpClient();

    try {




      // Create local HTTP context
      HttpContext localContext = new BasicHttpContext();

      // Bind custom cookie store to the local context
      localContext.setAttribute(ClientContext.COOKIE_STORE, _cookieStore);

      HttpResponse httpresponse;
      if (_postType == PostType.POST)
      {
        HttpPost httppost = new HttpPost(_url);
        httpresponse = httpclient.execute(httppost, localContext);
      }
      else
      {
        HttpGet httpget = new HttpGet(_url);
        httpresponse = httpclient.execute(httpget, localContext);
      }

      StringBuilder responseString = inputStreamToString(httpresponse.getEntity().getContent());

      response = responseString.toString();
    }
    catch (UnknownHostException e) {
      e.printStackTrace();
    }
    catch (Exception e) {
      e.printStackTrace();
    }
    finally {
      // When HttpClient instance is no longer needed,
      // shut down the connection manager to ensure
      // immediate deallocation of all system resources
      httpclient.getConnectionManager().shutdown();
    }

    return response;
  }

  private StringBuilder inputStreamToString(InputStream is) throws IOException {
    String line = "";
    StringBuilder total = new StringBuilder();

    // Wrap a BufferedReader around the InputStream
    BufferedReader rd = new BufferedReader(new InputStreamReader(is,Charset.forName("iso-8859-9")));
    // Read response until the end
    while ((line = rd.readLine()) != null) {
      total.append(line);
    }

    // Return full string
    return total;
  }
}

重要:

不要忘记在WebRequest.java的第一行更改包名称

package com.nesim.test;

<强>结果:

完成这些更改后,您将获得以下内容:

D/title:  Two Walk-Off Moments Lead To Two A&T Losses
D/description: The Lancers win in their last at-bat in both games of Saturday&#39;s doubleheader.
D/title:  A&T To Play Four Against Longwood
D/description: A&T baseball takes a break from conference play this weekend.
D/title:  Wilkerson Named MEAC Rookie of the Week
D/description: Wilkerson was 6-for-14 for the week of April 9-15.
D/title:  Lights, Camera, Action
D/description: A&T baseball set to play nationally televised game on ESPNU.
D/title:  Resilient Aggies Fall To USC Upstate
D/description: Luke Tendler extends his hitting streak to 10 games.

您的解析会返回以下内容:

D/title  : Two Walk-Off Moments Lead To Two A
D/description: The Lancers win in their last at-bat in both games of Saturday&#39;s doubleheader.
D/title  : A
D/description: A&amp;T baseball takes a break from conference play thisweekend.
D/title  : Wilkerson Named MEAC Rookie of the Week
D/description: Wilkerson was 6-for-14 for the week of April 9-15.
D/title  : Lights, Camera, Action
D/description: A&amp;T baseball set to play nationally televised game on ESPNU.
D/title  : Resilient Aggies Fall To USC Upstate
D/description: Luke Tendler extends his hitting streak to 10 games.