我正在尝试从RSS Feed中检索数据。我的计划运作良好,但有一个例外。 Feed的项目结构如下:
<title></title>
<link></link>
<description></description>
我可以检索数据,但是当标题有'&amp;'时返回的字符串在前面的字符处停止。例如,这个标题:
<title>A&T To Play Four Against Bears</title>
当我期望回到'A&amp; T To Four Against Bears'时,我只能回到'A'。
任何人都可以告诉我是否可以修改现有的RSSReader类来解释是否存在&amp; amp:
import android.util.Log;
import java.net.URL; import java.util.ArrayList; import javax.xml.parsers.DocumentBuilder; import javax.xml.parsers.DocumentBuilderFactory; import org.w3c.dom.CharacterData; import org.w3c.dom.Document; import org.w3c.dom.Element; import org.w3c.dom.Node; import org.w3c.dom.NodeList;
public class RSSReader {
private static RSSReader instance = null;
private RSSReader() {
}
public static RSSReader getInstance() {
if (instance == null) {
instance = new RSSReader();
}
return instance;
}
public ArrayList<Story> getStories(String address) {
ArrayList<Story> stories = new ArrayList<Story>();
try {
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
URL u = new URL(address);
Document doc = builder.parse(u.openStream());
NodeList nodes = doc.getElementsByTagName("item");
for (int i = 0; i < nodes.getLength(); i++) {
Element element = (Element) nodes.item(i);
Story currentStory = new Story(getElementValue(element, "title"),
getElementValue(element, "description"),
getElementValue(element, "link"),
getElementValue(element, "pubDate"));
stories.add(currentStory);
}//for
}//try
catch (Exception ex) {
if (ex instanceof java.net.ConnectException) {
}
}
return stories;
}
private String getCharacterDataFromElement(Element e) {
try {
Node child = e.getFirstChild();
if (child instanceof CharacterData) {
CharacterData cd = (CharacterData) child;
return cd.getData();
}
} catch (Exception ex) {
Log.i("myTag2", ex.toString());
}
return "";
} //private String getCharacterDataFromElement
protected float getFloat(String value) {
if (value != null && !value.equals("")) {
return Float.parseFloat(value);
} else {
return 0;
}
}
protected String getElementValue(Element parent, String label) {
return getCharacterDataFromElement((Element) parent.getElementsByTagName(label).item(0));
}
}
关于如何解决这个问题的任何想法?
答案 0 :(得分:1)
我用我使用的解析器测试了rss feed,它解析如下。 看起来它是可解析的,但正如我在评论中所写,由于CDATA既被使用也被转义,有像“A&amp; T”这样的文本,但你可以在解析xml后替换它们。
D/*** TITLE : A&T To Play Four Against Longwood
D/*** DESCRIPTION: A&T baseball takes a break from conference play this weekend.
D/*** TITLE : Wilkerson Named MEAC Rookie of the Week
D/*** DESCRIPTION: Wilkerson was 6-for-14 for the week of April 9-15.
D/*** TITLE : Lights, Camera, Action
D/*** DESCRIPTION: A&T baseball set to play nationally televised game on ESPNU.
D/*** TITLE : Resilient Aggies Fall To USC Upstate
D/*** DESCRIPTION: Luke Tendler extends his hitting streak to 10 games.
D/*** TITLE : NCCU Defeats A&T In Key Conference Matchup
D/*** DESCRIPTION: Kelvin Freeman leads the Aggies with three hits.
我正在分享我用来比较与你的不同的rss feed解析器的大部分内容。
<强> XmlPullFeedParser.java 强>
package com.nesim.test.rssparser;
import java.util.ArrayList;
import java.util.List;
import org.xmlpull.v1.XmlPullParser;
import android.util.Log;
import android.util.Xml;
public class XmlPullFeedParser extends BaseFeedParser {
public XmlPullFeedParser(String feedUrl) {
super(feedUrl);
}
public List<Message> parse() {
List<Message> messages = null;
XmlPullParser parser = Xml.newPullParser();
try {
// auto-detect the encoding from the stream
parser.setInput(this.getInputStream(), null);
int eventType = parser.getEventType();
Message currentMessage = null;
boolean done = false;
while (eventType != XmlPullParser.END_DOCUMENT && !done){
String name = null;
switch (eventType){
case XmlPullParser.START_DOCUMENT:
messages = new ArrayList<Message>();
break;
case XmlPullParser.START_TAG:
name = parser.getName();
if (name.equalsIgnoreCase(ITEM)){
currentMessage = new Message();
} else if (currentMessage != null){
if (name.equalsIgnoreCase(LINK)){
currentMessage.setLink(parser.nextText());
} else if (name.equalsIgnoreCase(DESCRIPTION)){
currentMessage.setDescription(parser.nextText());
} else if (name.equalsIgnoreCase(PUB_DATE)){
currentMessage.setDate(parser.nextText());
} else if (name.equalsIgnoreCase(TITLE)){
currentMessage.setTitle(parser.nextText());
} else if (name.equalsIgnoreCase(DATES)){
currentMessage.setDates(parser.nextText());
}
}
break;
case XmlPullParser.END_TAG:
name = parser.getName();
if (name.equalsIgnoreCase(ITEM) && currentMessage != null){
messages.add(currentMessage);
} else if (name.equalsIgnoreCase(CHANNEL)){
done = true;
}
break;
}
eventType = parser.next();
}
} catch (Exception e) {
Log.e("AndroidNews::PullFeedParser", e.getMessage(), e);
throw new RuntimeException(e);
}
return messages;
}
}
<强> BaseFeedParser.java 强>
package com.nesim.test.rssparser;
import java.io.IOException;
import java.io.InputStream;
import java.net.MalformedURLException;
import java.net.URL;
public abstract class BaseFeedParser implements FeedParser {
// names of the XML tags
static final String CHANNEL = "channel";
static final String PUB_DATE = "pubDate";
static final String DESCRIPTION = "description";
static final String LINK = "link";
static final String TITLE = "title";
static final String ITEM = "item";
static final String DATES = "dates";
private final URL feedUrl;
protected BaseFeedParser(String feedUrl){
try {
this.feedUrl = new URL(feedUrl);
} catch (MalformedURLException e) {
throw new RuntimeException(e);
}
}
protected InputStream getInputStream() {
try {
return feedUrl.openConnection().getInputStream();
} catch (IOException e) {
throw new RuntimeException(e);
}
}
}
<强> FeedParser.java 强>
package com.nesim.test.rssparser;
import java.util.List;
public interface FeedParser {
List<Message> parse();
}
答案 1 :(得分:0)
似乎你没有像我提供的那样更改你的代码。如果你坚持像那样解析它,你需要先获取xml并操纵它以进行正确的解析。 我也给了一个类,在这条消息的末尾将xml作为文本。 请改变你的代码,试着写下结果。
如果你改变这一行,你就会成功。
从getStories函数中删除这一行:
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
URL u = new URL(address);
Document doc = builder.parse(u.openStream());
添加以下内容,而不是删除这些行:
WebRequest response = new WebRequest("http://www.ncataggies.com/rss.dbml?db_oem_id=24500&RSS_SPORT_ID=74515&media=news",PostType.GET);
String htmltext = response.Get();
int firtItemIndex = htmltext.indexOf("<item>");
String htmltextHeader = htmltext.substring(0,firtItemIndex);
String htmltextBody = htmltext.substring(firtItemIndex);
htmltextBody = htmltextBody.replace("<title>", "<title><![CDATA[ ");
htmltextBody = htmltextBody.replace("</title>", "]]></title>");
htmltextBody = htmltextBody.replace("<link>", "<link><![CDATA[ ");
htmltextBody = htmltextBody.replace("</link>", "]]></link>");
htmltextBody = htmltextBody.replace("<guid>", "<guid><![CDATA[ ");
htmltextBody = htmltextBody.replace("</guid>", "]]></guid>");
htmltextBody = htmltextBody.replace("&", "&");
htmltext = htmltextHeader + htmltextBody;
Document doc = XMLfunctions.XMLfromString(htmltext);
<强> WebRequest.java 强>
package com.nesim.test;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.UnknownHostException;
import java.nio.charset.Charset;
import org.apache.http.HttpResponse;
import org.apache.http.client.CookieStore;
import org.apache.http.client.HttpClient;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.client.protocol.ClientContext;
import org.apache.http.impl.client.BasicCookieStore;
import org.apache.http.impl.client.DefaultHttpClient;
import org.apache.http.protocol.BasicHttpContext;
import org.apache.http.protocol.HttpContext;
public class WebRequest {
public enum PostType{
GET, POST;
}
public String _url;
public String response = "";
public PostType _postType;
CookieStore _cookieStore = new BasicCookieStore();
public WebRequest(String url) {
_url = url;
_postType = PostType.POST;
}
public WebRequest(String url, CookieStore cookieStore) {
_url = url;
_cookieStore = cookieStore;
_postType = PostType.POST;
}
public WebRequest(String url, PostType postType) {
_url = url;
_postType = postType;
}
public String Get() {
HttpClient httpclient = new DefaultHttpClient();
try {
// Create local HTTP context
HttpContext localContext = new BasicHttpContext();
// Bind custom cookie store to the local context
localContext.setAttribute(ClientContext.COOKIE_STORE, _cookieStore);
HttpResponse httpresponse;
if (_postType == PostType.POST)
{
HttpPost httppost = new HttpPost(_url);
httpresponse = httpclient.execute(httppost, localContext);
}
else
{
HttpGet httpget = new HttpGet(_url);
httpresponse = httpclient.execute(httpget, localContext);
}
StringBuilder responseString = inputStreamToString(httpresponse.getEntity().getContent());
response = responseString.toString();
}
catch (UnknownHostException e) {
e.printStackTrace();
}
catch (Exception e) {
e.printStackTrace();
}
finally {
// When HttpClient instance is no longer needed,
// shut down the connection manager to ensure
// immediate deallocation of all system resources
httpclient.getConnectionManager().shutdown();
}
return response;
}
private StringBuilder inputStreamToString(InputStream is) throws IOException {
String line = "";
StringBuilder total = new StringBuilder();
// Wrap a BufferedReader around the InputStream
BufferedReader rd = new BufferedReader(new InputStreamReader(is,Charset.forName("iso-8859-9")));
// Read response until the end
while ((line = rd.readLine()) != null) {
total.append(line);
}
// Return full string
return total;
}
}
重要:强>
不要忘记在WebRequest.java的第一行更改包名称
package com.nesim.test;
<强>结果:强>
完成这些更改后,您将获得以下内容:
D/title: Two Walk-Off Moments Lead To Two A&T Losses
D/description: The Lancers win in their last at-bat in both games of Saturday's doubleheader.
D/title: A&T To Play Four Against Longwood
D/description: A&T baseball takes a break from conference play this weekend.
D/title: Wilkerson Named MEAC Rookie of the Week
D/description: Wilkerson was 6-for-14 for the week of April 9-15.
D/title: Lights, Camera, Action
D/description: A&T baseball set to play nationally televised game on ESPNU.
D/title: Resilient Aggies Fall To USC Upstate
D/description: Luke Tendler extends his hitting streak to 10 games.
您的解析会返回以下内容:
D/title : Two Walk-Off Moments Lead To Two A
D/description: The Lancers win in their last at-bat in both games of Saturday's doubleheader.
D/title : A
D/description: A&T baseball takes a break from conference play thisweekend.
D/title : Wilkerson Named MEAC Rookie of the Week
D/description: Wilkerson was 6-for-14 for the week of April 9-15.
D/title : Lights, Camera, Action
D/description: A&T baseball set to play nationally televised game on ESPNU.
D/title : Resilient Aggies Fall To USC Upstate
D/description: Luke Tendler extends his hitting streak to 10 games.