我正在尝试获取此网站的RSS Feed:
http://www.phonearena.com/feed
这是我的domparser活动:
public class DOMParser {
private RSSFeed _feed = new RSSFeed();
public RSSFeed parseXml(String xml) {
URL url = null;
try {
url = new URL(xml);
} catch (MalformedURLException e1) {
e1.printStackTrace();
}
try {
DocumentBuilderFactory dbf;
dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new InputSource(url.openStream()));
doc.getDocumentElement().normalize();
NodeList nl = doc.getElementsByTagName("item");
NodeList itemChildren = null;
Node currentItem = null;
Node currentChild = null;
int length = nl.getLength();
for (int i = 0; i < length; i++) {
currentItem = nl.item(i);
RSSItem _item = new RSSItem();
NodeList nchild = currentItem.getChildNodes();
int clength = nchild.getLength();
for (int j = 0; j < clength; j++) {
currentChild = nchild.item(j);
String theString = null;
String nodeName = currentChild.getNodeName();
theString = nchild.item(j).getFirstChild().getNodeValue();
if (theString != null) {
if ("title".equals(nodeName)) {
_item.setTitle(theString);
}
else if ("description".equals(nodeName)) {
_item.setDescription(theString);
// Parse the html description to get the image url
String html = theString;
org.jsoup.nodes.Document docHtml = Jsoup
.parse(html);
Elements imgEle = docHtml.select("img");
_item.setImage(imgEle.attr("src"));
}
else if ("pubDate".equals(nodeName)) {
String formatedDate = theString.replace(" +0000",
"");
_item.setDate(formatedDate);
}
}
}
_feed.addItem(_item);
}
} catch (Exception e) {
}
return _feed;
}
}
除了我试图通过jsoup获取的图像外,一切正常。
任何人都能说出我做错了什么或错过了吗?
答案 0 :(得分:0)
变量theString
在传递给Jsoup之前需要转义。
else if ("description".equals(nodeName)) {
_item.setDescription(theString);
// Unescape then Parse the html description to get the image url
Element imgEle = Jsoup.parse( //
Parser.unescapeEntities( //
Parser.xmlParser().parseInput(theString, "").outerHtml(), //
true //
)) //
.select("img").first();
if (imgEle != null) {
_item.setImage(imgEle.attr("src"));
}
}