我一直试图通过rss提取java Feed,但我一直收到403错误。我四处寻找它,显然是因为空user-agent vars。
这是我迄今为止所尝试过的:
try {
url = new URL("http://*****.com/feed/");
InputStream is = null;
try {
URLConnection con = url.openConnection();
con.addRequestProperty("User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)");
con.connect();
is = con.getInputStream();
feed = FeedParser.parse(con.getURL());
} catch (IOException e) {
System.out.println("error");
try
{
throw e;
}
catch (IOException e1)
{
// TODO Auto-generated catch block
e1.printStackTrace();
}
} finally {
if( is != null)
try
{
is.close();
}
catch (IOException e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}
}
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (FeedIOException e) {
e.printStackTrace();
} catch (FeedXMLParseException e) {
e.printStackTrace();
} catch (UnsupportedFeedException e) {
e.printStackTrace();
}
int items = feed.getItemCount();
for (int i = 1; i <= items; i++) {
FeedItem item = feed.getItem(i-1);
System.out.println(i+" Title: " + item.getTitle());
}
我无法使其正常工作,我确信我没有正确行事。我用来解析RSS源的库是feed4j。
提前致谢。
答案 0 :(得分:0)
Feed4j不支持设置请求属性。除非您将FeedParser class修改为类似
,否则您无法做到这一点public static Feed parse(URL url, String userAgent) throws IOException, FeedIOException, FeedXMLParseException, UnsupportedFeedException {
try {
URLConnection con = url.openConnection();
if (userAgent != null) {
con.addRequestProperty("User-Agent", userAgent);
}
con.connect();
InputStream is = con.getInputStream();
SAXReader saxReader = new SAXReader();
Document document = saxReader.read(is);
int code = FeedRecognizer.recognizeFeed(document);
switch (code) {
case FeedRecognizer.RSS_1_0:
return TypeRSS_1_0.feed(url, document);
case FeedRecognizer.RSS_2_0:
return TypeRSS_2_0.feed(url, document);
case FeedRecognizer.ATOM_0_3:
return TypeAtom_0_3.feed(url, document);
case FeedRecognizer.ATOM_1_0:
return TypeAtom_1_0.feed(url, document);
default:
throw new UnsupportedFeedException();
}
} catch (DocumentException e) {
throw new FeedXMLParseException(e);
}
}
同样在github