我正在研究Android
。我想使用jsoup
库从网页中提取评论。我这样做。但不能这样做。有人可以帮忙吗?
public void fun() {
Document doc = null;
try {
doc = Jsoup.connect("http://tribune.com.pk/story/1164751/federal-govt-dodged-chinese-govt-cpec/").timeout(10 * 1000).get();
} catch (IOException e) {
e.printStackTrace();
}
Elements pa = doc.getElementsByClass("span-12 last");
int count = 1;
for (Element iter : pa) {
System.out.println( iter.text());
count = count + 1;
}
}
答案 0 :(得分:0)
我使用了这个..
public void fetchData(View v){
Toast.makeText(getApplicationContext(),
"Data is fetching from The Hindu wait some time ",
Toast.LENGTH_LONG).show();
new Thread(new Runnable() {
@Override
public void run() {
try {
// get the Document object from the site. Enter the link of
// site you want to fetch
/*
* Document document = Jsoup.connect(
* "http://javalanguageprogramming.blogspot.in/") .get();
*/
Document document = Jsoup.connect(
"http://www.thehindu.com/").get();
title = document.text().toString();
// Get the title of blog using title tag
/* title = document.select("h1.title").text().toString(); */
// set the title of text view
// Get all the elements with h3 tag and has attribute
// a[href]
/*
* Elements elements = document.select("div.post-outer")
* .select("h3").select("a[href]"); int length =
* elements.size();
*/
Elements elements = document.select("div.fltrt")
.select("h3").select("a[href]");
int length = elements.size();
for (int i = 0; i < length; i++) {
// store each post heading in the string
posts += elements.get(i).text();
}
// Run this on ui thread because another thread cannot touch
// the views of main thread
runOnUiThread(new Runnable() {
@Override
public void run() {
// set both the text views
titleText.setText(title);
postText.setText(posts);
}
});
} catch (Exception e) {
e.printStackTrace();
}
}
}).start();
}
答案 1 :(得分:0)
这里有2个问题:
userAgent
字符串并返回403 error
。此代码适用于我:
Document doc = null;
try {
doc = Jsoup.connect("http://tribune.com.pk/story/1164751/federal-govt-dodged-chinese-govt-cpec/").timeout(10 * 1000)
.userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64; rv:47.0) Gecko/20100101 Firefox/47.0")
.get();
} catch (IOException e) {
e.printStackTrace();
}
Elements el = doc.getElementsByClass("li-comment");
for (Element e : el) {
System.out.println(e.text());
System.out.println("-----------------");
}
如果页面上没有评论,你还应该处理li-comment是emtpy或不存在的情况。