无法使用Jsoup从URL中查找注释

时间:2016-08-19 08:33:44

标签: android jsoup

我正在研究Android。我想使用jsoup库从网页中提取评论。我这样做。但不能这样做。有人可以帮忙吗?

 public void fun() {
        Document doc = null;
        try {
            doc = Jsoup.connect("http://tribune.com.pk/story/1164751/federal-govt-dodged-chinese-govt-cpec/").timeout(10 * 1000).get();
        } catch (IOException e) {
            e.printStackTrace();
        }
        Elements pa = doc.getElementsByClass("span-12 last");
        int count = 1;
        for (Element iter : pa) {
            System.out.println( iter.text());
            count = count + 1;
        }
    }

2 个答案:

答案 0 :(得分:0)

按钮点击

我使用了这个..

public void fetchData(View v){

    Toast.makeText(getApplicationContext(),
            "Data is fetching from The Hindu wait some time ",
            Toast.LENGTH_LONG).show();
    new Thread(new Runnable() {

        @Override
        public void run() {
            try {

                // get the Document object from the site. Enter the link of
                // site you want to fetch
                /*
                 * Document document = Jsoup.connect(
                 * "http://javalanguageprogramming.blogspot.in/") .get();
                 */
                Document document = Jsoup.connect(
                        "http://www.thehindu.com/").get();
                title = document.text().toString();

                // Get the title of blog using title tag
                /* title = document.select("h1.title").text().toString(); */

                // set the title of text view

                // Get all the elements with h3 tag and has attribute
                // a[href]
                /*
                 * Elements elements = document.select("div.post-outer")
                 * .select("h3").select("a[href]"); int length =
                 * elements.size();
                 */
                Elements elements = document.select("div.fltrt")
                        .select("h3").select("a[href]");
                int length = elements.size();

                for (int i = 0; i < length; i++) {
                    // store each post heading in the string
                    posts += elements.get(i).text();

                }

                // Run this on ui thread because another thread cannot touch
                // the views of main thread
                runOnUiThread(new Runnable() {

                    @Override
                    public void run() {

                        // set both the text views
                        titleText.setText(title);
                        postText.setText(posts);

                    }
                });

            } catch (Exception e) {
                e.printStackTrace();
            }
        }
    }).start();

}

答案 1 :(得分:0)

这里有2个问题:

  1. 您的程序已关闭,因为服务器希望获得userAgent字符串并返回403 error
  2. 评论位于&#34; li-comment&#34;类。
  3. 此代码适用于我:

    Document doc = null;
    try {
            doc = Jsoup.connect("http://tribune.com.pk/story/1164751/federal-govt-dodged-chinese-govt-cpec/").timeout(10 * 1000)
                    .userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64; rv:47.0) Gecko/20100101 Firefox/47.0")
                    .get();
    } catch (IOException e) {
            e.printStackTrace();
    }
    Elements el = doc.getElementsByClass("li-comment");
    for (Element e : el) {
        System.out.println(e.text());
        System.out.println("-----------------");
    }
    

    如果页面上没有评论,你还应该处理li-comment是emtpy或不存在的情况。