使用Jsoup从div中提取文本

时间:2015-02-24 13:26:21

标签: java android jsoup

使用此代码,应用程序应该提取站点div的文本并将其显示在屏幕上,但是这没有发生,而且[并没有在Logcat中显示错误,我做错了什么?

    package com.androidbegin.jsouptutorial;

import java.io.IOException;
import java.io.InputStream;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;
import android.os.AsyncTask;
import android.os.Bundle;
import android.app.Activity;
import android.app.ProgressDialog;
import android.view.View;
import android.view.View.OnClickListener;
import android.widget.Button;

import android.widget.TextView;

public class MainActivity extends Activity {
    TextView txtdesc;

    // URL Address
    String url = "http://uat.sophiejuliete.com.br/tendencias/";
    ProgressDialog mProgressDialog;

    @Override
    public void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);

        // Locate the Buttons in activity_main.xml
        Button titlebutton = (Button) findViewById(R.id.titlebutton);
        txtdesc = (TextView) findViewById(R.id.desctxt);


        // Capture button click
        titlebutton.setOnClickListener(new OnClickListener() {
            public void onClick(View arg0) {
                // Execute Title AsyncTask
                new Title().execute();
            }
        });

    }


    private class Title extends AsyncTask<Void, Void, String> {

        @Override
        protected void onPreExecute() {
            super.onPreExecute();
            mProgressDialog = new ProgressDialog(MainActivity.this);
            mProgressDialog.setTitle("Android Basic JSoup Tutorial");
            mProgressDialog.setMessage("Loading...");
            mProgressDialog.setIndeterminate(false);
            mProgressDialog.show();
        }

        @Override
        protected String doInBackground(Void... params) {
            String desc = null;
            try {
                // Connect to the web site
                Document document = Jsoup.connect(url).get();
                // Using Elements to get the Meta data
                Elements description = document.select("div[class=postWrapper]");
                // Locate the content attribute
                desc = description.text();
            } catch (IOException e) {
                e.printStackTrace();
            }
            return desc;
        }

        @Override
        protected void onPostExecute(String result) {
            // Set description into TextView
            txtdesc.setText(result);
            mProgressDialog.dismiss();
        }

    }



}

这是您需要分析的网站的结构:

<div class="postWrapper" id="post162">
        <div class="postTitle">


            <h2>
                <a href="http://uat.sophiejuliete.com.br/tendencias/agarradinhos-as-orelhas/">
                    Agarradinhos às orelhas                </a>
            </h2>

            <div class="fb-custom-share" data-url="http://uat.sophiejuliete.com.br/tendencias/agarradinhos-as-orelhas/">
                Compartilhar
            </div>

            <div class="date">
                26 de janeiro de 2015            </div>

        </div>

        <div class="postContent"><p>Agarradinhos às orelhas, os solitários e brincos curtos são ideais tanto para o dia como para a noite.</p>
<p>E melhor ainda ficam bem em qualquer formato de rosto.</p>
<p>Basta apenas escolher o modelo conforme a ocasião que você vai utilizar.</p>
<p>&nbsp;</p>
<p><a href="http://sophiejuliete.com.br/shop/brincos.html"><img style="display: block; margin-left: auto; margin-right: auto;" src="http://uat.sophiejuliete.com.br/media/wysiwyg/Agarradinhos_s_orelhas.jpg" alt=""></a></p></div>
    </div>

1 个答案:

答案 0 :(得分:0)

尝试

desc = description.text();

而不是

desc = description.attr("postContent");

示例:

public static void main(String[] args) throws Exception {
    String url = "http://uat.sophiejuliete.com.br/tendencias/";
    Document document = Jsoup.connect(url).timeout(10000).get();
    // Using Elements to get the Meta data
    Elements description = document.select("div[class=postContent]");
    // Locate the content attribute
    String desc = description.text();
    System.out.println(desc);
    // prints out "Agarradinhos às orelhas, os solitários e brincos..."
}

<强>更新

由于JSoup部分已修复,因此您可能遇到异步任务的一些问题。尝试使用String作为结果类型,类似这样

private class Title extends AsyncTask<Void, Void, String> {

    ...

    @Override
    protected String doInBackground(Void... params) {
        String desc = null;
        try {
            // Connect to the web site
            Document document = Jsoup.connect(url).get();
            // Using Elements to get the Meta data
            Elements description = document.select("div[class=postContent]");
            // Locate the content attribute
            desc = description.text();
        } catch (IOException e) {
            e.printStackTrace();
        }
        return desc;
    }

    @Override
    protected void onPostExecute(String result) {
        // Set description into TextView
        TextView txtdesc = (TextView) findViewById(R.id.desctxt);
        txtdesc.setText(result);
        mProgressDialog.dismiss();
    }

}

更新2

txtdesc

中全局声明MainActivity
TextView txtdesc;

onCreate()

中初始化它
txtdesc = (TextView) findViewById(R.id.desctxt);

并删除onPostExecute()中的声明,因此只有txtdesc.setText(result);

@Override
protected void onPostExecute(String result) {
    // Set description into TextView
    txtdesc.setText(result);
    mProgressDialog.dismiss();
}