使用JSOUP刮擦标签

时间:2013-09-12 21:55:32

标签: java android jsoup scrape scraper

我正在尝试使用JSOUP从下表中提取<TD>值:

<table class="datagrid">
        <tbody><tr>
            <th>Item No.</th>
            <th>Name</th>
            <th>Sex</th>
            <th>Location</th>
        </tr>

            <tr>
                <td><a href="redirector.cfm?ID=a33660a3-aae0-45e3-9703-d59d77717836&amp;page=1&amp;&amp;lname=&amp;fname=" title="501207593">501207593&nbsp;</a></td>
                <td>USER1</td>
                <td>M&nbsp;</td>
                <td>Unknown</td>
            </tr>

            <tr>
                <td><a href="redirector.cfm?ID=edf524da-8598-450f-9373-da87db8d6c84&amp;page=1&amp;&amp;lname=&amp;fname=" title="501302750">501302750&nbsp;</a></td>
                <td>USER2</td>
                <td>M&nbsp;</td>
                <td>Unknown</td>
            </tr>

            <tr>
                <td><a href="redirector.cfm?ID=a78abeea-7651-4ac1-bba2-0dcb272c8b77&amp;page=1&amp;&amp;lname=&amp;fname=" title="531201804">531201804&nbsp;</a></td>
                <td>USER3</td>
                <td>M&nbsp;</td>
                <td>Unknown</td>
            </tr>

    </tbody></table>

到目前为止,我已经能够从标题标签中提取数据 - 但我不确定我是如何指定<TD>标签的。

CURRENT SOURCE:

package com.example.test;

import java.io.BufferedReader;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.IOException;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import org.apache.http.HttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.DefaultHttpClient;



import android.app.Activity;
import android.app.ProgressDialog;
import android.os.AsyncTask;
import android.os.Bundle;
import android.view.View;
import android.widget.TextView;

public class MainActivity extends Activity {

TextView tv;
final String URL="http://www.google.com";
@Override
public void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);

tv = (TextView) findViewById(R.id.TextView01);
new MyTask().execute(URL);
}

private class MyTask extends AsyncTask<String, Void, String> {
ProgressDialog prog;
String title = "";
@Override
protected void onPreExecute() {
      prog = new ProgressDialog(MainActivity.this);
      prog.setMessage("Loading....");
      prog.show();
}
@Override
protected String doInBackground(String... params) {
    try {
        Document doc = Jsoup.connect(params[0]).get();
        title = doc.title();
    } catch (IOException e) {
        e.printStackTrace();
    }
    return title;
}
@Override
protected void onPostExecute(String result) {
    super.onPostExecute(result);
    prog.dismiss();
    tv.setText(result);
}
}
}

1 个答案:

答案 0 :(得分:0)

Elements tds = tableElement.getElementsByTag("td");,其中tableElement是表示该表的Element,将返回表格中带有<td>标记的所有元素的集合。您很可能通过执行Element tableElement = document.getElementsByClass("datagrid").first()(假设只有一个)来检索表。

相关问题