Android Jsoup,如何解析表?

时间:2017-02-07 12:41:40

标签: java android jsoup

我试图从网页上抓一张桌子,但我似乎无法让它正常工作。

<table cellpadding="0" cellspacing="0" border="0" class="pricetable sortable" id="sortabletable">
    <thead class="tableheader">
        <tr class="sortbottom">
            <th class="thtableheaderlogo unsortable">&nbsp;</th>
            <th class="thtableheaderprice"><div class="tableheaderprice">Pris</div></th>
            <th class="thtableheaderaddress"><div class="tableheaderaddress">Adresse</div></th>
            <th class="thtableheaderobserved unsortable"><div class="tableheaderobserved">Tidspunkt</div></th>
        </tr>
    </thead>
    <tfoot>
        <tr class="unsortable">
            <td colspan="4"><br />* Denne pris er indberettet af selskabet <a style="margin-left: 40px;" href="/indberet">Indberet pris</a></td>
        </tr>
    </tfoot>
    <tbody id="list_canvas">
        <tr>
            <td class="tablebodylogo"><img src="/sites/all/themes/benzinpriser/logo/f24.jpg" alt="" style="width:32px; height: 18px;" /></td>
            <td class="tablebodyprice">&nbsp;<a href="/f24/f24-frederiksborgvej-1" class="octanelink">10.57</a></td>
            <td class="tablebodyaddress" title="Frederiksborgvej 1 3600 Frederikssund">&nbsp;<a href="/f24/f24-frederiksborgvej-1" class="octanelink">Frederiksborgvej 1 3600 Frederikssund</a></td>
            <td class="tablebodydate"><a href="/f24/f24-frederiksborgvej-1" class="octanelink">1 time 57 m </a></td>
        </tr>
        <tr>
            <td class="tablebodylogo"><img src="/sites/all/themes/benzinpriser/logo/q8.gif" alt="" style="width:32px; height: 18px;" /></td>
            <td class="tablebodyprice">&nbsp;<a href="/q8/q8-jernbanegade-43" class="octanelink">10.67</a></td>
            <td class="tablebodyaddress" title="Jernbanegade 43 3600 Frederikssund">&nbsp;<a href="/q8/q8-jernbanegade-43" class="octanelink">Jernbanegade 43 3600 Frederikssund</a></td>
            <td class="tablebodydate"><a href="/q8/q8-jernbanegade-43" class="octanelink">1 time 57 m </a></td>
        </tr>
        <tr>
            <td class="tablebodylogo"><img src="/sites/all/themes/benzinpriser/logo/shell.gif" alt="" style="width:32px; height: 18px;" /></td>
            <td class="tablebodyprice">&nbsp;<a href="/shell/shell-ny-%C3%B8stergade-12" class="octanelink">11.87</a></td>
            <td class="tablebodyaddress" title="Ny Østergade 12 3600 Frederikssund">&nbsp;<a href="/shell/shell-ny-%C3%B8stergade-12" class="octanelink">Ny Østergade 12 3600 Frederikssund</a></td>
            <td class="tablebodydate"><a href="/shell/shell-ny-%C3%B8stergade-12" class="octanelink">1 time 57 m </a></td>
        </tr>
        <tr>
            <td class="tablebodylogo"><img src="/sites/all/themes/benzinpriser/logo/shell.gif" alt="" style="width:32px; height: 18px;" /></td>
            <td class="tablebodyprice">&nbsp;<a href="/shell/shell-askelundsvej-1" class="octanelink">11.87</a></td>
            <td class="tablebodyaddress" title="Askelundsvej 1 3600 Frederikssund">&nbsp;<a href="/shell/shell-askelundsvej-1" class="octanelink">Askelundsvej 1 3600 Frederikssund</a></td>
            <td class="tablebodydate"><a href="/shell/shell-askelundsvej-1" class="octanelink">1 time 57 m </a></td>
        </tr>
        <tr>
            <td class="tablebodylogo"><img src="/sites/all/themes/benzinpriser/logo/circlek.png" alt="" style="width:32px; height: 18px;" /></td>
            <td class="tablebodyprice">&nbsp;<a href="/circle-k/circle-k-servicenter-frederiksv%C3%A6rkvej-16" class="octanelink">10.00</a></td>
            <td class="tablebodyaddress" title="Frederiksværkvej 16 3600 Frederikssund">&nbsp;<a href="/circle-k/circle-k-servicenter-frederiksv%C3%A6rkvej-16" class="octanelink">Frederiksværkvej 16 3600 Frederikssund</a></td>
            <td class="tablebodydate"><a href="/circle-k/circle-k-servicenter-frederiksv%C3%A6rkvej-16" class="octanelink">1 time 57 m </a></td>
        </tr>
    </tbody>
</table>

我试图抓住桌子的价格和地址。

这是我目前的代码。

package com.example.android.soup;

import android.os.Bundle;
import android.support.v7.app.AppCompatActivity;
import android.view.View;
import android.widget.TextView;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

public class MainActivity extends AppCompatActivity {

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);
    }

    public void fetch(View View){
        String sNodes = "";
        TextView text = (TextView) findViewById(R.id.text1234);
        try
        {
            Document doc = Jsoup.parse("http://www.fdmbenzinpriser.dk/searchprices/1/3600");
            System.out.println(doc.getElementById("list_canvas"));
        }
        catch (Exception e)
        {
            e.printStackTrace();
        }
        text.setText(sNodes);
    }
}

2 个答案:

答案 0 :(得分:1)

parse()将解析String(https://jsoup.org/cookbook/input/parse-document-from-string)中的文档。您传递了一个不是HTML字符串的URL。您必须从URL获取()数据。这就是问题所在。这是一个有效的例子:

 Document doc = Jsoup.connect("http://www.fdmbenzinpriser.dk/searchprices/1/3600").get();


      System.out.println(doc.getElementById("list_canvas"));

https://jsoup.org/cookbook/input/load-document-from-url

答案 1 :(得分:0)

由于您真的对访问tbody标签感兴趣,可以尝试

final Elements tbodyElements = doc.getAllElements().first().getElementsByTag("tbody");
for( int x = 0; x < tbodyElements.size(); x++ )
{
    if( tbodyElements.get(x).attr("id").equals("list_canvas") )
    {
        // You know you are inside tbody tag, find all the td elements in it
        final Elements tdElems = tbodyElements.get(x).getElementsByTag("td");
        for( int y = 0; y < tdElems.size(); y++ )
        {
             final Element tdElem = tdElems.get(y);
             if( tdElem.attr("tablebodylogo") )
             {
                 // this will get you tags within tablebodylogo
                 final Elements childrenTDLogo = tdElem.children();
             }
             else if( tdElem.attr("tablebodyprice") )
             {

                 // this will get you tags within tablebodyprice
                 final Elements childrenTDPrice = tdElem.children();
             }                     
             else if( tdElem.attr("tablebodyaddress") )
             {

                 // this will get you tags within tablebodyaddress
                 final Elements childrenTDAddress = tdElem.children();
             }                     

             else if( tdElem.attr("tablebodydate") )
             {

                 // this will get you tags within tablebodydate
                 final Elements childrenTDDate = tdElem.children();
             }                     
        } 
    }
}

参考jsoup的官方文档将大大提高您对如何使用org.jsoup.nodes.Element和org.jsoup.select.Elements的理解,将真正帮助您。这是一个用于解析html文档的惊人库,我不认为它是抓取在线html页面的最佳选择。但是,希望你得到帮助。欢迎澄清