我试图从网页上抓一张桌子,但我似乎无法让它正常工作。
<table cellpadding="0" cellspacing="0" border="0" class="pricetable sortable" id="sortabletable">
<thead class="tableheader">
<tr class="sortbottom">
<th class="thtableheaderlogo unsortable"> </th>
<th class="thtableheaderprice"><div class="tableheaderprice">Pris</div></th>
<th class="thtableheaderaddress"><div class="tableheaderaddress">Adresse</div></th>
<th class="thtableheaderobserved unsortable"><div class="tableheaderobserved">Tidspunkt</div></th>
</tr>
</thead>
<tfoot>
<tr class="unsortable">
<td colspan="4"><br />* Denne pris er indberettet af selskabet <a style="margin-left: 40px;" href="/indberet">Indberet pris</a></td>
</tr>
</tfoot>
<tbody id="list_canvas">
<tr>
<td class="tablebodylogo"><img src="/sites/all/themes/benzinpriser/logo/f24.jpg" alt="" style="width:32px; height: 18px;" /></td>
<td class="tablebodyprice"> <a href="/f24/f24-frederiksborgvej-1" class="octanelink">10.57</a></td>
<td class="tablebodyaddress" title="Frederiksborgvej 1 3600 Frederikssund"> <a href="/f24/f24-frederiksborgvej-1" class="octanelink">Frederiksborgvej 1 3600 Frederikssund</a></td>
<td class="tablebodydate"><a href="/f24/f24-frederiksborgvej-1" class="octanelink">1 time 57 m </a></td>
</tr>
<tr>
<td class="tablebodylogo"><img src="/sites/all/themes/benzinpriser/logo/q8.gif" alt="" style="width:32px; height: 18px;" /></td>
<td class="tablebodyprice"> <a href="/q8/q8-jernbanegade-43" class="octanelink">10.67</a></td>
<td class="tablebodyaddress" title="Jernbanegade 43 3600 Frederikssund"> <a href="/q8/q8-jernbanegade-43" class="octanelink">Jernbanegade 43 3600 Frederikssund</a></td>
<td class="tablebodydate"><a href="/q8/q8-jernbanegade-43" class="octanelink">1 time 57 m </a></td>
</tr>
<tr>
<td class="tablebodylogo"><img src="/sites/all/themes/benzinpriser/logo/shell.gif" alt="" style="width:32px; height: 18px;" /></td>
<td class="tablebodyprice"> <a href="/shell/shell-ny-%C3%B8stergade-12" class="octanelink">11.87</a></td>
<td class="tablebodyaddress" title="Ny Østergade 12 3600 Frederikssund"> <a href="/shell/shell-ny-%C3%B8stergade-12" class="octanelink">Ny Østergade 12 3600 Frederikssund</a></td>
<td class="tablebodydate"><a href="/shell/shell-ny-%C3%B8stergade-12" class="octanelink">1 time 57 m </a></td>
</tr>
<tr>
<td class="tablebodylogo"><img src="/sites/all/themes/benzinpriser/logo/shell.gif" alt="" style="width:32px; height: 18px;" /></td>
<td class="tablebodyprice"> <a href="/shell/shell-askelundsvej-1" class="octanelink">11.87</a></td>
<td class="tablebodyaddress" title="Askelundsvej 1 3600 Frederikssund"> <a href="/shell/shell-askelundsvej-1" class="octanelink">Askelundsvej 1 3600 Frederikssund</a></td>
<td class="tablebodydate"><a href="/shell/shell-askelundsvej-1" class="octanelink">1 time 57 m </a></td>
</tr>
<tr>
<td class="tablebodylogo"><img src="/sites/all/themes/benzinpriser/logo/circlek.png" alt="" style="width:32px; height: 18px;" /></td>
<td class="tablebodyprice"> <a href="/circle-k/circle-k-servicenter-frederiksv%C3%A6rkvej-16" class="octanelink">10.00</a></td>
<td class="tablebodyaddress" title="Frederiksværkvej 16 3600 Frederikssund"> <a href="/circle-k/circle-k-servicenter-frederiksv%C3%A6rkvej-16" class="octanelink">Frederiksværkvej 16 3600 Frederikssund</a></td>
<td class="tablebodydate"><a href="/circle-k/circle-k-servicenter-frederiksv%C3%A6rkvej-16" class="octanelink">1 time 57 m </a></td>
</tr>
</tbody>
</table>
我试图抓住桌子的价格和地址。
这是我目前的代码。
package com.example.android.soup;
import android.os.Bundle;
import android.support.v7.app.AppCompatActivity;
import android.view.View;
import android.widget.TextView;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
public class MainActivity extends AppCompatActivity {
@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
}
public void fetch(View View){
String sNodes = "";
TextView text = (TextView) findViewById(R.id.text1234);
try
{
Document doc = Jsoup.parse("http://www.fdmbenzinpriser.dk/searchprices/1/3600");
System.out.println(doc.getElementById("list_canvas"));
}
catch (Exception e)
{
e.printStackTrace();
}
text.setText(sNodes);
}
}
答案 0 :(得分:1)
parse()将解析String(https://jsoup.org/cookbook/input/parse-document-from-string)中的文档。您传递了一个不是HTML字符串的URL。您必须从URL获取()数据。这就是问题所在。这是一个有效的例子:
Document doc = Jsoup.connect("http://www.fdmbenzinpriser.dk/searchprices/1/3600").get();
System.out.println(doc.getElementById("list_canvas"));
答案 1 :(得分:0)
由于您真的对访问tbody标签感兴趣,可以尝试
final Elements tbodyElements = doc.getAllElements().first().getElementsByTag("tbody");
for( int x = 0; x < tbodyElements.size(); x++ )
{
if( tbodyElements.get(x).attr("id").equals("list_canvas") )
{
// You know you are inside tbody tag, find all the td elements in it
final Elements tdElems = tbodyElements.get(x).getElementsByTag("td");
for( int y = 0; y < tdElems.size(); y++ )
{
final Element tdElem = tdElems.get(y);
if( tdElem.attr("tablebodylogo") )
{
// this will get you tags within tablebodylogo
final Elements childrenTDLogo = tdElem.children();
}
else if( tdElem.attr("tablebodyprice") )
{
// this will get you tags within tablebodyprice
final Elements childrenTDPrice = tdElem.children();
}
else if( tdElem.attr("tablebodyaddress") )
{
// this will get you tags within tablebodyaddress
final Elements childrenTDAddress = tdElem.children();
}
else if( tdElem.attr("tablebodydate") )
{
// this will get you tags within tablebodydate
final Elements childrenTDDate = tdElem.children();
}
}
}
}
参考jsoup的官方文档将大大提高您对如何使用org.jsoup.nodes.Element和org.jsoup.select.Elements的理解,将真正帮助您。这是一个用于解析html文档的惊人库,我不认为它是抓取在线html页面的最佳选择。但是,希望你得到帮助。欢迎澄清