关于如何使用多个表抓取网页的任何想法? 我正在连接到网页
这是一个表,但在同一个网页上有多个表
我也无法弄清楚如何阅读表......
XML:
<p><a href="/fantasy_news/feature/?ID=49818"><strong>Top 300 Overall Fantasy Rankings</strong></a></p>
<div class="storyStats">
<table>
<thead>
<tr>
<th>RANK</th>
<th>CENTRES</th>
<th>TEAM</th>
<th>POS</th>
<th>GP</th>
<th>G</th>
<th>A</th>
<th>PTS</th>
<th>+/-</th>
<th>PIM</th>
<th>PPP</th>
</tr>
</thead>
<tbody>
<tr class="bg1">
<td>1.</td>
<td><a href="/nhl/teams/players/?name=steven+stamkos">Steven Stamkos</a></td>
<td>Tampa Bay</td>
<td>C</td>
<td align="right">81</td>
<td align="right">50</td>
<td align="right">51</td>
<td align="right">101</td>
<td align="right">-2</td>
<td align="right">56</td>
<td align="right">38</td>
</tr>
Iterator<Element> trSIter = doc.select("table")
.iterator();
while (trSIter.hasNext()) {
Element trEl = trSIter.next().child(0);
Elements tdEls = trEl.children();
Iterator<Element> tdIter = tdEls.select("tr").iterator();
System.out.println("><1><><"+tdIter);
boolean firstRow = true;
while (tdIter.hasNext()) {
Element tr = (Element) tdIter.next();
while (tdIter.hasNext()) {
int tdCount = 1;
Element tdEl = tdIter.next();
//name = tdEl.getElementsByClass("playertablePlayerName").get(0).text();
Elements tdsEls = tdEl.select("td");
System.out.println("><2><><"+tdsEls);
Iterator<Element> columnIt = tdsEls.iterator();
while (columnIt.hasNext()) {
Element column = columnIt.next();
switch (tdCount++) {
case 1:
name =column.select("a").first().text();
break;
case 2:
stat2 = Double.parseDouble(column.text());
break;
case 3:
stat3 = Double.parseDouble(column.text());
break;
case 4:
stat4 = Double.parseDouble(column.text());
break;
case 5:
stat5 = Double.parseDouble(column.text());
break;
case 6:
stat6 = Double.parseDouble(column.text());
break;
case 7:
stat7 = Double.parseDouble(column.text());
break;
case 8:
stat8 = Double.parseDouble(column.text());
break;
答案 0 :(得分:1)
使用下面的代码,从HTML中解析表似乎没有问题。
public class JsoupActivity extends Activity {
Document doc;
myHttpGet _myGet;
@Override
public void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.main);
final TextView tv = (TextView)findViewById(R.id.tv1);
_myGet = new myHttpGet();
try {
doc = _myGet.doHttpGet();
Elements tdsEls = doc.getElementsByClass("storyStats");
//tv.setText(tdsEls.get(0).child(0).text());
tv.setText(String.valueOf(tdsEls.first().children().size()));
} catch (Exception e) {
e.printStackTrace();
}
}
private class myHttpGet {
Document myDom;
Connection myConnection;
Response myResponse;
public Document doHttpGet() {
myConnection = Jsoup.connect("http://www.tsn.ca/fantasy_news/feature/?ID=49815");
try {
myResponse = myConnection.execute();
try {
myDom = myResponse.parse();
return myDom;
} catch (IOException e) {
Log.e("napster","Parse Error");
}
} catch (IOException e) {
Log.e("napster","HTTP Error");
}
return myDom;
}
}
}
代码可以在textView中显示 5 ,这是您在 storyStats 类下的HTML中的表格数量。如果必须继续解析表的内容,可以将表分配到另一个Elements对象中并继续解析它。
Elements es = tdsEls.first().children();
Anderson的回答显示了如何解析数据。希望有所帮助。
答案 1 :(得分:0)
这应该让你开始。每个表都有一个您必须考虑的空白记录。您还需要确定您想要的统计数据以及它们在表中的位置。您可以获得tds.get()
的统计信息。让我知道它对你有用。
Document doc = Jsoup.connect("http://www.tsn.ca/fantasy_news/feature/?ID=49815").get();
for (Element table : doc.select("div.storyStats").select("table")) {
for (Element row : table.select("tr")) {
Elements tds = row.select("td");
if (tds.size() > 0) {
System.out.println(tds.get(1).text() + ":" + tds.get(5).text());
}
}
}