如何解析具有多个表的页面

时间:2012-02-08 09:42:43

标签: java android jsoup

关于如何使用多个表抓取网页的任何想法? 我正在连接到网页

这是一个表,但在同一个网页上有多个表

我也无法弄清楚如何阅读表......

XML:

    <p><a href="/fantasy_news/feature/?ID=49818"><strong>Top 300 Overall Fantasy Rankings</strong></a></p> 
<div class="storyStats"> 
<table> 
<thead> 
<tr> 
<th>RANK</th> 
<th>CENTRES</th> 
<th>TEAM</th> 
<th>POS</th> 
<th>GP</th> 
<th>G</th> 
<th>A</th> 
<th>PTS</th> 
<th>+/-</th> 
<th>PIM</th> 
<th>PPP</th> 
</tr> 
</thead> 
<tbody> 
<tr class="bg1"> 
<td>1.</td> 
<td><a href="/nhl/teams/players/?name=steven+stamkos">Steven&nbsp;Stamkos</a></td> 

<td>Tampa Bay</td> 
<td>C</td> 
<td align="right">81</td> 
<td align="right">50</td> 
<td align="right">51</td> 
<td align="right">101</td> 
<td align="right">-2</td> 
<td align="right">56</td> 
<td align="right">38</td> 
</tr> 


Iterator<Element> trSIter = doc.select("table")
            .iterator();
    while (trSIter.hasNext()) {
        Element trEl = trSIter.next().child(0);
        Elements tdEls = trEl.children();
        Iterator<Element> tdIter = tdEls.select("tr").iterator();
        System.out.println("><1><><"+tdIter);
        boolean firstRow = true;
        while (tdIter.hasNext()) {

            Element tr = (Element) tdIter.next();


            while (tdIter.hasNext()) {
                int tdCount = 1;
                Element tdEl = tdIter.next();
                //name = tdEl.getElementsByClass("playertablePlayerName").get(0).text();

                Elements tdsEls = tdEl.select("td");
                System.out.println("><2><><"+tdsEls);
                Iterator<Element> columnIt = tdsEls.iterator();

                while (columnIt.hasNext()) {

                    Element column = columnIt.next();
                    switch (tdCount++) {
                    case 1:
                        name =column.select("a").first().text();

                        break;
                    case 2:
                        stat2 = Double.parseDouble(column.text());
                        break;
                    case 3:
                        stat3 = Double.parseDouble(column.text());
                        break;
                    case 4:
                        stat4 = Double.parseDouble(column.text());
                        break;
                    case 5:
                        stat5 = Double.parseDouble(column.text());
                        break;
                    case 6:
                        stat6 = Double.parseDouble(column.text());
                        break;
                    case 7:
                        stat7 = Double.parseDouble(column.text());
                        break;
                    case 8:
                        stat8 = Double.parseDouble(column.text());
                        break;

2 个答案:

答案 0 :(得分:1)

使用下面的代码,从HTML中解析表似乎没有问题。

public class JsoupActivity extends Activity {
    Document doc;
    myHttpGet _myGet;
    @Override
    public void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.main);
        final TextView tv = (TextView)findViewById(R.id.tv1);
        _myGet = new myHttpGet();
        try {
            doc = _myGet.doHttpGet();
            Elements tdsEls = doc.getElementsByClass("storyStats");
            //tv.setText(tdsEls.get(0).child(0).text());
            tv.setText(String.valueOf(tdsEls.first().children().size()));
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    private class myHttpGet {
        Document myDom;
        Connection myConnection;
        Response myResponse;
        public Document doHttpGet() {
            myConnection = Jsoup.connect("http://www.tsn.ca/fantasy_news/feature/?ID=49815");
            try {
                myResponse = myConnection.execute();
                try {
                    myDom = myResponse.parse();
                    return myDom;
                } catch (IOException e) {
                    Log.e("napster","Parse Error");
                }
            } catch (IOException e) {
                Log.e("napster","HTTP Error");
            }
            return myDom;
        }
    }

}

代码可以在textView中显示 5 ,这是您在 storyStats 类下的HTML中的表格数量。如果必须继续解析表的内容,可以将表分配到另一个Elements对象中并继续解析它。

Elements es = tdsEls.first().children();

Anderson的回答显示了如何解析数据。希望有所帮助。

答案 1 :(得分:0)

这应该让你开始。每个表都有一个您必须考虑的空白记录。您还需要确定您想要的统计数据以及它们在表中的位置。您可以获得tds.get()的统计信息。让我知道它对你有用。

    Document doc = Jsoup.connect("http://www.tsn.ca/fantasy_news/feature/?ID=49815").get();

    for (Element table : doc.select("div.storyStats").select("table")) {
        for (Element row : table.select("tr")) {
            Elements tds = row.select("td");
            if (tds.size() > 0) {
                System.out.println(tds.get(1).text() + ":" + tds.get(5).text());
            }
        }
    }