无法使用java中的jsoup从html中提取内容?

时间:2014-02-25 06:21:33

标签: java jsoup

我正在尝试使用jsoup从下面的HTML代码中提取内容< td>具有类css-sched-table-title和css-sched-waypoints的标签。但我无法理解,有人可以帮忙解决问题吗?

Document doc = Jsoup.parse("somelink.html");
    Elements row = doc.select(".css-sched-table-title td");
    Iterator<Element> iterator = row.listIterator();
    while(iterator.hasNext())
    {
       Element element = iterator.next();
        String value = element.text();
        System.out.println("value : " + value);
    }

  <tr>
        <td ALIGN="CENTER" COLSPAN="16"  CLASS="css-sched-table-title"><b>Saturday - </b><b>Afternoon</b></td>
    </tr>
    <tr VALIGN="BOTTOM">
        <TD>&nbsp;</TD>
        <TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">Townline and Southern</TD>
        <TD>&nbsp;</TD>
        <TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">Clearbrook and Blueridge</TD>
        <TD>&nbsp;</TD>
        <TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">Clearbrook and South Fraser</TD>
        <TD>&nbsp;</TD>
        <TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">Ar. Bourquin Exchange</TD>
        <TD>&nbsp;</TD>
        <TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">Lv. Bourquin Exchange</TD>
        <TD>&nbsp;</TD>
        <TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">Downtown Abbotsford</TD>
        <TD>&nbsp;</TD>
        <TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">McMillan and Old Yale</TD>
        <TD>&nbsp;</TD>
        <TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">Sandy Hill and Old Clayburn</TD>
    </tr>

1 个答案:

答案 0 :(得分:1)

td个标记css-sched-table-title,但css-sched-waypoints列表。

此外,与Elements row = doc.select("td.css-sched-waypoints");的正确语法对齐,请参阅here

注意:html文件按原样使用无效,jsoup不会将其解释为有效的表格html内容。我必须将上面的内容括在<table></table>代码中。

当我使用您的html文件尝试以下代码时:

Elements row = doc.select("td.css-sched-waypoints");
    Element title = doc.select("td.css-sched-table-title").first();

    System.out.println(title.text());
    Iterator<Element> iterator = row.listIterator();
    while (iterator.hasNext()) {
        Element element = iterator.next();
        String id = element.attr("id");
        String classes = element.attr("class");
        String value = element.text();
        System.out.println("Id : " + id + ", classes : " + classes
                + ", value : " + value);
    }

我明白了,

Saturday - Afternoon
Id : , classes : css-sched-waypoints, value : Townline and Southern
Id : , classes : css-sched-waypoints, value : Clearbrook and Blueridge
Id : , classes : css-sched-waypoints, value : Clearbrook and South Fraser
Id : , classes : css-sched-waypoints, value : Ar. Bourquin Exchange
Id : , classes : css-sched-waypoints, value : Lv. Bourquin Exchange
Id : , classes : css-sched-waypoints, value : Downtown Abbotsford
Id : , classes : css-sched-waypoints, value : McMillan and Old Yale
Id : , classes : css-sched-waypoints, value : Sandy Hill and Old Clayburn