(Jsoup)如何解析特定的列和行?

时间:2017-11-15 18:38:50

标签: java html-table jsoup

基本上我有这个公交时刻表:

<table id="smsBusResults" width="100%" cellpadding="0" cellspacing="0" border="0">
            <tbody><tr>
                <th>Linha</th>
                <th>Hora Prevista</th>
                <th>Tempo de Espera</th>
            </tr>
                            <tr class="even">
                <td>        <ul class="linhasAssoc">
                <li><a target="_self" class="linha_502" title="" href="/pt/viajar/linhas/?linha=502 ">502 </a></li>
                </ul>
    &nbsp;MATOSINHOS M</td>
                <td><i>17:09</i></td>
                <td>2min</td>
            </tr>
                            <tr class="even">
                <td>        <ul class="linhasAssoc">
                <li><a target="_self" class="linha_201" title="" href="/pt/viajar/linhas/?linha=201 ">201 </a></li>
                </ul>
    &nbsp;VISO - C2</td>
                <td><i>17:13</i></td>
                <td>5min</td>
            </tr>
                            <tr class="even">
                <td>        <ul class="linhasAssoc">
                <li><a target="_self" class="linha_203" title="" href="/pt/viajar/linhas/?linha=203 ">203 </a></li>
                </ul>
    &nbsp;CAST. QUEIJO</td>
                <td><i>17:18</i></td>
                <td>10min</td>
            </tr>
                            <tr class="even">
                <td>        <ul class="linhasAssoc">
                <li><a target="_self" class="linha_502" title="" href="/pt/viajar/linhas/?linha=502 ">502 </a></li>
                </ul>
    &nbsp;MATOSINHOS M</td>
                <td><i>17:20</i></td>
                <td>12min</td>
            </tr>
                            <tr class="even">
                <td>        <ul class="linhasAssoc">
                <li><a target="_self" class="linha_201" title="" href="/pt/viajar/linhas/?linha=201 ">201 </a></li>
                </ul>
    &nbsp;VISO - C2</td>
                <td><i>17:22</i></td>
                <td>15min</td>
            </tr>
                        </tbody></table>
“Linha”的意思是“巴士号和目的地名称”,“Hora Prevista” - “ETA”,“Tempo de Espera” - “等待时间”。

例如,第一辆公共汽车是目的地为马托西纽什的公共汽车502,他应该在17:09到达,等待时间为2分钟。

如何仅打印第一辆公交车的名称? (第1栏,第0行)?

我尝试过什么......

public class Main {

public static void main(String[] args) throws IOException {
        Document doc = Jsoup.connect("http://www.stcp.pt/pt/itinerarium/soapclient.php?codigo=ACRD1").get();


    ArrayList<String> nomesLinhas = new ArrayList<>();
    Elements smsBusResults = doc.select("smsBusResults");
    Elements filas = smsBusResults.select("tr");
    Elements colunas = smsBusResults.select("td");

    for (int i = 0; i < filas.size(); i++) {
        Element fila = filas.get(i);
        Elements cols = fila.select("td");

        System.out.println(cols.get(1).text());
    }

}

}

1 个答案:

答案 0 :(得分:0)

您可以先获得var occurences = 0; string.split("-1").reduceRight((exit, space) => { if(exit || space) return true; occurences++; return false; }, false); 方法。如果您的文档具有名称doc,则以下代码将为您提供第0行第1行:

doc.select()