Question

我正在尝试使用JSoup获取HTML表格行中的所有元素。但是，它似乎一直省略列并且只打印表中大约一半的列。例如，如果我有一个像这样的HTML表：

<table id="resultTable" width="100%" border="1" cellpadding="1" cellspacing="0" bordercolor="#FFFFFF">
    <tbody>
        <tr class="table">
            <td align="right">1</td>
            <td align="right">4</td>
            <td><div>NAME</div></td>
            <td><div>Country</div></td>
            <td><div>Club</div></td>
            <td><div>SR</div></td>
            <td align="right"><div>56.00 (5)</div></td>
            <td align="right"><div>51.62 (3)</div></td>
            <td align="right"><div>1:47.62</div></td>
        </tr>
   </tbody>
</table>

我尝试打印细胞：

Document doc = Jsoup.parse(html);  //html is a string containing the full HTML page
Elements tableRows = doc.select("tr.table");

for (Element tableRow : tableRows) {

    System.out.println(tableRow.text());

}

由于某种原因，唯一打印过的细胞是第1，第3和第9细胞。

您可以找到完整的HTML here。（这是一个用于显示高山滑雪比赛现场时间的网站）。为简洁起见，我只包含一个<tr>标记，但该网站包含数百个标记。此外，我不知道这是否重要，但每个<div>都会调用Javascript函数onmouseover和onmouseout。

问题是HTML不好吗？我认为JSoup照顾或清理坏HTML。或者我没有正确使用JSoup？

感谢您的帮助。

编辑：修正了它。在我使用Android WebView时，我没有意识到我正在加载移动网站而不是桌面版。

Answer 1

首先确定html是哪个： the source downloaded from the url 或 the full html copied from browser 。如果 the source ，由于页面是动态的，并且表格由 http://live-timing.com/includes/aj_race.php?r=163390&&m=1&&u=5 加载（，您可以尝试直接从此网址获取结果，您将得不到任何结果 >）。如果 the full html ，您的选择器语法没有问题，您可以尝试 https://try.jsoup.org/ ，结果是正确的。

JSoup不打印HTML表格行

1 个答案: