Question

我正在使用HTMLDocument Iterator尝试迭代HTMLDocument中的所有标记。但是，迭代器似乎正在跳过嵌套在p标签内的标签。例如：

<html>
  <body>
    <a href = "somesite"> some site </a>
        <p>
            <a href = "someothersite"> some other site </a>
        </p>
  </body>
</html>

迭代器将获得第一个标签（某个站点），但它不会转到p标签中的标签（someothersite）。

以下是代码：

private void getLinks() throws MalformedURLException {
    HTMLDocument.Iterator it = content.getIterator(HTML.Tag.A);           
    it.next();
        while(it.isValid()) {
            // Do something
            it.next();
        }
}

有人能说明原因吗？

Answer 1

也许isValid()检查会破坏你的循环。尝试迭代器在没有检查的情况下命中第二个锚标记。

HTMLDocument Iterator跳过标签

1 个答案: