Question

我正在使用JSOUP过滤掉html正文中的链接。

用于散列标记链接en.wikipedia.org/wiki/Cloud_computing#cite_note-1

我尝试doc.select("a[href*=#]").remove();并且它在页面html src中的哈希标记链接很好用：<a href="#cite_ref-1">

但当我使用doc.select("a[href]*=/]").remove();页面html src中的链接

时

<a href="/wiki/Light">CH</a>

但仍有链接未过滤。这怎么可能？

Answer 1

你有一个错字。

doc.select("a[href]*=/]").remove();

应该是这样的

doc.select("a[href*=/]").remove();

但这会删除包含/的每个链接。这是您想要的，还是要删除以/开头的每个链接。在这种情况下，你需要这个

doc.select("a[href^=/]").remove();