Question

我正在使用以下内容解析包含html标记和javascript标记的HTML字符串

public Document parse(String content) {
    return Jsoup.parse(content, "", Parser.xmlParser());
  }

问题是javascript元素只包含在一行中。

另外，我试过

public Document parse(String content) {
    return Jsoup.parse(content, "", Parser.htmlParser());
  }

，这适用于Javascript ...但HTML元素已包含在内而没有结束标记。例如：

<link rel="shortcut icon" href="../../static/public/img/favicon.ico" data-th-remove="all"></link>

已被解析为

<link rel="shortcut icon" href="../../static/public/img/favicon.ico" data-th-remove="all">

当我运行我的应用程序时，这不起作用。

我该如何解决？有没有办法使用JSOUP一起解析HTML和Javascript？

注意：我刚刚在JSOUP gitHub上创建了以下问题https://github.com/jhy/jsoup/issues/774

此致

Answer 1

link元素在HTML中没有结束标记。它只出现在标题中。有关说明，请参阅https://developer.mozilla.org/de/docs/Web/HTML/Element/link。

因此，当您使用Parser.htmlParser()

时，JSoup的行为与预期一致

您可以更详细地解释一下，为什么您无法处理未关闭的link代码？