Question

例如：

<html>
   <head></head>
   <body sometag='"'></body>
</html>

当我使用Jsoup解析这个html时：

Document doc = Jsoup.parse(html);
doc.outputSettings().prettyPrint(false);
System.out.println(doc.toString());

它将成为

<html>
   <head></head>
   <body sometag="&quot;"></body>
</html>

注意＆＃39;和＆＃34; ，我不想要它解析＆＃39;和＆＃34; ，我只需要它来获取一些文字有没有办法避免jsoup解析这个。非常感谢

Answer 1

只是不要使用HTML解析器。改为使用XML解析器。

Document doc = Jsoup.parse(html, "", Parser.xmlParser());

Answer 2

所以我使用不同的String转义玩了一下，最简单的方法是执行以下操作：

虽然这可能不是你所追求的，但我们会看到。

String html = "<html> <head> </head> <body sometag='\"'> </body> </html>";

Document doc = Jsoup.parse(html);
doc.outputSettings().escapeMode(Entities.EscapeMode.xhtml);
System.out.println( StringEscapeUtils.unescapeXml( doc.toString() ) );

jsoup解析html标签属性

2 个答案: