Question

我使用JSoup将URL的内容提取为HTML。但是代替' - '（Hiphen）和'（Apostrophe）等字符，我得到了奇怪的符号。我没有通过查看页面来源看到这些符号。

以下是我使用的代码：

String url = "http://www.novotreeminds.com/job-details.html#chief";

org.jsoup.nodes.Document document = org.jsoup.Jsoup.connect(url).get();
document = Jsoup.connect(url).timeout(20000)
            .method(Connection.Method.GET)
            .ignoreContentType(true).execute().parse();

document.outputSettings(new Document.OutputSettings().prettyPrint(false));
    System.out.println(document);

在提取的内容中，而不是

经验：6 - 10年

我明白了：

经验：6年10年

这也发生在apsotrophe的情况下。我还看到另一个方形符号，而不是上面那个奇怪的符号。enter image description here

谢谢， Akhila

你好@AHungerArtist，

我已经尝试过以下代码（指定了URL中使用的字符编码）

File input = new File("/home/Documents/NovoTree Minds.html");
Document doc = Jsoup.parse(input, "iso-8859-1", "");

但我看到相同的结果

谢谢， Akhila

JSoup返回符号代替hiphens和撇号

0 个答案: