Question

我想用JSOUP for Links解析本地HTML文件。但它仍然不起作用。守则是：

public static Set<String> getAllLinksFromPage(String file) throws IOException{
    final Set<String> result = new HashSet<String>();
    File input = new File(file);

    Document doc = Jsoup.parse(file);

    Elements links = doc.select("a[href]");
    for(Element link : links) {
        result.add(links.attr("abs:href"));
    }

    return result; 

}

输出为：[]

那么问题是什么？

Answer 1

您粘贴的代码中有一些拼写错误：

:set nomore您可以Document doc = Jsoup.parse(file)方法而不是Jsoup.parse(String html)。您使用了错误的变量名称 - Jsoup.parse(File in, String charset)是输入字符串（我假设的文件名），而使用file变量保留对文件的引用。它应该input
你在Document doc = Jsoup.parse(input, "UTF-8");中输了一个错字 - 你带有属性＆＃34; abs：href＆＃34;从一个链接列表中而不是在迭代result.add(links.attr("abs:href"));列表时当前获取的链接：links

应用所有更改后，您的方法应如下所示：

result.add(link.attr("abs:href"));

我已经使用反映此页面的HTML文件对其进行了测试（我只是将其保存到public static Set<String> getAllLinksFromPage(String file) throws IOException { final Set<String> result = new HashSet<String>(); File input = new File(file); Document doc = Jsoup.parse(input, "UTF-8"); Elements links = doc.select("a[href]"); for (Element link : links) { result.add(link.attr("abs:href")); } return result; }文件并与您的函数一起使用）这是我得到的结果：

/tmp/test.html

解析链接的HTML

1 个答案: