Question

我写了一个分析网站的方法 - 查找其中的所有唯一链接并计算所有图像的大小（以字节为单位）。在一些网站的情况下，它可以工作，但有一些（"https://www.nasa.gov"）它没有。请问有人请说明原因是什么？

/**
 * @param url - url to the page to be parsed
 * @return - a hashset of unique links found in the page
 * @throws IOException - whan a problem with the connection occurs
 */
private static HashSet<String> AnalyzeUrl(String url) throws IOException
{
    Document doc = Jsoup.connect(url).get();

    HashSet<String> uniqueImages = new HashSet<>();
    HashSet<String> uniqueLinks = new HashSet<>();

    // Get unique images
    Elements images = doc.getElementsByTag("img");
    for (Element image : images)
        uniqueImages.add(image.attr("abs:src"));

    // Get unique links
    Elements links = doc.getElementsByTag("a");
    for (Element link : links)
        uniqueLinks.add(link.attr("abs:href"));

    // Get total size of images
    int totalSize = 0;
    for (String imageUrl : uniqueImages)
        totalSize += Jsoup.connect(imageUrl).ignoreContentType(true).execute().bodyAsBytes().length;

    // Show information
    String information = "Unique images found: " + uniqueImages.size() + "\n" +
                         "Total size of images: " + totalSize + " bytes \n" +
                         "Unique links found: " + uniqueLinks.size() + "\n";

    Alert alert = new Alert(Alert.AlertType.INFORMATION, information, ButtonType.OK);
    alert.showAndWait();

    return uniqueLinks;
}

Answer 1

您的问题可能是由重定向的完成方式引起的。如果网站使用javascript，则连接将不会被重定向，因为JSOUP不支持javascript。您需要查看网站并提供更多信息，希望它有所帮助。

JSOUP不适用于某个网站

1 个答案: