Jsoup通过abs解析相对图像网址失败

时间:2013-10-31 11:01:53

标签: java image url jsoup relative

我正在尝试从网址获取图片链接。 有些网址是相对的。所以,我用abs来解决。 但它无法解决,abs打印相对网址

我的代码没有abs -

String linktopro = "http://www.washingtonpost.com/politics/promises-promises-a-big-obama-health-insurance-promise-that-never-stood-a-chance/2013/10/31/4a465f78-41fd-11e3-b028-de922d7a3f47_story.html";

Document doc = Jsoup.connect(linktopro).userAgent("Mozilla/5.0 (Windows; U; WindowsNT 5.1; en-US; rv1.8.1.6) Gecko/20070725 Firefox/2.0.0.6").timeout(30000).get();
Elements wp_columns = doc.select("div[class=wp-column ten margin-right main-content]");

for(Element wp_column : wp_columns)
{
    String wp_column_string = wp_column+"";
    Document wp_column_doc = Jsoup.parse(wp_column_string);
    Elements imgs = wp_column_doc.select("img");

    for(Element img : imgs)
    {
        out.println(img.attr("src")+"<br/>");
    }
}

没有abs的输出 -

/rf/image_606w/2010-2019/Wires/Online/2013-10-31/AP/Images/Obama Health Care.JPEG-09824.jpg
http://www.washingtonpost.com/rw/sites/twpweb/img/blogs/spacer.gif
http://www.washingtonpost.com/rw/sites/twpweb/img/blogs/spacer.gif
http://www.washingtonpost.com/rw/sites/twpweb/img/blogs/spacer.gif
http://www.washingtonpost.com/rw/sites/twpweb/img/blogs/spacer.gif

我的代码是abs -

String linktopro = "http://www.washingtonpost.com/politics/promises-promises-a-big-obama-health-insurance-promise-that-never-stood-a-chance/2013/10/31/4a465f78-41fd-11e3-b028-de922d7a3f47_story.html";

Document doc = Jsoup.connect(linktopro).userAgent("Mozilla/5.0 (Windows; U; WindowsNT 5.1; en-US; rv1.8.1.6) Gecko/20070725 Firefox/2.0.0.6").timeout(30000).get();
Elements wp_columns = doc.select("div[class=wp-column ten margin-right main-content]");

for(Element wp_column : wp_columns)
{
    String wp_column_string = wp_column+"";
    Document wp_column_doc = Jsoup.parse(wp_column_string);
    Elements imgs = wp_column_doc.select("img");
    for(Element img : imgs)
    {
        out.println(img.attr("abs:src")+"<br/>");
    }
}

输出abs -

http://www.washingtonpost.com/rw/sites/twpweb/img/blogs/spacer.gif
http://www.washingtonpost.com/rw/sites/twpweb/img/blogs/spacer.gif
http://www.washingtonpost.com/rw/sites/twpweb/img/blogs/spacer.gif
http://www.washingtonpost.com/rw/sites/twpweb/img/blogs/spacer.gif

正如您所看到的,第一个有用的图像链接消失了。 我不知道为什么会这样。

1 个答案:

答案 0 :(得分:1)

你试过吗?

 out.println(img.absUrl("src")+"<br/>");