Question

我正在尝试解析并从给定的网站获取一些特定的链接。我在这里检查了很多关于jsoup的问题，并尝试了那些可能是解决方案但我没有成功的问题，我开始认为我想解析的网站可能是某个特定的网站。这是HTML的一个部分，我想从HTML中的所有文章类中获取所有链接：

<article id="item_65190842" class="media item_row ptm pbm nmt" itemscope itemtype="http://schema.org/Offer">  
<a title="Flera bilder" itemprop="url" tabindex="50" href="http://www.blocket.se/vastmanland/Volkswagen_Passat_2_0_TDI_DSG_140_Hk_Sportlin_65190842.htm?ca=11&w=3" class="pull-left item-link nohistory image_container  has_multiple_images" data-js="item_link"><ul class="object-attribute-badges"></ul><img src="https://cdn.blocket.com/static/0/lithumbs/41/4164545596.jpg" title="Flera bilder" alt="Flera bilder" width="169px" height="126px" class="item_image"/></a>

<div class="media-body desc" itemprop="itemOffered" itemscope><header class="clearfix"><div class="pull-left "><a class="label label-default mrxs" itemprop="url" onclick="return xt_click(this,'C','11','Butiksbadge','N')" href="http://www.blocket.se/bildepan-i-morgongava?ca=11">Butik</a>Västmanland</div><time datetime="2016-02-10 13:47:01" pubdate itemprop="datePublished" class="pull-right">Idag  13:47</time></header><h1 class="h5 media-heading ptxs" itemprop="name"><a href="http://www.blocket.se/vastmanland/Volkswagen_Passat_2_0_TDI_DSG_140_Hk_Sportlin_65190842.htm?ca=11&w=3" title="Volkswagen Passat 2.0 TDI DSG 140 Hk Sportlin" itemprop="url" tabindex="50" class="item_link">Volkswagen Passat 2.0 TDI DSG 140 Hk Sportlin -08</a></h1><p itemprop="price" class="list_price font-large">62 900:-</p><footer><div class="pull-right addon"></div></footer></div>
</article>

我想获取<a title="Flera bilder"

中的链接

以下是我要解析的网页

http://www.blocket.se/hela_sverige/bilar/

Answer 1

假设你有使用Jsoup的Document对象。这就是你需要的。

Elements links = doc.select("a[title=Flera bilder]")
for(Element link : links){
    //This is the absolute link that you need. 
    String absHref = link.attr("abs:href");
}

请参阅Jsoup cookbook以获取进一步的参考资料

Java解析网站

1 个答案: