我怎样才能获得特定部件" href"

时间:2014-07-17 11:49:52

标签: java html parsing jsoup

我是html解析的新手,并试图获得

document = Jsoup.connect("http://www.beyazperde.com/filmler/tum-filmleri/kullanici-puani/tur-13015/"+"?page=" + i).get();
Elements links = document.select("div.content a.no_underline");
for (Element link : links) 
{
    Element url = link.after("filmler/film-");
    System.out.println(url);
}

当我跑步时,我得到了这些

<a class="no_underline" title="" href="/filmler/film-10080/"> Cesury&uuml;rek </a>
<a class="no_underline" title="" href="/filmler/film-9393/"> Schindler’in Listesi </a>
<a class="no_underline" title="" href="/filmler/film-28359/"> Piyanist </a>

但我希望"10080","9393","28359"只是这些数字而不是整个<a>标记。有没有办法做到这一点?

3 个答案:

答案 0 :(得分:1)

如果将url转换为字符串,则可以使用正则表达式执行此操作:

url.replaceAll(".*href=\"/filmler/film-([0-9]*)/.*","$1");

答案 1 :(得分:1)

您可以获取href属性并使用它。

for (Element link : links) 
{
    String url = link.attr("href");
    String result = url.split("-")[1].replace("/","");
    System.out.println(result);
}

答案 2 :(得分:0)

document = Jsoup.connect("http://www.beyazperde.com/filmler/tum-filmleri/kullanici-        puani/tur-13015/"+"?page=" + i).get();
Elements links = document.select("div.content a.no_underline");
for (Element link : links) 
{
  Attributes attributes = link.attributes();
   String hrefVal = attributes.get("href");
   //use substring or any other logic to get your value
  // Element url = link.after("filmler/film-");
  System.out.println(hrefVal);
}