Question

我是html解析的新手，并试图获得

document = Jsoup.connect("http://www.beyazperde.com/filmler/tum-filmleri/kullanici-puani/tur-13015/"+"?page=" + i).get();
Elements links = document.select("div.content a.no_underline");
for (Element link : links) 
{
    Element url = link.after("filmler/film-");
    System.out.println(url);
}

当我跑步时，我得到了这些

<a class="no_underline" title="" href="/filmler/film-10080/"> Cesury&uuml;rek </a>
<a class="no_underline" title="" href="/filmler/film-9393/"> Schindler’in Listesi </a>
<a class="no_underline" title="" href="/filmler/film-28359/"> Piyanist </a>

但我希望"10080","9393","28359"只是这些数字而不是整个<a>标记。有没有办法做到这一点？

Answer 1

如果将url转换为字符串，则可以使用正则表达式执行此操作：

url.replaceAll(".*href=\"/filmler/film-([0-9]*)/.*","$1");

Answer 2

您可以获取href属性并使用它。

for (Element link : links) 
{
    String url = link.attr("href");
    String result = url.split("-")[1].replace("/","");
    System.out.println(result);
}

Answer 3

document = Jsoup.connect("http://www.beyazperde.com/filmler/tum-filmleri/kullanici-        puani/tur-13015/"+"?page=" + i).get();
Elements links = document.select("div.content a.no_underline");
for (Element link : links) 
{
  Attributes attributes = link.attributes();
   String hrefVal = attributes.get("href");
   //use substring or any other logic to get your value
  // Element url = link.after("filmler/film-");
  System.out.println(hrefVal);
}

我怎样才能获得特定部件＆＃34; href＆＃34;

3 个答案: