我是html解析的新手,并试图获得
document = Jsoup.connect("http://www.beyazperde.com/filmler/tum-filmleri/kullanici-puani/tur-13015/"+"?page=" + i).get();
Elements links = document.select("div.content a.no_underline");
for (Element link : links)
{
Element url = link.after("filmler/film-");
System.out.println(url);
}
当我跑步时,我得到了这些
<a class="no_underline" title="" href="/filmler/film-10080/"> Cesuryürek </a>
<a class="no_underline" title="" href="/filmler/film-9393/"> Schindler’in Listesi </a>
<a class="no_underline" title="" href="/filmler/film-28359/"> Piyanist </a>
但我希望"10080","9393","28359"
只是这些数字而不是整个<a>
标记。有没有办法做到这一点?
答案 0 :(得分:1)
如果将url
转换为字符串,则可以使用正则表达式执行此操作:
url.replaceAll(".*href=\"/filmler/film-([0-9]*)/.*","$1");
答案 1 :(得分:1)
您可以获取href属性并使用它。
for (Element link : links)
{
String url = link.attr("href");
String result = url.split("-")[1].replace("/","");
System.out.println(result);
}
答案 2 :(得分:0)
document = Jsoup.connect("http://www.beyazperde.com/filmler/tum-filmleri/kullanici- puani/tur-13015/"+"?page=" + i).get();
Elements links = document.select("div.content a.no_underline");
for (Element link : links)
{
Attributes attributes = link.attributes();
String hrefVal = attributes.get("href");
//use substring or any other logic to get your value
// Element url = link.after("filmler/film-");
System.out.println(hrefVal);
}