Jsoup - 从href属性中选择值

时间:2013-11-19 23:37:33

标签: java html-parsing jsoup

html代码(不是来自我的网站,所以我无法改变它)看起来像这样:

<div id="resulttable"> 
 <div class="dirlist"> 
  <div class="stationcol" style="width:428px;"> 
   <a href="http://whatever.com?id=xxx" title="Whatever" class="playbutton playimage" name="whatever" id="105867"></a> 
   <div class="videoBody"> 
    <div class="gridModule"> 
     <div class="surrogate"> 
      <div id="thumbnail105867" class="thumbnail"> 
       <a class="playbutton clickabletitle" name="whatever" id="105867" title="Whatever" href="http://whatever.com?id=xxx"> Bla </a>
</div></div></div></div></div></div></div>

这是我的代码:

Document doc = Jsoup.parse(result);
Elements hrefs = doc.select("div.stationcol a[href]");
StringBuilder links = new StringBuilder();

for (Element href : hrefs) {
    links.append(href.text());
}

String httplinks = links.toString();
System.out.println("TEST: " + httplinks);

输出如下:

I/System.out(10451): Link1http://www.whatever.c...Link2http://www.test.c...

我真正需要的是ArrayList,其中包含Urls以及可能包含标题的单独ArrayList

有人可以帮我吗?

1 个答案:

答案 0 :(得分:4)

你的意思是这样吗?

ArrayList<String> titles = new ArrayList<String>();
ArrayList<String> urls = new ArrayList<String>();

Document doc = Jsoup.parse(result);
Elements links = doc.select("div.stationcol > a[href]");

for (Element e : links) {
    titles.add(e.attr("title"));
    urls.add(e.attr("href"));
}

System.out.println(titles);
System.out.println(urls);

这将输出示例代码中两个ArrayLists的内容,例如:

[Whatever]
[http://whatever.com?id=xxx]