我有一个网站,其中包含一个看起来与此类似(更大..)的表:
</table>
<tr>
<td>
<table width="100%" cellspacing="-1" cellpadding="0" border="0" dir="rtl" style="padding-top: 25px;">
<tr>
<td align="right" style="padding-right: 25px;">
<span class="artist_name_txt">
<a href="/namelink">name</a>
<p class="diccografia">subname</p>
</span>
</td>
</tr>
</table>
</td>
</tr>
<tr>
<td>
<table width="100%" border="0" cellspacing="0" cellpadding="0" dir="rtl" style="padding-right: 25px; padding-left: 25px">
<tr>
<td class="songs" align="right">
<a href="/number1link" class="artist_player_songlist"> number1</a>
</td>
</tr>
<tr>
<td class="songs" align="right">
<a href="/number2link" class="artist_player_songlist">number2</a>
.......
</td>
</tr>
</table>
我需要一个想法,我如何解析网站并将此表提取为2个数组 -
我尝试了很多方法,没有什么能真正帮助我。
答案 0 :(得分:1)
您应该阅读JSoup Cookbook - 尤其是Selector syntax非常强大。
以下是一个例子:
final String html = ...
// use connect().get() instead if you connect to an website
Document doc = Jsoup.parse(html);
List<String> names = new ArrayList<>();
List<String> links = new ArrayList<>();
for( Element element : doc.select("a.artist_player_songlist") )
{
names.add(element.text());
links.add(element.attr("href"));
}
System.out.println("Names: " + names);
System.out.println("Links: " + links);
输出:
Names: [number1, number2]
Links: [/number1link, /number2link]